Unit 1: Simple Linear Regression Flashcards

To understand the primary terminology and core content of Simple Linear Regression

1
Q

Why do we use simple linear regression?

A

To model a response variable Y against the predictor variable X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Covariance (SXY)?

A

Covariance describes the joint behavior of two Random Variables (X and Y).
The sign indicates the direction but we cannot know the strength because it is dependent on units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the correlation coefficient (R) and what does it tell us?

A

The correlation coefficient (R) measures the linear relationship between two or more quantitative variables and falls between -1 and 1. The R value tells you if there is a linear relationship and the strength and direction of that relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the coefficient of determination (R2)? What can it tell you about the linear relationship?

A

The coefficient of determination (R2) = SSM/SST.

It is the proportion of the variability in y explained by the linear association with x. It falls between 0 and 1.
It can tell you the strength of the relationship but not the direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If the covariance of two variables = 0, what can you say about the independence of the variables?

A

You cannot know if the variables are independent just because the covariance is 0. You can only know that there is no linear relationship between those variables. If 2 variables are KNOWN to be independent, than the covariance equals 0, but you cannot assume independence when covariance is 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Fisher’s Z Transformation?

A
It is a variance stabilizing transformation that allows you to construct confidence interval for any p. It can indirectly test the null hypothesis that  p=p0 (rho = observed rho) for any p0  not equal to 0. 
The rho (p) is more accurate near the boundaries.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Residuals

A

Estimated error = observed Y- expected Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the hypotheses for the overall F test for SLR?

A

H0: B1 = 0 (Slope of X =0 and the intercept-only model is a better model)
H1: B1 =/= 0 (Slope of X is not equal to 0. The model with X is a better model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the assumptions for SLR?

A

Linearity
Independence
Normality of Error
Errors are homoskedastic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do violations of SLR assumptions look like?

A

Curved shape
Fanning shape
heteroskedacity of the residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do we do when assumptions are violated?

A

Proceed with analysis because inference is robust to minor deviations from the assumptions for a large n.

For major violations, consider variable transformations or adding higher order polynomial terms.

For clear trends, consider adding predictors (MLR)

For heteroskedacity, consider advanced regression techniques

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What causes the Coefficient of Determination (R2) to increase?

A
Increase in SSM
Increase in MSM
Decrease in SSE
Decrease in Residual Variance (O2)
Stronger Linear relationship between X and Y
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What causes the Coefficient of Determination (R2) to decrease?

A
Decrease in SSM
Decrease in MSM
Increase in SSE
Increase in Residual Variance (O2)
Weaker linear relationship between X and Y
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are outliers?

A

Outliers are far from data and include points of leverage and influential points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why do we use method of least squares?

A

“Closed form” solution
Estimates (B0 & B1) are identical to those from Maximum Likelihood Estimates (MLE)
The estimates are unbiased and have smallest possible variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What three tests are identical in SLR?

A
  1. T-test for correlation (H0: p(rho)=0)
  2. T-test for slope (H0:B1=0)
  3. F-test for overall model fit (H0: Y=B0 + E)