Linear Regression as a Statistical Model Flashcards

(36 cards)

1
Q

What is linear regression at a high level?

A

A model that assumes the expected value of a response variable Y is a linear combination of input features X plus an error term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the standard form of a multiple linear regression model?

A

Y = β₀ + β₁X₁ + ··· + β_pX_p + ε, where ε is an error term with mean zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In linear regression, what do the coefficients β represent?

A

They quantify the expected change in the response Y associated with a one-unit change in each predictor Xj , holding other predictors constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the role of the error term ε in the linear regression model?

A

It captures unexplained variability in Y due to measurement noise, omitted variables, and inherent randomness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the main goal when fitting a linear regression model?

A

To estimate coefficients β that best explain the relationship between X and Y according to some optimality criterion, usually minimizing squared error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is ordinary least squares (OLS)?

A

A method of estimating β by minimizing the sum of squared residuals between observed Y and model predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a residual in regression?

A

The difference between the observed value of Y and the predicted value Ŷ for a given observation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is the sum of squared residuals used in OLS?

A

It penalizes larger errors more heavily, leads to convenient analytic solutions, and corresponds to maximum likelihood under Gaussian noise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Under what distributional assumption does OLS coincide with maximum likelihood estimation?

A

When the errors ε are assumed i.i.d. Normal with mean zero and constant variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does ‘linear in parameters’ mean?

A

The model is linear with respect to the coefficients β, even if it includes nonlinear transformations of inputs (e.g., X², log X).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can we model nonlinear relationships within linear regression?

A

By including engineered features such as polynomial terms, interactions, or transformations of the original inputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the design matrix X in linear regression?

A

An n×(p+1) matrix whose rows are observations and columns are predictors (including a column of ones for the intercept).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the closed-form OLS solution in matrix notation (when XᵀX is invertible)?

A

β̂ = (XᵀX)⁻¹ Xᵀ y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the common assumptions of the classical linear regression model?

A

Linearity in parameters, independence of errors, errors with mean zero, constant error variance (homoscedasticity), and often Normality of errors for inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does homoscedasticity mean?

A

That the variance of the errors is constant across all levels of the predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is heteroscedasticity?

A

A situation where error variance changes with the level of predictors, violating the homoscedasticity assumption.

17
Q

Why is heteroscedasticity problematic for classical inference?

A

It can make standard errors and confidence intervals from basic formulas invalid, even if OLS estimates remain unbiased under certain conditions.

18
Q

What is multicollinearity in linear regression?

A

A condition where some predictors are highly correlated with each other, making coefficient estimates unstable and difficult to interpret.

19
Q

How does severe multicollinearity affect OLS estimates?

A

Small changes in data can cause large swings in estimated coefficients, and standard errors become large.

20
Q

Does multicollinearity necessarily harm predictive performance?

A

Not always; predictions can still be good, but interpretability of individual coefficients suffers.

21
Q

What is R² (coefficient of determination)?

A

The proportion of variability in Y explained by the regression model relative to a baseline that predicts the mean of Y.

22
Q

How is R² interpreted?

A

R²=1 means the model explains all variability in Y; R²=0 means it does no better than predicting the mean for all observations.

23
Q

Why can R² be misleading when comparing models with different numbers of predictors?

A

R² never decreases when new predictors are added, even if they offer no real explanatory power, encouraging overfitting.

24
Q

What is adjusted R²?

A

A modified version of R² that penalizes additional predictors, providing a more balanced comparison across models.

25
What is the Gauss–Markov theorem (intuitively)?
Under certain assumptions, OLS gives the best linear unbiased estimator (BLUE) of β, meaning the smallest variance among all linear unbiased estimators.
26
What does 'best linear unbiased estimator' not guarantee?
It does not guarantee best possible estimator among all nonlinear or biased estimators, only among linear unbiased ones.
27
What is the difference between in-sample and out-of-sample performance?
In-sample performance measures fit on the training data, while out-of-sample performance evaluates generalization to new data.
28
Why should model selection be based on out-of-sample performance rather than just R² on the training set?
Because R² on training data can be inflated by overfitting, while out-of-sample performance reflects generalization ability.
29
What is leverage in linear regression diagnostics?
A measure of how extreme an observation's predictor values are relative to others; high leverage points can strongly influence the fitted line.
30
What is an influential point?
An observation that, if removed, would significantly change the fitted regression; often identified using leverage and residual diagnostics.
31
Why are diagnostic plots important for regression?
They help detect violations of assumptions (nonlinearity, heteroscedasticity), outliers, and influential points that may distort results.
32
What is the difference between interpolation and extrapolation in regression?
Interpolation predicts within the range of observed X values; extrapolation predicts outside that range, which is often risky and unreliable.
33
Why can a linear regression with a high R² still be a poor model?
It might violate assumptions, be driven by outliers, capture spurious relationships, or be mis-specified for the question at hand.
34
How does linear regression relate to least-squares linear classification?
If you threshold continuous predictions to classify, you effectively get a linear decision boundary, though logistic regression is usually more appropriate for classification.
35
In ML practice, why is linear regression still widely used despite more complex models?
It is simple, fast, interpretable, and can perform competitively when paired with good feature engineering and regularization.
36
In one sentence, what is the core mental model for linear regression as a statistical model?
It assumes that the average response is a linear function of features plus noise and estimates the coefficients by minimizing squared error under assumptions that justify inference and interpretation.