Linear Regression as a Statistical Model Flashcards by O Cam

What is linear regression at a high level?

A model that assumes the expected value of a response variable Y is a linear combination of input features X plus an error term.

How well did you know this?

Not at all

Perfectly

What is the standard form of a multiple linear regression model?

Y = β₀ + β₁X₁ + ··· + β_pX_p + ε, where ε is an error term with mean zero.

How well did you know this?

Not at all

Perfectly

In linear regression, what do the coefficients β represent?

They quantify the expected change in the response Y associated with a one-unit change in each predictor X_j , holding other predictors constant.

How well did you know this?

Not at all

Perfectly

What is the role of the error term ε in the linear regression model?

It captures unexplained variability in Y due to measurement noise, omitted variables, and inherent randomness.

How well did you know this?

Not at all

Perfectly

What is the main goal when fitting a linear regression model?

To estimate coefficients β that best explain the relationship between X and Y according to some optimality criterion, usually minimizing squared error.

How well did you know this?

Not at all

Perfectly

What is ordinary least squares (OLS)?

A method of estimating β by minimizing the sum of squared residuals between observed Y and model predictions.

How well did you know this?

Not at all

Perfectly

What is a residual in regression?

The difference between the observed value of Y and the predicted value Ŷ for a given observation.

How well did you know this?

Not at all

Perfectly

Why is the sum of squared residuals used in OLS?

It penalizes larger errors more heavily, leads to convenient analytic solutions, and corresponds to maximum likelihood under Gaussian noise.

How well did you know this?

Not at all

Perfectly

Under what distributional assumption does OLS coincide with maximum likelihood estimation?

When the errors ε are assumed i.i.d. Normal with mean zero and constant variance.

How well did you know this?

Not at all

Perfectly

What does ‘linear in parameters’ mean?

The model is linear with respect to the coefficients β, even if it includes nonlinear transformations of inputs (e.g., X², log X).

How well did you know this?

Not at all

Perfectly

How can we model nonlinear relationships within linear regression?

By including engineered features such as polynomial terms, interactions, or transformations of the original inputs.

How well did you know this?

Not at all

Perfectly

What is the design matrix X in linear regression?

An n×(p+1) matrix whose rows are observations and columns are predictors (including a column of ones for the intercept).

How well did you know this?

Not at all

Perfectly

What is the closed-form OLS solution in matrix notation (when XᵀX is invertible)?

β̂ = (XᵀX)⁻¹ Xᵀ y.

How well did you know this?

Not at all

Perfectly

What are the common assumptions of the classical linear regression model?

Linearity in parameters, independence of errors, errors with mean zero, constant error variance (homoscedasticity), and often Normality of errors for inference.

How well did you know this?

Not at all

Perfectly

What does homoscedasticity mean?

That the variance of the errors is constant across all levels of the predictors.

How well did you know this?

Not at all

Perfectly

What is heteroscedasticity?

Study These Flashcards

A situation where error variance changes with the level of predictors, violating the homoscedasticity assumption.

Why is heteroscedasticity problematic for classical inference?

Study These Flashcards

It can make standard errors and confidence intervals from basic formulas invalid, even if OLS estimates remain unbiased under certain conditions.

What is multicollinearity in linear regression?

Study These Flashcards

A condition where some predictors are highly correlated with each other, making coefficient estimates unstable and difficult to interpret.

How does severe multicollinearity affect OLS estimates?

Study These Flashcards

Small changes in data can cause large swings in estimated coefficients, and standard errors become large.

Does multicollinearity necessarily harm predictive performance?

Study These Flashcards

Not always; predictions can still be good, but interpretability of individual coefficients suffers.

What is R² (coefficient of determination)?

Study These Flashcards

The proportion of variability in Y explained by the regression model relative to a baseline that predicts the mean of Y.

How is R² interpreted?

Study These Flashcards

R²=1 means the model explains all variability in Y; R²=0 means it does no better than predicting the mean for all observations.

Why can R² be misleading when comparing models with different numbers of predictors?

Study These Flashcards

R² never decreases when new predictors are added, even if they offer no real explanatory power, encouraging overfitting.

What is adjusted R²?

Study These Flashcards

A modified version of R² that penalizes additional predictors, providing a more balanced comparison across models.

What is the Gauss–Markov theorem (intuitively)?

Under certain assumptions, OLS gives the best linear unbiased estimator (BLUE) of β, meaning the smallest variance among all linear unbiased estimators.

What does 'best linear unbiased estimator' not guarantee?

It does not guarantee best possible estimator among all nonlinear or biased estimators, only among linear unbiased ones.

What is the difference between in-sample and out-of-sample performance?

In-sample performance measures fit on the training data, while out-of-sample performance evaluates generalization to new data.

Why should model selection be based on out-of-sample performance rather than just R² on the training set?

Because R² on training data can be inflated by overfitting, while out-of-sample performance reflects generalization ability.

What is leverage in linear regression diagnostics?

A measure of how extreme an observation's predictor values are relative to others; high leverage points can strongly influence the fitted line.

What is an influential point?

An observation that, if removed, would significantly change the fitted regression; often identified using leverage and residual diagnostics.

Why are diagnostic plots important for regression?

They help detect violations of assumptions (nonlinearity, heteroscedasticity), outliers, and influential points that may distort results.

What is the difference between interpolation and extrapolation in regression?

Interpolation predicts within the range of observed X values; extrapolation predicts outside that range, which is often risky and unreliable.

Why can a linear regression with a high R² still be a poor model?

It might violate assumptions, be driven by outliers, capture spurious relationships, or be mis-specified for the question at hand.

How does linear regression relate to least-squares linear classification?

If you threshold continuous predictions to classify, you effectively get a linear decision boundary, though logistic regression is usually more appropriate for classification.

In ML practice, why is linear regression still widely used despite more complex models?

It is simple, fast, interpretable, and can perform competitively when paired with good feature engineering and regularization.

In one sentence, what is the core mental model for linear regression as a statistical model?

It assumes that the average response is a linear function of features plus noise and estimates the coefficients by minimizing squared error under assumptions that justify inference and interpretation.

Linear Regression as a Statistical Model Flashcards

(36 cards)