What is linear regression at a high level?
A model that assumes the expected value of a response variable Y is a linear combination of input features X plus an error term.
What is the standard form of a multiple linear regression model?
Y = β₀ + β₁X₁ + ··· + β_pX_p + ε, where ε is an error term with mean zero.
In linear regression, what do the coefficients β represent?
They quantify the expected change in the response Y associated with a one-unit change in each predictor Xj , holding other predictors constant.
What is the role of the error term ε in the linear regression model?
It captures unexplained variability in Y due to measurement noise, omitted variables, and inherent randomness.
What is the main goal when fitting a linear regression model?
To estimate coefficients β that best explain the relationship between X and Y according to some optimality criterion, usually minimizing squared error.
What is ordinary least squares (OLS)?
A method of estimating β by minimizing the sum of squared residuals between observed Y and model predictions.
What is a residual in regression?
The difference between the observed value of Y and the predicted value Ŷ for a given observation.
Why is the sum of squared residuals used in OLS?
It penalizes larger errors more heavily, leads to convenient analytic solutions, and corresponds to maximum likelihood under Gaussian noise.
Under what distributional assumption does OLS coincide with maximum likelihood estimation?
When the errors ε are assumed i.i.d. Normal with mean zero and constant variance.
What does ‘linear in parameters’ mean?
The model is linear with respect to the coefficients β, even if it includes nonlinear transformations of inputs (e.g., X², log X).
How can we model nonlinear relationships within linear regression?
By including engineered features such as polynomial terms, interactions, or transformations of the original inputs.
What is the design matrix X in linear regression?
An n×(p+1) matrix whose rows are observations and columns are predictors (including a column of ones for the intercept).
What is the closed-form OLS solution in matrix notation (when XᵀX is invertible)?
β̂ = (XᵀX)⁻¹ Xᵀ y.
What are the common assumptions of the classical linear regression model?
Linearity in parameters, independence of errors, errors with mean zero, constant error variance (homoscedasticity), and often Normality of errors for inference.
What does homoscedasticity mean?
That the variance of the errors is constant across all levels of the predictors.
What is heteroscedasticity?
A situation where error variance changes with the level of predictors, violating the homoscedasticity assumption.
Why is heteroscedasticity problematic for classical inference?
It can make standard errors and confidence intervals from basic formulas invalid, even if OLS estimates remain unbiased under certain conditions.
What is multicollinearity in linear regression?
A condition where some predictors are highly correlated with each other, making coefficient estimates unstable and difficult to interpret.
How does severe multicollinearity affect OLS estimates?
Small changes in data can cause large swings in estimated coefficients, and standard errors become large.
Does multicollinearity necessarily harm predictive performance?
Not always; predictions can still be good, but interpretability of individual coefficients suffers.
What is R² (coefficient of determination)?
The proportion of variability in Y explained by the regression model relative to a baseline that predicts the mean of Y.
How is R² interpreted?
R²=1 means the model explains all variability in Y; R²=0 means it does no better than predicting the mean for all observations.
Why can R² be misleading when comparing models with different numbers of predictors?
R² never decreases when new predictors are added, even if they offer no real explanatory power, encouraging overfitting.
What is adjusted R²?
A modified version of R² that penalizes additional predictors, providing a more balanced comparison across models.