GLMs - Anderson Flashcards

Question 1

Q

Problems with One-way analysis

Answer

A

Potentially distorted by correlations among rating variables.
Does not consider inter-dependencies between rating variables in the way they impact what is being modeled.

Question 2

Q

Problems with Classical Linear Models

Answer

A

It is difficult to assert Normality and constant variance for response variables.
The values of the dependent variable (the Y variable) may be restricted to positive values, but the assumption of Normality violates this restriction.
If Y is always positive, then intuitively the variance of Y moves toward zero as the mean of Y moves toward zero, so the variance is related to the mean.
The linear model only allows for additive relationships between predictor variables, but those might be inadequate to describe the response variable.

Question 3

Q

Benefits of GLMs

Answer

A

The statistical framework allows for explicit assumptions about the nature of the data and its relationship with predictive variables.
The method of solving GLMs is more technically efficient than iterative methods.
GLMs provide statistical diagnostics which aid in selecting only significant variables and validating model assumptions.
Adjusts for correlations between variables and allows for interaction effects.

Question 4

Q

Steps to solving a Classical Linear Model

Answer

A

Set up the general equation in terms of Y, ß, and X’s.
Write down an equation for each observation by replacing X’s and Y with observed values in data. You will have the same number of equations as observations in the data. For observation i, the equation may contain some ß values and will contain error_i.
Solve each equation for the error_i.
Calculate the equation for the Sum of Squared Errors (SSE) by plugging in the error_i² formulas. SSE = Σ (i = 1 to n) of error_i²
Minimize the SSE by taking derivatives of it with respect to each ß and setting them equal to 0.
Solve the system of equations for the ß values.

Question 5

Q

Components of Classical Linear Models

Answer

A

Systematic - The p covariates are combined to give the “linear predictor” eta, where eta = ß₁ X₁ + ß₂X₂ + ß₃X₃ +…+ ß_pX_p
Random - The error term, is Normally distributed with mean zero and variance sigma². Var(Y_i) = sigma²
Link function - Equal to the identity function.

Question 6

Q

Components of Generalized Linear Models

Answer

A

Systematic - The p covariates are combined to give the “linear predictor” eta, where eta = ß₁ X₁ + ß₂ X₂ + ß₃ X₃ +…+ ß_p X_p
Random - Each Y_i is independent and from the exponential family of distributions. Var(Y_i) = phi * V(mu_i) / omega_i
Link function - Must be differentiable and monotonic.

Question 7

Q

Common exponential family distribution variance functions

Answer

A

Error Distribution & Variance Function

Normal : V(x) = 1 (as in a Classical Linear Model)

Poisson : V(x) = x

Gamma : V(x) = x²

Binomial : V(x) = x(1 - x)

Inverse Gaussian : x³

Teedie : V(x) = (1 / lambda) * x^p, where p < 0 or 1 < p < 2 or p > 2

Question 8

Q

Methods of estimating the scale parameter

Answer

A

Maximum likelihood (not feasible in practice)
The moment estimator (Pearson chi² statistic): phi hat = (1 / (n-p)) * Σ (i=1 to n) [(omega_i * (Y_i - mu_i)²) / V(mu_i)]
The total deviance estimator : phi hat = D / (n-p)

Question 9

Q

Common Link Functions

Answer

A

Link Function: Function, & Inverse Function

Identity: eta, eta

Log: ln(eta), e^eta

Logit: ln(eta / (1-eta)), e^eta / (1+e^eta)

Reciprocal: 1 / eta, 1 / eta

Question 10

Q

Common model forms for insurance data

Answer

A

Claim frequencies/counts - Multiplicative Poisson (Log link function, Poisson error term)
Claim severity - Multiplicative Gamma (Log link function, Gamma error term)
Pure Premium - Tweedie (compound of Poisson and Gamma above)
Probability (i.e., of policyholder retention) - Logistic (Logit link function, Binomial error term)

Question 11

Q

Aliasing and near-aliasing

Answer

A

Aliasing is when there is a linear dependency among the covariates in the model. Types of aliasing:

Intrinsic aliasing - When the linear dependency occurs by definition of the covariates.
Extrinsic aliasing - When the linear dependency occurs by the nature of the data.
Near-aliasing - When covariates are nearly linearly dependent, but not perfectly linearly dependent.

Question 12

Q

Ways to decide whether to include a factor in the model

Answer

A

Size of confidence intervals (usually viewed graphically in practice)
Type III testing
See if the parameter estimate is consistent over time
Intuitive that factor should impact result

Question 13

Q

Type III test statistics

Answer

A

chi² test statistic = D₁^* - D₂^* ~ chi² (df₁ - df₂)
F test statistic = [(D₁ - D₂) / (df₁ - df₂)] / (D₂ / df₂) ~ F (df₁ - df₂), df₂

Brainscape's Knowledge GenomeTM

GLMs - Anderson Flashcards

Brainscape's Knowledge Genome^TM