GLMs - Anderson Flashcards

1
Q

Problems with One-way analysis

A
  1. Potentially distorted by correlations among rating variables.
  2. Does not consider inter-dependencies between rating variables in the way they impact what is being modeled.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Problems with Classical Linear Models

A
  1. It is difficult to assert Normality and constant variance for response variables.
  2. The values of the dependent variable (the Y variable) may be restricted to positive values, but the assumption of Normality violates this restriction.
  3. If Y is always positive, then intuitively the variance of Y moves toward zero as the mean of Y moves toward zero, so the variance is related to the mean.
  4. The linear model only allows for additive relationships between predictor variables, but those might be inadequate to describe the response variable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Benefits of GLMs

A
  1. The statistical framework allows for explicit assumptions about the nature of the data and its relationship with predictive variables.
  2. The method of solving GLMs is more technically efficient than iterative methods.
  3. GLMs provide statistical diagnostics which aid in selecting only significant variables and validating model assumptions.
  4. Adjusts for correlations between variables and allows for interaction effects.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Steps to solving a Classical Linear Model

A
  1. Set up the general equation in terms of Y, ß, and X’s.
  2. Write down an equation for each observation by replacing X’s and Y with observed values in data. You will have the same number of equations as observations in the data. For observation i, the equation may contain some ß values and will contain errori.
  3. Solve each equation for the errori.
  4. Calculate the equation for the Sum of Squared Errors (SSE) by plugging in the errori2 formulas. SSE = Σ (i = 1 to n) of errori2
  5. Minimize the SSE by taking derivatives of it with respect to each ß and setting them equal to 0.
  6. Solve the system of equations for the ß values.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Components of Classical Linear Models

A
  1. Systematic - The p covariates are combined to give the “linear predictor” eta, where eta = ß1 X1 + ß2X2 + ß3X3 +…+ ßpXp
  2. Random - The error term, is Normally distributed with mean zero and variance sigma2. Var(Yi) = sigma2
  3. Link function - Equal to the identity function.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Components of Generalized Linear Models

A
  1. Systematic - The p covariates are combined to give the “linear predictor” eta, where eta = ß1 X1 + ß2 X2 + ß3 X3 +…+ ßp Xp
  2. Random - Each Yi is independent and from the exponential family of distributions. Var(Yi) = phi * V(mui) / omegai
  3. Link function - Must be differentiable and monotonic.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Common exponential family distribution variance functions

A

Error Distribution & Variance Function

Normal : V(x) = 1 (as in a Classical Linear Model)

Poisson : V(x) = x

Gamma : V(x) = x2

Binomial : V(x) = x(1 - x)

Inverse Gaussian : x3

Teedie : V(x) = (1 / lambda) * xp, where p < 0 or 1 < p < 2 or p > 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Methods of estimating the scale parameter

A
  1. Maximum likelihood (not feasible in practice)
  2. The moment estimator (Pearson chi2 statistic): phi hat = (1 / (n-p)) * Σ (i=1 to n) [(omegai * (Yi - mui)2) / V(mui)]
  3. The total deviance estimator : phi hat = D / (n-p)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Common Link Functions

A

Link Function: Function, & Inverse Function

Identity: eta, eta

Log: ln(eta), eeta

Logit: ln(eta / (1-eta)), eeta / (1+eeta)

Reciprocal: 1 / eta, 1 / eta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Common model forms for insurance data

A
  1. Claim frequencies/counts - Multiplicative Poisson (Log link function, Poisson error term)
  2. Claim severity - Multiplicative Gamma (Log link function, Gamma error term)
  3. Pure Premium - Tweedie (compound of Poisson and Gamma above)
  4. Probability (i.e., of policyholder retention) - Logistic (Logit link function, Binomial error term)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Aliasing and near-aliasing

A

Aliasing is when there is a linear dependency among the covariates in the model. Types of aliasing:

  1. Intrinsic aliasing - When the linear dependency occurs by definition of the covariates.
  2. Extrinsic aliasing - When the linear dependency occurs by the nature of the data.
  3. Near-aliasing - When covariates are nearly linearly dependent, but not perfectly linearly dependent.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ways to decide whether to include a factor in the model

A
  1. Size of confidence intervals (usually viewed graphically in practice)
  2. Type III testing
  3. See if the parameter estimate is consistent over time
  4. Intuitive that factor should impact result
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Type III test statistics

A
  1. chi2 test statistic = D1* - D2* ~ chi2 (df1 - df2)
  2. F test statistic = [(D1 - D2) / (df1 - df2)] / (D2 / df2) ~ F (df1 - df2), df2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly