Unit 3: Logistic Regression Flashcards

Question 1

Q

Which plot can be used to check the independence of observations assumption in logistic regression?

Answer

A

Scatterplot of residuals vs. the order of data collection

Question 2

Q

Sensitivity:

Answer

A

The probability that a test classifies someone as sick given that the person is truly sick
P(T+|D+)

Question 3

Q

Specificity:

Answer

A

The probability that a test classifies someone as healthy given that the person is truly healthy
P(T-|D-)

Question 4

Q

Accuracy:

Answer

A

The probability that a test correctly classifies someone
How to calculate:
Add the concordant cells and divide by total.

Question 5

Q

Area Under the Curve (AUC):

Answer

A

describes the overall predictive ability of
the screening test (a coin-flip has AUC=0.5)
We want AUC to be close to 1.
Shown on the ROC curve; ROC curves are useful for quantitative screening tests

Question 6

Q

Which cutpoint (i.e., `decision rule’) is the best?

Answer

A

It depends on the purpose of the screening test and the cost of misclassication

Usually you desire a balance between sensitivity and specificity

Question 7

Q

When do we do logistic regression?

Answer

A

When the outcome is a binary/dichotomous variable.
The appropriate measure for describing a dichotomous (binary) outcome depends on the study design, but generally ODDS RATIO is always appropriate

Question 8

Q

What are the three equivalent overall tests we can do in logistic regression?

Answer

A

Three asymptotically-equivalent tests:

(1) Likelihood ratio
(2) Score
(3) Wald

Rejecting H0 indicates that the model with all predictors is better than a model with no predictors (i.e., an intercept-only model)

Similar to Overall F-test in MLR

Question 9

Q

What is the Type 3 test for an individual predictor?

Answer

A

Type 3 test asks: Is the predictor variable
associated with the outcome, given the association with the other predictors has already been accounted for?

Type 3 Test can accommodate multi-level categorical predictors, in addition to continuous and binary predictors

Hypotheses:
H0 : The predictor is not important (given all other predictors)
H1 : The predictor is important (given all other predictors)

Consider doing Type 3 after rejecting H0 in the Overall test.
Rejecting H0 of Type 3 implies that there is signicant evidence of a linear association between the predictor and the binary response, given all other predictors are already included in the model.
Rejecting H0 of Type 3 Test implies an adjusted odds ratio not equal to 1
*Similar to partial F test in MLR

Question 10

Q

What is the difference between the estimated model and the predicted model?

Answer

A

The estimated is the logit model.
phat= (logit(pi/1+pi)) = B0 + B1 + B2 +…+Bj

The predicted is the odds model that has been exponentiated.
p=odds=exp(Bj)/1+exp(bj)

Question 11

Q

What is the Individual coefficient test?

Answer

A

Tests a single Bj predictor

H0 : Bj = 0 (given all other predictors)
H1 : B not equal to 0 (given all other predictors)

Rejecting H0 implies that there is signicant evidence of an association between Xj and Y, given all other predictors are in the model
Depending on the problem, we may be interested in testing against other
null values (e.g., H0 : Bj = 1)
Should not be used for multi-level categorical covariates
* Similar to Partial T-test in MLR

Question 12

Q

What is the large sample assumption of logistic regression?

Answer

A

Hypothesis testing in Logistic Regression is based on large sample theory and asymptotics - large sample sizes are recommended
at least 100 observations need enough observations for each category/group

each Bj ‘costs’ about 10 observations to estimate

Question 13

Q

Why the odds ratio?

Answer

A

Regardless of the specific study design used to collect the data, it is always appropriate to report an odds ratio

Since we are actually modeling the log(odds) in logistic regression, odds ratios tend to ‘fall out’ naturally.

Question 14

Q

What are the two types of odds ratios?

Answer

A

Simple odds ratios associated with individual predictors can be obtained by exponentiating the corresponding regression coefficient (e.g., expBj)

Complex odds ratios comparing any two predictor-profiles can be obtained by first determining the appropriate contrast and then exponentiating
OR1v2 = odds1/odds2

Question 15

Q

What are the assumptions of Logistic Regression?

Answer

A

Linear Relationship: Logit(p) can be modeled as a linear function of the predictors
Large sample with independent observations of equal importance (implied by errors and/or
design)
The predictors are independent of eachother (no multicollinearity)

*no error assumptions b/c there are no errors

Question 16

Q

How do you check for independence of observations assumption?

Answer

A

Plot of the outcome (Y ) against the index or order of data collection (to check independence)
Check characteristics of study design

Question 17

Q

How do you check for multicollinearity?

Answer

A

Plot predictor variables against each other
Look for large sample correlation coefficients
Look for large variance inflation factors (VIFs) via PROC REG

Question 18

Q

What are the model Selection Criteria for Logistic Regression?

Answer

A

Akaike information criterion (AIC) (smaller is better)
Bayesian information criterion (BIC) (smaller is better)
Area under the curve (AUC) (larger is better)
Has a meaningful stand-alone interpretation
Similar to R2 in MLR, the AUC tends to be overly optimistic regarding a model’s true predictive abilities when applied to an
external dataset
Generalized R2 (larger is better)
Does not have the same interpretation as in MLR (i.e., variability explained)
Use as supporting evidence only when making decisions

Question 19

Q

What are the goodness of fit tests for logistic regression?

Answer

A

Pearson (categorical)
Deviance (categorical)
Hosmer Lemeshow (continous only and/or categorical data)

Question 20

Q

What are the hypotheses for the goodness of fit tests in logistic regression?

Answer

A

H0 : The model fits the data well

H1 : The model does not fit the data well

Question 21

Q

Which goodness of fit test is comparing a saturated model with interaction terms with a reduced model with only the main effects for categorical predictors?

Answer

A

The deviance test
Hypotheses of Deviance Test:
H0 : the (reduced) model fits the data well
H1 : the saturated (full) model provides a better fit to the data

(The Saturated Model refers to the model with all main effects and all possible
pairwise interactions)

Test statistic (D)
D= -2LogL(full) -  -2logL(reduced)

Rejecting H0 implies that the saturated model is superior to the proposed reduced model
(The Deviance Test is actually a special case of the general Likelihood Ratio Test)

Question 22

Q

Which goodness of fit test compares what is predicted vs what is actually observed for categorical predictors?

Answer

A

The Pearson Chi-square test
H0 : the model fits the data well
H1 : the model does not fit the data well

Rejecting H0 implies that the model does not fit the data well

Question 23

Q

Which goodness of fit test compares what is predicted vs what is actually observed for continuous predictors?

Answer

A

The Hosmer Lemeshow test
H0 : the model fits the data well
H1 : the model does not fit the data well

Rejecting H0 implies that the model does not fit the data well
Can be thought of as an extension of the Pearson’s chi-square test

Question 24

Q

What is the General Likelihood Ratio Test?

Answer

A

Can be thought of as ‘the partial F-test of logistic regression’

Rejecting H0 implies that the full model is preferable to the reduced model

Reduced model must be `nested’ within the full model (i.e., a special case)

The most general type of hypothesis test for logistic regression

(The Deviance Test is actually a special case of the general Likelihood Ratio Test)

Question 25

Q

What are 4 similarities between MLR and Logistic Regression?

Answer

A

general hypothesis testing for the overall model and individual predictors/coefficients
assumptions about independence of observations/predictors
checking assumptions and identifying outliers/influentialobservations
general model-selection strategies

Question 26

Q

What are differences between MLR and Logistic Regression?

Answer

A

the left-hand side (i.e., modeling logit(p) instead of y)
No error term in logistic regression
Need more observations, complexity cost more
The distribution of test statistics under the null hypothesis ( Chi-Square vs T Dist)
Primarily interested in odds ratios
likelihoods (not sums-of-squares) are used for model comparisons

Brainscape's Knowledge GenomeTM

Unit 3: Logistic Regression Flashcards

Brainscape's Knowledge Genome^TM