Understanding Logistic Regression Model Selection in Statistics
Statistics, as Florence Nightingale famously said, is the most important science in the world. In this chapter on logistic regression, we delve into model selection, interpretation of parameters, and methods such as forward selection, backward elimination, and stepwise selection. Guidelines for selecting models with discrete predictors and the SAS application for model selection are also discussed. The aim is to find a balance between complexity and interpretability in fitting data.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Statistics . the most important science in the whole world: for upon it the most important science in the whole world: for upon it depends the practical application of every other science and of depends the practical application of every other science and of every art for it only gives results of our experience. every art for it only gives results of our experience. Florence Nightingale 1820 - 1910
CHAPTER 6 Logistic Regression -- Application In this chapter we demonstrate how the logistic regression model is used to make inferences. We cover: model selection interpretation of parameters Mantel-Haenszel test 2
6.1 MODEL SELECTION Chapter 1: Levels of Measurement Model selection for logistic regression models follows the same principle as for least squares regression model selection, namely, search for a model that is complex enough to fit the data well but simple enough to interpret. Just as in least squares regression, there are three model selection procedures to choose from: Forward selection (start with no predictors in the model) Backward elimination (start with all predictors in the model) Stepwise (start with no predictors in the model) 3
MODEL SELECTION Chapter 1: Levels of Measurement Some guidelines in model selection follow. For an m-level discrete predictor, m-1 0-1 dummy variables are included in the model. In the model selection procedure, where predictors are added or deleted according to their adjusted statistical significance, the entire set of m-1 dummy variables for a given discrete predictor is added or deleted in the process. This is done in SAS by using the CLASS statement in PROC LOGISTIC. We often wish to retain all main effect predictors in the model so that each predictor s effect on the logit is adjusted for the other main effect terms. Consider only hierarchical models, i.e., any terms involved in an interaction in the model are also included in the model (if X*Y*Z is in the model, then X*Y, X*Z, Y*Z, X, Y, and Z must also be in the model). The backward elimination model selection procedure is recommended based on the principle that it is safer to delete terms from an overly complex model than to add terms to an overly simple model. 4
MODEL SELECTION -- SAS Chapter 1: Levels of Measurement input color spine width weight satellites; If satellites ge 1 then y=1; else y=0; cards; 2 3 28.3 3.05 8 3 3 26.0 2.60 4 . . . ; title 'Horseshoe Crab Data, Table 4.3, page 123, Agresti'; proc logistic descending; class color spine; model y = width color spine weight/backward; run; /* Check multicollinearity */ data crab; set crab; if color=2 then light=1; else light=0; if color=3 then med=1; else med=0; if color=4 then darkmed=1; else darkmed=0; if spine=1 then both=1; else both=0; if spine=2 then one=1; else one=0; regy = 1; proc corr pearson spearman; var light med darkmed both one width weight; proc reg; model regy=light med darkmed both one width weight/vif; run; data crab; SAS uses 1, -1 coding for indicator variables by default, so the reference category ?? ????? is assigned -1 for all ? 1 indicator variables. This is a fake regression (the outcome is set at 1 for all observations); it is run only in order to get the VIF for the predictors. 5
MODEL SELECTION SAS Output Chapter 1: Levels of Measurement 6
6.2 MODEL PERFORMANCE and DIAGNOSTICS Chapter 1: Levels of Measurement Diagnostics and model performance measures were discussed in chapter 3 for two-way tables. We now discuss these concepts for logistic regression. PEARSON RESIDUALS Suppose there are n observations ( subjects ) in the data set. Each subject has a value for perhaps several continuous variables and several indicator variables. For example, from the horseshoe crab data we have: COLOR SPINE CONDITION Subject Light Med Darkmed Both One Width Weight 1 1 0 0 0 1 26.2 3020 2 0 0 1 1 0 28.3 3112 3 0 1 0 etc. 7
MODEL PERFORMANCE Chapter 1: Levels of Measurement PEARSON RESIDUALS, continued Subject Light Med Darkmed Both One Width Weight 1 1 0 0 0 1 26.2 3020 2 0 0 1 1 0 28.3 3112 3 0 1 0 etc. Consider the ?? setting of the predictors, ? = 1, 2, ..., N, where N n (there may be several settings shared by more than one subject). If there are several continuous variables in the data set, then it s possible that N = n (each subject has its own unique setting). In this case, for setting ?, ??={# ???????? ?? ??????? ?} = 1 and ??= {# ????????? ?? ??????? ?} = 0 ?? 1. 8
MODEL PERFORMANCE Chapter 1: Levels of Measurement PEARSON RESIDUALS, continued ? = + ?=1 Logistic regression model: ????? ?? = ??? ???? ?? ???? The Pearson residual for the ?? setting is: ?? ?? ? ??= ? ?1 ? where ??= # of successes for setting ?, ??= # of observations having setting ?, and ?is the success probability for setting ? estimated from the logistic regression model: ? ???? + ?=1 ? ?= ? ???? + ?=1 1 + ? 9
MODEL PERFORMANCE Chapter 1: Levels of Measurement PEARSON RESIDUALS, continued ?? ?? ? ??= ? ?1 ? 2~ 1 2approximately ?? ? 2= ?2 ?=1 ?? 2< 1 ?? 10
MODEL PERFORMANCE Chapter 1: Levels of Measurement PEARSON STANDARDIZED RESIDUALS ?? 1 ? ??= ~ ?(0,1) approximately ?is called the leverage, it is the ?? diagonal element of the N N hat matrix : 1? ? ??? = ? ? ? ?? where ? is the N N diagonal matrix with diagonal elements ?? ?1 ? for the ??observations at setting ?, and X is the N p design matrix with elements ???. 11
MODEL PERFORMANCE Chapter 1: Levels of Measurement DEVIANCE RESIDUALS For the ?? setting, ???????? = ?? ???? ?? ?? ? ?? ?? ?? ?? ?? ? where ??= 2 ????? + ?? ????? ?? ? ? ? = 2 ???? for setting ? 2< 1 ?? Note: as discussed above, when there are several continuous variables in the model we may have some settings ? with a single subject; in these cases, the residuals may be uninformative. 12
MODEL PERFORMANCE EXAMPLE Chapter 1: Levels of Measurement We will obtain residuals and goodness of fit indices for the horseshoe crab data with the single predictor width. proc logistic; model y = width / influence lackfit; run; 13
MODEL PERFORMANCE EXAMPLE, continued Chapter 1: Levels of Measurement ?0: ????? ???? ? ? ???? ???? P = 0.5847 => we cannot reject ?0 at the 0.05 level of significance. Conclusion: the model fits the data well. 14
6.3 PREDICTIVE POWER Chapter 1: Levels of Measurement CLASSIFICATION MATRIX The Classification Matrix (introduced in chapter 5) is used to assess predictive performance. For a given observation, let ? = 0 or 1 according to whether that observation corresponds to a failure or success, respectively. The model predicted value of ?, ?, is calculated as follows: 1,? ?? ?> 0 ? = 0,? ?? ? 0 SAS uses 0= 0.5 as a default. 15
PREDICTIVE POWER Chapter 1: Levels of Measurement An example classification matrix and several summary measures are given below. ? (predicted by model) 0 1 Total 0 45 10 55 ? (observed) 1 5 40 45 Total 50 50 100 40 45= 0.89 45 55= 0.82 These are the values that the diagnostician/researcher is most interested in. Sensitivity = ? ? = 1|? = 1 = Specificity = ? ? = 0|? = 0 = 10 55= 0.18 5 45= 0.11 False Positive Rate = ? ? = 1|? = 0 = False Negative Rate = ? ? = 0|? = 1 = 40 50= 0.80 45 50= 0.90 Positive Predictive Value (PPV) = ? ? = 1| ? = 1 = These are the values that the patient is most interested in. Negative Predictive Value (NPV) = ? ? = 0| ? = 0 = 45 100= 0.45 Prevalence Rate = ? ? = 1 = 16
PREDICTIVE POWER Chapter 1: Levels of Measurement RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE How the logistic regression model predicts success or failure depends on the choice of 0; recall 1,? ?? ?> 0 ? = 0,? ?? ? 0 So, there may be an optimal choice of 0, i.e., a choice of 0that optimizes the prediction accuracy. The ROC curve is used to find this optimal choice of 0. The ROC curve is a plot of sensitivity versus 1-specificity for selected values of 0. The plot is a curve that is concave down extending from (0,0) to (1,1). 17
PREDICTIVE POWER Chapter 1: Levels of Measurement AUC = Area Under the Curve 18
PREDICTIVE POWER Chapter 1: Levels of Measurement RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE, continued The concordance index c = area under the curve (AUC) = probability that an observation with larger y also has larger for all pairs of observations c = 0.5 => the prediction is equivalent to random guessing. For the logistic regression model having only the intercept term (no predictors), the ROC curve is a straight line connecting (0,0) and (1,1). The optimal 0is the one corresponding to the elbow of the ROC curve. 19
PREDICTIVE POWER Chapter 1: Levels of Measurement RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE, continued 0= 0 optimal point? 0= 1 20
PREDICTIVE POWER Chapter 1: Levels of Measurement RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE, continued How is the ROC curve generated? ? = + ?=1 1. Run the logistic regression for your data: ????? ?? ???? ? ???? + ?=1 ? 2. Solve for the estimated probability: ?= ? ???? + ?=1 1 + ? 3. For a given value of 0in 0,1 calculate Sensitivity = ? ? = 1|? = 1 and 1-specificity = ? ? = 1|? = 0 based on 1,? ?? ?> 0 ? = 0,? ?? ? 0 Note: for 0= 0, ? = 1 always => sensitivity = 1-specificity = 1; for 0= 1, ? = 0 always => sensitivity = 1-specificity = 0. 21
PREDICTIVE POWER Chapter 1: Levels of Measurement RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE, continued SAS LANGUAGE: title 'Horseshoe Crab Data, Table 4.3, page 123,'; title3 'y = 1 implies at least one satellite'; proc logistic descending; class color; model y = width color / outroc=roc; run; proc plot data=roc; plot _sensit_*_1mspec_; title 'ROC curve'; run; proc print data=roc; goptions cback=white colors=(black) border; axis1 length=2.5in; axis2 order=(0 to 1 by .1) length=2.5in; run; proc gplot data=roc; symbol1 i=join v=none; plot _sensit_*_1mspec_ /haxis=axis1 vaxis=axis2; run; quit; 22
PREDICTIVE POWER Chapter 1: Levels of Measurement RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE, continued SAS Output: Optimal? 23
PREDICTIVE POWER Chapter 1: Levels of Measurement RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE, continued SAS Output: 24
PREDICTIVE POWER Chapter 1: Levels of Measurement RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE, continued The optimal cut-off point ( 0 0.68) has, approximately, sensitivity = 0.63 1-specificity = 0.26 25
6.4 COCHRAN-MANTEL-HAENSZEL (CMH) TEST Chapter 1: Levels of Measurement 2 x 2 x K contingency tables are common in research. Often for such tables it is of interest to test for independence between X (row) and Y (column) conditional on Z (layer). A non-model based test of H0: X Y|? , namely a model of conditional independence, has been developed. For each partial table (layer), ????,? = 1,2, ,?, condition on ??? ????? (1) the predictor (row) marginal totals, ??+?,? = 1,2 ???? j=1 j=2 TOTAL ?11? ?21? ?+?? ?12? ?22? ?+?? ??+? ??+? ?++? i=1 and i=2 TOTAL (2) the response (column)marginal totals, ?+??,? = 1,2. This leads to a hypergeometric distribution for ?11?for each partial table. 26
COCHRAN-MANTEL-HAENSZEL (CMH) TEST Chapter 1: Levels of Measurement Under ?0: X Y|? , we define ?1+??+1? ?++? ?1+??2+??+1??+2? ?++? ?++? 1 2 11?= ? ?11? = ?11? = and . 2 ?11? are independent for ? = 1,2, ,?. To test for conditional indpendence we pool information from the K partial tables. Cochran-Mantel-Haenszel test for conditional independence: 2 ? ?=1 ?11? 11? ?=1 ?11? ??? = ? 2 2under ?0. This test is most effective when the X-Y association is similar ??? ~ 1 for all layers. 27
COCHRAN-MANTEL-HAENSZEL (CMH) TEST Chapter 1: Levels of Measurement To estimate the common OR we use the CMH estimate: ?11??22? ?++? ?12??21? ?++? ? ?=1 ???= ? ?=1 This estimate gives more weight to layers with larger sample sizes. If K is large and the tables are sparse, then this estimate is better than the MLE. To test for homogeneity of ORs across the K layers we use the Breslow-Day test. The null hypothesis can be stated thus: ?0: ??(1)= ??(2)= = ??(?) 28
COCHRAN-MANTEL-HAENSZEL (CMH) TEST -- EXAMPLE Chapter 1: Levels of Measurement A study relating treatment to response is conducted at 8 different centers (Beitler and Landis, 1985): Center Treatment Success Failure OR 1 Drug 11 25 1.19 data clinical; do center=1 to 8 do trt=1 to 2; do y=1 to 2; input count @@; output; end; end; end; datalines; 11 25 10 27 16 4 22 10 ... ; proc logistic; class center; weight count; model y = trt center / rsq lackfit; run; proc freq; weight count; table center*trt*y/nocol nopercent cmh chisq measures; run; Control 10 27 2 Drug 16 4 1.82 Control 22 10 3 Drug 14 5 4.80* Control 7 12 4 Drug 2 14 2.29 Control 1 16 * 5 Drug 6 11 Control 0 12 6 Drug 1 10 Control 0 10 7 Drug 1 4 2.0 Control 1 8 8 Drug 4 2 0.33 29 Control 6 1 * = statistically significant at = 0.05
COCHRAN-MANTEL-HAENSZEL (CMH) TEST EXAMPLE, continued Chapter 1: Levels of Measurement Center Treatment Success Failure OR 1 Drug 11 25 1.19 Control 10 27 2 Drug 16 4 1.82 Control 22 10 3 Drug 14 5 4.80* Control 7 12 4 Drug 2 14 2.29 Control 1 16 * 5 Drug 6 11 Control 0 12 6 Drug 1 10 Control 0 10 ?0: ?? ?? ? ? ???? ??? ??? ??????? 7 Drug 1 4 2.0 Control 1 8 8 Drug 4 2 0.33 Control 6 1 30
COMPARISONS Chapter 1: Levels of Measurement Two competing models to the logistic regression model are linear discriminant analysis and probit analysis. LINEAR DISCRIMINANT ANALYSIS Logistic regression analysis and linear discriminant analysis generally do the same thing predict group membership. However, logistic regression does not have as many restrictive model assumptions. Also, it functions more like a regression, namely it can predict the outcome as well as quantify the effect of predictors on the outcome. PROBIT ANALYSIS Logistic regression and probit analysis are also very similar: the probit and logit S- curves are nearly the same. However, logistic regression is applicable to retrospective studies, unlike probit analysis. Also, in logistic regression sufficient statistics are used in the estimation, unlike the probit model. 31
CHAPTER 6 CONCLUSION The effect of transforming one variable This chapter has presented the practical aspects of carrying out a logistic regression analysis, including estimating and interpreting parameters diagnosing and correcting assumption violations introducing Receiver Operating Characteristic curves presenting the Cochran-Mantel-Haenszel test for conditional independence in 2 x 2 x K tables 32
SUMMARY EXERCISES The effect of transforming one variable 1. In words, what is the ROC curve? 2. What is ??? 3. What is the technique for obtaining the optimal value of ?? with respect to sensitivity and specificity? 4. What is the purpose of the Cochran-Mantel-Haenszel test? 5. When is the Cochran-Mantel-Haenszel test most effective? 6. What is the name of the test used to determine if the X-Y association is homogeneous across levels of Z? 33