Binary Logistic Regression with SPSS – A Comprehensive Guide by Karl L. Wuensch
Explore the world of Binary Logistic Regression with SPSS through an instructional document provided by Karl L. Wuensch of East Carolina University. Understand when to use this regression model, its applications in research involving dichotomous variables, and the iterative maximum likelihood procedure in SPSS. Dive into predictor variables like Gender, Ethical Idealism, and more for insightful analysis.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Binary Logistic Regression with SPSS Karl L. Wuensch Dept of Psychology East Carolina University
Download the Instructional Document http://core.ecu.edu/psyc/wuenschk/SPSS/ SPSS-MV.htm . Click on Binary Logistic Regression . Save to desktop. Open the document.
When to Use Binary Logistic Regression The criterion variable is dichotomous. Predictor variables may be categorical or continuous. If predictors are all continuous and nicely distributed, may use discriminant function analysis. If predictors are all categorical, may use logit analysis.
Wuensch & Poteat, 1998 Cats being used as research subjects. Stereotaxic surgery. Subjects pretend they are on university research committee. Complaint filed by animal rights group. Vote to stop or continue the research.
Purpose of the Research Cosmetic Theory Testing Meat Production Veterinary Medical
Predictor Variables Gender Ethical Idealism (9-point Likert) Ethical Relativism (9-point Likert) Purpose of the Research
Model 1: Decision = Gender Decision 0 = stop, 1 = continue Gender 0 = female, 1 = male Model is .. logit = Y ( ) = = + ln ODDS ln a bX 1 Y Y is the predicted probability of the event which is coded with 1 (continue the research) rather than with 0 (stop the research).
Iterative Maximum Likelihood Procedure SPSS starts with arbitrary regression coefficents. Tinkers with the regression coefficients to find those which best reduce error. Converges on final model.
SPSS Bring the data into SPSS http://core.ecu.edu/psyc/wuenschk/SPSS/ Logistic.sav Analyze, Regression, Binary Logistic
Decision Dependent Gender Covariate(s), OK
Look at the Output Case Processing Summary Unweighted Casesa Selected Cases N Percent Included in Analysis Missing Cases Total 315 100.0 0 .0 315 100.0 Unselected Cases Total a. 0 .0 315 100.0 If weight is in effect, see classification table for the total number of cases. We have 315 cases.
Block 0 Model, Odds Look at Variables in the Equation. The model contains only the intercept (constant, B0), a function of the marginal distribution of the decisions. Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 0 Constant -.379 .115 10.919 1 .001 .684 Y ( ) = = ln ODDS ln . 379 1 Y
Exponentiate Both Sides Exponentiate both sides of the equation: e-.379= .684 = Exp(B0) = odds of deciding to continue the research. Y 128 = = = Exp ( . 379 ) . 684 187 1 Y 128 voted to continue the research, 187 to stop it.
Probabilities Randomly select one participant. P(votes continue) = 128/315 = 40.6% P(votes stop) = 187/315 = 59.4% Odds = 40.6/59.4 = .684 Repeatedly sample one participant and guess how e will vote.
Humans vs. Goldfish Humans Match Probabilities (suppose p = .7, q = .3) .7(.7) + .3(.3) = .49 + .09 = .58 Goldfish Maximize Probabilities .7(1) = .70 The goldfish win!
SPSS Model 0 vs. Goldfish Look at the Classification Table for Block 0. Classification Tablea,b Predicted decision Percentage Correct Observed decision stop continue Step 0 stop continue 187 128 0 0 100.0 .0 Overall Percentage 59.4 a. Constant is included in the model. b. The cut value is .500 SPSS Predicts STOP for every participant. SPSS is as smart as a Goldfish here.
Block 1 Model Gender has now been added to the model. Model Summary: -2 Log Likelihood = how poorly model fits the data. Model Summary -2 Log likelihood Cox & Snell R Square Nagelkerke R Square Step 1 399.913a .078 .106 a. Estimation terminated at iteration number 3 because parameter estimates changed by less than .001.
Block 1 Model For intercept only, -2LL = 425.666. Add gender and -2LL = 399.913. Omnibus Tests: Drop in -2LL = 25.653 = Model 2. df = 1, p < .001. Omnibus Tests of Model Coefficients Chi-square df Sig. Step 1 Step Block Model 25.653 25.653 25.653 1 1 1 .000 .000 .000
Variables in the Equation ln(odds) = -.847 + 1.217 Gender + Gender = a b ODDS e Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 1 gender Constant 1.217 -.847 .245 .154 24.757 30.152 1 1 .000 .000 3.376 .429 a a. Variable(s) entered on step 1: gender.
Odds, Women 847 . . 1 + 847 . = = = 217 ( 0 ) ODDS e e . 0 429 A woman is only .429 as likely to decide to continue the research as she is to decide to stop it.
Odds, Men 847 . . 1 + = = . 1 = 217 ) 1 ( 37 . ODDS e e 448 A man is 1.448 times more likely to vote to continue the research than to stop the research.
Odds Ratio male _ odds . 1 448 = = = . 1 217 . 3 376 e female _ odds . 429 1.217 was the B (slope) for Gender, 3.376 is the Exp(B), that is, the exponentiated slope, the odds ratio. Men are 3.376 times more likely to vote to continue the research than are women.
Convert Odds to Probabilities For our women, ODDS . 0 429 = = = Y . 0 30 + 1 ODDS . 1 429 For our men, ODDS . 1 448 = = = Y . 0 59 + 1 ODDS . 2 448
Classification Decision Rule: If Prob (event) Cutoff, then predict event will take place. By default, SPSS uses .5 as Cutoff. For every man, Prob(continue) = .59, predict he will vote to continue. For every woman Prob(continue) = .30, predict she will vote to stop it.
Overall Success Rate Look at the Classification Table Classification Tablea Predicted decision Percentage Correct Observed decision stop continue Step 1 stop continue 140 60 47 68 74.9 53.1 66.0 Overall Percentage a. The cut value is .500 + 140 68 208 = = 66 % 315 315 SPSS beat the Goldfish!
Sensitivity P (correct prediction | event did occur) P (predict Continue | subject voted to Continue) Of all those who voted to continue the research, for how many did we correctly predict that. 68 + 68 = = 53 % 68 60 128
Specificity P (correct prediction | event did not occur) P (predict Stop | subject voted to Stop) Of all those who voted to stop the research, for how many did we correctly predict that. 140 140 = = 75 % + 140 47 187
False Positive Rate P (incorrect prediction | predicted occurrence) P (subject voted to Stop | we predicted Continue) Of all those for whom we predicted a vote to Continue the research, how often were we wrong. 47 + 47 = = 41 % 47 68 115
False Negative Rate P (incorrect prediction | predicted nonoccurrence) P (subject voted to Continue | we predicted Stop) Of all those for whom we predicted a vote to Stop the research, how often were we wrong. 60 + 60 = = 30 % 140 60 200
Pearson 2 Analyze, Descriptive Statistics, Crosstabs Gender Rows; Decision Columns
Crosstabs Statistics Statistics, Chi-Square, Continue
Crosstabs Cells Cells, Observed Counts, Row Percentages
Crosstabs Output Continue, OK 59% & 30% match logistic s predictions. gender * decision Crosstabulation decision stop continue Total gender Female Count % within gender Count % within gender Count % within gender 140 60 200 70.0% 30.0% 100.0% Male 47 68 115 40.9% 187 59.4% 59.1% 128 40.6% 100.0% Total 315 100.0%
Crosstabs Output Likelihood Ratio 2= 25.653, as with logistic. Chi-Square Tests Asymp. Sig. (2-sided) Value df 25.685b 25.653 315 Pearson Chi-Square Likelihood Ratio N of Valid Cases a. 1 1 .000 .000 Computed only for a 2x2 table b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 46.73.
Model 2: Decision = Idealism, Relativism, Gender Analyze, Regression, Binary Logistic Decision Dependent Gender, Idealism, Relatvsm Covariate(s)
Click Options and check Hosmer- Lemeshow goodness of fit and CI for exp(B) 95%. Continue, OK.
Comparing Nested Models With only intercept and gender, -2LL = 399.913. Adding idealism and relativism dropped -2LL to 346.503, a drop of 53.41. 2(2) = 399.913 346.503 = 53.41, p = ? Model Summary -2 Log likelihood Cox & Snell R Square Nagelkerke R Square Step 1 346.503a .222 .300 a. Estimation terminated at iteration number 4 because parameter estimates changed by less than .001.
Obtain p Transform, Compute Target Variable = p Numeric Expression = 1 - CDF.CHISQ(53.41,2)
p = ? OK Data Editor, Variable View Set Decimal Points to 5 for p
p < .0001 Data Editor, Data View p = .00000 Adding the ethical ideology variables significantly improved the model.
Hosmer-Lemeshow H : predictions made by the model fit perfectly with observed group memberships Cases are arranged in order by their predicted probability on the criterion. Then divided into (usually) ten bins with approximately equal n. This gives ten rows in the table.
For each bin and each event, we have number of observed cases and expected number predicted from the model. Contingency Table for Hosmer and Lemeshow Test decision = stop decision = continue Observed Expected Observed Expected Total Step 1 1 2 3 4 5 6 7 8 9 10 29 30 28 20 22 15 15 10 12 6 29.331 27.673 25.669 23.265 20.693 18.058 15.830 12.920 9.319 4.241 3 2 4 2.669 4.327 6.331 8.735 11.307 13.942 16.170 19.080 22.681 22.759 32 32 32 32 32 32 32 32 32 27 12 10 17 17 22 20 21
Note expected freqs decline in first column, rise in second. The nonsignificant chi-square is indicative of good fit of data with linear model. Hosmer and Lemeshow Test Step 1 Chi-square df Sig. 8.810 8 .359
Hosmer-Lemeshow There are problems with this procedure. Hosmer and Lemeshow have acknowledged this. Even with good fit the test may be significant if sample sizes are large Even with poor fit the test may not be significant if sample sizes are small. Number of bins can have a big effect on the results of this test.
Linearity of the Logit We have assumed that the log odds are related to the predictors in a linear fashion. Use the Box-Tidwell test to evaluate this assumption. For each continuous predictor, compute the natural log. Include in the model interactions between each predictor and its natural log.
Box-Tidwell If an interaction is significant, there is a problem. For the troublesome predictor, try including the square of that predictor. That is, add a polynomial component to the model. See T-Test versus Binary Logistic Regression