Understanding Interaction Effects in Regression Analysis using SAS 9.4
Regression models help analyze effects of independent variables (IVs) on dependent variables (DVs, like weight loss from exercise time). Interactions explore how one IV's effect can be modified by another IV (moderating variable, MV). In this seminar's purpose, techniques to estimate, test, and graph interaction effects using PROC PLM are demonstrated with examples like sex and height interacting to influence weight. Simple effects and slopes help decipher conditional effects within interaction models.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
ANALYZING AND VISUALIZING INTERACTIONS IN SAS 9.4 Andy Lin IDRE Statistical Consulting
Background Regression models effects of IVs on DVs. E.g. does amount of time exercising predict weight loss? Can also model effect of IV modified by another IV moderating variable (MV) e.g is effect of exercise time on weight loss modified by the type of exercise? Effect modification = interaction
Background Interactions are products of IVs Typically entered with the IVs into regression All we get out of regression is a coefficient Not enough to understand interaction What are the conditional effects? Simple effects and slopes Conditional interactions
Purpose of seminar Demonstrate methods to estimate, test and graph effects within an interaction Specifically we will use PROC PLM to: Calculate and estimate simple effects Compare simple effects Graph simple effects
weight=0+sSEX+hHEIGHT Main effects vs interaction models Main effects models IV effects constrained to be the same across levels of all other IVs in the model Main effect of height is constrained to be the same across sexes Average of male and female height effect
weight=0+sSEX+hHEIGHT Main effects vs interaction models Interaction models Allow effect of an IV to vary with levels of another IV Formed as product of 2 IVs Now the effect of height may vary between sexes And effect of sex may vary at different heights
Simple effects and slopes From this equation We can derive sex-specific regression equations Males (sex=0) Females (sex=1)
Simple effects and slopes Each sex has its own height effect Males (sex=0) Females (sex=1) These are the simple slopes of height within each group Interaction coefficient is difference in simple slopes
PROC PLM We use proc plm for most of our analyses Proc plm performs post-estimation analyses and graphing Uses and item store as input Contains model information (coefficients and covariance matrices) Item store created in other procs Inlcuding glm, genmod, logistic, phreg, mixed, glmmix, and more
PROC PLM Important proc plm statement used in this seminar Estimate statement Forms linear combinations of coefficients and tests them against 0 Very flexible linear combinations can be means, effects, contrasts, etc. We use it to estimate and compare simple slopes Syntax is a bit more difficult
PROC PLM Important proc plm statement used in this seminar Slice statement Specifically analyzes simple effects Very simple syntax Lsmestimate statement Compare estimated marginal means, i.e. calculate simple effects More versatile than slice
PROC PLM Lsmeans statement Estimates marginal means and can calculate differences between them Effectplot Plots predicted values of the outcome across range of values on 1 or more predictors Can visualize interactions Many types of plots
WHY PROC PLM Many of these statements found in regression procs Why use PROC PLM? Do not have to rerun model as we run code for interaction analysis These statements sometimes have more functionality in PROC PLM
Dataset used in seminar Study of average weekly weight loss achieved by subjects in 3 exercise programs 900 subjects Important variables: Loss continuous, normal outcome average weekly weight loss Hours continuous predictor average weekly hours of exercise Effort continuous predictor average weekly rating of exertion when exercising, ranging from 0 to 50
Dataset used in the seminar Important variables cont: Prog 3-category predictor - which exercise program the subject followed, 1=jogging, 2=swimming, 3=reading (control) Female binary predictor - gender, 0=male, 1=female Satisfied - binary outcome - subject s overall satisfaction with weight loss due to participation in exercise program, 0=unsatisfied, 1=satisified
Continuous-by-continuous: the model We first model the interaction of 2 continuous IVs The effect of a continuous IV on the outcome is called a slope Expresses change in outcome pre unit increase in IV With the interaction of 2 continuous variables, the slope of each IV is allowed to vary with the other IV Simple slopes
Continuous-by-continuous: the model Let us look at model where Y is predicted by continuous X, continuous Z, and their interaction: Be careful when interpreting x and z They are simple effects (when interacting variable=0), not main effects
Continuous-by-continuous: the model The coefficient xz is interpreted as the change in the simple slope of X per unit-increase in Z Equation for simple slope of X:
Continuous-by-continuous: example model We regress loss on hours, effort, and their interaction Is the effect of hours modified by the effort that the subject exerts? And the converse is effect of effort modified by hours?
Continuous-by-continuous: example model procglm data=exercise; model loss = hours|effort / solution; store contcont; run; The | requests main effects and interactions solution requests table of regression coefficients store contcont creates an item store of the model for proc plm
Continuous-by-continuous: example model Interaction is significant Remember that hours and effort terms are simple slopes
Continuous-by-continuous: calculating simple slopes Estimate statement used to form linear combinations of regression coefficients Including simple slopes (and effects) Very flexible Understanding the regression equation very helpful in coding estimate statements
Estimate statement syntax Estimate label coefficient values / e e.g. to estimate expected loss when hours=2 and effort = 30 procplm restore=contcont; estimate 'pred loss, hours=2, effort=30' intercept 1 hours 2 effort 30 hours*effort 60 / e; run; The regression coefficients are multiplied by their values and summed to form the estimate, which is tested against 0
We see that the values are correct And a test against 0 (not interesting here)
Continuous-by-continuous: calculating simple slopes Let s revisit the formula for the simple slope of X moderated by Z In the estimate statement, we will put a 1 after x and the value of z after zx In our model, X = hours and Z=effort
Continuous-by-continuous: calculating simple slopes What values of effort to choose to evaluate simple slopes of hours Two common choices: Substantively important values (education=12yrs, BMI=18, temperature = 98.6, etc.) Data-driven values (mean, mean+sd, mean-sd) There are no a priori important values of effort, so we choose (mean, mean+sd, mean-sd) = (26.66, 34.8, 24.52)
Continuous-by-continuous: calculating simple slopes procplm restore=contcont; estimate 'hours, effort=mean-sd' hours 1 hours*effort 24.52, hours, effort=mean' hours 1 hours*effort 29.66, 'hours, effort=mean+sd' hours 1 hours*effort 34.8 / e; run;
Continuous-by-continuous: calculating simple slopes We might be interested in whether those simple slopes are different, but we don t need to test it Why? If the moderator is continuous and interaction is significant then simple slopes will always be different We demonstrate a difference to show this
Continuous-by-continuous: calculating simple slopes To get the difference between simple slopes, take the difference between values across coefficients in the estimate statement hours 1 hours*effort 29.66 - hours 1 hours*effort 24.52 hours 0 hours*effort 5.14
Continuous-by-continuous: calculating simple slopes Coefficients with 0 values can be omitted: procplm restore=contcont; estimate 'diff slopes, mean+sd - mean' hours*effort 5.14; run; Same t-value and p-value as interaction coefficient
Continuous-by-continuous: graphing simple slopes We use effectplot statement in proc plm Plot predicted outcome across range of values of predictors We will plot across range of 2 predictors to depict an interaction
Simple slopes as contour plots procplm source=contcont; effectplot contour (x=hours y=effort); run;
Simple slopes as contour plots Contour plots uncommon Nice that both continuous variables are represented continuously Simple slopes of hours are horizontal lines across graph The more the color changes, the steeper the slope
Simple slopes as a fit plot procplm source=contcont; effectplot fit (x=hours) / at(effort=24.5229.6634.8); run; Effort will not be represented continuously, so we must specify values what we want A separate graph will be plotted for each effort
Simple slopes as a fit plot More easily understood But why not all 3 on one graph?
Creating a custom graph through scoring We can make the graph ourselves by getting predicted loss values across a range of hours at the 3 selected effort values (24.52, 29.66, 34.8) by: Creating a dataset of hours and effort values at which to predict the outcome loss Use the score statement in proc plm to predict the outcome and its 95% confidence interval Use the scored dataset in proc sgplot to create a plot
Creating a custom graph through scoring data scoredata; do effort = 24.52, 29.66, 34.8; do hours = 0 to 4 by 0.1; output; end; end; run; procplm source=contcont; score data=scoredata out=plotdata predicted=pred lclm=lower uclm=upper; run; procsgplot data=plotdata; band x=hours upper=upper lower=lower / group=effort transparency=0.5; series x=hours y=pred / group=effort; yaxis label="predicted loss"; run;
Creating a custom graph through scoring Purty!
Quadratic effect: the model Special case of a continuous-by-continuous interaction Interaction of IV with itself Allows the (linear) effect of the IV to vary depending on the level of the IV itself Models a curvilinear relationship between DV and IV
Quadratic effect: the model The regression equation with linear and quadratic effects of continuous predictor X: x is still interpreted as slope of X when X=0 xx interpretation slightly different Represents the change in the slope of X when X increase by 1 unit
Quadratic effect: the model To get formula for simple slope of X, we must use partial derivative: Here we see that the slope of X changes by 2 xx per unit-increase in X
Quadratic effect: example model We regress loss on the linear and quadratic effect of hours procglm data=exercise order=internal; model loss = hours|hours / solution; store quad; run;
Quadratic effect: example model Quadratic effect is significant Negative sign indicates that slope becomes more negative as hours increases (inverted U-shaped curve) Diminishing returns on increasing hours
Quadratic effect: calculating simple slopes We construct estimate statements for simple slopes in the same way as before BUT, we must be careful to multiply the value after the quadratic effect by 2 We will put a 1 after x and the value of 2*x after xx No a priori important values of hours, so we choose mean=2, mean+sd=2.5, and mean-sd=1.5
Quadratic effect: calculating simple slopes procplm restore=quad; estimate 'hours, hours=mean-sd(1.5)' hours 1 hours*hours 3, 'hours, hours=mean(2)' hours 1 hours*hours 4, 'hours, hours=mean+sd(2.5)' hours 1 hours*hours 5 / e; run; Slopes decrease as hours increase, eventually non-significant
Quadratic effect: comparing simple slopes Do not need to compare Significance always same as interaction coefficient
Quadratic effect: graphing the quadratic effect The fit type of effectplot is made for plotting the outcome vs a single continuous predictor procplm restore=quad; effectplot fit (x=hours); run;
Quadratic effect: graphing the quadratic effect Diminishing returns apparent Too many hours of exercise may lead to weight gain
Continuous-by-categorical: the model We can also estimate the simple slopes in a continuous-by-categorical interaction We will estimate the slope of the continuous variable within each category of the categorical variable We could also look at the simple effects of the categorical variable across levels of the continuous First, how do categorical variables enter regression models?
Categorical predictors and dummy variables A categorical predictor with k categories can be represented by k dummy variables Each dummy codes for membership to a category, where 0=non-membership and 1=membership However, typically only k-1 dummies are entered into the regression model? Each dummy is a linear combination of all other dummies -- collinearity Regression model cannot estimate coefficient for a collinear predictor