Understanding Latent Variable Modeling in Statistical Analysis
Latent Variable Modeling, including Factor Analysis and Path Analysis, plays a crucial role in statistical analysis to uncover hidden relationships and causal effects among observed variables. This method involves exploring covariances, partitioning variances, and estimating causal versus non-causal aspects. By delving into common factor models and techniques like Exploratory and Confirmatory Factor Analysis, as well as elucidating causal relations through Path Analysis, researchers aim to uncover meaningful insights and enhance the understanding of complex data structures.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Tate Center Lecture Series Brooks Applegate, EMR 3/10/2014
Often the analysis focuses on covariances so is is referred to as Covariance Structure Modeling or Structural Regression Models Often involves unobserved or latent variables so it is referred to as Latent Variable Modeling Often SEM models test /estimate causal effects (theoretically modeled) so it is referred to as Causal Modeling 3/10/14 Tate Center Series 2
About 100 years ago Spearman put down the foundation of what is today called the Common Factor Model and the techniques to statistically analyze it Factor Analysis FA is one of the most frequently used multivariate statistical techniques in use today. CFM & FA studies the variance structures within a group. The fundamental intent of FA is to determine the number and nature of the latent variables that account for the variation/covariation in a larger set of observed variables 3/10/14 Tate Center Series 3
Postulates that each indicator in a set of observed measures is a linear function of one or more common factors and one unique factor. FA task is to partition the variance of each indicator into Common variance Unique variance There are two broad FA families that do this Exploratory FA (EFA) Confirmatory FA (CFA) 3/10/14 Tate Center Series 4
In path analysis (PA), the researcher specifies a model that attempts to explain why X and Y (and other observed variables) covary Part of the explanation about why two variables covary may include presumed causal effects (e.g., X causes Y) Other parts of the explanation may reflect presumed noncausal relations, such as a spurious association The overall goal in PA is to estimate causal versus noncausal aspects of observed covariances 3/10/14 Tate Center Series 7
To reasonably infer that X is a cause of Y, all of the following conditions must be met: There is time precedence, that is, X precedes Y in time The direction of the causal relation is correctly specified, that is, X causes Y instead of the reverse or that X and Y cause each other The association between X and Y does not disappear when external variables, such as common causes of both, are held constant (i.e., it is not spurious) It is very unlikely that all these conditions would be satisfied in a single study 1. 2. 3. 3/10/14 Tate Center Series 8
The assessment of variables at different times at least provides a measurement framework consistent with the specification of directional causal effects longitudinal designs pose many potential difficulties, such as subject attrition and the need for additional resources When the variables are concurrently measured, it is not possible to demonstrate time precedence Therefore, the researcher needs a very clear, substantive rationale for specifying that X causes Y instead of the reverse or that X and Y mutually influence each other when all variables are measured at the same time 3/10/14 Tate Center Series 9
X1 Y1 X2 Y3 Y2 3/10/14 Tate Center Series 10
Basic foundation was laid down in the 1970s Generally became accessible to researchers in the 1980 s New developments (statistical estimation theory, numerical analysis & desktop computing power) have made this family accessible to researchers who have not had extensive training in applied statistics and measurement 3/10/14 Tate Center Series 11
SEM can be generally thought of as a generalization or extension of: ANOVA (DOE) Regression Principal Factor Analysis It can model multilevel data Provided an appropriate link function it can accommodate non-linear response data Item Response Models Growth Mixture Models 3/10/14 Tate Center Series 12
SEM models fit (and test) a-priori models to data OR fits a-post priori models to data Models employ both estimation and hypothesis testing Models require the explicit representation of observed (indicator) and unobserved (latent) variables Models can be applied to experimental and non experimental data Models can be exploratory, confirmatory or a mixture of both (Jorsekog, 1993) Strictly confirmatory Alternative models Model-generating 3/10/14 Tate Center Series 13
Path Models All observed indicator variables Mediation & moderation analysis Cause and effects Measurement Models (EFA, E/CFA & CFA) Definition, structure and relationships of the latent factors Structural Regression Models Integration of path models (depicting relations among latent factors) together with the CFA measurement models Special models Latent Growth Models (longitudinal multilevel models) Latent Class Models Item Response Models (IRT) 3/10/14 Tate Center Series 14
Whole and Parts 3/10/14 Tate Center Series 15
Symbol Interpretation X Observed exogenous variable Y Observed endogenous variable D Unobserved exogenous variable (i.e., a disturbance) Variance of exogenous variable Covariance between a pair of exogenous variables Presumed direct causal effect (e.g., X Y) Presumed reciprocal causal effects (e.g., Y1 Y2) 3/10/14 Tate Center Series 16
y1 y2 y3 x1 y7 1 x2 1 y8 3 D 1 y9 D 3 y10 x3 2 2 D x4 1 x5 y4 y5 y6 3/10/14 Tate Center Series 17
y1 y2 y3 x1 y7 X1 x2 Y1 y8 Y3 y9 y10 x3 Y2 X2 x4 x5 y4 y5 y6 3/10/14 Tate Center Series 18
X1 Y1 Y3 Y2 X2 3/10/14 Tate Center Series 19
Specify the model where is your theory? Establish that the model is identified Prepare/screen the data/variables Estimate the model 1. Examine model fit 2. Interpret 3. Consider alternative/equivalent models Re-specify the model Write it up accurately Replicate (cross validate) your model Apply the results 1. 2. 3. 4. 5. 6. 7. 8. 3/10/14 Tate Center Series 20
b2 data acquisition & preparation b1 a interpretation & reporting specification estimation fit evaluation identification specification 3/10/14 Tate Center Series 21
Combining path model with latent variables and their measurement components into Structural Regression (SR) Models 3/10/14 Tate Center Series 22
In a path model the disturbance factors of the endogenous variables reflect both Measurement error All omitted causes In a CFA model measurement errors are moved to the unique variances of the observed indicator variables There is no counterpart to omitted causes In a SEM model Measurement errors are reflected in the measurement model Omitted causes are reflected in the disturbance factors of the endogenous latent variables 3/10/14 Tate Center Series 23
Exogenous factors are uncorrelated with the disturbances of the endogenous factors Measurement errors are uncorrelated 3/10/14 Tate Center Series 24
Unit Loading Identification (ULI) Used to scale disturbance and measurement errors Common software default Generally not a problem unless there are only 2 indicator variables for a factor and the model has equality constraints involving the other indicator (a constraint interaction) Unit Variance Identification (UVI) Common for scaling exogenous variables 3/10/14 Tate Center Series 25
Disturbances & measurement errors are typically assigned a scale through a unit loading (ULI) constraints Exogenous variables are typically scaled by either ULI (one indicator per factor is fixed to 1.0) thus the factor is unstandardized (or by fixing the variance of the factor to 1.0 thus standardizing it) Common SEM software limits the scaling of the endogenous factors to only ULI (thus treating them unstandardized) 3/10/14 Tate Center Series 26
Basically the same issues and considerations as CFA & Path models Start with the number of variables (not the sample size) v(v+1)/2, where v = # of observed variables Need to have a just specified or over-specified model Count the number of variances and covariances of all the exogenous variables (measurement errors, disturbances and exogenous factors) Count the number of direct effects on endogenous variables (factor loadings, direct effects on endogenous factors from other factors 3/10/14 Tate Center Series 27
The 2-Step rule is a sufficient condition for SEM identification 1. The measurement model must be identified 2. If the structural model is recursive the full model is identified If the model is nonrecursive things are bit more complicated! Empirical underidentificaton Created when there is substantial model misspecification 3/10/14 Tate Center Series 28
Do it all at once Probably not recommended If the overall fit is good - GOOD JOB If the overall fit is poor what to do? Is the poor fit due to a poor measurement model? Is the poor fit due to a poor structural model? 3/10/14 Tate Center Series 29
Based on the recognition that the structural part of the SEM is actually nested under the more general (correlated) CFA model 1. Respecify the full SEM model as a measurement (CFA) model and estimate (and fix!?!) If the measurement model adequately fits move to Step 2 If the measurement model is a poor fit then the structural model will be just as poor or worse 2. Place constraints on the structural part of the CFA model to bring it into like with your structural model Now consider alternative structural models If alternative structural models dramatically affect the measurement model portion of the SEM, the measurement model is not invariant This results in interpretational confounding that is the meaning/interpretation of the measurement model is a function of the structural model 3/10/14 Tate Center Series 30
Expansion of the 2 step process Requirement: Each latent factor has at least 4 indicator variables (mixed methods) 1. E/CFA 2. CFA with all latent factors freely correlated (step 1 in the 2-step approach) 3. Begin to place constraints on the structural portion of the model 4. Incremental or sequential tests of a-prior hypotheses Steps 3 & 4 are really incremental refinement of the structural part of the model from the general CFA to the end SEM 3/10/14 Tate Center Series 31
3/10/14 Tate Center Series 32
SAS/CALIS IBM SPSS Amos (21.0.0) (Arbuckle & Wothke) bundled with SPSS http://www-142.ibm.com/software/products/us/en/spss- amos/ AMOS 21.0.0 stand alone ($1590.00) LISREL (9.1) (Joreskog & Sorbom) $495.00 http://www.ssicentral.com Free student versions EQS (6.2) for Windows (Bentler) $595.00 http://www.mvsoft.com Mplus-7 (Muthen & Muthen) ($595.00) https://www.statmodel.com Demo version available Add-ons for multilevel and mixture models Many open source programs, e.g. R 3/10/14 Tate Center Series 33
Structural Regression Model 3/10/14 Tate Center Series 34
A research question that focuses on the regression of Y on X (e.g., do principal experience(s) predict school building health)? A survey is constructed with 4 items all theoretically related to a latent construct X (principal experiences) And 3 different items theoretically related to a latent construct Y (school health) 3/10/14 Tate Center Series 35
Typically a researcher derives a X-composite variable X= (x1+x2+x3+x4) And a Y-composite variable Y= (y1+y2+y3) Then regresses Y on X 3/10/14 Tate Center Series 36
3/10/14 Tate Center Series 37
* * Y X 1 * * * * * 1 x1 x2 x3 x4 y1 y2 y3 * * * * * * * 3/10/14 Tate Center Series 38
Understanding the Measurement Model 3/10/14 Tate Center Series 39
Consider the regression of Y on X (diagramed as follows) Y X Expressed as a Path model the regression of Y on X is diagramed X Y 3/10/14 Tate Center Series 40
Modeling Measurement Error in X (measurement error variance is 0.019) 1. Y X FX FX is the True Score on X 0.019 Modeling Measurement Errors in both X & Y 1. 1. X Y FX FY 0.019 FY is the True Score on Y 0.022 FX is the True Score on X 3/10/14 Tate Center Series 41
Confirmatory Factor Analysis 3/10/14 Tate Center Series 42
CFA is a type of SEM that deals specifically with measurement models The relationships between observed variables (indicators) and latent variables (factors) CFA models are hypothesis driven CFA has become one of the most popular techniques/methods used in applied social and health science research 3/10/14 Tate Center Series 43
Psychometric evaluation of test instruments Construct validation Measurement of invariance Investigation of method effects 3/10/14 Tate Center Series 44
Missing data, Non-normality, & Categorical Data 3/10/14 Tate Center Series 46
Conduct a Missing Data Analysis 3/10/14 Tate Center Series 47
MCAR (missing completely at random) Probability of missing on Y is unrelated to Y and all other variables in the data set MAR (missing at random) When the probability of missing on Y depends on one (or more) X variables (not related to Y when X is held constant) NMAR (missing not at random) When missingness is related values that could have been observed Planned missingness (ignorable missing) 3/10/14 Tate Center Series 48
Consult your favorite statistician 3/10/14 Tate Center Series 49
Listwise deletion Loss of power If missingness is MCAR Estimates are consistent (unbiased) Usually not efficient (large std errors) If missingness is MAR Estimates may not be consistent nor efficient Pairwise deletion If you use a covariance or correlation matrix (you will probably lie) Matrix may not be positive definite (means it cannot be inverted) MCAR Consistent estimates (in large samples) Biased std errors MAR Estimates and std errors are biased 3/10/14 Tate Center Series 50