Latent Variable Modeling in Statistical Analysis

Tate Center Lecture Series
Brooks Applegate, EMR
3/10/2014
Often the analysis focuses on covariances so is
is referred to as 
Covariance Structure Modeling
or Structural Regression Models
Often involves unobserved or latent variables
so it is referred to as 
Latent Variable Modeling
Often SEM models test /estimate causal effects
(theoretically modeled) so it is referred to as
Causal Modeling
3/10/14
Tate Center Series
2
About 100 years ago Spearman put down the
foundation of what is today called the Common
Factor Model and the techniques to statistically
analyze it – Factor Analysis
FA is one of the most frequently used multivariate
statistical techniques in use today.
CFM & FA studies the variance structures 
within 
a group.
The fundamental intent of FA is to determine the number
and nature of the latent variables that account for the
variation/covariation in a larger set of observed variables
3/10/14
Tate Center Series
3
Postulates that 
each 
indicator in a set of
observed measures is a linear function of one
or more common factors and one unique factor.
FA task is to partition the variance of each indicator
into
Common variance
Unique variance
There are two broad FA families that do this
Exploratory FA (EFA)
Confirmatory FA (CFA)
3/10/14
Tate Center Series
4
3/10/14
Tate Center Series
5
X
3
X
4
X
7
X
5
X
6
X
2
X
1
ξ
1
ξ
2
3/10/14
Tate Center Series
6
X
3
X
4
X
7
X
5
X
6
X
2
X
1
ξ
1
ξ
2
In path analysis (PA), the researcher specifies a
model that attempts to explain why X and Y
(and other observed variables) covary
Part of the explanation about why two variables
covary may include presumed causal effects (e.g., X
causes Y)
Other parts of the explanation may reflect presumed
noncausal relations, such as a spurious association
The overall goal in PA is to estimate causal
versus noncausal aspects of observed
covariances
3/10/14
Tate Center Series
7
To reasonably infer that X is a cause of Y, all of the
following conditions must be met:
1.
There is 
time precedence
, that is, X precedes Y in time
2.
The direction of the causal relation is correctly specified,
that is, X causes Y instead of the reverse or that X and Y
cause each other
3.
The association between X and Y does not disappear
when external variables, such as common causes of
both, are held constant (i.e., it is not spurious)
It is very unlikely that all these conditions would
be satisfied in a single study
3/10/14
Tate Center Series
8
The assessment of variables at different times at least
provides a measurement framework consistent with
the specification of directional causal effects
longitudinal designs pose many potential difficulties,
such as subject attrition and the need for additional
resources
When the variables are concurrently measured, it is not
possible to demonstrate time precedence
Therefore, the researcher needs a very clear,
substantive rationale for specifying that X causes Y
instead of the reverse or that X and Y mutually
influence each other when all variables are measured at
the same time
3/10/14
Tate Center Series
9
X
1
X
2
Y
1
Y
3
Y
2
3/10/14
Tate Center Series
10
Basic foundation was laid down in the 1970’s
Generally became accessible to researchers in
the 1980’s
New developments (statistical estimation
theory, numerical analysis & desktop
computing power) have made this family
accessible to researchers who have not had
extensive training in applied statistics and
measurement
3/10/14
Tate Center Series
11
SEM can be generally thought of as a
generalization or extension of:
ANOVA (DOE)
Regression
Principal Factor Analysis
It can model multilevel data
Provided an appropriate link function it can
accommodate non-linear response data
Item Response Models
Growth Mixture Models
3/10/14
Tate Center Series
12
SEM models fit (and test) 
a-priori 
models to data
OR fits 
a-post priori 
models to data
Models employ both estimation and hypothesis testing
Models require the explicit representation of
observed (indicator) and unobserved (latent)
variables
Models can be applied to experimental and non
experimental data
Models can be exploratory, confirmatory or a
mixture of both (Jorsekog, 1993)
Strictly confirmatory
Alternative models
Model-generating
3/10/14
Tate Center Series
13
Path Models
All observed indicator variables
Mediation & moderation analysis
Cause and effects
Measurement Models (EFA, E/CFA & CFA)
Definition, structure and relationships of the latent factors
Structural Regression Models
Integration of path models (depicting relations among
latent factors) together with the CFA measurement
models
Special models
Latent Growth Models (longitudinal multilevel models)
Latent Class Models
Item Response Models (IRT)
3/10/14
Tate Center Series
14
Whole and Parts
3/10/14
Tate Center Series
15
Symbol Interpretation
X Observed exogenous variable
Y Observed endogenous variable
D Unobserved exogenous variable (i.e., a disturbance)
            Variance of exogenous variable
            Covariance between a pair of exogenous variables
            Presumed direct causal effect (e.g., X  Y)
            Presumed reciprocal causal effects (e.g., Y1  Y2)
3/10/14
Tate Center Series
16
3/10/14
Tate Center Series
17
x
1
x
2
x
3
x
4
x
5
y
4
y
5
y
6
y
1
y
2
y
3
y
10
y
9
y
8
y
7
ξ
2
η
2
η
3
ξ
1
η
1
D
1
D
1
D
3
3/10/14
Tate Center Series
18
x
1
x
2
x
3
x
4
x
5
y
4
y
5
y
6
y
1
y
2
y
3
y
10
y
9
y
8
y
7
X
2
Y
2
Y
3
X
1
Y
1
3/10/14
Tate Center Series
19
X
2
Y
2
Y
3
X
1
Y
1
1.
Specify the model – where is your theory?
2.
Establish that the model is identified
3.
Prepare/screen the data/variables
4.
Estimate the model
1.
Examine model fit
2.
Interpret
3.
Consider alternative/equivalent models
5.
Re-specify the model
6.
Write it up accurately
7.
Replicate (cross validate) your model
8.
Apply the results
3/10/14
Tate Center Series
20
3/10/14
Tate Center Series
21
data acquisition &
preparation
specification
estimation
fit evaluation
identification
specification
interpretation
& reporting
b
1
a
b
2
Combining path model with latent variables
and their measurement components into
Structural Regression (SR) Models
3/10/14
Tate Center Series
22
In a path model the disturbance factors of the
endogenous variables reflect both
Measurement error
All omitted causes
In a CFA model measurement errors are moved to
the unique variances of the observed indicator
variables
There is no counterpart to omitted causes
In a SEM model
Measurement errors are reflected in the measurement
model
Omitted causes are reflected in the disturbance factors of
the endogenous  latent variables
3/10/14
Tate Center Series
23
Exogenous factors are uncorrelated with  the
disturbances of the endogenous factors
Measurement errors are uncorrelated
3/10/14
Tate Center Series
24
Unit Loading Identification (ULI)
Used to scale disturbance and measurement errors
Common software default
Generally not a problem unless there are only 2
indicator variables for a factor and the model has
equality constraints involving the other indicator (a
constraint interaction)
Unit Variance Identification (UVI)
Common for scaling exogenous variables
3/10/14
Tate Center Series
25
Disturbances & measurement errors are
typically assigned a scale through a unit
loading (ULI) constraints
Exogenous variables are typically scaled by
either ULI (one indicator per factor is fixed to
1.0) thus the factor is unstandardized (or by
fixing the variance of the factor to 1.0 thus
standardizing it)
Common SEM software limits the scaling of the
endogenous factors to only ULI (thus treating
them unstandardized)
3/10/14
Tate Center Series
26
Basically the same issues and considerations as
CFA & Path models
Start with the number of variables (not the sample
size)
v(v+1)/2, where v = # of observed variables
Need to have a just specified or over-specified
model
Count the number of variances and covariances of all
the exogenous variables (measurement errors,
disturbances and exogenous factors)
Count the number of direct effects on endogenous
variables (factor loadings, direct effects on endogenous
factors from other factors
3/10/14
Tate Center Series
27
The 2-Step rule is a sufficient condition for
SEM identification
1.
The measurement model must be identified
2.
If the structural model is recursive the full model is
identified
If the model is nonrecursive things are bit more
complicated!
Empirical underidentificaton
Created when there is substantial model misspecification
3/10/14
Tate Center Series
28
Do it all at once
Probably not recommended
If the overall fit is good  - GOOD JOB
If the overall fit is poor what to do?
Is the poor fit due to a poor measurement model?
Is the poor fit due to a poor structural model?
3/10/14
Tate Center Series
29
Based on the recognition that the structural part of
the SEM is actually nested under the more general
(correlated) CFA model
1.
Respecify the full SEM model as a measurement (CFA)
model and estimate (and fix!?!)
If the measurement model adequately fits move to Step 2
If the measurement model is a poor fit then the structural model
will be just as poor or worse
2.
Place constraints on the structural part of the CFA model
to bring it into like with your structural model
Now consider alternative structural models
If alternative structural models dramatically affect the
measurement model portion of the SEM, the measurement
model is not invariant
This results in 
interpretational confounding 
– that is the
meaning/interpretation of the measurement model is a
function of the structural model
3/10/14
Tate Center Series
30
Expansion of the 2 step process
Requirement: Each latent factor has at least 4 indicator
variables (mixed methods)
1.
E/CFA
2.
CFA with all latent factors freely correlated (step 1 in the
2-step approach)
3.
Begin to place constraints on the structural portion of the
model
4.
Incremental or sequential tests of a-prior hypotheses
Steps 3 & 4 are really incremental refinement of the
structural part of the model from the general CFA
to the end SEM
3/10/14
Tate Center Series
31
 
3/10/14
Tate Center Series
32
SAS/CALIS
IBM SPSS Amos (21.0.0) (Arbuckle & Wothke) bundled with
SPSS
http://www-142.ibm.com/software/products/us/en/spss-
amos/
AMOS 21.0.0 stand alone ($1590.00)
LISREL (9.1) (Joreskog & Sorbom) $495.00
http://www.ssicentral.com
Free student versions
EQS (6.2) for Windows (Bentler) $595.00
http://www.mvsoft.com
Mplus-7 (Muthen & Muthen) ($595.00)
https://www.statmodel.com
Demo version available
Add-ons for multilevel and mixture models
Many open source programs, e.g. R
3/10/14
Tate Center Series
33
Structural Regression Model
3/10/14
Tate Center Series
34
A research question that focuses on the
regression of Y on X (e.g., do principal
experience(s) predict school building health)?
A survey is constructed with 4 items all
theoretically related to a latent construct X
(principal experiences)
And 3 different items theoretically related to a
latent construct Y (school health)
3/10/14
Tate Center Series
35
Typically a researcher derives a X-composite
variable
X= (x
1
+x
2
+x
3
+x
4
)
And a Y-composite variable
Y= (y
1
+y
2
+y
3
)
Then regresses Y on X
3/10/14
Tate Center Series
36
 
3/10/14
Tate Center Series
37
y
3
x
1
x
4
x
2
x
3
y
2
y
1
Y
X
1
1
*
*
*
*
*
*
*
*
*
*
*
*
*
*
3/10/14
Tate Center Series
38
Understanding the Measurement Model
3/10/14
Tate Center Series
39
3/10/14
Tate Center Series
40
Y
X
Consider the regression of Y on X (diagramed as follows)
Y
X
Expressed as a Path model the regression of Y on X is diagramed
3/10/14
Tate Center Series
41
Modeling Measurement Error in X (measurement error variance is 0.019)
Y
X
F
X
0.019
F
X
 is the True Score on X
Modeling Measurement Errors in both X & Y
X
Y
F
Y
0.022
F
X
1.
1.
1.
F
X
 is the True Score on X
F
Y
 is the True Score on Y
0.019
Confirmatory Factor Analysis
3/10/14
Tate Center Series
42
CFA is a type of SEM that deals specifically
with measurement models
The relationships between observed variables
(indicators) and latent variables (factors)
CFA models are hypothesis driven
CFA has become one of the most popular
techniques/methods used in applied social and
health science research
3/10/14
Tate Center Series
43
Psychometric evaluation of “test” instruments
Construct validation
Measurement of invariance
Investigation of method effects
3/10/14
Tate Center Series
44
3/10/14
Tate Center Series
45
X
3
X
4
X
7
X
5
X
6
X
2
X
1
ξ
1
ξ
2
Missing data, Non-normality, & Categorical Data
3/10/14
Tate Center Series
46
Conduct a Missing Data Analysis
3/10/14
Tate Center Series
47
MCAR (missing completely at random)
Probability of missing  on 
Y
 is unrelated to 
Y
 and all
other variables in the data set
MAR (missing at random)
When the probability of missing on 
Y
 depends on
one (or more) 
X
 variables (not related to 
Y
 when 
X
 is
held constant)
NMAR (missing not at random)
When missingness is related values that could have
been observed
Planned missingness (ignorable missing)
3/10/14
Tate Center Series
48
Consult your favorite statistician
3/10/14
Tate Center Series
49
Listwise deletion
Loss of power
If missingness is MCAR
Estimates are 
consistent 
(unbiased)
Usually not 
efficient
 (large std errors)
If missingness is MAR
Estimates may not be consistent nor efficient
Pairwise deletion
If you use a covariance or correlation matrix (you will probably lie)
Matrix may not be positive definite (means it cannot be inverted)
MCAR
Consistent estimates (in large samples)
Biased std errors
MAR
Estimates and std errors are biased
3/10/14
Tate Center Series
50
Mean substitution, LVCF, regression substitution
Tends to underestimate variances & std errors but
overestimate correlations
EM imputation
Missingness must be MCAR or MAR + multivariate
normal
Std errors are consistent
Multiple imputation (3 step process)
Generate 
m
 (5 is enough) imputed data sets (typically EM
+ MCMC)
Analyze the 5 parallel data sets individually
Combine the analyses
SAS PROC MIANALYZE
3/10/14
Tate Center Series
51
Preferred choice for handling missing data
MCAR or MAR + multivariate normality
MLM estimator (robust ML) can be used with
non-normal data
Available in Mplus, LISREL,  Amos, Mx, SAS
SPSS(?)
3/10/14
Tate Center Series
52
 
3/10/14
Tate Center Series
53
ML & GLS estimators are robust to minor
departures in multivariate normality
When non-normality is pronounced don
t use
ML or GLS estimators
ML is very sensitive to high kurtosis
Inflated 
χ
2
Modest underestimation of fit indices
Moderate to severe underestimation of std errors
All bad things get much worse in small samples
3/10/14
Tate Center Series
54
Weighted Least Squares (WLS)
Requires 
very large 
samples
Robust ML (best choice)
Typically requires raw data
SB 
χ
2
Cannot be used the same way for testing nested models
– must be adjusted
There is quite a bit of variation among the
popular SEM programs – so read and get
familiar with the tool you use
3/10/14
Tate Center Series
55
 
3/10/14
Tate Center Series
56
Don
t use a ML estimator
Produces attenuated correlation estimates if there is
a floor or ceiling effect
Produces 
pseudofactor
 that are artifacts of item
difficulty or extremeness
Produces incorrect test statistics and std errors
Possibly produces incorrect parameter estimates if
there is a floor or ceiling effect
3/10/14
Tate Center Series
57
WLS
Weight matrix requires a large sample
Assume 10 indicators
b = (10*11)/2 = 55 so W matrix is (b*(b+1))/2 or
(55*56)/2=1540 elements
Skewness can aggravate a small sample size problem
Robust WLS (WLSMV – Mplus)
Appears to be the best choice for categorical data
But still requires large samples
ULS
3/10/14
Tate Center Series
58
 
3/10/14
Tate Center Series
59
Slide Note
Embed
Share

Latent Variable Modeling, including Factor Analysis and Path Analysis, plays a crucial role in statistical analysis to uncover hidden relationships and causal effects among observed variables. This method involves exploring covariances, partitioning variances, and estimating causal versus non-causal aspects. By delving into common factor models and techniques like Exploratory and Confirmatory Factor Analysis, as well as elucidating causal relations through Path Analysis, researchers aim to uncover meaningful insights and enhance the understanding of complex data structures.

  • Latent variable modeling
  • Factor analysis
  • Path analysis
  • Causal modeling
  • Statistical analysis

Uploaded on Dec 16, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Tate Center Lecture Series Brooks Applegate, EMR 3/10/2014

  2. Often the analysis focuses on covariances so is is referred to as Covariance Structure Modeling or Structural Regression Models Often involves unobserved or latent variables so it is referred to as Latent Variable Modeling Often SEM models test /estimate causal effects (theoretically modeled) so it is referred to as Causal Modeling 3/10/14 Tate Center Series 2

  3. About 100 years ago Spearman put down the foundation of what is today called the Common Factor Model and the techniques to statistically analyze it Factor Analysis FA is one of the most frequently used multivariate statistical techniques in use today. CFM & FA studies the variance structures within a group. The fundamental intent of FA is to determine the number and nature of the latent variables that account for the variation/covariation in a larger set of observed variables 3/10/14 Tate Center Series 3

  4. Postulates that each indicator in a set of observed measures is a linear function of one or more common factors and one unique factor. FA task is to partition the variance of each indicator into Common variance Unique variance There are two broad FA families that do this Exploratory FA (EFA) Confirmatory FA (CFA) 3/10/14 Tate Center Series 4

  5. X1 X2 X3 X4 X5 X6 X7 2 1 3/10/14 Tate Center Series 5

  6. X1 X2 X3 X4 X5 X6 X7 2 1 3/10/14 Tate Center Series 6

  7. In path analysis (PA), the researcher specifies a model that attempts to explain why X and Y (and other observed variables) covary Part of the explanation about why two variables covary may include presumed causal effects (e.g., X causes Y) Other parts of the explanation may reflect presumed noncausal relations, such as a spurious association The overall goal in PA is to estimate causal versus noncausal aspects of observed covariances 3/10/14 Tate Center Series 7

  8. To reasonably infer that X is a cause of Y, all of the following conditions must be met: There is time precedence, that is, X precedes Y in time The direction of the causal relation is correctly specified, that is, X causes Y instead of the reverse or that X and Y cause each other The association between X and Y does not disappear when external variables, such as common causes of both, are held constant (i.e., it is not spurious) It is very unlikely that all these conditions would be satisfied in a single study 1. 2. 3. 3/10/14 Tate Center Series 8

  9. The assessment of variables at different times at least provides a measurement framework consistent with the specification of directional causal effects longitudinal designs pose many potential difficulties, such as subject attrition and the need for additional resources When the variables are concurrently measured, it is not possible to demonstrate time precedence Therefore, the researcher needs a very clear, substantive rationale for specifying that X causes Y instead of the reverse or that X and Y mutually influence each other when all variables are measured at the same time 3/10/14 Tate Center Series 9

  10. X1 Y1 X2 Y3 Y2 3/10/14 Tate Center Series 10

  11. Basic foundation was laid down in the 1970s Generally became accessible to researchers in the 1980 s New developments (statistical estimation theory, numerical analysis & desktop computing power) have made this family accessible to researchers who have not had extensive training in applied statistics and measurement 3/10/14 Tate Center Series 11

  12. SEM can be generally thought of as a generalization or extension of: ANOVA (DOE) Regression Principal Factor Analysis It can model multilevel data Provided an appropriate link function it can accommodate non-linear response data Item Response Models Growth Mixture Models 3/10/14 Tate Center Series 12

  13. SEM models fit (and test) a-priori models to data OR fits a-post priori models to data Models employ both estimation and hypothesis testing Models require the explicit representation of observed (indicator) and unobserved (latent) variables Models can be applied to experimental and non experimental data Models can be exploratory, confirmatory or a mixture of both (Jorsekog, 1993) Strictly confirmatory Alternative models Model-generating 3/10/14 Tate Center Series 13

  14. Path Models All observed indicator variables Mediation & moderation analysis Cause and effects Measurement Models (EFA, E/CFA & CFA) Definition, structure and relationships of the latent factors Structural Regression Models Integration of path models (depicting relations among latent factors) together with the CFA measurement models Special models Latent Growth Models (longitudinal multilevel models) Latent Class Models Item Response Models (IRT) 3/10/14 Tate Center Series 14

  15. Whole and Parts 3/10/14 Tate Center Series 15

  16. Symbol Interpretation X Observed exogenous variable Y Observed endogenous variable D Unobserved exogenous variable (i.e., a disturbance) Variance of exogenous variable Covariance between a pair of exogenous variables Presumed direct causal effect (e.g., X Y) Presumed reciprocal causal effects (e.g., Y1 Y2) 3/10/14 Tate Center Series 16

  17. y1 y2 y3 x1 y7 1 x2 1 y8 3 D 1 y9 D 3 y10 x3 2 2 D x4 1 x5 y4 y5 y6 3/10/14 Tate Center Series 17

  18. y1 y2 y3 x1 y7 X1 x2 Y1 y8 Y3 y9 y10 x3 Y2 X2 x4 x5 y4 y5 y6 3/10/14 Tate Center Series 18

  19. X1 Y1 Y3 Y2 X2 3/10/14 Tate Center Series 19

  20. Specify the model where is your theory? Establish that the model is identified Prepare/screen the data/variables Estimate the model 1. Examine model fit 2. Interpret 3. Consider alternative/equivalent models Re-specify the model Write it up accurately Replicate (cross validate) your model Apply the results 1. 2. 3. 4. 5. 6. 7. 8. 3/10/14 Tate Center Series 20

  21. b2 data acquisition & preparation b1 a interpretation & reporting specification estimation fit evaluation identification specification 3/10/14 Tate Center Series 21

  22. Combining path model with latent variables and their measurement components into Structural Regression (SR) Models 3/10/14 Tate Center Series 22

  23. In a path model the disturbance factors of the endogenous variables reflect both Measurement error All omitted causes In a CFA model measurement errors are moved to the unique variances of the observed indicator variables There is no counterpart to omitted causes In a SEM model Measurement errors are reflected in the measurement model Omitted causes are reflected in the disturbance factors of the endogenous latent variables 3/10/14 Tate Center Series 23

  24. Exogenous factors are uncorrelated with the disturbances of the endogenous factors Measurement errors are uncorrelated 3/10/14 Tate Center Series 24

  25. Unit Loading Identification (ULI) Used to scale disturbance and measurement errors Common software default Generally not a problem unless there are only 2 indicator variables for a factor and the model has equality constraints involving the other indicator (a constraint interaction) Unit Variance Identification (UVI) Common for scaling exogenous variables 3/10/14 Tate Center Series 25

  26. Disturbances & measurement errors are typically assigned a scale through a unit loading (ULI) constraints Exogenous variables are typically scaled by either ULI (one indicator per factor is fixed to 1.0) thus the factor is unstandardized (or by fixing the variance of the factor to 1.0 thus standardizing it) Common SEM software limits the scaling of the endogenous factors to only ULI (thus treating them unstandardized) 3/10/14 Tate Center Series 26

  27. Basically the same issues and considerations as CFA & Path models Start with the number of variables (not the sample size) v(v+1)/2, where v = # of observed variables Need to have a just specified or over-specified model Count the number of variances and covariances of all the exogenous variables (measurement errors, disturbances and exogenous factors) Count the number of direct effects on endogenous variables (factor loadings, direct effects on endogenous factors from other factors 3/10/14 Tate Center Series 27

  28. The 2-Step rule is a sufficient condition for SEM identification 1. The measurement model must be identified 2. If the structural model is recursive the full model is identified If the model is nonrecursive things are bit more complicated! Empirical underidentificaton Created when there is substantial model misspecification 3/10/14 Tate Center Series 28

  29. Do it all at once Probably not recommended If the overall fit is good - GOOD JOB If the overall fit is poor what to do? Is the poor fit due to a poor measurement model? Is the poor fit due to a poor structural model? 3/10/14 Tate Center Series 29

  30. Based on the recognition that the structural part of the SEM is actually nested under the more general (correlated) CFA model 1. Respecify the full SEM model as a measurement (CFA) model and estimate (and fix!?!) If the measurement model adequately fits move to Step 2 If the measurement model is a poor fit then the structural model will be just as poor or worse 2. Place constraints on the structural part of the CFA model to bring it into like with your structural model Now consider alternative structural models If alternative structural models dramatically affect the measurement model portion of the SEM, the measurement model is not invariant This results in interpretational confounding that is the meaning/interpretation of the measurement model is a function of the structural model 3/10/14 Tate Center Series 30

  31. Expansion of the 2 step process Requirement: Each latent factor has at least 4 indicator variables (mixed methods) 1. E/CFA 2. CFA with all latent factors freely correlated (step 1 in the 2-step approach) 3. Begin to place constraints on the structural portion of the model 4. Incremental or sequential tests of a-prior hypotheses Steps 3 & 4 are really incremental refinement of the structural part of the model from the general CFA to the end SEM 3/10/14 Tate Center Series 31

  32. 3/10/14 Tate Center Series 32

  33. SAS/CALIS IBM SPSS Amos (21.0.0) (Arbuckle & Wothke) bundled with SPSS http://www-142.ibm.com/software/products/us/en/spss- amos/ AMOS 21.0.0 stand alone ($1590.00) LISREL (9.1) (Joreskog & Sorbom) $495.00 http://www.ssicentral.com Free student versions EQS (6.2) for Windows (Bentler) $595.00 http://www.mvsoft.com Mplus-7 (Muthen & Muthen) ($595.00) https://www.statmodel.com Demo version available Add-ons for multilevel and mixture models Many open source programs, e.g. R 3/10/14 Tate Center Series 33

  34. Structural Regression Model 3/10/14 Tate Center Series 34

  35. A research question that focuses on the regression of Y on X (e.g., do principal experience(s) predict school building health)? A survey is constructed with 4 items all theoretically related to a latent construct X (principal experiences) And 3 different items theoretically related to a latent construct Y (school health) 3/10/14 Tate Center Series 35

  36. Typically a researcher derives a X-composite variable X= (x1+x2+x3+x4) And a Y-composite variable Y= (y1+y2+y3) Then regresses Y on X 3/10/14 Tate Center Series 36

  37. 3/10/14 Tate Center Series 37

  38. * * Y X 1 * * * * * 1 x1 x2 x3 x4 y1 y2 y3 * * * * * * * 3/10/14 Tate Center Series 38

  39. Understanding the Measurement Model 3/10/14 Tate Center Series 39

  40. Consider the regression of Y on X (diagramed as follows) Y X Expressed as a Path model the regression of Y on X is diagramed X Y 3/10/14 Tate Center Series 40

  41. Modeling Measurement Error in X (measurement error variance is 0.019) 1. Y X FX FX is the True Score on X 0.019 Modeling Measurement Errors in both X & Y 1. 1. X Y FX FY 0.019 FY is the True Score on Y 0.022 FX is the True Score on X 3/10/14 Tate Center Series 41

  42. Confirmatory Factor Analysis 3/10/14 Tate Center Series 42

  43. CFA is a type of SEM that deals specifically with measurement models The relationships between observed variables (indicators) and latent variables (factors) CFA models are hypothesis driven CFA has become one of the most popular techniques/methods used in applied social and health science research 3/10/14 Tate Center Series 43

  44. Psychometric evaluation of test instruments Construct validation Measurement of invariance Investigation of method effects 3/10/14 Tate Center Series 44

  45. X1 X2 X3 X4 X5 X6 X7 2 1 3/10/14 Tate Center Series 45

  46. Missing data, Non-normality, & Categorical Data 3/10/14 Tate Center Series 46

  47. Conduct a Missing Data Analysis 3/10/14 Tate Center Series 47

  48. MCAR (missing completely at random) Probability of missing on Y is unrelated to Y and all other variables in the data set MAR (missing at random) When the probability of missing on Y depends on one (or more) X variables (not related to Y when X is held constant) NMAR (missing not at random) When missingness is related values that could have been observed Planned missingness (ignorable missing) 3/10/14 Tate Center Series 48

  49. Consult your favorite statistician 3/10/14 Tate Center Series 49

  50. Listwise deletion Loss of power If missingness is MCAR Estimates are consistent (unbiased) Usually not efficient (large std errors) If missingness is MAR Estimates may not be consistent nor efficient Pairwise deletion If you use a covariance or correlation matrix (you will probably lie) Matrix may not be positive definite (means it cannot be inverted) MCAR Consistent estimates (in large samples) Biased std errors MAR Estimates and std errors are biased 3/10/14 Tate Center Series 50

Related


More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#