Updated Nov 21, 2002
Web references I've found on the subject of SEM
Papers that have suggestion of how to write up the results of a
structural equation model in a paper.
- Hoyle and Panter (1995) "Writing about Structural Equation Models" in Structural equation modeling, concepts, issues, applications ed. Hoyle. pp158-176.
- Raykov, T., Tomer, A., and Nesselroade, J. (1991) "Reporting structural equation modeling results in Psychology and Aging: Some proposed guidelines" psychology and Aging, Vol6, No. $, 499-503. (Thanks to Muree Larson-Bright for finding this article Nov. 2002)
Structural Equation Modeling (Hybrid Models)
- Path analysis with latent variables
- A Two Step Approach, (a common reference Anderson,
J.C. and Gerbing, D.W. (1988), Psychological Bulletin)
- Measurement model (specifies relationships between the
latent variables and their indicators)
- Structural model (specifies relationships between latent variables)
- When we perform a test of an SEM, we are simultaneously
testing whether the combined measurement and structural model is
adequate to explain the structure of the data.
Handout of example from Hatcher. 6 underlying latent variables.
First fit the CFA model to the 18 observed variables with 6 factors
letting every factor be simply correlated with every other factor.
Second step is to fit the hypothesized direct relationships between
the latent factors. This is basically putting restrictions on the PHI
matrix.
Identifiability
- Lots of rules for determining but none of the simple ones are
both sufficient and necessary
- Just because you have positive degrees of freedom doesn't mean
the model is identified
- Here are some rules that are sufficient for
identifiability when the factors all have ``simple structure''
(i.e. each observed variables loads on only one factor) and the
model is recursive
- If each factor has 3 indicators then the model is identified
- If a factor has only two indicators, then that factor must be
correlated with some other factor then the model is identified.
- If a factor has only one indicator then you
need to do one of the following
- Assume that it measures the latent variable without error OR
- Use some estimate of its reliability (perhaps Cronbach alpha)
and fix its error variance. That is, if you know the
reliability of the scale is .7 then fix its measurement error variance to
(1-.7)*(Variance of the scale)
- For other models, check out order and rank conditions (Maruyama and Kline discuss these) or just go ahead and try to fit the model and let software tell you
whether it is identified or not.
Heywood Cases
- When a parameter is estimated to be a value that lies outside
of its known range of possibilities (i.e. outside the parameter space).
- Most common scenario, the Maximum likelihood estimate for a
measurement error variance turns out to NEGATIVE.
Why?
- model is very misspecified
- small sample size
- the true value of the variance is just very close to zero but
unfortunately was estimated to be negative (There is a way to check
this by fixing the error variance to equal zero and then doing a Chi-square
difference test)
- Another scenario not as easy to detect is when the estimated
covariance matrix of the the latent factors is not positive
definite. This is also caused by model misspecification and/or
small sample sizes
- For both of these, AMOS will say ``this solution is not
admissible''.
- Occurs more commonly when there are only two indicators per
latent variable
GOODNESS OF FIT INDICES. Handout of AMOS manual appendix.
- Discuss Problems with Chi-squared test as sample size increases
- NFI (Normed Fit Index) - compare your model to the independence
model, rule of thumb > .9 is acceptable - 1 - (Chisquare of your
model) / (Chisquare of the independence model)
- AIC - adds a penalty to the chi-squared test for additional
parameters, at some point adding additional parameters is just
modeling the noise not the signal, compare AIC across different
models, smaller AIC is better.
- RMSEA - sqrt( (chisquare - df)/ (N*df) ) - Root mean square error
of approximation, rule of thumb < .05 is indicative of close fit and <
.08 is reasonable fit, but RMSEA > .1 is bad fit. Check out Browne
and Cudeck (1993) "Alternative Ways of Assessing Model Fit" in
Testing Structural Equation Models eds. Bollen and Long,
pp.136-162.
CROSS VALIDATION.
- A good way to establish a model (especially in circumstances where changes to the model are made using the data) is to use cross-validation
- When sample size is large enough split the sample in half, determine the best model given one half of the data (the calibration sample) and then examine the fit of that model applied to the second half (the validation sample).
- This assesment of the fit of the model based on the calibration sample to the validation sample can be done in several ways.
- For an overview...MacCallum, R. C., Roznowski, M., Mar, M. and Reith, J.V. (1994) Alternative strategies for cross-validation of covariance structure modeling. Multivaraite Behavioral Research, 29, 1-32.
- Cudeck and Browne (1983) "Cross validation of covariance structures" Multivaraite Behavioral Research, 18, 147-167 develop a cross-validation index (CFI) that looks at the discrepency between the estimated model covariance matrix based on the calibration sample as compared to the sample covaraince of the validation sample.
- Cross-validation IN AMOS...Can use treat the calibration and validation samples as two different groups and then test whether equality constraints across the groups significantly effect the model fit. Treating each sample as a separate group fit the model (developed on the calibration sample) to both groups with and without equality constraints across the groups and perform a chi-squared difference test. If there is no significant difference then we cannot reject that the model fits the two groups (samples) equally well.
Alternatives to doing a full SEM.
These methods eliminate the explicit use of latent variables in the modeling observed data where there is some hypothesized underlying latent variable.
Rather than use the CFA model directly in the the SEM alternative methods are to take the results from the CFA and...
- Choose one observed variable to represent each factor. Choosing the one with the largest standardized factor loading is common. When there is more than one variable with similar large loading, choose the variable which has better subject knowledge interpretation.
- Add up (or average) the indicators of each factor and use this sum to represent the factor.
- Create a factor score estimate, the predicted value of the factor given the observations. SEE HANDOUT. AMOS only provides the weights for the sum, other software SAS, SPSS, MPLUS actually take the additional step to calculate the factor score estimate for each observation.
Each of these methods looses information because the surrogate used
for each factor will subsequently be treated as if it observed the
factor perfectly which we know is not true.
Recent research topics in SEM that I will not cover
- Multi-level analysis (nested samples). Incorporating random effects due to clinics or schools into the model. MPLUS can do this.
- Nonlinear relationships between factors. SEM is limited to simple linear relations between all variables (because it only analyses covariances)