This page shows an example of a latent growth curve model (LGCM) with footnotes explaining the output. A LGCM can be similar to a multilevel model (a model many people have seen). To help you understand the LGCM and its output, first a multilevel model is shown using HLM and then using Stata, and then the same data are analyzed using Mplus using a LGCM. The Mplus output is related to the multilevel model results. We suggest that you view this page using two web browsers so you can show the page side by side showing the Stata output in one browser and the corresponding Mplus output in the other browser.
This example is drawn from the Mplus User’s Guide (example 6.10) and we suggest that you see the Mplus User’s Guide for more details about this example. We thank the kind people at Muthén & Muthén for permission to use examples from their manual.
Example using HLM
Each subject is observed on the variable Y at four different times. A covariate called a is measured at each of the four time points. Also, a variable x1 and x2 are measured for each person. Conceptualized as a multilevel model, the variable time and a are level 1 variables. (Note that time is coded 0, 1, 2, and 3). The variables x1 and x2 are level two variables. The model uses time and a to predict the values of y at level 1, and uses x1 and x2 to predict the intercept and slope of time at level 2. We can write this model using multiple equations as shown below. This uses the ex610.mdm file.
Level-1 Model Y = B0 + B1*(A) + B2*(TIME) + R Level-2 Model B0 = G00 + G01*(X1) + G02*(X2) + U0 B1 = G10 B2 = G20 + G21*(X1) + G22*(X2) + U2
Here is the output from HLM, condensed to save space. Footnotes are included for relating the output to Mplus.
Sigma_squared = 0.54200I Tau INTRCPT1,B0 1.08757F 0.05079 TIME,B2 0.05079H 0.20495G Tau (as correlations) INTRCPT1,B0 1.000 0.108 TIME,B2 0.108 1.000 Final estimation of fixed effects: ---------------------------------------------------------------------------- Standard Approx. Fixed Effect Coefficient Error T-ratio d.f. P-value ---------------------------------------------------------------------------- For INTRCPT1, B0 INTRCPT2, G00 0.570413A 0.054807 10.408 497 0.000 X1, G01 0.560548B 0.054574 10.271 497 0.000 X2, G02 0.716557B 0.055865 12.827 497 0.000 For A slope, B1 INTRCPT2, G10 0.296872E 0.021381 13.885 1993 0.000 For TIME slope, B2 INTRCPT2, G20 1.010207C 0.025332 39.879 497 0.000 X1, G21 0.263030D 0.025223 10.428 497 0.000 X2, G22 0.473419D 0.025819 18.336 497 0.000 ----------------------------------------------------------------------------
Example using Stata
Combining the two equations into one by substituting the level 2 equation into the level 1 equation, we have the equation below, with the random effects identified by placing them in square brackets.
Composite model Y = G00 + G01*(X1) + G02*(X2) + G10*A + G20*TIME + G21*X1*TIME + G22*X2*TIME + [ U0 + U2*TIME + r ]
Based on the composite model, this is the same example using Stata. Please note that this is Stata 12 code.
infile y1-y4 x1 x2 a1-a4 using https://stats.idre.ucla.edu/stat/mplus/output/ex6.10.dat generate id = _n reshape long y a, i(id) j(time) generate t1 = time - 1 xtmixed y a c.t1##c.x1 c.t1##c.x2 || id: t1, cov(un) var mle Mixed-effects ML regression Number of obs = 2000 Group variable: id Number of groups = 500 Obs per group: min = 4 avg = 4.0 max = 4 Wald chi2(6) = 2871.89 Log likelihood = -3075.8519 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- a | .2967777E .0213597 13.89 0.000 .2549135 .3386419 t1 | 1.010207C .0252504 40.01 0.000 .9607168 1.059696 x1 | .560547B .0544035 10.30 0.000 .4539181 .6671759 | c.t1#c.x1 | .2630303D .0251425 10.46 0.000 .2137519 .3123087 | t1 | 0 (omitted) x2 | .716562B .0556897 12.87 0.000 .6074121 .8257118 | c.t1#c.x2 | .4734171D .0257359 18.40 0.000 .4229756 .5238586 | _cons | .570413A .0546356 10.44 0.000 .4633292 .6774968 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ id: Unstructured | var(time) | .20304G .0203019 .1669054 .2469978 var(_cons) | 1.178701F .1311813 .9476997 1.466008 cov(time,_cons) | -.1515308H .041814 -.2334848 -.0695768 -----------------------------+------------------------------------------------ var(Residual) | .5416011I .0242353 .4961241 .5912467 ------------------------------------------------------------------------------ LR test vs. linear regression: chi2(3) = 1344.80 Prob > chi2 = 0.0000
Mplus example #1
Here is the same example analyzed as a Latent Growth Curve Model using Mplus based on the ex6.10.dat data file. We should reiterate that the multilevel model is not identical to the LGCM model, but only similar, so the results are analogous, not identical, but we use this as a means of helping you understand a technique and output below that might be new to you.
TITLE: this is an example of a linear growth model for a continuous outcome with time- invariant and time-varying covariates DATA: FILE IS ex6.10.dat; VARIABLE: NAMES ARE y11-y14 x1 x2 a31-a34; MODEL: i s | y11@0 y12@1 y13@2 y14@3; i s ON x1 x2; y11 ON a31 ; y12 ON a32 ; y13 ON a33 ; y14 ON a34 ; SUMMARY OF ANALYSIS Number of observations 500 TESTS OF MODEL FIT Chi-Square Test of Model Fit Value 25.786 Degrees of Freedom 21 P-Value 0.2147 Chi-Square Test of Model Fit for the Baseline Model Value 2862.582 Degrees of Freedom 30 P-Value 0.0000 CFI/TLI CFI 0.998 TLI 0.998 Loglikelihood H0 Value -7255.873 H1 Value -7242.980 Information Criteria Number of Free Parameters 17 Akaike (AIC) 14545.745 Bayesian (BIC) 14617.393 Sample-Size Adjusted BIC 14563.434 (n* = (n + 2) / 24) RMSEA (Root Mean Square Error Of Approximation) Estimate 0.021 90 Percent C.I. 0.000 0.046 Probability RMSEA <= .05 0.978 SRMR (Standardized Root Mean Square Residual) Value 0.014 MODEL RESULTS Estimates S.E. Est./S.E. I | Y11 1.000 0.000 0.000 Y12 1.000 0.000 0.000 Y13 1.000 0.000 0.000 Y14 1.000 0.000 0.000 S | Y11 0.000 0.000 0.000 Y12 1.000 0.000 0.000 Y13 2.000 0.000 0.000 Y14 3.000 0.000 0.000 I ON X1 0.557B 0.054 10.286 X2 0.718B 0.055 12.953 S ON X1 0.264D 0.025 10.549 X2 0.473D 0.026 18.438 Y11 ON A31 0.190E 0.044 4.302 Y12 ON A32 0.323E 0.038 8.433 Y13 ON A33 0.344E 0.038 9.016 Y14 ON A34 0.303E 0.050 6.004 S WITH I 0.055H 0.035 1.588 Intercepts Y11 0.000 0.000 0.000 Y12 0.000 0.000 0.000 Y13 0.000 0.000 0.000 Y14 0.000 0.000 0.000 I 0.570A 0.054 10.477 S 1.010C 0.025 40.112 Residual Variances Y11 0.509I 0.068 7.512 Y12 0.597I 0.048 12.348 Y13 0.481I 0.049 9.858 Y14 0.579I 0.088 6.607 I 1.074F 0.098 10.922 S 0.201G 0.022 9.092
- A. This analogous to G00 in the multilevel model. It is the predicted value of y when time and a are both 0.
- B. This analogous to G01 and G02 in the multilevel model. It is the predicted increase in the intercept for a one unit increase in x1 and x2, respectively.
- C. This is analogous to G20 in the multilevel model. It is the slope for time when x1 and x2 are held constant at 0.
- D. This analogous to G21 and G22 in the multilevel model. It is the predicted increase in the time slope for a one unit increase in x1 and x2, respectively.
- E. These are the four slopes representing the regression of y1 on a1, y2 on a2, y3 on a3, and y4 on a4. Note that in the multilevel model there is only one such relationship, whereas in this model there is a separate coefficient for each time point.
- F. This is the variance of the intercept, analogous to the variance component for the intercept in the multilevel model.
- G. This is the variance of the slope for time, analogous to the variance component for the intercept for time in the multilevel model.
- H. This is the covariance of the intercept and slope, analogous to the covariance of B0 and B1 from the multilevel model.
- I. This is the residual variance for each time point. Note that in the LGCM there is a separate residual variance at each time point. This is analogous to the rij value from the multilevel model. Note that in the multilevel model there is a single residual value.
Mplus example #2
Here is a second example which is a variation that uses constraints to make the assumptions more similar to the assumptions of the multilevel model. We should reiterate that the multilevel model is not identical to the LGCM model, but only similar, so the results are analogous, not identical, but we use this as a means of helping you understand a technique and output below that might be new to you.
TITLE: this is an example of a linear growth model for a continuous outcome with time- invariant and time-varying covariates DATA: FILE IS ex6.10.dat; VARIABLE: NAMES ARE y11-y14 x1 x2 a31-a34; MODEL: i s | y11@0 y12@1 y13@2 y14@3; i s ON x1 x2; y11 ON a31 (1); y12 ON a32 (1); y13 ON a33 (1); y14 ON a34 (1); y11 y12 y13 y14 (2); SUMMARY OF ANALYSIS Number of observations 500 TESTS OF MODEL FIT Loglikelihood H0 Value -7261.105 H1 Value -7242.980 Information Criteria Number of Free Parameters 11 Akaike (AIC) 14544.210 Bayesian (BIC) 14590.571 Sample-Size Adjusted BIC 14555.656 (n* = (n + 2) / 24) MODEL RESULTS Estimates S.E. Est./S.E. I | Y11 1.000 0.000 0.000 Y12 1.000 0.000 0.000 Y13 1.000 0.000 0.000 Y14 1.000 0.000 0.000 S | Y11 0.000 0.000 0.000 Y12 1.000 0.000 0.000 Y13 2.000 0.000 0.000 Y14 3.000 0.000 0.000 I ON X1 0.561B 0.054 10.303 X2 0.717B 0.056 12.867 S ON X1 0.263D 0.025 10.462 X2 0.473D 0.026 18.395 Y11 ON A31 0.297E 0.021 13.894 Y12 ON A32 0.297E 0.021 13.894 Y13 ON A33 0.297E 0.021 13.894 Y14 ON A34 0.297E 0.021 13.894 S WITH I 0.052I 0.031 1.641 Intercepts Y11 0.000 0.000 0.000 Y12 0.000 0.000 0.000 Y13 0.000 0.000 0.000 Y14 0.000 0.000 0.000 I 0.570A 0.055 10.440 S 1.010C 0.025 40.008 Residual Variances Y11 0.542I 0.024 22.361 Y12 0.542I 0.024 22.361 Y13 0.542I 0.024 22.361 Y14 0.542I 0.024 22.361 I 1.079F 0.094 11.506 S 0.203G 0.020 10.012
See the footnotes above for descriptions of the results. Exceptions are noted below.
- E. Note how the coefficients predicting y from a are all the same. Now they are closer to the multilevel model because the LGCM has been constrained to be more similar to the multilevel model.
- I. Note how the residual errors are the same. They are closer to (but not the same as) the multilevel model.