This seminar is a continuation of our introduction to Mplus seminar. We will review the basics of Mplus syntax and show some examples for simple analyses, such as regression models for continuous and binary variables. Then we’ll move on to more advanced models, such as factor analysis, path analysis, growth curve models and latent class models. Some of the examples will be demonstrated by running Mplus in real time. The data files and the input files are zipped for an easy download and can be accessed by following the link.

#### Introduction

“We started to develop Mplus eleven years ago with the goal of providing applied researchers with powerful new statistical modeling techniques. We saw a wide gap between new statistical methods presented in the statistical literature and the statistical methods used by researchers in applied papers. Our goal was to help bridge this gap with easy-to-use but powerful software.” — February 2006, Preface to the Mplus User’s Guide.

Mplus has been very successful in achieving their goal and has been improving constantly ever since it was first released in 1998. Its general framework of continuous and categorical latent variables gives us a new framework to formulate statistical models. For example, not only we can perform growth curve analysis, but also latent class growth analysis; not only we can do discrete-time survival analysis, but also discrete-time survival mixture analysis. The possibilities of different ways of modeling make Mplus a very attractive piece of software. It offers several options to deal with the missing data issue, including maximum likelihood estimation and estimation based on the multiple imputed data sets.

Over the years, we have recommended to our clients the “get in and get out” approach with Mplus (and some other statistical packages) and it seems to us that this approach has worked well. This approach consists of a few steps: deciding the appropriate models for the study; deciding if switching to Mplus is necessary; preparing the data structure for Mplus using a familiar software package; and moving to Mplus and performing the analyses.

Our goal for this seminar is to help the transition process to Mplus. We will discuss the overall structure and syntax of Mplus input files. We will also discuss the usage of the Mplus 4 User’s Guide and the online resources for Mplus. Starting with some basic models, we will transit to some more advanced models.

#### Overall structure of Mplus input file

An input file defines the data set to use and the model to run. It is similar to a SAS program file, an SPSS syntax file and a Stata .do file. Below is an example of an input file. It is here to show the general structure of an input file. We are not going to explain what analysis it does.

Data: File is d:workdatarawtable3_4.dat ; Variable: names are a b c d freq; missing are all (-9999) ; usevariables are a b c d; weight is freq ; !default is frequency weight categorical are a b c d; classes = cl(2); Analysis: Type = mixture ; starts = 0; Model: %overall% [a$1*10 b$1*10 c$1*10 d$1*10] (1); %cl#1% [a$1*-10 b$1*-10 c$1*-10 d$1*-10] (2); plot: type= plot3; series is a(1) b(2) c(3) d(4);

Here are some characteristics of an input file:

- an input line can not exceed 80 characters in width;
- variable names can not exceed 8 characters in length;
- only one model per input file;
- only one output file per input file;
- comments start with “!”;
- the default of the analysis type is
**type = general**; - the keywords
**categorical**and**count**are for outcome variables only; - new variables can be defined using the” define” command.

Here are some characteristics of a data file:

- must be in ASCII format;
- can be in fixed format or delimited;
- can be raw data or correlation data;
- no variable names in the first line;
- only numeric variables are allowed;
- use stata2mplus to convert a Stata data file to an ASCII data file and an Mplus input file.

#### Overall review of Mplus syntax for the model command

Mplus has made a great effort to make the syntax as simple as possible. Since there are so many analyses that Mplus can perform, the

modelcommand can still get really involved. We have compiled a short list here for commonly used keywords.

- “on” for regression (regress response variable “on” predictor variables);
- “by” for factors (measured “by” observed variables);
- “with” for covariance (correlated “with”);
- “[x]” for means or intercepts;
- “x” alone means the variance of x;
- “*” for starting values;
- “@” for fixed values;
- “|” for random effects;
- use (_number_) to constrain a set of parameters to be equal.

#### Use of User’s Guide and online resources

The Mplus User’s Guide is an excellent reference both for Mplus syntax and for types of models possible in Mplus. It has the flavor of learning by doing. Its organization is very different from other user guides, such as that of Stata, SAS or SPSS. Examples for basic models can be found in the first chapter, and more advanced models are divided into later chapters. The section on syntax is near the end. A very important feature is that almost all of the examples in the Guide are included with the software itself. If one sees an interesting example, one can always run the model to see the output and to modify the example to suit one’s own modeling need. An equally important feature is that each example in the book has a counterpart of Monte Carlo simulation. In fact, the Monte Carlo simulation has been used for generating most of the data sets used in the User’s Guide. The help system of Mplus has

A SUMMARY OF THE Mplus LANGUAGEfor a quick reference.The Mplus website has tremendous resources, with a very active discussion group on many topics for serious modelers and the website has many examples one can download. One can get access to the entire User’s Guide in PDF format from Mplus’ website. One can search the entire Mplus User’s Guide for examples and commands. It is a great place to learn new modeling possibilities and to learn Mplus language as well.

**Post estimation **

Mplus has three commands for post estimation. The

outputcommand, thesavedatacommand and theplotcommand. Theoutputcommand is used for requesting types of output to be included in the output file. For example, we can request sample statistics to be displayed by using the optionsampstatin theoutputcommand. Thesavedatacommand is used for creating an ASCII data file for further data analysis. Theplotcommand is needed for requesting plots. Mplus offers many model related plots and the controls over the plots are easy to use.

#### Simple examples

We will review how some simple models are done in Mplus. We will start with linear regression and then discuss models with binary outcomes.

**Example 1. Where is the output for intercept? (linear regression)**

The code below is for a simple linear regression with the dependent variable

writeregressed on the predictor variablesfemaleandread. So we use the keywordonin themodelstatement.

Data: File is hsb2.dat ; Variable: Names are id female race ses schtyp prog read write math science socst; Missing are all (-9999) ; usevariables are female write read; Model: write on female read;MODEL RESULTS

Estimates S.E. Est./S.E.

WRITE ON FEMALE 5.487 1.007 5.451 READ 0.566 0.049 11.546

Residual Variances WRITE 50.113 5.011 10.000

Notice that something is missing in the output. Yes, the intercept is missing. What does it mean? Be default, Mplus performs an analysis of covariance. To understand what it is doing, let’s perform this analysis manually in the fashion of covariance analysis. We create the covariance matrix for the variables

write,femaleandread, and use this covariance matrix as the input for our analysis.

Title: example of using covariance matrix. input data is an matrix: 89.8436 1.21369 .249221 57.9967 -.271709 105.123; Data: file is cov.dat; type is covariance; nobservations = 200; Variable: names are write female read; Model: write on female read;Estimates S.E. Est./S.E. WRITE ON FEMALE 5.487 1.007 5.451 READ 0.566 0.049 11.546 Residual Variances WRITE 50.113 5.011 10.000

That shows that the analysis we did at the beginning of this example is just an analysis of covariance. In order to estimate the intercept, which is the expected mean holding values of predictor variables at zero, we need to tell Mplus that we are also interested in the analysis of means. This can be done easily by adding

type = meanstructureto theanalysiscommand. Every model has ananalysiscommand associated with it. In this example, we don’t see theanalysiscommand because we are using the default setting. The default setting isanalysis:type = general.Models that can be estimated usingtype=generalinclude regression analysis, path analysis, confirmatory factor analysis, structural equation modeling and growth curve modeling. Within any specific analysis setting, we can add more options, such astype = missingwhen the data set has missing values, and we don’t want to do listwise deletion. Or we can addtype =meanstructureto have the mean or intercept displayed in the output window as we are going to do here.

Data: File is hsb2.dat ; Variable: Names are id female race ses schtyp prog read write math science socst; Missing are all (-9999) ; usevariables are female write read; Analysis: type=meanstructure; Model: write on female read;

Estimates S.E. Est./S.E.

WRITE ON FEMALE 5.487 1.007 5.451 READ 0.566 0.049 11.546

Intercepts WRITE 20.228 2.693 7.511

Residual Variances WRITE 50.113 5.011 10.000

**Example 2. Is it a probit or a logit regression? (binary outcome)**

Now let’s switch to binary outcomes. Using the same data set as in previous example, we create a new dichotomous variable called

honbased on the variablewrite. We also declare that the new variablehonis a categorical variable. As we have mentioned before, the keywordcategoricalis for outcome variables only. If we have categorical variables as predictors, we have to make sure the dummy variables have been created for them (usually in another software package before the data are moved into Mplus).

Data: File is hsb2.dat ; Variable: Names are id female race ses schtyp prog read write math science socst; Missing are all (-9999) ; usevariables are female math read hon; categorical is hon; define: hon = (write>60); Model: hon on female math read;

Observed dependent variables

Binary and ordered categorical (ordinal) HON

Observed independent variables FEMALE MATH READ

Estimator WLSMV Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20 Parameterization DELTA

Input data file(s) hsb2.dat

Input data format FREE

SUMMARY OF CATEGORICAL DATA PROPORTIONS

HON Category 1 0.755 Category 2 0.245

(output omitted...) MODEL RESULTS

Estimates S.E. Est./S.E.

HON ON FEMALE 0.574 0.246 2.335 MATH 0.069 0.016 4.324 READ 0.038 0.017 2.275

R-SQUARE

Observed Residual Variable Variance R-Square

HON 1.000 0.489

Now, is this a probit model or a logit model? Mplus is not very explicit about it. By default, it is a probit model. In case we don’t know the default, we can still tell that this is a probit model since it has an output section on R-square with residual variance of 1. This is what probit models assume. It assumes that the residual variance follows the standard normal distribution. Now did we miss something again? Yes. We don’t see the intercept. This is the exact same situation as we had with the linear regression. Adding

type=meanstructurewill give us the intercept, which Mplus calls “threshold”.

Data: File is hsb2.dat ; Variable: Names are id female race ses schtyp prog read write math science socst; Missing are all (-9999) ; Usevariables are female math read hon; Categorical is hon; Define: hon = (write>60); Analysis: type=meanstructure; Model: hon on female math read;MODEL RESULTS Estimates S.E. Est./S.E. HON ON FEMALE 0.574 0.246 2.335 MATH 0.069 0.016 4.324 READ 0.038 0.017 2.275 Thresholds HON$1 6.887 1.063 6.482 R-SQUARE Observed Residual Variable Variance R-Square HON 1.000 0.489

What about a logistic regression with the same data? To do a logistic regression, we will change the estimation method from the default method of WLSMV to ML.

Data: File is hsb2.dat ; Variable: Names are id female race ses schtyp prog read write math science socst; Missing are all (-9999) ; Usevariables are female math read hon; Categorical is hon; Define: hon = (write>60); Analysis: estimator = ml; Model: hon on female math read;

Estimator ML (output omitted...) Link LOGIT Cholesky OFF

Input data file(s) hsb2.dat Input data format FREE

SUMMARY OF CATEGORICAL DATA PROPORTIONS

HON Category 1 0.755 Category 2 0.245

THE MODEL ESTIMATION TERMINATED NORMALLY

TESTS OF MODEL FIT

Loglikelihood

H0 Value -78.085

Information Criteria

Number of Free Parameters 4 Akaike (AIC) 164.170 Bayesian (BIC) 177.363 Sample-Size Adjusted BIC 164.690 (n* = (n + 2) / 24)

MODEL RESULTS

Estimates S.E. Est./S.E.

HON ON FEMALE 0.980 0.422 2.324 MATH 0.123 0.031 3.931 READ 0.059 0.027 2.224

Thresholds HON$1 11.770 1.711 6.880

LOGISTIC REGRESSION ODDS RATIO RESULTS

HON ON FEMALE 2.664 MATH 1.131 READ 1.061

#### Advanced examples

**Example 1. Exploratory factor analysis **

Exploratory factor analysis has often been used to explore the variable structures. But most statistical software lacks the sophisticated techniques to deal with the missing value issue or binary variables. On the other hand, Mplus allows us to take care of both issues. Let’s start with a simple exploratory factor analysis. This example is taken from our Annotated SPSS Output Factor Analysis page. The data set has many variables, and we are only going to use item13 – item24, as they are all about instructors.

Data: File is factor.dat ; Variable: Names are facsex facethn facnat facrank employm salary yrsteach yrsut degree sample remind nstud studrank studsex grade gpa satisfy religion psd item13 item14 item15 item16 item17 item18 item19 item20 item21 item22 item23 item24 item25 item26 item27 item28 item29 item30 item31 item32 item33 item34 item35 item36 item37 item38 item39 item40 item41 item42 item43 item44 item45 item46 item47 item48 item49 item50 item51 item52 race sexism racism rpolicy casteman competen sensitiv cstatus; Missing are all (-9999) ; Usevariables are item13 - item24; Analysis: estimator = ml; Type = efa 1 3 ;

INPUT READING TERMINATED NORMALLY

SUMMARY OF ANALYSIS

Number of groups 1 Number of observations 1428

Number of dependent variables 12 Number of independent variables 0 Number of continuous latent variables 0

Observed dependent variables

Continuous ITEM13 ITEM14 ITEM15 ITEM16 ITEM17 ITEM18 ITEM19 ITEM20 ITEM21 ITEM22 ITEM23 ITEM24

Estimator ML Information matrix EXPECTED Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20

Input data file(s) factor.dat

Input data format FREE

RESULTS FOR EXPLORATORY FACTOR ANALYSIS

EIGENVALUES FOR SAMPLE CORRELATION MATRIX 1 2 3 4 5 ________ ________ ________ ________ ________ 1 6.073 1.223 0.735 0.648 0.572

EIGENVALUES FOR SAMPLE CORRELATION MATRIX 6 7 8 9 10 ________ ________ ________ ________ ________ 1 0.539 0.485 0.429 0.383 0.334

EIGENVALUES FOR SAMPLE CORRELATION MATRIX 11 12 ________ ________ 1 0.311 0.267

(output omitted...)

EXPLORATORY ANALYSIS WITH 3 FACTOR(S) :

CHI-SQUARE VALUE 147.541 DEGREES OF FREEDOM 33 PROBABILITY VALUE 0.0000

RMSEA (ROOT MEAN SQUARE ERROR OF APPROXIMATION) : ESTIMATE (90 PERCENT C.I.) IS 0.049 ( 0.041 0.058) PROBABILITY RMSEA LE 0.05 IS 0.540

ROOT MEAN SQUARE RESIDUAL IS 0.0175

VARIMAX ROTATED LOADINGS 1 2 3 ________ ________ ________ ITEM13 0.744 0.158 0.236 ITEM14 0.753 0.197 0.213 ITEM15 0.650 0.303 0.258 ITEM16 0.581 0.292 0.177 ITEM17 0.532 0.468 0.300 ITEM18 0.277 0.731 0.240 ITEM19 0.158 0.745 0.130 ITEM20 0.243 0.470 0.187 ITEM21 0.350 0.504 0.383 ITEM22 0.189 0.531 0.319 ITEM23 0.409 0.365 0.724 ITEM24 0.321 0.309 0.604

PROMAX ROTATED LOADINGS 1 2 3 ________ ________ ________ ITEM13 0.820 -0.098 0.050 ITEM14 0.828 -0.037 0.001 ITEM15 0.645 0.110 0.063 ITEM16 0.591 0.152 -0.029 ITEM17 0.424 0.342 0.105 ITEM18 0.035 0.790 0.009 ITEM19 -0.079 0.890 -0.116 ITEM20 0.093 0.475 0.032 ITEM21 0.144 0.402 0.268 ITEM22 -0.048 0.510 0.218 ITEM23 0.128 0.044 0.786 ITEM24 0.079 0.048 0.662

PROMAX FACTOR CORRELATIONS 1 2 3 ________ ________ ________ 1 1.000 2 0.611 1.000 3 0.658 0.685 1.000

ESTIMATED RESIDUAL VARIANCES ITEM13 ITEM14 ITEM15 ITEM16 ITEM17 ________ ________ ________ ________ ________ 1 0.367 0.349 0.418 0.545 0.408

ESTIMATED RESIDUAL VARIANCES ITEM18 ITEM19 ITEM20 ITEM21 ITEM22 ________ ________ ________ ________ ________ 1 0.331 0.404 0.685 0.477 0.581

ESTIMATED RESIDUAL VARIANCES ITEM23 ITEM24 ________ ________ 1 0.176 0.436

**Example 2. Exploratory factor analysis with binary variables**

For the purpose of illustration, we dichotomized the variables item13-item24 from the previous example. We will do the same exploratory factor analysis again, but with the binary variables. Factor analysis with binary variables uses the tetrachoric correlation structure. It requires much larger sample size than the case for continuous variables.

Data: File is cat_factor.dat ; Variable: Names are item13 item14 item15 item16 item17 item18 item19 item20 item21 item22 item23 item24 cat_13 - cat_24; Missing are all (-9999) ; usevariables are cat_13 - cat_24; categorical are cat_13 - cat_24; Analysis: Type = efa 1 3 ;SUMMARY OF ANALYSIS

Number of groups 1 Number of observations 1428

Number of dependent variables 12 Number of independent variables 0 Number of continuous latent variables 0

Observed dependent variables

Binary and ordered categorical (ordinal) CAT_13 CAT_14 CAT_15 CAT_16 CAT_17 CAT_18 CAT_19 CAT_20 CAT_21 CAT_22 CAT_23 CAT_24

Estimator ULS Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20

(output omitted...) RESULTS FOR EXPLORATORY FACTOR ANALYSIS

EIGENVALUES FOR SAMPLE CORRELATION MATRIX 1 2 3 4 5 ________ ________ ________ ________ ________ 1 7.208 1.280 0.768 0.622 0.451

EIGENVALUES FOR SAMPLE CORRELATION MATRIX 6 7 8 9

________ ________ ________ ________ ________ 1 0.424 0.374 0.259 0.180 0.174

EIGENVALUES FOR SAMPLE CORRELATION MATRIX 11 12 ________ ________ 1 0.157 0.104

(output omitted...)

EXPLORATORY ANALYSIS WITH 3 FACTOR(S) :

ROOT MEAN SQUARE RESIDUAL IS 0.0199

VARIMAX ROTATED LOADINGS 1 2 3 ________ ________ ________ CAT_13 0.813 0.260 0.297 CAT_14 0.806 0.228 0.306 CAT_15 0.824 0.293 0.262 CAT_16 0.758 0.307 0.141 CAT_17 0.724 0.502 0.226 CAT_18 0.254 0.794 0.241 CAT_19 0.363 0.728 0.154 CAT_20 0.223 0.484 0.153 CAT_21 0.271 0.574 0.414 CAT_22 0.177 0.592 0.320 CAT_23 0.412 0.413 0.812 CAT_24 0.337 0.368 0.613

PROMAX ROTATED LOADINGS 1 2 3 ________ ________ ________ CAT_13 0.832 -0.005 0.120 CAT_14 0.832 -0.049 0.144 CAT_15 0.844 0.050 0.062 CAT_16 0.791 0.136 -0.087 CAT_17 0.657 0.369 -0.028 CAT_18 -0.032 0.882 0.008 CAT_19 0.151 0.799 -0.112 CAT_20 0.064 0.516 -0.003 CAT_21 0.017 0.517 0.300 CAT_22 -0.079 0.604 0.193 CAT_23 0.137 0.102 0.842 CAT_24 0.115 0.144 0.612

PROMAX FACTOR CORRELATIONS 1 2 3 ________ ________ ________ 1 1.000 2 0.606 1.000 3 0.574 0.645 1.000

ESTIMATED RESIDUAL VARIANCES CAT_13 CAT_14 CAT_15 CAT_16 CAT_17 ________ ________ ________ ________ ________ 1 0.183 0.205 0.166 0.312 0.172

ESTIMATED RESIDUAL VARIANCES CAT_18 CAT_19 CAT_20 CAT_21 CAT_22 ________ ________ ________ ________ ________ 1 0.247 0.315 0.693 0.426 0.516

ESTIMATED RESIDUAL VARIANCES CAT_23 CAT_24 ________ ________ 1 0.001 0.376

**Example 3. Exploratory factor analysis on continuous outcome variables
with missing data**

For the purpose of illustration again, we have created another version of the data set. This data set is basely on the data set in

Example 1in the section of Advanced Examples. We have created a lot of missing values, and the pattern of missing is completely random. For the same analysis, we will add thetype = missingoption to tell Mplus that the analysis will be done without deleting any cases. In general, Mplus offers ML estimation under the assumption of MCAR and MAR. From the output labeled as “PROPORTION OF DATA PRESENT”, we can see that many variables have a good amount of missing data.

Data: File is factor_missing.dat ; Variable: Names are item13 item14 item15 item16 item17 item18 item19 item20 item21 item22 item23 item24; Missing are all (-9999) ; Analysis: Type = efa 1 3 missing;

INPUT READING TERMINATED NORMALLY

SUMMARY OF ANALYSIS

Number of groups 1 Number of observations 1428

Number of dependent variables 12 Number of independent variables 0 Number of continuous latent variables 0

Observed dependent variables

Continuous ITEM13 ITEM14 ITEM15 ITEM16 ITEM17 ITEM18 ITEM19 ITEM20 ITEM21 ITEM22 ITEM23 ITEM24

Estimator ML Information matrix OBSERVED Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20

Input data file(s) factor_missing.dat

Input data format FREE

SUMMARY OF DATA

Number of patterns 940

COVARIANCE COVERAGE OF DATA

Minimum covariance coverage value 0.100

PROPORTION OF DATA PRESENT

Covariance Coverage ITEM13 ITEM14 ITEM15 ITEM16 ITEM17 ________ ________ ________ ________ ________ ITEM13 0.492 ITEM14 0.209 0.436 ITEM15 0.216 0.183 0.433 ITEM16 0.266 0.235 0.225 0.513 ITEM17 0.277 0.235 0.227 0.280 0.544 ITEM18 0.257 0.228 0.237 0.264 0.282 ITEM19 0.245 0.218 0.218 0.263 0.275 ITEM20 0.271 0.232 0.214 0.288 0.293 ITEM21 0.305 0.277 0.272 0.319 0.343 ITEM22 0.349 0.298 0.305 0.370 0.379 ITEM23 0.422 0.377 0.371 0.443 0.477 ITEM24 0.410 0.368 0.368 0.436 0.466

Covariance Coverage ITEM18 ITEM19 ITEM20 ITEM21 ITEM22 ________ ________ ________ ________ ________ ITEM18 0.520 ITEM19 0.258 0.508 ITEM20 0.272 0.272 0.533 ITEM21 0.327 0.318 0.333 0.625 ITEM22 0.370 0.361 0.382 0.438 0.704 ITEM23 0.449 0.440 0.453 0.543 0.606 ITEM24 0.431 0.428 0.451 0.539 0.590

Covariance Coverage ITEM23 ITEM24 ________ ________ ITEM23 0.867 ITEM24 0.732 0.848

RESULTS FOR EXPLORATORY FACTOR ANALYSIS

EIGENVALUES FOR SAMPLE CORRELATION MATRIX 1 2 3 4 5 ________ ________ ________ ________ ________ 1 6.043 1.257 0.736 0.658 0.627

EIGENVALUES FOR SAMPLE CORRELATION MATRIX 6 7 8 9 10 ________ ________ ________ ________ ________ 1 0.551 0.454 0.439 0.422 0.331

EIGENVALUES FOR SAMPLE CORRELATION MATRIX 11 12 ________ ________ 1 0.267 0.213

(output omitted...)

EXPLORATORY ANALYSIS WITH 3 FACTOR(S) :

CHI-SQUARE VALUE 90.822 DEGREES OF FREEDOM 33 PROBABILITY VALUE 0.0000

RMSEA (ROOT MEAN SQUARE ERROR OF APPROXIMATION) : ESTIMATE (90 PERCENT C.I.) IS 0.035 ( 0.027 0.044) PROBABILITY RMSEA LE 0.05 IS 0.998

ROOT MEAN SQUARE RESIDUAL IS 0.0286

VARIMAX ROTATED LOADINGS 1 2 3 ________ ________ ________ ITEM13 0.789 0.168 0.151 ITEM14 0.742 0.216 0.176 ITEM15 0.598 0.347 0.312 ITEM16 0.549 0.176 0.345 ITEM17 0.535 0.264 0.483 ITEM18 0.233 0.231 0.750 ITEM19 0.142 0.183 0.700 ITEM20 0.278 0.151 0.510 ITEM21 0.337 0.362 0.546 ITEM22 0.169 0.310 0.520 ITEM23 0.358 0.768 0.392 ITEM24 0.315 0.554 0.348

PROMAX ROTATED LOADINGS 1 2 3 ________ ________ ________ ITEM13 0.897 -0.041 -0.091 ITEM14 0.812 0.031 -0.066 ITEM15 0.540 0.203 0.101 ITEM16 0.526 -0.034 0.235 ITEM17 0.433 0.041 0.388 ITEM18 -0.025 -0.021 0.848 ITEM19 -0.109 -0.043 0.827 ITEM20 0.137 -0.055 0.546 ITEM21 0.127 0.210 0.484 ITEM22 -0.061 0.194 0.519 ITEM23 0.062 0.829 0.089 ITEM24 0.096 0.558 0.137

PROMAX FACTOR CORRELATIONS 1 2 3 ________ ________ ________ 1 1.000 2 0.628 1.000 3 0.614 0.686 1.000

ESTIMATED RESIDUAL VARIANCES ITEM13 ITEM14 ITEM15 ITEM16 ITEM17 ________ ________ ________ ________ ________ 1 0.327 0.373 0.424 0.549 0.411

ESTIMATED RESIDUAL VARIANCES ITEM18 ITEM19 ITEM20 ITEM21 ITEM22 ________ ________ ________ ________ ________ 1 0.329 0.456 0.639 0.457 0.605

ESTIMATED RESIDUAL VARIANCES ITEM23 ITEM24 ________ ________ 1 0.128 0.473

**Example 4. Path analysis with indirect and direct effects**

We have created a fake data set on school performance. We hypothesize that school performance will be related to student’s IQ, ambition and social economic status. On the other hand, student’s IQ might be also related to ses. Here is the diagram for our hypothesis:

Mplus offers a very straightforward way to display all the possible direct and indirect effects by using the

model indirectstatement.

Data: File is path_anlaysis.dat ; Variable: Names are pfrm ses ambition iq; Missing are all (-9999) ; Model: pfrm on iq ambition ses; iq on ses; Model indirect: pfrm ind ses;TESTS OF MODEL FIT Chi-Square Test of Model Fit Value 0.060 Degrees of Freedom 1 P-Value 0.8066 Chi-Square Test of Model Fit for the Baseline Model Value 135.440 Degrees of Freedom 5 P-Value 0.0000 CFI/TLI CFI 1.000 TLI 1.036 Loglikelihood H0 Value -1775.747 H1 Value -1775.717 Information Criteria Number of Free Parameters 6 Akaike (AIC) 3563.494 Bayesian (BIC) 3583.283 Sample-Size Adjusted BIC 3564.275 (n* = (n + 2) / 24) RMSEA (Root Mean Square Error Of Approximation) Estimate 0.000 90 Percent C.I. 0.000 0.117 Probability RMSEA <= .05 0.849 SRMR (Standardized Root Mean Square Residual) Value 0.006 MODEL RESULTS Estimates S.E. Est./S.E. PFRM ON IQ 0.547 0.051 10.728 AMBITION 5.635 1.009 5.584 SES 0.930 0.727 1.279 IQ ON SES 4.152 0.957 4.339 Residual Variances PFRM 49.706 4.971 10.000 IQ 95.599 9.560 10.000 TOTAL, TOTAL INDIRECT, SPECIFIC INDIRECT, AND DIRECT EFFECTS Estimates S.E. Est./S.E. Effects from SES to PFRM Total 3.201 0.870 3.677 Total indirect 2.271 0.565 4.022 Specific indirect PFRM IQ SES 2.271 0.565 4.022 Direct PFRM SES 0.930 0.727 1.279

**Example 5. Growth curve modeling with the long format approach**

We have chosen a simple example to show how Mplus can handle growth curve modeling. Unlike most statistical software, Mplus does growth curve modeling in both long and wide format. The two approaches offer different ways of looking at the same model and offer alternative models to one another. The example here is taken from Chapter 7 of Singer and Willett’s

Applied Longitudinal Data Analysis.The outcome variable is the response time on a timed cognitive task called “opposites naming”. It is measured at four time points. We will start with the long format approach. This means that each subject will have potentially four rows of observations on the dependent variable and other covariates. In other words, this is the univariate approach. This is also the standard hierarchical linear model approach.

Data: File is opposites_pp.dat; Variable: Names are id time opp cog ccog wave; Missing are all (-9999) ; Usevariables are time opp ccog; Cluster = id; Within are time ; Between are ccog; Analysis: type = random twolevel; Model: %within% s | opp on time; %between% opp s on ccog; opp with s;SUMMARY OF ANALYSIS Number of groups 1 Number of observations 140 Number of dependent variables 1 Number of independent variables 2 Number of continuous latent variables 1 Observed dependent variables Continuous OPP Observed independent variables TIME CCOG Continuous latent variables S Variables with special functions Cluster variable ID Within variables TIME Between variables CCOG Estimator MLR Information matrix OBSERVED Maximum number of iterations 1000 Convergence criterion 0.100D-05 Maximum number of EM iterations 500 Convergence criteria for the EM algorithm Loglikelihood change 0.100D-02 Relative loglikelihood change 0.100D-05 Derivative 0.100D-02 Minimum variance 0.100D-03 Maximum number of steepest descent iterations 20 Maximum number of iterations for H1 2000 Convergence criterion for H1 0.100D-03 Optimization algorithm EMA Input data file(s) opposites_pp.dat Input data format FREE SUMMARY OF DATA Number of clusters 35 Size (s) Cluster ID with Size s 4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Average cluster size 4.000 Estimated Intraclass Correlations for the Y Variables Intraclass Intraclass Variable Correlation Variable Correlation OPP 0.406 THE MODEL ESTIMATION TERMINATED NORMALLY TESTS OF MODEL FIT Loglikelihood H0 Value -633.451 H0 Scaling Correction Factor 0.793 for MLR Information Criteria Number of Free Parameters 8 Akaike (AIC) 1282.901 Bayesian (BIC) 1306.434 Sample-Size Adjusted BIC 1281.123 (n* = (n + 2) / 24) MODEL RESULTS Estimates S.E. Est./S.E. Within Level Residual Variances OPP 159.727 23.491 6.800 Between Level S ON CCOG 0.433 0.121 3.566 OPP ON CCOG -0.114 0.416 -0.274 OPP WITH S -165.185 67.783 -2.437 Intercepts OPP 164.384 6.024 27.286 S 26.954 1.936 13.923 Residual Variances OPP 1158.985 278.161 4.167 S 99.238 23.369 4.247

**Example 6a. Growth curve modeling with the wide format approach **

Now let’s move to growth curve modeling with a wide format approach. The data structure is now in wide format. That is each subject will only have one row of data, with four dependent variables corresponding to the four time points. In other words, this is the multivariate approach. To this end, we have to restructure the data from long to wide (in another statistical package). In order to match the results from the long format approach, we have to constrain the residual variance at each time point to be equal to each other. This also gives us a hint that the residual variances don’t have to be always equal, leading to more flexible models.

Data: File is opposites_wide.dat ; Variable: Names are id opp1 opp2 opp3 opp4 cog ccog; Missing are all (-9999) ; usev = opp1-opp4 ccog; Analysis: Type = meanstructure; Model: i s | opp1@0 opp2@1 opp3@2 opp4@3; i s on ccog; [i s];

[opp1-opp4@0]; ! constraining the mean to be zero at all time points. opp1 - opp4 (1); ! constraining the residual variance to be equal ! at all time points.

INPUT READING TERMINATED NORMALLY

SUMMARY OF ANALYSIS

Number of groups 1 Number of observations 35

Number of dependent variables 4 Number of independent variables 1 Number of continuous latent variables 2

Observed dependent variables

Continuous OPP1 OPP2 OPP3 OPP4

Observed independent variables CCOG

Continuous latent variables I S

Estimator ML Information matrix EXPECTED Maximum number of iterations 1000 Convergence criterion 0.500D-04 Maximum number of steepest descent iterations 20

Input data file(s) opposites_wide.dat

Input data format FREE

THE MODEL ESTIMATION TERMINATED NORMALLY

TESTS OF MODEL FIT

Chi-Square Test of Model Fit

Value 6.899 Degrees of Freedom 10 P-Value 0.7350

Chi-Square Test of Model Fit for the Baseline Model

Value 134.996 Degrees of Freedom 10 P-Value 0.0000

CFI/TLI

CFI 1.000 TLI 1.025

Loglikelihood

H0 Value -770.987 H1 Value -767.538

Information Criteria

Number of Free Parameters 8 Akaike (AIC) 1557.975 Bayesian (BIC) 1570.418 Sample-Size Adjusted BIC 1545.438 (n* = (n + 2) / 24)

RMSEA (Root Mean Square Error Of Approximation)

Estimate 0.000 90 Percent C.I. 0.000 0.134 Probability RMSEA <= .05 0.787

SRMR (Standardized Root Mean Square Residual)

Value 0.043

MODEL RESULTS

Estimates S.E. Est./S.E.

I | OPP1 1.000 0.000 0.000 OPP2 1.000 0.000 0.000 OPP3 1.000 0.000 0.000 OPP4 1.000 0.000 0.000

S | OPP1 0.000 0.000 0.000 OPP2 1.000 0.000 0.000 OPP3 2.000 0.000 0.000 OPP4 3.000 0.000 0.000

I ON CCOG -0.114 0.489 -0.232

S ON CCOG 0.433 0.157 2.753

S WITH I -165.303 78.279 -2.112

Intercepts OPP1 0.000 0.000 0.000 OPP2 0.000 0.000 0.000 OPP3 0.000 0.000 0.000 OPP4 0.000 0.000 0.000 I 164.374 6.026 27.277 S 26.960 1.936 13.925

Residual Variances OPP1 159.475 26.956 5.916 OPP2 159.475 26.956 5.916 OPP3 159.475 26.956 5.916 OPP4 159.475 26.956 5.916 I 1159.354 304.409 3.809 S 99.298 31.821 3.121

**Example 6b. Growth curve modeling with the wide format approach
(different parameterization)**

As we have assumed in the previous models, the random intercept and the random slope are always correlated with each other. With the wide format approach, we can also model the correlation in the way of regression. This basically reparameterizes the model. But now we can describe the relationship between the intercept and the slope in terms of changes.

Data: File is opposites_wide.dat ; Variable: Names are id opp1 opp2 opp3 opp4 cog ccog; Missing are all (-9999) ; usev = opp1-opp4 ccog; Analysis: Type = meanstructure; Model: i s | opp1@0 opp2@1 opp3@2 opp4@3; i s on ccog; [i s]; s on i; ! different parameterization happens here [opp1-opp4@0]; ! constraining the mean to be zero at all time points. opp1 - opp4 (1); ! constraining the residual variance to be equal ! at all time points.TESTS OF MODEL FIT Chi-Square Test of Model Fit Value 6.899 Degrees of Freedom 10 P-Value 0.7350 Chi-Square Test of Model Fit for the Baseline Model Value 134.996 Degrees of Freedom 10 P-Value 0.0000 CFI/TLI CFI 1.000 TLI 1.025 Loglikelihood H0 Value -770.987 H1 Value -767.538 Information Criteria Number of Free Parameters 8 Akaike (AIC) 1557.975 Bayesian (BIC) 1570.418 Sample-Size Adjusted BIC 1545.438 (n* = (n + 2) / 24) RMSEA (Root Mean Square Error Of Approximation) Estimate 0.000 90 Percent C.I. 0.000 0.134 Probability RMSEA <= .05 0.787 SRMR (Standardized Root Mean Square Residual) Value 0.043 MODEL RESULTS Estimates S.E. Est./S.E. I | OPP1 1.000 0.000 0.000 OPP2 1.000 0.000 0.000 OPP3 1.000 0.000 0.000 OPP4 1.000 0.000 0.000 S | OPP1 0.000 0.000 0.000 OPP2 1.000 0.000 0.000 OPP3 2.000 0.000 0.000 OPP4 3.000 0.000 0.000 S ON I -0.143 0.051 -2.773 I ON CCOG -0.114 0.489 -0.232 S ON CCOG 0.417 0.135 3.091 Intercepts OPP1 0.000 0.000 0.000 OPP2 0.000 0.000 0.000 OPP3 0.000 0.000 0.000 OPP4 0.000 0.000 0.000 I 164.374 6.026 27.277 S 50.398 8.613 5.852 Residual Variances OPP1 159.477 26.957 5.916 OPP2 159.477 26.957 5.916 OPP3 159.477 26.957 5.916 OPP4 159.477 26.957 5.916 I 1159.380 304.416 3.809 S 75.726 23.268 3.255

**Example 7a. Latent class analysis**

This example uses the hsb2 data set. We have test scores for the students in the sample and demographic variables as well. We want to see if we can classify students based on their test scores and how the class membership relates to other variables. This example is strictly for the purpose of illustration and therefore does not reflect any real theory or such. Notice that we have taken the default syntax to perform this analysis. We are looking for a two latent classes solution based on the scores on read, write, math, science and social studies (socst). The class membership is then regressed on the variables female and ses. Our model runs “successfully”. But Mplus gives us warning messages. It tells that the assumption that Mplus makes by default is that all the variables are uncorrelated within each latent class. Can we accept this assumption? Maybe not. But for the time being, let’s take a look at the rest of the output. We have the average scores for each of the two latent classes. We can tell that the first class has lower means on all the variables and the second one has higher means. These two classes make sense to us. Also, the class membership is highly related to ses.

Data: File is hsb2.dat ; Variable: Names are id female race ses schtyp prog read write math science socst; Usevariables are read write math science socst female ses; classes = grp(2); Analysis: type=mixture; Model: %overall% grp#1 on female ses;*** WARNING in Model command Variable is uncorrelated with all other variables within class: READ *** WARNING in Model command Variable is uncorrelated with all other variables within class: WRITE *** WARNING in Model command Variable is uncorrelated with all other variables within class: MATH *** WARNING in Model command Variable is uncorrelated with all other variables within class: SCIENCE *** WARNING in Model command Variable is uncorrelated with all other variables within class: SOCST *** WARNING in Model command All least one variable is uncorrelated with all other variables within class. Check that this is what is intended. 6 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS Latent Class Analysis with Graphs SUMMARY OF ANALYSIS Number of groups 1 Number of observations 200 Number of dependent variables 5 Number of independent variables 2 Number of continuous latent variables 0 Number of categorical latent variables 1 Observed dependent variables Continuous READ WRITE MATH SCIENCE SOCST Observed independent variables FEMALE SES Categorical latent variables GRP Estimator MLR

(output omitted...) TESTS OF MODEL FIT Loglikelihood H0 Value -3510.499 H0 Scaling Correction Factor 1.126 for MLR Information Criteria Number of Free Parameters 18 Akaike (AIC) 7056.999 Bayesian (BIC) 7116.369 Sample-Size Adjusted BIC 7059.343 (n* = (n + 2) / 24) Entropy 0.852 FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THE ESTIMATED MODEL Latent Classes 1 96.61160 0.48306 2 103.38840 0.51694 FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS BASED ON ESTIMATED POSTERIOR PROBABILITIES Latent Classes 1 96.61161 0.48306 2 103.38839 0.51694 CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP Class Counts and Proportions Latent Classes 1 95 0.47500 2 105 0.52500 Average Latent Class Probabilities for Most Likely Latent Class Membership (Row) by Latent Class (Column) 1 2 1 0.963 0.037 2 0.049 0.951 MODEL RESULTS Estimates S.E. Est./S.E. Latent Class 1 Means READ 44.645 1.107 40.336 WRITE 45.822 1.197 38.269 MATH 45.766 0.806 56.784 SCIENCE 45.189 1.405 32.153 SOCST 45.785 1.375 33.288 Variances READ 50.830 5.261 9.662 WRITE 44.222 5.109 8.656 MATH 43.108 4.842 8.903 SCIENCE 56.073 7.406 7.572 SOCST 73.733 7.395 9.970 Latent Class 2 Means READ 59.318 1.168 50.791 WRITE 59.272 0.913 64.939 MATH 59.073 1.256 47.018 SCIENCE 58.075 0.836 69.495 SOCST 58.591 1.041 56.288 Variances READ 50.830 5.261 9.662 WRITE 44.222 5.109 8.656 MATH 43.108 4.842 8.903 SCIENCE 56.073 7.406 7.572 SOCST 73.733 7.395 9.970 Categorical Latent Variables GRP#1 ON FEMALE -0.173 0.344 -0.502 SES -0.779 0.222 -3.506 Intercepts GRP#1 1.622 0.556 2.917 LOGISTIC REGRESSION ODDS RATIO RESULTS Categorical Latent Variables GRP#1 ON FEMALE 0.841 SES 0.459 ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT VARIABLE REGRESSION Parameterization using Reference Class 1 GRP#2 ON FEMALE 0.173 0.344 0.502 SES 0.779 0.222 3.506 Intercepts GRP#2 -1.622 0.556 -2.917

**Example 7b. Latent class analysis with graphics**

Now, let’s take up the issue of the correlation of variables within latent classes. We will also request some plots. Should we allow all the test scores to be correlated with each other? Maybe not. In this example, we allow reading scores to be correlated with all the other test scores, writing scores to be correlated with social studies scores, and math scores to be correlated with the science scores. We can take a look at the difference in AIC values and conclude that this is a better fitting model than the previous one.

Data: File is hsb2.dat ; Variable: Names are id female race ses schtyp prog read write math science socst; Usevariables are read write math science socst female ses; classes = grp(2); Analysis: type=mixture; Model: %overall% read with write; read with math; read with science; read with socst; write with socst; math with science; grp#1 on female ses; Plot: type is plot3; series is read (1) write (2) math (3) science (4) socst (5);(output omitted...) TESTS OF MODEL FIT Loglikelihood H0 Value -3455.156 H0 Scaling Correction Factor 1.068 for MLR Information Criteria Number of Free Parameters 24 Akaike (AIC) 6958.313 Bayesian (BIC) 7037.472 Sample-Size Adjusted BIC 6961.438 (n* = (n + 2) / 24) Entropy 0.838 FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THE ESTIMATED MODEL Latent Classes 1 77.82126 0.38911 2 122.17874 0.61089 FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS BASED ON ESTIMATED POSTERIOR PROBABILITIES Latent Classes 1 77.82125 0.38911 2 122.17875 0.61089 CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP Class Counts and Proportions Latent Classes 1 76 0.38000 2 124 0.62000 Average Latent Class Probabilities for Most Likely Latent Class Membership (Row) by Latent Class (Column) 1 2 1 0.956 0.044 2 0.042 0.958 MODEL RESULTS Estimates S.E. Est./S.E. Latent Class 1 READ WITH WRITE 9.024 3.276 2.755 MATH 24.570 5.285 4.649 SCIENCE 27.390 5.820 4.706 SOCST 25.783 5.457 4.724 WRITE WITH SOCST 18.927 3.559 5.319 MATH WITH SCIENCE 27.609 6.718 4.109 Means READ 45.417 0.942 48.209 WRITE 42.995 1.347 31.917 MATH 45.527 0.722 63.091 SCIENCE 45.100 1.172 38.487 SOCST 45.613 1.261 36.185 Variances READ 66.360 5.860 11.324 WRITE 28.467 4.359 6.530 MATH 55.061 6.780 8.121 SCIENCE 68.513 9.495 7.216 SOCST 85.301 8.522 10.010 Latent Class 2 READ WITH WRITE 9.024 3.276 2.755 MATH 24.570 5.285 4.649 SCIENCE 27.390 5.820 4.706 SOCST 25.783 5.457 4.724 WRITE WITH SOCST 18.927 3.559 5.319 MATH WITH SCIENCE 27.609 6.718 4.109 Means READ 56.570 1.153 49.054 WRITE 59.005 0.580 101.768 MATH 57.179 1.072 53.347 SCIENCE 56.150 1.018 55.171 SOCST 56.731 1.024 55.428 Variances READ 66.360 5.860 11.324 WRITE 28.467 4.359 6.530 MATH 55.061 6.780 8.121 SCIENCE 68.513 9.495 7.216 SOCST 85.301 8.522 10.010 Categorical Latent Variables GRP#1 ON FEMALE -1.166 0.419 -2.780 SES -1.069 0.278 -3.842 Intercepts GRP#1 2.297 0.665 3.456 LOGISTIC REGRESSION ODDS RATIO RESULTS Categorical Latent Variables GRP#1 ON FEMALE 0.312 SES 0.343 ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT VARIABLE REGRESSION Parameterization using Reference Class 1 GRP#2 ON FEMALE 1.166 0.419 2.780 SES 1.069 0.278 3.842 Intercepts GRP#2 -2.297 0.665 -3.456