*These code fragments are examples that we are using to try
and understand these techniques using Mplus. We ask that you treat them as
works in progress that explore these techniques, rather than definitive answers
as to how to analyze any particular kind of data.*

Consider the file Stata file

hsb6that has 600 observations with information about students like their reading, writing, math and other achievement scores. For the variableslocus concept mot read-sswe will make a binary variable calledhi___that is 1 if the score is over the median, and 0 if below the median. This will be useful when we need a binary variable. Here we read the data from Stata, make the binary version of the file, compress it, and then conver the file to mplus usingstata2mplus.

use https://stats.idre.ucla.edu/stat/mplus/code/hsb6, clear foreach varname of varlist locus concept mot read-ss { summarize `varname', detail generate hi`varname' = `varname' > `r(p50)' } compress save hsb6, replace stata2mplus using hsb6

We now have the input file

//mplus/code/hsb6.inpand the data file it reads calledhttps://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb6.dat.

Example 1: A latent class analysis with 2 classes, and continuous indicators

Here is the input file

Data: File is I:mplushttps://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb6.dat ; Variable: Names are id gender race ses sch prog locus concept mot career read write math sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic; Usevariables are read write math sci ss ; classes = c(2); Analysis: Type=mixture; MODEL: %C#1% [read math sci ss write * 30 ]; %C#2% [read math sci ss write * 60]; OUTPUT: TECH8; SAVEDATA: file is lca_ex1.txt ; save is cprob; format is free;

Here is the output

------------------------------------------------------------------------------ FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE BASED ON ESTIMATED POSTERIOR PROBABILITIES Class 1 274.09000 0.45682 Class 2 325.91000 0.54318

#1. One way to view the second column is the average probability of falling into class 1 and class 2. As a result column 1 is the average probability times 600 (see stata example below for comparison).

A second way to view the second column is by taking each persons probability of falling into a class, and summing them. If person #6 has a .8 estimated probability of being in class 1, and .2 of being in class 2, then that person contributes .8 to class 1 and .2 to class 2. This is why these are these are fractional (see stata example below for comparison).

A third way of viewing this is that there is an underlying continuum of the latent variable, and there is a threshold for being categorized as class 1 or class 2, and that threshold can be used to compute the probabilities of being in the classes, see section #5

------------------------------------------------------------------------------ CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP Class Counts and Proportions Class 1 272 0.45333 Class 2 328 0.54667

#2. This shows the count of people who fall into each class by taking their probability of membership in each class and assigning them to the class which they have the highest probability of falling into. Note the counts are exact whole numbers.

------------------------------------------------------------------------------ Average Class Probabilities by Class 1 2 Class 1 0.957 0.043 Class 2 0.042 0.958

#3. This is related to the output in #1, but takes the probabilities of class membership and averages them by class, see Stata portion below for more on this.

------------------------------------------------------------------------------ MODEL RESULTS Estimates S.E. Est./S.E. CLASS 1 Means READ 43.783 0.642 68.152 WRITE 45.068 0.730 61.738 MATH 44.794 0.469 95.540 SCI 44.446 0.740 60.051 SS 45.574 0.658 69.237 Variances READ 46.463 2.785 16.681 WRITE 49.427 3.011 16.415 MATH 46.634 3.133 14.884 SCI 49.022 3.388 14.470 SS 62.216 4.109 15.141 CLASS 2 Means READ 58.730 0.605 97.000 WRITE 58.538 0.497 117.764 MATH 57.782 0.687 84.120 SCI 57.917 0.499 116.079 SS 57.488 0.589 97.629 Variances READ 46.463 2.785 16.681 WRITE 49.427 3.011 16.415 MATH 46.634 3.133 14.884 SCI 49.022 3.388 14.470 SS 62.216 4.109 15.141

#4. This shows the average on the scores for the two classes. Class 1 is a low performing group, and class 2 is a high performing group.

------------------------------------------------------------------------------ LATENT CLASS REGRESSION MODEL PART Means C#1 -0.173 0.133 -1.298

#5. This is the threshold for dividing the two classes. If you are below the threshhold, you are class 1, above it and you are class 2. We see the threshold is

-0.173. Say that we then convert this threshold to a probability like this.Prob(class 1) = 1/(1 + exp(-threshold1)) = 1 / ( 1 + exp( 0.173)) = .4568 (compare to section 1 above).

Prob(class 2) = 1 – 1/(1 + exp(-threshold1)) = 1 – 1 / ( 1 + exp( 0.173)) = .54314 (compare to section 1 above).

------------------------------------------------------------------------------

We now read the saved data file into Stata for comparison to the Mplus output.

infile read write math sci ss cprob1 cprob2 class using lca_ex1.txt

Below we show the first observations from the middle of this file. Note that

cprob1is the probability of being in class 1 andcprob2is the probability of being in class 2, andclassis the class membership based on the class with the highest probability.

. list in 200/210 +-------------------------------------------------------------+ | read write math sci ss cprob1 cprob2 class | |-------------------------------------------------------------| 200. | 46.9 52.1 42.5 47.7 60.5 .944 .056 1 | 201. | 46.9 51.5 57 49.8 40.6 .9 .1 1 | 202. | 46.9 52.8 49.3 53.1 35.6 .983 .017 1 | 203. | 46.9 43.7 41.9 41.7 35.6 1 0 1 | 204. | 46.9 61.9 53 52.6 60.5 .016 .984 2 | |-------------------------------------------------------------| 205. | 46.9 41.1 45.3 47.1 55.6 .998 .002 1 | 206. | 46.9 38.5 47.1 41.7 25.7 1 0 1 | 207. | 46.9 54.1 46.4 49.8 55.6 .827 .173 1 | 208. | 46.9 51.5 48.5 49.8 50.6 .934 .066 1 | 209. | 46.9 41.1 53.6 41.7 55.6 .995 .005 1 | |-------------------------------------------------------------| 210. | 46.9 61.9 46.2 60.7 45.6 .17 .83 2 | +-------------------------------------------------------------+

Note that if we tabulate

classwe see where the values from section #2 of the output came from.

. tab class class | Freq. Percent Cum. ------------+----------------------------------- 1 | 272 45.33 45.33 2 | 328 54.67 100.00 ------------+----------------------------------- Total | 600 100.00

Note that if we take the average of

cprob1andcprob2, we can relate these values to column 2 of section #1 of the output.

. summ cprob1 cprob2 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- cprob1 | 600 .4568233 .4664192 0 1 cprob2 | 600 .5431767 .4664192 0 1

If we sum the probabilities, we can relate these to column 1 of section #1 of the output.

. tabstat cprob1 cprob2, stat(sum) stats | cprob1 cprob2 ---------+-------------------- sum | 274.094 325.906 ------------------------------

If we average the probabilities by class, we can relate these values to section #3 of the output.

. tabstat cprob1 cprob2, by(class) Summary statistics: mean by categories of: class class | cprob1 cprob2 ---------+-------------------- 1 | .9570699 .0429301 2 | .0419848 .9580152 ---------+-------------------- Total | .4568233 .5431767 ------------------------------

Say that we get the mean of the reading, writing, math, science and social science scores and weight them by the probability of being in class 1 and then again weighting by the probability of being in class 2. Note the correspondence between these means and the means from section 4 of the output.

. tabstat read write math sci ss [aw=cprob1], stat(mean) stats | read write math sci ss ---------+-------------------------------------------------- mean | 43.78268 45.06829 44.79421 44.44601 45.5743 ------------------------------------------------------------ . tabstat read write math sci ss [aw=cprob2], stat(mean) stats | read write math sci ss ---------+-------------------------------------------------- mean | 58.73021 58.53821 57.78224 57.91736 57.48822 ------------------------------------------------------------

Example 2: A latent class analysis with 3 classes, and continuous indicators

Here is the input file

Data: File is I:mplushttps://stats.idre.ucla.edu/wp-content/uploads/2016/02/hsb6.dat ; Variable: Names are id gender race ses sch prog locus concept mot career read write math sci ss hilocus hiconcep himot hiread hiwrite himath hisci hiss academic; Usevariables are read write math sci ss ; classes = c(3); Analysis: Type=mixture; MODEL: %C#1% [read math sci ss write *30 ]; %C#2% [read math sci ss write *45]; %C#3% [read math sci ss write *60]; OUTPUT: TECH8; SAVEDATA: file is lca_ex2.txt ; save is cprob; format is free;

Here is the output

------------------------------------------------------------------------------ FINAL CLASS COUNTS AND PROPORTIONS OF TOTAL SAMPLE SIZE BASED ON ESTIMATED POSTERIOR PROBABILITIES Class 1 194.55375 0.32426 Class 2 252.39798 0.42066 Class 3 153.04826 0.25508

#1. One way to view the second column is the average probability of falling into class 1 and class 2. As a result column 1 is the average probability times 600 (see stata example below for comparison).

A second way to view the second column is by taking each persons probability of falling into a class, and summing them. If person #6 has a .8 estimated probability of being in class 1, and .2 of being in class 2, then that person contributes .8 to class 1 and .2 to class 2. This is why these are these are fractional (see stata example below for comparison).

A third way of viewing this is that there is an underlying continuum of the latent variable, and there is a threshold for being categorized as class 1 or class 2. If you are below the threshhold, you are class 1, above it and you are class 2. Below we see the threshold is

-0.173. Say that we then convert this threshold to a probability, exp( -0.173)/ ( 1 + exp( -0.173)) = .4568 (compare to above).

------------------------------------------------------------------------------ CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY CLASS MEMBERSHIP Class Counts and Proportions Class 1 197 0.32833 Class 2 249 0.41500 Class 3 154 0.25667

#2. This shows the count of people who fall into each class by taking their probability of membership in each class and assigning them to the class which they have the highest probability of falling into. Note the counts are exact whole numbers.

------------------------------------------------------------------------------ Average Class Probabilities by Class 1 2 3 Class 1 0.940 0.060 0.000 Class 2 0.038 0.912 0.050 Class 3 0.000 0.087 0.913

#3. This is related to the output in section #1, but takes the probabilities of class membership and averages them by class, see Stata portion below for more on this.

------------------------------------------------------------------------------ MODEL RESULTS Estimates S.E. Est./S.E. CLASS 1 Means READ 41.735 0.477 87.540 WRITE 42.703 0.962 44.390 MATH 43.178 0.516 83.648 SCI 42.160 0.663 63.625 SS 43.848 0.695 63.097 Variances READ 32.997 2.820 11.699 WRITE 42.369 3.775 11.223 MATH 34.562 2.422 14.269 SCI 38.395 2.714 14.146 SS 53.884 3.850 13.996 CLASS 2 Means READ 52.618 0.925 56.866 WRITE 54.507 0.727 74.938 MATH 52.008 0.835 62.319 SCI 53.172 0.835 63.680 SS 52.794 0.808 65.324 Variances READ 32.997 2.820 11.699 WRITE 42.369 3.775 11.223 MATH 34.562 2.422 14.269 SCI 38.395 2.714 14.146 SS 53.884 3.850 13.996 CLASS 3 Means READ 63.644 0.948 67.117 WRITE 61.193 0.453 135.170 MATH 62.610 0.865 72.404 SCI 61.648 0.667 92.451 SS 61.232 0.758 80.759 Variances READ 32.997 2.820 11.699 WRITE 42.369 3.775 11.223 MATH 34.562 2.422 14.269 SCI 38.395 2.714 14.146 SS 53.884 3.850 13.996

#4. This shows the average on the scores for the two classes. Class 1 is a low performing group, and class 2 is a medium performing group, and class 3 is a high performing group.

------------------------------------------------------------------------------ LATENT CLASS REGRESSION MODEL PART Means C#1 0.240 0.218 1.099 C#2 0.500 0.181 2.766

#5. This is the threshold for dividing the three classes. Note that this is now like a multinomial logistic regression, where the thresholds divide three multinomial categories, with class 3 being the reference category and C#1 is the threshold for being in class 1 as compared to class 3, and C#2 is the threshold for being in class 2 as compared to class 3. For the comparison group, class 3, the probability of being in that class is computed as below, letting "t1" be threshold 1 (.24) and "t2" be threshold 2 (.5).

P(class=3) = 1 / (1 + exp(t1) + exp(t2)) = 1 / (1 + exp(.24) + exp(.5)) = .25510397 .

For classes 1 and 2, the formula is a bit different since these are not the comparison class. For class 1, the formula is

P(class=1) = exp(t1) / (1 + exp(t1) + exp(t2)) = exp(.24) / (1 + exp(.24) + exp(.5)) = .32430071.

For class 2, the formula is

P(class=2) = exp(t2) / (1 + exp(t1) + exp(t2)) = exp(.5) / (1 + exp(.24) + exp(.5)) = .42059533.

------------------------------------------------------------------------------

We now read the saved data file into Stata for comparison to the Mplus output.

infile read write math sci ss cprob1 cprob2 cprob3 class using lca_ex2.txt

Below we show observations from the middle of this file. Note that

cprob1is the probability of being in class 1 andcprob2is the probability of being in class 2,cprob3is the probability of being in class 3, andclassis the class membership based on the class with the highest probability. Note that we don’t see any folks in class 3 here, but there are members of class 3.

. list in 200/210 +----------------------------------------------------------------------+ | read write math sci ss cprob1 cprob2 cprob3 class | |----------------------------------------------------------------------| 200. | 46.9 52.1 42.5 47.7 60.5 .133 .867 0 2 | 201. | 46.9 51.5 57 49.8 40.6 .062 .938 0 2 | 202. | 46.9 52.8 49.3 53.1 35.6 .228 .772 0 2 | 203. | 46.9 43.7 41.9 41.7 35.6 .998 .002 0 1 | 204. | 46.9 61.9 53 52.6 60.5 0 .996 .004 2 | |----------------------------------------------------------------------| 205. | 46.9 41.1 45.3 47.1 55.6 .812 .188 0 1 | 206. | 46.9 38.5 47.1 41.7 25.7 1 0 0 1 | 207. | 46.9 54.1 46.4 49.8 55.6 .039 .961 0 2 | 208. | 46.9 51.5 48.5 49.8 50.6 .1 .9 0 2 | 209. | 46.9 41.1 53.6 41.7 55.6 .709 .291 0 1 | |----------------------------------------------------------------------| 210. | 46.9 61.9 46.2 60.7 45.6 .001 .999 0 2 | +----------------------------------------------------------------------+

Note that if we tabulate

classwe see where the values from section #2 of the output came from.

. tab class class | Freq. Percent Cum. ------------+----------------------------------- 1 | 197 32.83 32.83 2 | 249 41.50 74.33 3 | 154 25.67 100.00 ------------+----------------------------------- Total | 600 100.00

Note that if we take the average of

cprob1, cprob2, andcprob3we can relate these values to column 2 of section #1 of the output.

. summ cprob1 cprob2 cprob3 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- cprob1 | 600 .3242633 .440132 0 1 cprob2 | 600 .4206317 .4326395 0 .999 cprob3 | 600 .2550817 .3989861 0 1

If we sum the probabilities, we can relate these to column 1 of section #1 of the output.

. tabstat cprob1 cprob2 cprob3, stat(sum) stats | cprob1 cprob2 cprob3 ---------+------------------------------ sum | 194.558 252.379 153.049 ----------------------------------------

If we average the probabilities by class, we can relate these values to section #3 of the output.

. tabstat cprob1 cprob2 cprob3, by(class) Summary statistics: mean by categories of: class class | cprob1 cprob2 cprob3 ---------+------------------------------ 1 | .9401117 .0598883 0 2 | .0375743 .9123735 .049996 3 | 0 .087013 .912987 ---------+------------------------------ Total | .3242633 .4206317 .2550817 ----------------------------------------

Say that we get the mean of the reading, writing, math, science and social science scores and weight them by the probability of being in class 1 and then again weighting by the probability of being in class 2, and likewise for class 3. Note the correspondence between these means and the means from section 4 of the output.

. tabstat read write math sci ss [aw=cprob1], stat(mean) stats | read write math sci ss ---------+-------------------------------------------------- mean | 41.73485 42.70297 43.17746 42.16013 43.84801 ------------------------------------------------------------ . tabstat read write math sci ss [aw=cprob2], stat(mean) stats | read write math sci ss ---------+-------------------------------------------------- mean | 52.61804 54.50678 52.00815 53.17197 52.79395 ------------------------------------------------------------ . tabstat read write math sci ss [aw=cprob3], stat(mean) stats | read write math sci ss ---------+-------------------------------------------------- mean | 63.64527 61.19303 62.61002 61.6482 61.2325 ------------------------------------------------------------