NOTE: Code for this page was tested in Stata 12.
Several researchers wish to conduct a longitudinal multilevel study that tests a control condition versus a treatment condition. In order to get funding for the study and to insure the quality of the research they will conduct a Monte Carlo power analysis based on values taken from a 20 subject pilot study.
The study has two level-2 predictors of the random intercept, a covariate, cv, and the dummy (indicator) variable for the treatment group, grp. At level-1 there is only one predictor, time, which takes on the values 0 through 3. Here is the equation for the level-1 model.
- Yij = β0j + β1j*time + εij
And here is the equation for the level-2 model.
- β0j = γ00 + γ01*cv + γ02*grp + μ0j
Next, we will look at some of the information from the pilot study beginning with the descriptive statistics for the covariate.
summarize cv if time==0 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- cv | 20 21.123 4.787203 15 28
Here is our random intercept model for the pilot data using xtmixed.
xtmixed dep cv i.grp time || id: Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -224.6518 Iteration 1: log restricted-likelihood = -224.6518 Computing standard errors: Mixed-effects REML regression Number of obs = 80 Group variable: id Number of groups = 20 Obs per group: min = 4 avg = 4.0 max = 4 Wald chi2(3) = 24.15 Log restricted-likelihood = -224.6518 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ dep | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cv | .4558327 .2300973 1.98 0.048 .0048503 .906815 1.grp | -4.187218 2.147263 -1.95 0.051 -8.395775 .0213394 time | -1.31745 .3202822 -4.11 0.000 -1.945192 -.6897085 _cons | 8.999306 5.08345 1.77 0.077 -.9640741 18.96269 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ id: Identity | sd(_cons) | 4.514276 .8731865 3.089884 6.595294 -----------------------------+------------------------------------------------ sd(Residual) | 3.202822 .2948436 2.674073 3.83612 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 40.39 Prob >= chibar2 = 0.0000
The coefficients for cv, 1.grp and time will be used for γ01, γ02 and β1j, respectively. From the random effects output, sd(_cons) and sd(Residual) will be used for μ0j and εij in our Monte Carlo simulation.
To understand the simulation process we will generate one random sample for 20 subjects manually before we write our simulation program. We start by generating our level-2 data. For testing purposes will will set the random seed to 98765.
set seed 987654 drop _all set obs 20 generate cv=rnormal(21.123,4.787203) /* from summarize output above */ generate grp=0 replace grp=1 in 1/10 /* create grp dummy with 10 in each group */ /* level-2 model */ generate beta0=8.999306+.4558327*cv-4.187218*grp+rnormal(0,4.514276) generate id=_n /* create an id variable */
Now that we have our level-2 data, we need to expand the data to have four observations for each subject and generate our level-1 model.
expand 4 /* expand the dataset */ sort id by id: generate time=_n-1 /* create time variable with values 0 thru 3 */ /* level-1 model */ generate y = beta0-1.31745*time+rnormal(0,3.202822)
We can test our simulated data using xtmixed.
xtmixed y cv i.grp time || id: Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -233.61504 Iteration 1: log restricted-likelihood = -233.61504 Computing standard errors: Mixed-effects REML regression Number of obs = 80 Group variable: id Number of groups = 20 Obs per group: min = 4 avg = 4.0 max = 4 Wald chi2(3) = 26.99 Log restricted-likelihood = -233.61504 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cv | .4492532 .2682906 1.67 0.094 -.0765866 .9750931 1.grp | -7.274475 2.801053 -2.60 0.009 -12.76444 -1.784512 time | -1.304538 .3487458 -3.74 0.000 -1.988067 -.6210089 _cons | 10.66226 6.500366 1.64 0.101 -2.078223 23.40274 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ id: Identity | sd(_cons) | 5.754637 1.078625 3.985404 8.309283 -----------------------------+------------------------------------------------ sd(Residual) | 3.487458 .3210465 2.91172 4.177038 ------------------------------------------------------------------------------ LR test vs. linear regression: chibar2(01) = 51.72 Prob >= chibar2 = 0.0000
The treatment effect in this simulation is much stronger than in the pilot study data but the other estimates are not too far off.
Now that we understand the basic process of generating the data, our next step is to write a simulation program which will be called from the simulate command. This will be an rclass program that will save to the p-values for grp from each of the simulated datasets. We suggest that you place the program in the ado/personal directory wherever it it is located on your computer system. Here is the mlsim.ado program.
program mlsim, rclass version 11 syntax [, obs(integer 20) ] drop _all set obs `obs' tempvar cv grp beta0 id time y gen `cv'=rnormal(21.123,4.787203) gen `grp'=0 local obs2=`obs'/2 replace `grp'=1 in 1/`obs2' /* level-2 model */ generate `beta0'=8.999306+.4558327*`cv'-4.187218*`grp'+rnormal(0,4.514276) generate `id'=_n expand 4 sort `id' by `id': gen `time'=_n-1 /* level-1 model */ generate `y' = `beta0'-1.31745*`time'+rnormal(0,3.202822) xtmixed `y' `cv' i.`grp' `time' || `id': test 1.`grp' return scalar pv = r(p) end
Now that we have the program we can use it to generate hundreds or thousands of samples from a population similar to the one that the pilot data came from. For each sample we will run xtmixed and obtain a p-value for the test of grp. When the simulate command is done the data in Stata’s memory will be a distribution of p-values. Let’s begin by running a small example with 20 observations, basically the same as our pilot study. We will just use 100 replications.
set seed 7654321 simulate pv=r(pv), reps(100): mlsim, obs(20) command: mlsim, obs(20) pv: r(pv) Simulations (100) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100
count if pv<=.05 50 display "Monte Carlo simulated power = " r(N)/100 Monte Carlo simulated power = .5
Things seem to be working okay and our estimated power of 0.5 seems quite reasonable. Let’s try it for real with 50 subjects (25 in each treatment group) and 1000 replications. You may need to be patient while this runs.
set seed 87654321 simulate pv=r(pv), reps(1000): mlsim, obs(50) command: mlsim, obs(50) pv: r(pv) Simulations (1000) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 (output omitted) .................................................. 950 .................................................. 1000 count if pv<=.05 874 display "Monte Carlo simulated power = " r(N)/1000 Monte Carlo simulated power = .874 count if pv<=.01 697 display "Monte Carlo simulated power = " r(N)/1000 Monte Carlo simulated power = .697
So, with 25 subjects in each group the estimated power is 0.874 when alpha equals 0.05, which should meet the requirements of the funding agency. If you are curious about the estimated power at an alpha level of 0.01, it is about 0.697.
We have shown on this page the basic steps in generating random samples for a multilevel model.