Monte Carlo Power Simulation of a Multilevel Model

NOTE: Code for this page was tested in Stata 12.

Several researchers wish to conduct a longitudinal multilevel study that tests a control condition versus a treatment condition. In order to get funding for the study and to insure the quality of the research they will conduct a Monte Carlo power analysis based on values taken from a 20 subject pilot study.

The study has two level-2 predictors of the random intercept, a covariate, cv, and the dummy (indicator) variable for the treatment group, grp. At level-1 there is only one predictor, time, which takes on the values 0 through 3. Here is the equation for the level-1 model.

Y_ij = β_0j + β_1j*time + ε_ij

And here is the equation for the level-2 model.

β_0j = γ₀₀ + γ₀₁*cv + γ₀₂*grp + μ_0j

Next, we will look at some of the information from the pilot study beginning with the descriptive statistics for the covariate.

summarize cv if time==0

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          cv |        20      21.123    4.787203         15         28

Here is our random intercept model for the pilot data using xtmixed.

xtmixed dep cv i.grp time || id:

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log restricted-likelihood =  -224.6518  
Iteration 1:   log restricted-likelihood =  -224.6518  

Computing standard errors:

Mixed-effects REML regression                   Number of obs      =        80
Group variable: id                              Number of groups   =        20

                                                Obs per group: min =         4
                                                               avg =       4.0
                                                               max =         4


                                                Wald chi2(3)       =     24.15
Log restricted-likelihood =  -224.6518          Prob > chi2        =    0.0000

------------------------------------------------------------------------------
         dep |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          cv |   .4558327   .2300973     1.98   0.048     .0048503     .906815
       1.grp |  -4.187218   2.147263    -1.95   0.051    -8.395775    .0213394
        time |   -1.31745   .3202822    -4.11   0.000    -1.945192   -.6897085
       _cons |   8.999306    5.08345     1.77   0.077    -.9640741    18.96269
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                   sd(_cons) |   4.514276   .8731865      3.089884    6.595294
-----------------------------+------------------------------------------------
                sd(Residual) |   3.202822   .2948436      2.674073     3.83612
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =    40.39 Prob >= chibar2 = 0.0000

The coefficients for cv, 1.grp and time will be used for γ₀₁, γ₀₂ and β_1j, respectively. From the random effects output, sd(_cons) and sd(Residual) will be used for μ_0j and ε_ij in our Monte Carlo simulation.

To understand the simulation process we will generate one random sample for 20 subjects manually before we write our simulation program. We start by generating our level-2 data. For testing purposes will will set the random seed to 98765.

set seed 987654
drop _all
set obs 20
generate cv=rnormal(21.123,4.787203) /* from summarize output above */
generate grp=0
replace grp=1 in 1/10  /* create grp dummy with 10 in each group */
/* level-2 model */
generate beta0=8.999306+.4558327*cv-4.187218*grp+rnormal(0,4.514276)
generate id=_n  /* create an id variable */

Now that we have our level-2 data, we need to expand the data to have four observations for each subject and generate our level-1 model.

expand 4   /* expand the dataset */
sort id
by id: generate time=_n-1  /* create time variable with values 0 thru 3 */
/* level-1 model */
generate y = beta0-1.31745*time+rnormal(0,3.202822)

We can test our simulated data using xtmixed.

xtmixed y cv i.grp time || id:

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log restricted-likelihood = -233.61504  
Iteration 1:   log restricted-likelihood = -233.61504  

Computing standard errors:

Mixed-effects REML regression                   Number of obs      =        80
Group variable: id                              Number of groups   =        20

                                                Obs per group: min =         4
                                                               avg =       4.0
                                                               max =         4


                                                Wald chi2(3)       =     26.99
Log restricted-likelihood = -233.61504          Prob > chi2        =    0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          cv |   .4492532   .2682906     1.67   0.094    -.0765866    .9750931
       1.grp |  -7.274475   2.801053    -2.60   0.009    -12.76444   -1.784512
        time |  -1.304538   .3487458    -3.74   0.000    -1.988067   -.6210089
       _cons |   10.66226   6.500366     1.64   0.101    -2.078223    23.40274
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                   sd(_cons) |   5.754637   1.078625      3.985404    8.309283
-----------------------------+------------------------------------------------
                sd(Residual) |   3.487458   .3210465       2.91172    4.177038
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =    51.72 Prob >= chibar2 = 0.0000

The treatment effect in this simulation is much stronger than in the pilot study data but the other estimates are not too far off.

Now that we understand the basic process of generating the data, our next step is to write a simulation program which will be called from the simulate command. This will be an rclass program that will save to the p-values for grp from each of the simulated datasets. We suggest that you place the program in the ado/personal directory wherever it it is located on your computer system. Here is the mlsim.ado program.

program mlsim, rclass
  version 11
  syntax [, obs(integer 20) ]
  drop _all
  set obs `obs'
  tempvar cv grp beta0 id time y
  gen `cv'=rnormal(21.123,4.787203)
  gen `grp'=0
  local obs2=`obs'/2
  replace `grp'=1 in 1/`obs2'
  /* level-2 model */
  generate `beta0'=8.999306+.4558327*`cv'-4.187218*`grp'+rnormal(0,4.514276)
  generate `id'=_n
  expand 4
  sort `id'
  by `id': gen `time'=_n-1
  /* level-1 model */
  generate `y' = `beta0'-1.31745*`time'+rnormal(0,3.202822)
  xtmixed `y' `cv' i.`grp' `time' || `id':
  test 1.`grp'
  return scalar pv = r(p)
end

Now that we have the program we can use it to generate hundreds or thousands of samples from a population similar to the one that the pilot data came from. For each sample we will run xtmixed and obtain a p-value for the test of grp. When the simulate command is done the data in Stata’s memory will be a distribution of p-values. Let’s begin by running a small example with 20 observations, basically the same as our pilot study. We will just use 100 replications.

set seed 7654321

simulate pv=r(pv), reps(100): mlsim, obs(20)

      command:  mlsim, obs(20)
           pv:  r(pv)

Simulations (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..................................................    50
..................................................   100

count if pv<=.05

   50

display "Monte Carlo simulated power = " r(N)/100

Monte Carlo simulated power = .5

Things seem to be working okay and our estimated power of 0.5 seems quite reasonable. Let’s try it for real with 50 subjects (25 in each treatment group) and 1000 replications. You may need to be patient while this runs.

set seed 87654321

simulate pv=r(pv), reps(1000): mlsim, obs(50)

      command:  mlsim, obs(50)
           pv:  r(pv)

Simulations (1000)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..................................................    50
..................................................   100
(output omitted)
..................................................   950
..................................................  1000

count if pv<=.05

  874

display "Monte Carlo simulated power = " r(N)/1000

Monte Carlo simulated power = .874

count if pv<=.01

  697

display "Monte Carlo simulated power = " r(N)/1000

Monte Carlo simulated power = .697

So, with 25 subjects in each group the estimated power is 0.874 when alpha equals 0.05, which should meet the requirements of the funding agency. If you are curious about the estimated power at an alpha level of 0.01, it is about 0.697.

We have shown on this page the basic steps in generating random samples for a multilevel model.