One of the assumptions of Analysis of Variance (ANOVA) is that the variances of the dependent variable is the same across the groups being studied. When this assumption is violated, the results of the analysis may not be trustworthy, namely that the reported p-value from the significance test may be too liberal (yielding a higher than expected type I error rate) or too conservative (yielding a lower than expected type I error rate). We will look at three Stata programs that you can download for analyzing data where you suspect you might have violated the homogeneity of variance assumption. These Stata programs can help you assess whether a standard ANOVA will be too liberal or too conservative given your data, and show you how you can perform alternative analyses that are more robust to violations of the homogeneity of variance assumption.

First, let’s consider the program **simanova**.
You can download **simanova **by typing **search simanova **(see
How can I use the search command to search for programs and get additional
help? for more information about using **search**).

As shown below, we can supply information about a hypothetical study that has 3 groups with sample sizes of 10 in each group and standard deviations of 1 in each group. Since we have not mentioned the means for the groups, the means for the groups are assumed to be equal. Given these conditions, which are consistent with the assumptions of ANOVA, we simulate 5000 analyses and report the number of significant results for a nominal p-value of 0.05. As we would expect, the proportion of results that were significant at 0.05 was 0.0488, quite similar to 0.05 and the confidence interval contains 0.05.

simanova , groups(3) n(10 10 10) s(1 1 1) nomp(0.05) reps(5000)Information about Sample Sizes and Standard Deviations ------------------------------------------------------ N1 = 10 and S1 = 1 N2 = 10 and S2 = 1 N3 = 10 and S3 = 1 5000 simulated ANOVA F tests -------------------------------- Nominal Simulated Simulated P value P Value P Value [95% Conf. Interval] ----------------------------------------- 0.0500 0.0488 0.0430 - 0.0551

Let’s now make the standard deviations unequal, making the standard deviation for group 3 to be 3. Even though this technically violates the homogeneity of variance assumption, it has been found that ANOVA is fairly robust to this violation when the sample sizes are equal. Indeed, as we see below, the actual proportion significant (0.0714) somewhat exceeds the proportion we would expect (0.05), making the ordinary F test under these conditions somewhat too liberal.

simanova , groups(3) n(10 10 10) s(1 1 3) nomp(0.05) reps(5000)Information about Sample Sizes and Standard Deviations ------------------------------------------------------ N1 = 10 and S1 = 1 N2 = 10 and S2 = 1 N3 = 10 and S3 = 3 5000 simulated ANOVA F tests -------------------------------- Nominal Simulated Simulated P value P Value P Value [95% Conf. Interval] ----------------------------------------- 0.0500 0.0714 0.0644 - 0.0789

If we also make the sample sizes unequal, then ANOVA is known to show type I error rates than can be quite different from expectations. As shown below, we now make the sample size for group three equal to 40, while leaving the sample size as 10 for groups 1 and 2. As we see below, the observed number of significant results at 0.05 is much below what we expect (at 0.0018). In this case, when the groups with the higher standard deviations also have the higher sample sizes, the ANOVA test becomes too conservative. When you think a result is significant at 0.05, it really is significant at around 0.0018.

simanova , groups(3) n(10 10 40) s(1 1 3) nomp(0.05) reps(5000)Information about Sample Sizes and Standard Deviations ------------------------------------------------------ N1 = 10 and S1 = 1 N2 = 10 and S2 = 1 N3 = 40 and S3 = 3 5000 simulated ANOVA F tests -------------------------------- Nominal Simulated Simulated P value P Value P Value [95% Conf. Interval] ----------------------------------------- 0.0500 0.0018 0.0008 - 0.0034

Let’s reverse the pattern shown above, and make groups 1 and 2 have sample sizes of 40, and group 1 have a sample size of 10. In this case, the groups with the higher sample size have the lower standard deviation. As you might have expected, the results below show that the ANOVA test is too liberal under these conditions. When you believed the probability of a type I error was 0.05, it was actually around 27%.

simanova , groups(3) n(40 40 10) s(1 1 3) nomp(0.05) reps(5000)Information about Sample Sizes and Standard Deviations ------------------------------------------------------ N1 = 40 and S1 = 1 N2 = 40 and S2 = 1 N3 = 10 and S3 = 3 5000 simulated ANOVA F tests -------------------------------- Nominal Simulated Simulated P value P Value P Value [95% Conf. Interval] ----------------------------------------- 0.0500 0.2778 0.2654 - 0.2904

So far, we have illustrated how you can use**
simanova** to assess actual type I error rates given an arbitrary set of sample
sizes and standard deviations. Let’s look at an example of how we can use
**simanova**
in analyzing our data. Consider the fictitious data file below called **homvar**
with a dependent variable called **dv** and an independent variable called
**group**.
We can use this data file and perform a standard ANOVA on it. Based just on the
results below, we would conclude that there is a relationship between **group**
and the score on **dv**.

use simstb, clear anova dv groupNumber of obs = 100 R-squared = 0.0636 Root MSE = 2.59622 Adj R-squared = 0.0443 Source | Partial SS df MS F Prob > F -----------+---------------------------------------------------- Model | 44.4095133 2 22.2047567 3.29 0.0413 | group | 44.4095133 2 22.2047567 3.29 0.0413 | Residual | 653.81335 97 6.74034381 -----------+---------------------------------------------------- Total | 698.222863 99 7.05275619

However, do these data meet the assumptions of ANOVA? As you see below, the sample sizes are unequal and the groups with the smaller sample sizes have the larger standard deviations.

tabulate group, sum(dv)| Summary of dv group | Mean Std. Dev. Freq. ------------+------------------------------------ 1 | 2.1116685 6.3250411 10 2 | -.22530662 2.8095381 30 3 | -.02013056 1.0483759 60 ------------+------------------------------------ Total | .13149653 2.6557026 100

We can use **simanova** to perform
simulations given this pattern of sample sizes and standard deviations, assuming the means
are equal, and assess the type I error rate that would be expected given this pattern of
data. By supplying the name of the dependent variable (**dv**) followed
by the independent variable (**group**), **simanova** computes the sample sizes
and standard deviations for the groups, reports them back to you, and shows the results of
performing an ordinary ANOVA and then shows the results of 5000 simulated F tests (where
there were no differences in the means of the groups). As shown below, the
ANOVA reported a p value of 0.0413, as compared to the simulated p value of 0.3110.
These simulation results suggest that the actual p value for this test is really about
31%, not less than 5%.

simanova dv group, nomp(0.05) reps(5000)Information about Sample Sizes and Standard Deviations ------------------------------------------------------ N1 = 10 and S1 = 6.3250413 N2 = 30 and S2 = 2.8095381 N3 = 60 and S3 = 1.0483758 Results of Standard ANOVA ---------------------------------------------------------------------- Dependent Variable is dv and Independent Variable is group F( 2, 97.00) = 3.294, p= 0.0413 ---------------------------------------------------------------------- 5000 simulated ANOVA F tests -------------------------------- Nominal Simulated Simulated P value P Value P Value [95% Conf. Interval] ----------------------------------------- 0.0413 0.3110 0.2982 - 0.3240 0.0500 0.3320 0.3189 - 0.3452

Another way that we can handle this is to use a test
that is less sensitive to violations of homogeneity of variance. The **F***
test is a modification of the standard F test that is much less sensitive to violations of
the homogeneity of variance assumption. Let’s analyze this data using the
**fstar**
command. As you see, the results of **fstar** are much more in line
with the results that we found with the simulation.

You can download the **fstar
**command by
typing **search fstar **(see
How can I use the search command to search for programs and get additional
help? for more information about using **search**).

fstar dv group---------------------------------------------------------------------- Dependent Variable is dv and Independent Variable is group Fstar( 2, 12.14) = 1.058, p= 0.3771 ----------------------------------------------------------------------

The W test is another test that is more robust to
violations of homogeneity of variance than the traditional F test. Let’s use the
**wtest**
command to perform this test. While the results are not identical to the F* test,
they do agree in that both tests indicate that these results are far from being
significant. You can download the **wtest **command by
typing **search wtest **(see
How can I use the search command to search for programs and get additional
help? for more information about using **search**).

wtest dv group---------------------------------------------------------------------- Dependent Variable is dv and Independent Variable is group WStat( 2, 18.99) = 0.626, p= 0.5457 ----------------------------------------------------------------------

Let’s consider a couple of other alternatives, some
which have not been explored very much to our knowledge. First, what if we recast
this ANOVA into a regression with dummy variables and then use the **robust**
option to request robust standard errors. Below we see that the test of the two dummy
variables is quite comparable to the simulated results, the F* results and the W test
results. Had we omitted the **robust** option, we would have gotten the
same p value as the standard ANOVA, but it appears from this one example that the
**robust**
option may be offering some robustness with respect to the homogeneity of variance
assumption. One example is certainly not enough, so we will investigate this later in this
page.

xi: regress dv i.group, robusti.group _Igroup_1-3 (naturally coded; _Igroup_1 omitted) Regression with robust standard errors Number of obs = 100 F( 2, 97) = 0.69 Prob > F = 0.5030 R-squared = 0.0636 Root MSE = 2.5962 ------------------------------------------------------------------------------ | Robust dv | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Igroup_2 | -2.336975 1.99352 -1.17 0.244 -6.293561 1.619611 _Igroup_3 | -2.131799 1.931445 -1.10 0.272 -5.965183 1.701585 _cons | 2.111669 1.926632 1.10 0.276 -1.712162 5.935499 ------------------------------------------------------------------------------test _Igroup_2 _Igroup_3( 1) _Igroup_2 = 0.0 ( 2) _Igroup_3 = 0.0 F( 2, 97) = 0.69 Prob > F = 0.5030

If the use of the **robust** option is
useful, what about using the **rreg** command for robust regression?
Consider the example below. The **rreg** results are even more dramatically
off base than the standard ANOVA, yielding a p value of 0.0000 . Again, one example
is not enough to draw a conclusion, but this suggests to us that **rreg** may
not perform well when the homogeneity of variance assumption is violated.

xi: rreg dv i.groupi.group _Igroup_1-3 (naturally coded; _Igroup_1 omitted) <iterations omitted> Robust regression estimates Number of obs = 100 F( 2, 97) = 17.54 Prob > F = 0.0000 ------------------------------------------------------------------------------ dv | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Igroup_2 | -3.383854 .62812 -5.39 0.000 -4.630498 -2.137209 _Igroup_3 | -3.401373 .5875525 -5.79 0.000 -4.567502 -2.235244 _cons | 3.39778 .5439679 6.25 0.000 2.318155 4.477406 ------------------------------------------------------------------------------test _Igroup_2 _Igroup_3( 1) _Igroup_2 = 0.0 ( 2) _Igroup_3 = 0.0 F( 2, 97) = 17.54 Prob > F = 0.0000

So far we have seen how we can use **simanova**
for simulating type I error rates for a given condition, but since it uses return values
to return the resulting type I error rates, it can also be used for simulation studies to
examine the performance of these various tests under different conditions. Below, we
show a simple example where we vary the sample sizes for each of 3 groups between 10 and
40 and likewise vary the standard deviation from 1 to 3 for each group. We
cannot only use **simanova** for analysis, we can use it for simulation studies by varying Ns
and Ss.

postfile simrse n1 n2 n3 s1 s2 s3 fp rsep rrp wstatp fstarp using simrse , replace foreach n1 of numlist 10 40 { foreach n2 of numlist 10 40 { foreach n3 of numlist 10 40 { foreach s1 of numlist 1 3 { foreach s2 of numlist 1 3 { foreach s3 of numlist 1 3 { simanova , groups(3) n(`n1' `n2' `n3') s(`s1' `s2' `s3') fstar wtest rse rreg /* nomp(0.05) reps(5000) post simrse (`n1') (`n2') (`n3') (`s1') (`s2') (`s3') (`r(_fp)') (`r(rsep)') /* */ (`r(_rrp)') (`r(_wstatp)') (`r(_fstarp)') } } } } } } postclose simrse

We then use the file and create a variable with the standard
deviations and sample sizes for the groups, and then display the simulated p values in a
table using the **tabdisp** command. The first row is p values for the regular F test, then the W test, then the F* test, then regression with robust
standard errors, and finally robust regression.

use simrse2, clear gen str8 s = string(s1,"%02.0f") + "," + string(s2,"%02.0f") + "," + string(s3,"%02.0f") gen str8 n = string(n1,"%02.0f") + "," + string(n2,"%02.0f") + "," + string(n3,"%02.0f") tabdisp s n, cellvar(fp wstatp fstar rsep rrp)

While this is a very limited study, it does reveal some very interesting information.

- As we expect, the regular F tests (the first line) are most liberal when the large sample sizes are associated with the small standard deviations. Conversely, the regular F tests are most conservative when the large sample sizes are associated with the large standard deviations.
- The W and F* tests (lines 2 and 3) are both reasonably robust to the violations of homogeneity of variance studied here.
- The regression with robust standard errors (line 4) also was
reasonably robust, although it occasionally had type I error rates as high as 10%. This
suggests that when performing an ordinary regression that includes categorical variables
as dummy variables that exhibit heterogeneity of variance, the
**robust**option may be useful for increasing robustness with respect to this heterogeneity. Further study in this area would be needed before drawing firmer conclusions. - The robust regression (line 5) was not robust at all to the violations of homogeneity of variance from this study, and frequently performed much more poorly than the standard ANOVA. This suggests that robust regression may be a poor choice when categorical variables are used that show heterogeneity of variance. Further study in this area would be needed before drawing firmer conclusions.

------------------------------------------------------------------------------------------ | n s | 10,10,10 10,10,40 10,40,10 10,40,40 40,10,10 40,10,40 40,40,10 40,40,40 ----------+------------------------------------------------------------------------------- 01,01,01 | .0518 .048 .0542 .0476 .049 .051 .048 .052 | .0466 .0516 .0566 .047 .0522 .0508 .0508 .0546 | .0492 .0516 .0528 .0468 .0512 .0538 .0496 .052 | .0612 .0816 .0864 .0638 .083 .0692 .07 .0572 | .0532 .0514 .055 .046 .051 .0522 .047 .0526 | 01,01,03 | .0784 .0028 .214 .0254 .2098 .031 .2714 .071 | .047 .0486 .0548 .049 .0486 .0508 .0558 .0534 | .065 .0566 .0658 .061 .063 .0652 .072 .0684 | .0606 .0634 .0958 .063 .0802 .0668 .083 .0578 | .3098 .013 .5208 .174 .52 .1784 .5528 .3138 | 01,03,01 | .08 .1952 .0028 .03 .204 .2818 .0268 .0706 | .0548 .0496 .0508 .0528 .0546 .0518 .0476 .0488 | .0704 .0606 .0592 .0674 .0652 .0678 .0652 .0652 | .0708 .0822 .0672 .0726 .0882 .0806 .0654 .0524 | .336 .5062 .0128 .161 .5098 .551 .171 .317 | 01,03,03 | .0598 .0268 .0258 .0204 .2488 .0952 .0896 .0554 | .0492 .0514 .045 .0472 .0486 .0538 .0488 .0458 | .052 .0614 .0614 .0584 .044 .0552 .0506 .0538 | .0692 .0724 .0682 .0542 .0928 .0814 .0744 .0504 | .1762 .0412 .0376 .028 .6388 .279 .288 .1338 | 03,01,01 | .085 .2074 .2026 .2772 .0012 .0298 .0258 .071 | .052 .0542 .0506 .0476 .0484 .0526 .0474 .0556 | .0682 .07 .0626 .059 .0624 .0646 .0614 .0658 | .0712 .0862 .083 .0708 .0646 .0716 .0636 .0596 | .3268 .5032 .5062 .5558 .0086 .169 .1724 .311 | 03,01,03 | .0676 .0266 .254 .0922 .0286 .0202 .0944 .0574 | .056 .0462 .0546 .0546 .049 .047 .0502 .0504 | .061 .0598 .0498 .0542 .0594 .0576 .0508 .0556 | .0806 .0676 .1016 .0794 .0698 .0538 .075 .057 | .1742 .042 .6372 .283 .0462 .028 .2804 .1306 | 03,03,01 | .0628 .2534 .035 .0958 .0268 .0914 .0238 .0556 | .0526 .0532 .056 .048 .0474 .0484 .053 .0522 | .0566 .0486 .0682 .0538 .057 .0532 .0664 .0542 | .0702 .0974 .0792 .0752 .0722 .0734 .063 .0558 | .1724 .6292 .0466 .2786 .0424 .2796 .0294 .1312 | 03,03,03 | .0538 .0466 .0444 .0522 .0462 .0488 .0546 .0528 | .05 .0496 .0428 .0544 .0504 .049 .0564 .0508 | .0512 .0466 .0414 .0508 .0486 .0474 .054 .0526 | .0652 .0786 .0704 .0698 .0772 .0652 .0756 .0546 | .0606 .048 .0422 .0556 .0472 .05 .0526 .0514 ------------------------------------------------------------------------------------------

## References

- Browne, M. B. & Forsythe, A. B. (1974). The ANOVA and multiple
comparisons for data with heterogeneous variances,
*Biometrics, 719-724.* - Wilcox, R. (1987). New Designs in Analysis of Variance.
*Annual Review of Psychology***,**29-60**.** - Wilcox, R, Charlin, V, Thompson, K. (1986).
*Communications in Statistical Simulation and Computation*. 15(4) 933-943.