Here is a traditional regression model with an interaction:

regress y x1 x2 x1#x2

We see two main effects (**x1** & **x2**) in addition to the interaction term (**x1#x2**).
Is it “legal” to omit one or both main effects?
Is it really necessary to include both main effects when the interaction is present?

The simple answer is no, you don’t always need main effects when there is an interaction. However, the interaction term will not have the same meaning as it would if both main effects were included in the model.

We will explore regression models that include an interaction term but only one of two main effect terms using the **hsbanova**
dataset.

use http://www.ats.ucla.edu/stat/data/hsbanova, clear

**Case 1: Categorical by categorical interaction**

We will begin by looking at a model with two categorical main effects and an interaction. We will refer to this model as the “full” model.

regress write i.female##i.grpSource | SS df MS Number of obs = 200 -------------+------------------------------ F( 7, 192) = 11.05 Model | 5135.17494 7 733.59642 Prob > F = 0.0000 Residual | 12743.7001 192 66.3734378 R-squared = 0.2872 -------------+------------------------------ Adj R-squared = 0.2612 Total | 17878.875 199 89.843593 Root MSE = 8.147 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.female | 9.136876 2.311726 3.95 0.000 4.577236 13.69652 | grp | 2 | 7.31677 2.458951 2.98 0.003 2.466743 12.1668 3 | 10.10248 2.292658 4.41 0.000 5.580454 14.62452 4 | 16.75286 2.525696 6.63 0.000 11.77119 21.73453 | female#grp | 1 2 | -5.029733 3.357123 -1.50 0.136 -11.65131 1.591845 1 3 | -3.721697 3.128694 -1.19 0.236 -9.892723 2.449328 1 4 | -9.831208 3.374943 -2.91 0.004 -16.48793 -3.174482 | _cons | 41.82609 1.698765 24.62 0.000 38.47545 45.17672 ------------------------------------------------------------------------------

This model has an overall F of 11.05 with 7 & 193 degrees of freedom and has an R^{2} of .2827.

## Example 1.1

Now, let’s run the model but leave **female** out of the **regress** command.

regress write i.grp i.female#i.grpSource | SS df MS Number of obs = 200 -------------+------------------------------ F( 7, 192) = 11.05 Model | 5135.17494 7 733.59642 Prob > F = 0.0000 Residual | 12743.7001 192 66.3734378 R-squared = 0.2872 -------------+------------------------------ Adj R-squared = 0.2612 Total | 17878.875 199 89.843593 Root MSE = 8.147 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- grp | 2 | 7.31677 2.458951 2.98 0.003 2.466743 12.1668 3 | 10.10248 2.292658 4.41 0.000 5.580454 14.62452 4 | 16.75286 2.525696 6.63 0.000 11.77119 21.73453 | female#grp | 1 1 | 9.136876 2.311726 3.95 0.000 4.577236 13.69652 1 2 | 4.107143 2.434379 1.69 0.093 -.6944172 8.908703 1 3 | 5.415179 2.108234 2.57 0.011 1.256906 9.573452 1 4 | -.694332 2.458895 -0.28 0.778 -5.544247 4.155583 | _cons | 41.82609 1.698765 24.62 0.000 38.47545 45.17672 ------------------------------------------------------------------------------

This model has the same overall F, degrees of freedom and R^{2} as our “full” model. So, in fact, this
is just a reparameterization of the “full” model. It contains all of the information
from our first model but it is organized differently. This shows that Stata is smart about the missing main-effect and generated
an “interaction” term with four degrees of freedom instead of three. Thus keeping the overall
model degrees of freedom at seven.

In this case, the coefficients for the “interaction” are actually simple effects. For
example, the first “interaction” coefficient is the simple effect of **female** at
**grp** equal to one. It shows that there is a significant male/female difference for **grp** 1.

We could get the same four simple effects tests from the “full” regression model using the following Stata 12 code.

regress write i.grp i.female#i.grp contrast female@grp

## Example 1.2

What if we ran the regression including just the main effect for **female**?

regress write i.female i.female#i.grpSource | SS df MS Number of obs = 200 -------------+------------------------------ F( 7, 192) = 11.05 Model | 5135.17494 7 733.59642 Prob > F = 0.0000 Residual | 12743.7001 192 66.3734378 R-squared = 0.2872 -------------+------------------------------ Adj R-squared = 0.2612 Total | 17878.875 199 89.843593 Root MSE = 8.147 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- 1.female | 9.136876 2.311726 3.95 0.000 4.577236 13.69652 | female#grp | 0 2 | 7.31677 2.458951 2.98 0.003 2.466743 12.1668 0 3 | 10.10248 2.292658 4.41 0.000 5.580454 14.62452 0 4 | 16.75286 2.525696 6.63 0.000 11.77119 21.73453 1 2 | 2.287037 2.285571 1.00 0.318 -2.221015 6.79509 1 3 | 6.380787 2.128954 3.00 0.003 2.181646 10.57993 1 4 | 6.921652 2.238549 3.09 0.002 2.506347 11.33696 | _cons | 41.82609 1.698765 24.62 0.000 38.47545 45.17672 ------------------------------------------------------------------------------

Again, this model has the same overall F, degrees of freedom and R^{2} as before. So, it is
a different reparameterization of our “full” model. This time the “interaction” coefficients
are simple contrasts. To get the three degree of freedom
simple effects we need to run the following **test** commands.

test 0.female#2.grp 0.female#3.grp 0.female#4.grp( 1) 0b.female#2.grp = 0 ( 2) 0b.female#3.grp = 0 ( 3) 0b.female#4.grp = 0 F( 3, 192) = 15.33 Prob > F = 0.0000test 1.female#2.grp 1.female#3.grp 1.female#4.grp( 1) 1.female#2.grp = 0 ( 2) 1.female#3.grp = 0 ( 3) 1.female#4.grp = 0 F( 3, 192) = 4.55 Prob > F = 0.0042

You can obtain the same simple effects from the “full” model with this Stata 12 code.

regress write i.grp i.female#i.grp contrast grp@female

## Example 1.3

Let’s push things one step further and remove all of the main effects from our model, leaving only the interaction term.

regress write i.female#i.grpSource | SS df MS Number of obs = 200 -------------+------------------------------ F( 7, 192) = 11.05 Model | 5135.17494 7 733.59642 Prob > F = 0.0000 Residual | 12743.7001 192 66.3734378 R-squared = 0.2872 -------------+------------------------------ Adj R-squared = 0.2612 Total | 17878.875 199 89.843593 Root MSE = 8.147 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female#grp | 0 2 | 7.31677 2.458951 2.98 0.003 2.466743 12.1668 0 3 | 10.10248 2.292658 4.41 0.000 5.580454 14.62452 0 4 | 16.75286 2.525696 6.63 0.000 11.77119 21.73453 1 1 | 9.136876 2.311726 3.95 0.000 4.577236 13.69652 1 2 | 11.42391 2.377259 4.81 0.000 6.735015 16.11281 1 3 | 15.51766 2.227099 6.97 0.000 11.12494 19.91039 1 4 | 16.05853 2.332086 6.89 0.000 11.45873 20.65833 | _cons | 41.82609 1.698765 24.62 0.000 38.47545 45.17672 ------------------------------------------------------------------------------

Again, the overall F, degrees of freedom and R^{2} are the same as our “full” model.
This model is a variation of a cell means model in which the intercept (41.82609) is the
mean for the cell **female** = 0 and **grp** = 1. The “interaction” coefficients
give the difference between each of the cell means and the mean for cell(0,1).

We can get a clearer picture of the cell means model by rerunning the analysis with the
**noconstant** option and using **ibn** factor variable notation to

suppress a reference group.

regress write ibn.female#ibn.grp, noconsSource | SS df MS Number of obs = 200 -------------+------------------------------ F( 8, 192) = 1058.74 Model | 562175.3 8 70271.9125 Prob > F = 0.0000 Residual | 12743.7001 192 66.3734378 R-squared = 0.9778 -------------+------------------------------ Adj R-squared = 0.9769 Total | 574919 200 2874.595 Root MSE = 8.147 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- female#grp | 0 1 | 41.82609 1.698765 24.62 0.000 38.47545 45.17672 0 2 | 49.14286 1.777819 27.64 0.000 45.63629 52.64942 0 3 | 51.92857 1.539636 33.73 0.000 48.8918 54.96534 0 4 | 58.57895 1.869048 31.34 0.000 54.89244 62.26545 1 1 | 50.96296 1.567889 32.50 0.000 47.87046 54.05546 1 2 | 53.25 1.662997 32.02 0.000 49.96991 56.53009 1 3 | 57.34375 1.440198 39.82 0.000 54.50311 60.18439 1 4 | 57.88462 1.597756 36.23 0.000 54.73321 61.03602 ------------------------------------------------------------------------------

This model has eight and 192 degrees of freedom. The overall F and R^{2}
are very different from the previous model although you will note that the sums of squares
residual are the same in both models. This time each of the coefficients
are the individual cell means. Even though the model seems very different we
can replicate the coefficients from the previous model using **lincom**.

For example, the first coefficient in the previous model is 7.31677 (2.458951) with t = 2.98, i.e.,
the difference in cell means between cell(0,2) and cell(0,1).
Here is the **lincom** code to obtain that value.

lincom 0.female#2.grp - 0.female#1.grp( 1) - 0bn.female#1bn.grp + 0bn.female#2.grp = 0 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | 7.31677 2.458951 2.98 0.003 2.466743 12.1668 ------------------------------------------------------------------------------

## Case 2: Categorical by continuous interaction

Consider the following model with a categorical and a continuous predictor.

regress write i.grp##c.socstSource | SS df MS Number of obs = 200 -------------+------------------------------ F( 7, 192) = 19.01 Model | 7319.63342 7 1045.66192 Prob > F = 0.0000 Residual | 10559.2416 192 54.9960499 R-squared = 0.4094 -------------+------------------------------ Adj R-squared = 0.3879 Total | 17878.875 199 89.843593 Root MSE = 7.4159 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- grp | 2 | -9.264093 7.699529 -1.20 0.230 -24.45062 5.92243 3 | 8.384216 7.052153 1.19 0.236 -5.525425 22.29386 4 | 5.122424 10.11178 0.51 0.613 -14.82202 25.06687 | socst | .4307724 .0994109 4.33 0.000 .2346948 .6268501 | grp#c.socst | 2 | .2259628 .1559057 1.45 0.149 -.0815451 .5334706 3 | -.0850639 .1377873 -0.62 0.538 -.3568351 .1867073 4 | .0064412 .1817305 0.04 0.972 -.3520035 .3648858 | _cons | 27.36662 4.596719 5.95 0.000 18.30007 36.43318 ------------------------------------------------------------------------------

This time the overall F is 19.01 with 7 & 192 degrees f freedom and an R^{2} of .4094.

## Example 2.1

Next, we will rerun the model without **socst** in the **regress** command.

regress write i.grp i.grp#c.socstSource | SS df MS Number of obs = 200 -------------+------------------------------ F( 7, 192) = 19.01 Model | 7319.63342 7 1045.66192 Prob > F = 0.0000 Residual | 10559.2416 192 54.9960499 R-squared = 0.4094 -------------+------------------------------ Adj R-squared = 0.3879 Total | 17878.875 199 89.843593 Root MSE = 7.4159 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- grp | 2 | -9.264093 7.699529 -1.20 0.230 -24.45062 5.92243 3 | 8.384216 7.052153 1.19 0.236 -5.525425 22.29386 4 | 5.122424 10.11178 0.51 0.613 -14.82202 25.06687 | grp#c.socst | 1 | .4307724 .0994109 4.33 0.000 .2346948 .6268501 2 | .6567352 .1201002 5.47 0.000 .41985 .8936204 3 | .3457085 .0954087 3.62 0.000 .1575248 .5338923 4 | .4372136 .1521297 2.87 0.005 .1371535 .7372738 | _cons | 27.36662 4.596719 5.95 0.000 18.30007 36.43318 ------------------------------------------------------------------------------

Once again, the overall F, degrees of freedom and R^{2} are the same as our
“full” model. So, once again, this is just a reparameterization of the “full” model.

In this model, the “interaction” coefficients represent the simple slopes of **write** on
**socst** for each of the four levels of **grp**.

You can obtain the same results with these Stata commands.

regress write i.grp##c.socst margins grp, dydx(socst)

So far, each time we have dropped a term out of the regression command the model has remained the
same. Sure, the coefficients are different but the overall F, degrees of freedom and R^{2}
have remained the same. If we drop the categorical variable (**grp**) from our model we will lose
three degrees of freedom and the overall F and R^{2} will change. Let’s see What
happens.

**Example 2.2**

regress write c.socst i.grp#c.socstSource | SS df MS Number of obs = 200 -------------+------------------------------ F( 4, 195) = 31.73 Model | 7048.83282 4 1762.2082 Prob > F = 0.0000 Residual | 10830.0422 195 55.5386779 R-squared = 0.3943 -------------+------------------------------ Adj R-squared = 0.3818 Total | 17878.875 199 89.843593 Root MSE = 7.4524 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- socst | .4110007 .0650009 6.32 0.000 .2828056 .5391958 | grp#c.socst | 2 | .0505514 .0318924 1.59 0.115 -.0123469 .1134497 3 | .065381 .0302781 2.16 0.032 .0056664 .1250956 4 | .0963406 .0320259 3.01 0.003 .0331789 .1595023 | _cons | 28.30563 2.891027 9.79 0.000 22.60393 34.00733 ------------------------------------------------------------------------------

This time things are very different. The overall F, degrees of freedom and R^{2}
differ from the “full” model. This model is not a simple reparameterization of of the original
model. The coefficients in this model do not have a simple interpretation. This model may, in fact,
be misspecified.

So here’s what’s going on in this model. There is just one intercept for the regression lines in each of
the four levels of **grp**. That intercept equals 28.30563. The coefficients for the “interaction”
are the differences in slopes between each **grp** versus **grp1**. We can show this using the
**margins** command. We will begin by computing the intercepts for each **grp**.

margins, at(grp=(1 2 3 4) socst=0) noatlegendAdjusted predictions Number of obs = 200 Model VCE : OLS Expression : Linear prediction, predict() ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at | 1 | 28.30563 2.891027 9.79 0.000 22.63932 33.97194 2 | 28.30563 2.891027 9.79 0.000 22.63932 33.97194 3 | 28.30563 2.891027 9.79 0.000 22.63932 33.97194 4 | 28.30563 2.891027 9.79 0.000 22.63932 33.97194 ------------------------------------------------------------------------------

Next, we will compute the slopes. We will include the **post** option so that we can compute
the differences in slopes using the **lincom** command.

margins, dydx(socst) at(grp=(1 2 3 4)) noatlegend postAverage marginal effects Number of obs = 200 Model VCE : OLS Expression : Linear prediction, predict() dy/dx w.r.t. : socst ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- socst | _at | 1 | .4110007 .0650009 6.32 0.000 .2836012 .5384002 2 | .4615521 .0593735 7.77 0.000 .3451821 .577922 3 | .4763817 .0535655 8.89 0.000 .3713952 .5813681 4 | .5073413 .0519691 9.76 0.000 .4054838 .6091988 ------------------------------------------------------------------------------/* slope 1 vs slope 2 */ lincom 2._at-1._at( 1) - [socst]1bn._at + [socst]2._at = 0 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | .0505514 .0318924 1.59 0.113 -.0119566 .1130594 ------------------------------------------------------------------------------/* slope 1 vs slope 3 */ lincom 3._at-1._at( 1) - [socst]1bn._at + [socst]3._at = 0 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | .065381 .0302781 2.16 0.031 .006037 .124725 ------------------------------------------------------------------------------/* slope 1 cs slope 4 */ lincom 4._at-1._at( 1) - [socst]1bn._at + [socst]4._at = 0 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- (1) | .0963406 .0320259 3.01 0.003 .0335709 .1591103 ------------------------------------------------------------------------------

The values computed by the **lincom** commands have the same values as the “interaction”
coefficients in the regression model we ran.

A plot of the model looks like this.

You will need to decide from looking at the plot whether this is truly the type of model you are interested in. If the above model is very different from what you expected then you may have run a mispecified model.

## Example 2.3

This time we will run an “interaction” only model.

regress write i.grp#c.socstSource | SS df MS Number of obs = 200 -------------+------------------------------ F( 4, 195) = 31.73 Model | 7048.83282 4 1762.2082 Prob > F = 0.0000 Residual | 10830.0422 195 55.5386779 R-squared = 0.3943 -------------+------------------------------ Adj R-squared = 0.3818 Total | 17878.875 199 89.843593 Root MSE = 7.4524 ------------------------------------------------------------------------------ write | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- grp#c.socst | 1 | .4110007 .0650009 6.32 0.000 .2828056 .5391958 2 | .4615521 .0593735 7.77 0.000 .3444554 .5786487 3 | .4763817 .0535655 8.89 0.000 .3707396 .5820238 4 | .5073413 .0519691 9.76 0.000 .4048477 .6098349 | _cons | 28.30563 2.891027 9.79 0.000 22.60393 34.00733 ------------------------------------------------------------------------------

This example has exactly the same fit (overall F, degrees of freedom and R^{2}) as the
previous example where we dropped the **grp** term. Instead of a three degree of
freedom “interaction” Stata give us a four degree of freedom term in which the coefficient
are the slopes within each cell.

## Case 3: Continuous by continuous interaction

Let’s look at a “full” model using **math** and **socst** as predictors of **read**.

regress read c.math##c.socstSource | SS df MS Number of obs = 200 -------------+------------------------------ F( 3, 196) = 78.61 Model | 11424.7622 3 3808.25406 Prob > F = 0.0000 Residual | 9494.65783 196 48.4421318 R-squared = 0.5461 -------------+------------------------------ Adj R-squared = 0.5392 Total | 20919.42 199 105.122714 Root MSE = 6.96 -------------------------------------------------------------------------------- read | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------------+---------------------------------------------------------------- math | -.1105123 .2916338 -0.38 0.705 -.6856552 .4646307 socst | -.2200442 .2717539 -0.81 0.419 -.7559812 .3158928 | c.math#c.socst | .0112807 .0052294 2.16 0.032 .0009677 .0215938 | _cons | 37.84271 14.54521 2.60 0.010 9.157506 66.52792 --------------------------------------------------------------------------------estimates store m1

The overall F is 78.61 with 3 and 196 degrees of freedom for the model and an
R^{2} of .5461. The intercept
is 37.84271 when both **math** and **socst** equal zero. For each unit change in **socst**
the slope of **read** on **math** increases by .0112807. Here is what the graph of this
model looks when plotted over the range of 0 to 70 for both variables.

One way to think about this model is that there is a regression line for each value of **socst**. and
these regression lines differ in both intercepts and slopes although they all intersect when
**math** equals 19.51.

## Example 3.1

Next, we will rerun the regression leaving the main effect for **socst** out of the model.

regress read c.math c.math#c.socstSource | SS df MS Number of obs = 200 -------------+------------------------------ F( 2, 197) = 117.80 Model | 11393.0014 2 5696.50068 Prob > F = 0.0000 Residual | 9526.41864 197 48.357455 R-squared = 0.5446 -------------+------------------------------ Adj R-squared = 0.5400 Total | 20919.42 199 105.122714 Root MSE = 6.954 -------------------------------------------------------------------------------- read | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------------+---------------------------------------------------------------- math | .1097745 .1049659 1.05 0.297 -.0972266 .3167757 | c.math#c.socst | .0071334 .0010534 6.77 0.000 .0050559 .0092108 | _cons | 26.3823 3.349592 7.88 0.000 19.77664 32.98796 --------------------------------------------------------------------------------

Now the overall F is 117.80 with 2 and 197 degrees of freedom for the model and an
R^{2} of .5446. Let’s jump straight to the graph of this model.

Again, we have a model with different slopes for different values of **socst**. However,
this time each regression line has the same intercept, 26.3823. The researcher needs to
decide whether this model makes theoretical sense. If the researcher concludes that the
model does make theoretical sense then it is possible to test whether the data can support
the model with a common intercept. Basically, we will test to see if the model without
**socst** fits significantly worse than the “full” model. We will do this using the **lrtest** command.

lrtest m1 .Likelihood-ratio test LR chi2(1) = 0.67 (Assumption: . nested in m1) Prob > chi2 = 0.4138

This test is equivalent to testing the coefficient for **socst** in the “full” model.

estimates restore m1 test socst( 1) socst = 0 F( 1, 196) = 0.66 Prob > F = 0.4191

The tests above support the hypothesis that the model without **socst** does not fit
the data significantly worse than the “full” model.

If instead of dropping **socst** we had dropped **math** the graph of the model
would have looked very similar. The degrees of freedom would be be the same and the overall
F and R^{2} would have been close. Both the intercept and “interaction” coefficient
are also different, but not in any noticeable way. The same thing happens when we drop
both **math** and **socst**. The graph is similar and there are small differences
in the overall F and R^{2}. The model with only the “interaction” term has
1 and 198 degrees of freedom.

The most likely reason that these three model appear so similar is that when the “interaction”
is in the model neither predictor is significant.
Further both **math** and **socst** are scaled similarly with nearly equal means
and standard deviations.

## Concluding remarks

When you drop one or both predictors from a model with an interaction term, one of two things can happen. 1) The model remains the same but the coefficient are reparameterizations of the original estimates. This situation occurs with categorical variables because Stata adds additional degrees of freedom to the “interaction” term so that overall the degrees of freedom and fit of the model do not change. Or, 2) The model changes, such that, it is no longer the same model at all. This occurs with continuous predictors and results in a decrease in the model degrees of freedom as well as a substantial change in the meaning of the coefficients.