How can there be an intercept in the fixed-effects model estimated by xtreg, fe?

This page was adapted from a FAQ at the Stata Corp. FAQ page. We thank Stata for their permission to adapt and distribute this page via our web site.

The results that xtreg, fe reports have simply been reformulated so that the reported intercept is the average value of the fixed effects.

Intuition

One way of writing the fixed-effects model is

            y   = a + x  b + v  + e                                (1)
             it        it     i    it

where v_i (i=1, …, n) are simply the fixed effects to be estimated. With no further constraints, the parameters a and v_i do not have a unique solution. You can see that by rearranging the terms in equation (1):

            y   = (a + v ) + x  b + e
             it         i     it     it

Consider some solution which has, say a=3. In that case, we could just as well say that a=4 and subtract the value 1 from each of the estimated v_i.

Thus, before equation (1) can be estimated, we must place an additional constraint on the system. Any constraint will do and the choice we make will have no effect on the estimated b. One popular constraint is a=0 but it is important to understand that we could just as well constrain a=3. Changing the value of a would merely change the corresponding values of v_i. Nor do we have to constrain a; we could place a constraint on v_i. We could, for instance, constrain v_1=0, or v_5=3.

The constraint xtreg, fe places on the system is computationally more difficult:

            N
           Sum  v   =   0
           i=1   i

Since the constraint we choose is arbitrary, we chose a constraint that makes interpreting results a little more convenient. The random-effects estimator proceeds under the *ASSUMPTION* that E(v_i)=0 and is hence able to estimate an intercept. We parameterize the fixed-effects estimator so that it proceeds under the *CONSTRAINT* average(v_i)=0. This constraint has no implication since we had to choose some constraint anyway.

The primary advantage of this constraint is that if you estimate some model and then obtain the predictions

        . xtreg y x1 x2 x3, fe
xtpred yhat

then the average value of yhat will equal the average value of y. In order to obtain estimates with the fixed-effects estimator, we had to impose an arbitrary constraint and had we instead constrained a=0, xtpred yhat would have produced yhat with average value 0. That would be the only difference; the predictions would differ by a constant (namely, by their respective values of a).

Using the constraint Sum v_i=0 has another advantage. Let us draw a distinction between models and estimators. The *MODEL* is

            y   = a + x  b + v  + e                                (1)
             it        it     i    it

Under the random-effects *MODEL*, it is assumed that E(v_i)=0 and that v_i and x_it are uncorrelated. From that model, we can derive the random-effects *ESTIMATOR*.

Under the fixed-effects *MODEL*, no assumptions are made about v_i except that they are fixed parameters. From that model, we can derive the fixed-effects *ESTIMATOR*.

Now, it turns out that the fixed-effects *ESTIMATOR* is an admissible estimator for the random-effects *MODEL*; it is merely less efficient than the random-effects *ESTIMATOR*. That is,

                        |   ----------------- model ---------------------
        Estimator       |   fixed effects             random effects
------------------------+---------------------------------------------------
         fixed effects  |    appropriate                appropriate
        random effects  |   inappropriate               appropriate
------------------------+---------------------------------------------------

When you use the fixed-effects *ESTIMATOR* for the random-effects *MODEL*, the intercept a reported by xtreg, fe is the appropriate estimate for the intercept of the random-effects model.

Derivation

The fixed-effects model is


            y   = a + x  b + v  + e                                (1)
             it        it     i    it


From which it follows that 

            _        _          _
            y  = a + x b + v  + e                                  (2)
             i        i     i    i

      _   _       _ 
where y , x , and e  are with averages of y  , x  , and e   within i.
       i   i       i                       it   it       it

Subtracting equation (2) from (1), we obtain

            _            _            _
       y  - y   = ( x  - x )b + (e  - e )                          (3)
        it   i       it   i       it   i

Equation (3) is the way many people think about the fixed-effects estimator and notice that, in this formula, a remains unestimated. From equation (1), it also follows that


           =       =    _   =
           y = a + xb + v + e                                      (4)


      =  =  _      =  
where y, x, v, and e are the grand averages of y  , x  , v , and e  .
                                                it   it   i       it  
For instance,


          =       n   T_i
          y  = ( Sum  Sum  y_it ) / (total_number_of_observations)
                 i=1  t=1

Summing equations (3) and (4), we obtain

           _    =                _    =           _    _    =
     y   - y  + y  =  a + (x   - x  + x)b + (e  - e  + v) + e      (5)
      it    i               it    i           it   i

                                                            _ 
xtreg, fe estimates the above equation under the constraint v=0, which is to 
say, it estimates

           _    =                _    =         
     y   - y  + y  =  a + (x   - x  + x)b + noise 
      it    i               it    i

Thus, the left-hand-side variable is y_it minus the within-group means but with the grand mean added back in and the right-hand-side variables are x_it minus the within-group means but with the grand mean added back in. Obviously, adding in grand means to the left- and right-hand sides has no effect on the estimated b.

Demonstration

Fixed-effects regression is supposed to produce the same coefficient estimates and standard errors as ordinary regression when indicator (dummy) variables are included for each of the groups. Since the fixed-effects model is

       y   = X  b + v  + e
        ij    ij     i    it

and v_i are fixed parameters to be estimated, this is the same as

      y   = X  b + v d1  + v d2  + ... e
       ij    ij     1  i    2  i        it

where d1 is 1 when i=1 and 0 otherwise, d2 is 1 when i=2 and 0 otherwise, and so on. d1, d2, …, are just dummy variables indicating the groups and v_1, v_2, …, are their regression coefficients which we must estimate.

The problem is that we typically have lots of groups — perhaps thousands — and including logs of dummy variables is too computationally expensive. So we look for a shortcut.

Nevertheless, we could take a little dataset with just a few groups and compare the methods. Here is my little dataset:

. list

         group          x          y  
  1.         1          0         -5  
  2.         1          8         23  
  3.         1         17         44  
  4.         2         10         29  
  5.         2         16         26  
  6.         3          4         17  
  7.         3         11         17  
  8.         3          5         31  
  9.         4         18         50  
 10.         4          5         26  
 11.         4          2         17

I’m going to show you

what regress with group dummies reports;
that xtreg, fe reports the same results;
that removing the within-group means and estimating a regression on the deviations without an intercept (as given in equation 3) produces the same coefficients but different standard errors.

How can method 3 be wrong? Because it fails to account for the fact that the means we removed are *ESTIMATES*. As a consequence, it understates standard errors.

1. What regress with group dummies reports

. xi: regress y x i.group
i.group               Igroup_1-4   (naturally coded; Igroup_1 omitted)

  Source |       SS       df       MS                  Number of obs =      11
---------+------------------------------               F(  4,     6) =    4.01
   Model |  1554.16667     4  388.541667               Prob > F      =  0.0643
Residual |  581.833333     6  96.9722222               R-squared     =  0.7276
---------+------------------------------               Adj R-squared =  0.5460
   Total |     2136.00    10      213.60               Root MSE      =  9.8474

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
       x |          2   .5372223      3.723   0.010       .6854644    3.314536
Igroup_2 |       -2.5   9.332493     -0.268   0.798      -25.33579    20.33579
Igroup_3 |   4.333333   8.090107      0.536   0.611      -15.46245    24.12911
Igroup_4 |   10.33333   8.040407      1.285   0.246      -9.340835     30.0075
   _cons |          4   7.236455      0.553   0.600      -13.70697    21.70697
------------------------------------------------------------------------------

2. xtreg, fe reports the same results

. xtreg y x, i(group) fe 

                                             Fixed-effects (within) regression
sd(u_group)                  =  5.621346               Number of obs =      11
sd(e_group_t)                =  9.847447                           n =       4
sd(e_group_t + u_group)      =  11.33895                       T-bar =    2.75

corr(u_group, Xb)            =   -0.1939               R-sq within   =  0.6979
                                                            between  =  0.1716
                                                            overall  =  0.6146

                                                       F(  1,     6) =   13.86
                                                            Prob > F =  0.0098

------------------------------------------------------------------------------
       y |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
       x |          2   .5372223      3.723   0.010       .6854644    3.314535
   _cons |   7.545455   5.549554      1.360   0.223      -6.033815    21.12472
------------------------------------------------------------------------------
   group |              F(3,6) =      0.830   0.524            (4 categories)

If you compare, you will find that regress with group dummies reported the same coefficient (2) and the same standard error (.5372223) for x as xtreg, fe just did. In both cases, the t statistic is 3.723.

3. Estimating the deviation model reports incorrect standard errors

. egen double ybar = mean(y), by(group)

egen double xbar = mean(x), by(group)

gen yd = y-ybar 

gen xd = x-xbar 

reg yd xd, nocons

  Source |       SS       df       MS                  Number of obs =      11
---------+------------------------------               F(  1,    10) =   23.10
   Model |  1343.99999     1  1343.99999               Prob > F      =  0.0007
Residual |  581.833327    10  58.1833327               R-squared     =  0.6979
---------+------------------------------               Adj R-squared =  0.6677
   Total |  1925.83332    11  175.075756               Root MSE      =  7.6278

------------------------------------------------------------------------------
      yd |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
      xd |          2   .4161306      4.806   0.001       1.072803    2.927197
------------------------------------------------------------------------------

So, to summarize:

                                         x
                                 |  Coefficient     Std. Err.     t
        -------------------------+------------------------------------
        -regress- with dummies   |      2           .5372223    3.723
        -xtreg, fe-              |      2           .5372223    3.723
        removing the means       |      2           .4161306    4.806
        -------------------------+------------------------------------

regress with dummies definitionally calculates correct results.

xtreg, fe matches them.

Removing the means and estimating on the deviations with the noconstant option produces correct coefficients but incorrect standard errors. Why? Because we did not account for the fact that the means we removed from y and x were estimated.

This page was adapted from a FAQ at the Stata Corp. FAQ page. We thank Stata for their permission to adapt and distribute this page via our web site.