**This page was adapted from a FAQ at the Stata Corp. FAQ page. We thank Stata
for their permission to adapt and distribute this page via our web site.**

The results that **xtreg**, **fe** reports have simply been reformulated so that
the reported intercept is the average value of the fixed effects.

### Intuition

One way of writing the fixed-effects model is

y = a + x b + v + e (1) it it i it

where **v_i **(i=1, …, n) are simply the fixed effects to be estimated. With no
further constraints, the parameters **a** and **v_i** do not have a unique solution.
You can see that by rearranging the terms in equation (1):

y = (a + v ) + x b + e it i it it

Consider some solution which has, say **a**=3. In that case, we could just as well
say that **a**=4 and subtract the value 1 from each of the estimated **v_i**.

Thus, before equation (1) can be estimated, we must place an additional constraint on
the system. Any constraint will do and the choice we make will have no effect on the
estimated **b**. One popular constraint is **a**=0 but it is important to understand
that we could just as well constrain **a**=3. Changing the value of **a** would
merely change the corresponding values of **v_i**. Nor do we have to constrain **a**;
we could place a constraint on **v_i**. We could, for instance, constrain **v_1**=0,
or **v_5**=3.

The constraint **xtreg**, **fe** places on the system is computationally more
difficult:

N Sum v = 0 i=1 i

Since the constraint we choose is arbitrary, we chose a constraint that makes
interpreting results a little more convenient. The random-effects estimator proceeds under
the *ASSUMPTION* that E**(v_i)**=0 and is hence able to estimate an intercept. We
parameterize the fixed-effects estimator so that it proceeds under the *CONSTRAINT*
average**(v_i)**=0. This constraint has no implication since we had to choose some
constraint anyway.

The primary advantage of this constraint is that if you estimate some model and then obtain the predictions

. xtreg y x1 x2 x3, fe xtpred yhat

then the average value of **yhat** will equal the average value of **y**. In
order to obtain estimates with the fixed-effects estimator, we had to impose an arbitrary
constraint and had we instead constrained **a**=0, **xtpred yhat** would have
produced **yhat** with average value 0. That would be the only difference; the
predictions would differ by a constant (namely, by their respective values of **a**).

Using the constraint Sum **v_i**=0 has another advantage. Let us draw a distinction
between models and estimators. The *MODEL* is

y = a + x b + v + e (1) it it i it

Under the random-effects *MODEL*, it is assumed that E**(v_i)**=0 and that **v_i**
and **x_it** are uncorrelated. From that model, we can derive the random-effects
*ESTIMATOR*.

Under the fixed-effects *MODEL*, no assumptions are made about **v_i** except that
they are fixed parameters. From that model, we can derive the fixed-effects *ESTIMATOR*.

Now, it turns out that the fixed-effects *ESTIMATOR* is an admissible estimator for the random-effects *MODEL*; it is merely less efficient than the random-effects *ESTIMATOR*. That is,

| ----------------- model --------------------- Estimator | fixed effects random effects ------------------------+--------------------------------------------------- fixed effects | appropriate appropriate random effects | inappropriate appropriate ------------------------+---------------------------------------------------

When you use the fixed-effects *ESTIMATOR* for the random-effects *MODEL*, the
intercept **a** reported by **xtreg**, **fe** is the appropriate estimate for the
intercept of the random-effects model.

### Derivation

The fixed-effects model is

y = a + x b + v + e (1) it it i itFrom which it follows that_ _ _ y = a + x b + v + e (2) i i i i_ _ _wherey,x, andeare with averages ofy,x, andewithini.i i i it it it

Subtracting equation (2) from (1), we obtain

_ _ _ y - y = ( x - x )b + (e - e ) (3) it i it i it i

Equation (3) is the way many people think about the fixed-effects estimator and notice
that, in this formula, **a** remains unestimated. From equation (1), it also follows
that

= = _ = y = a + xb + v + e (4)= = _ =wherey,x,v, andeare the grand averages ofy,x,v, ande.it it i itFor instance,= n T_i y = ( Sum Sum y_it ) / (total_number_of_observations) i=1 t=1

Summing equations (3) and (4), we obtain

_ = _ = _ _ = y - y + y = a + (x - x + x)b + (e - e + v) + e (5) it i it i it i

_xtreg,feestimates the above equation under the constraintv=0, which is to say, it estimates

_ = _ = y - y + y = a + (x - x + x)b + noise it i it i

Thus, the left-hand-side variable is **y_it** minus the within-group means but with
the grand mean added back in and the right-hand-side variables are **x_it** minus the
within-group means but with the grand mean added back in. Obviously, adding in grand means
to the left- and right-hand sides has no effect on the estimated **b**.

### Demonstration

Fixed-effects regression is supposed to produce the same coefficient estimates and standard errors as ordinary regression when indicator (dummy) variables are included for each of the groups. Since the fixed-effects model is

y = X b + v + e ij ij i it

and **v_i** are fixed parameters to be estimated, this is the same as

y = X b + v d1 + v d2 + ... e ij ij 1 i 2 i it

where **d1** is 1 when i=1 and 0 otherwise, **d2** is 1 when i=2 and 0 otherwise,
and so on. **d1**, **d2**, …, are just dummy variables indicating the groups and **v_1**,
**v_2**, …, are their regression coefficients which we must estimate.

The problem is that we typically have lots of groups — perhaps thousands — and including logs of dummy variables is too computationally expensive. So we look for a shortcut.

Nevertheless, we could take a little dataset with just a few groups and compare the methods. Here is my little dataset:

. list group x y 1. 1 0 -5 2. 1 8 23 3. 1 17 44 4. 2 10 29 5. 2 16 26 6. 3 4 17 7. 3 11 17 8. 3 5 31 9. 4 18 50 10. 4 5 26 11. 4 2 17

I’m going to show you

- what
**regress**with group dummies reports; - that
**xtreg, fe**reports the same results; - that removing the within-group means and estimating a regression on the deviations without an intercept (as given in equation 3) produces the same coefficients but different standard errors.

How can method 3 be wrong? Because it fails to account for the fact that the means we removed are *ESTIMATES*. As a consequence, it understates standard errors.

### 1. What regress with group dummies reports

. xi: regress y x i.group i.group Igroup_1-4 (naturally coded; Igroup_1 omitted) Source | SS df MS Number of obs = 11 ---------+------------------------------ F( 4, 6) = 4.01 Model | 1554.16667 4 388.541667 Prob > F = 0.0643 Residual | 581.833333 6 96.9722222 R-squared = 0.7276 ---------+------------------------------ Adj R-squared = 0.5460 Total | 2136.00 10 213.60 Root MSE = 9.8474 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 2 .5372223 3.723 0.010 .6854644 3.314536 Igroup_2 | -2.5 9.332493 -0.268 0.798 -25.33579 20.33579 Igroup_3 | 4.333333 8.090107 0.536 0.611 -15.46245 24.12911 Igroup_4 | 10.33333 8.040407 1.285 0.246 -9.340835 30.0075 _cons | 4 7.236455 0.553 0.600 -13.70697 21.70697 ------------------------------------------------------------------------------

### 2. xtreg, fe reports the same results

. xtreg y x, i(group) fe Fixed-effects (within) regression sd(u_group) = 5.621346 Number of obs = 11 sd(e_group_t) = 9.847447 n = 4 sd(e_group_t + u_group) = 11.33895 T-bar = 2.75 corr(u_group, Xb) = -0.1939 R-sq within = 0.6979 between = 0.1716 overall = 0.6146 F( 1, 6) = 13.86 Prob > F = 0.0098 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 2 .5372223 3.723 0.010 .6854644 3.314535 _cons | 7.545455 5.549554 1.360 0.223 -6.033815 21.12472 ------------------------------------------------------------------------------ group | F(3,6) = 0.830 0.524 (4 categories)

If you compare, you will find that **regress** with group dummies reported the same
coefficient (2) and the same standard error (.5372223) for x as **xtreg, fe** just did.
In both cases, the t statistic is 3.723.

### 3. Estimating the deviation model reports incorrect standard errors

. egen double ybar = mean(y), by(group) egen double xbar = mean(x), by(group) gen yd = y-ybar gen xd = x-xbar reg yd xd, nocons Source | SS df MS Number of obs = 11 ---------+------------------------------ F( 1, 10) = 23.10 Model | 1343.99999 1 1343.99999 Prob > F = 0.0007 Residual | 581.833327 10 58.1833327 R-squared = 0.6979 ---------+------------------------------ Adj R-squared = 0.6677 Total | 1925.83332 11 175.075756 Root MSE = 7.6278 ------------------------------------------------------------------------------ yd | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- xd | 2 .4161306 4.806 0.001 1.072803 2.927197 ------------------------------------------------------------------------------

So, to summarize:

x | Coefficient Std. Err. t -------------------------+------------------------------------ -regress- with dummies | 2 .5372223 3.723 -xtreg, fe- | 2 .5372223 3.723 removing the means | 2 .4161306 4.806 -------------------------+------------------------------------

**regress** with dummies definitionally calculates correct results.

**xtreg, fe** matches them.

Removing the means and estimating on the deviations with the **noconstant** option
produces correct coefficients but incorrect standard errors. Why? Because we did not
account for the fact that the means we removed from y and x were estimated.

**This page was adapted from a FAQ at the Stata Corp. FAQ page. We thank Stata
for their permission to adapt and distribute this page via our web site.**