**Version info**: Code for this page was tested in Stata 12.

Robust regression is an alternative to least squares regression when data is contaminated with outliers or influential observations and it can also be used for the purpose of detecting influential observations.

**
Please note:** The purpose of this page is to show how to use various data
analysis commands. It does not cover all aspects of the research process which
researchers are expected to do. In particular, it does not cover data
cleaning and checking, verification of assumptions, model diagnostics or
potential follow-up analyses.

## Introduction

Let’s begin our discussion on robust regression with some terms in linear regression.

**Residual**: The difference between the predicted value (based on
the regression equation) and the actual, observed value.

**Outlier**: In linear regression, an outlier is an observation with
large residual. In other words, it is an observation whose
dependent-variable value is unusual given its value on the predictor variables.
An outlier may indicate a sample peculiarity or may indicate a data entry error
or other problem.

**Leverage**: An observation with an extreme value on a predictor
variable is a point with high leverage. Leverage is a measure of how far
an independent variable deviates from its mean. High leverage points can
have a great amount of effect on the estimate of regression coefficients.

**Influence**: An observation is said to be influential if removing
the observation substantially changes the estimate of the regression coefficients.
Influence can be thought of as the product of leverage and outlierness.

**Cook’s distance** (or Cook’s D): A measure that combines the information
of leverage and residual of the observation.

Robust regression can be used in any situation in which you would use least squares regression. When fitting a least squares regression, we might find some outliers or high leverage data points. We have decided that these data points are not data entry errors, neither they are from a different population than most of our data. So we have no compelling reason to exclude them from the analysis. Robust regression might be a good strategy since it is a compromise between excluding these points entirely from the analysis and including all the data points and treating all them equally in OLS regression. The idea of robust regression is to weigh the observations differently based on how well behaved these observations are. Roughly speaking, it is a form of weighted and reweighted least squares regression.

Stata’s **rreg** command implements a version of robust regression. It
first runs the OLS regression, gets the Cook’s
D for each observation, and then drops any observation with Cook’s distance
greater than 1. Then iteration process begins in which weights are calculated based on
absolute residuals. The iterating stops when the maximum change between
the weights from one iteration to the next is below
tolerance. Two types of weights are used. In Huber weighting,
observations with small residuals get a weight of 1, the larger the residual,
the smaller the weight. With biweighting, all
cases with a non-zero residual get down-weighted at least a little. The two different kinds of weight are used because Huber weights can have
difficulties with severe outliers, and biweights can have difficulties
converging or may yield multiple solutions. Using the Huber weights first
helps to minimize problems with the biweights. You can see the iteration
history of both types of weights at the top of the robust regression output. Using the Stata defaults, robust regression is about 95% as efficient as OLS
(Hamilton, 1991).
In short, the most influential points are dropped, and then cases with large
absolute residuals are down-weighted.

## Description of the data

For our data analysis below, we will use the **crime** data set. This dataset appears in *
Statistical Methods for Social Sciences, Third Edition* by Alan Agresti and
Barbara Finlay (Prentice Hall, 1997). The variables are state id (**sid**),
state name (**state**), violent crimes per 100,000 people (**crime**),
murders per 1,000,000 (**murder**), the percent of the population living in
metropolitan areas (**pctmetro**), the percent of the population that is
white (**pctwhite**), percent of population with a high school education or
above (**pcths**), percent of population living under poverty line (**poverty**),
and percent of population that are single parents (**single**). It has
51 observations. We are going to use **poverty** and **single** to predict
**crime**.

use http://www.ats.ucla.edu/stat/stata/dae/crime, clearsummarize crime poverty singleVariable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- crime | 51 612.8431 441.1003 82 2922 poverty | 51 14.25882 4.584242 8 26.4 single | 51 11.32549 2.121494 8.4 22.1

## Robust regression analysis

In most cases, we begin by running an OLS regression and doing some
diagnostics. We will begin by running an OLS regression. The **lvr2plot** is used to create a graph showing the
leverage versus the squared residuals, and the **mlabel** option is used to
label the points on the graph with the two-letter abbreviation for each state.

regress crime poverty singleSource | SS df MS Number of obs = 51 -------------+------------------------------ F( 2, 48) = 57.96 Model | 6879872.44 2 3439936.22 Prob > F = 0.0000 Residual | 2848602.3 48 59345.8813 R-squared = 0.7072 -------------+------------------------------ Adj R-squared = 0.6950 Total | 9728474.75 50 194569.495 Root MSE = 243.61 ------------------------------------------------------------------------------ crime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- poverty | 6.787359 8.988529 0.76 0.454 -11.28529 24.86001 single | 166.3727 19.42291 8.57 0.000 127.3203 205.425 _cons | -1368.189 187.2052 -7.31 0.000 -1744.59 -991.7874 ------------------------------------------------------------------------------lvr2plot, mlabel(state)

As we can see, DC, Florida and Mississippi have either high leverage or large residuals.
Let’s compute Cook’s D and display the observations that have relatively large
values of Cook’s D. To this end, we use the **predict** command with the **
cooksd** option to create a new variable called **d1** containing the values of Cook’s D.
Another conventional cut-off
point is **4/n**, where **n** is the number of observations in the
data set. We will use this criterion to select the values to display.

predict d1, cooksd clist state crime poverty single d1 if d1>4/51, noobsstate crime poverty single d1 ak 761 9.1 14.3 .125475 fl 1206 17.8 10.6 .1425891 ms 434 24.7 14.7 .6138721 dc 2922 26.4 22.1 2.636252

Since DC has a Cook’s D larger than 1, **rreg** will assign a missing
weight to it so it will be excluded from the robust regression analysis. We
probably should drop DC to begin with since it is not even a state. We include
it in the analysis just to show that it has large Cook’s D and will be dropped
by **rreg**. Now we will look at the residuals. We will again use the **predict**
command, this time with the **rstandard** option. We will generate a new
variable called **absr1**, which is the absolute value of the standardized residuals
(because the sign of the residual doesn’t matter). The **gsort**
command is used to sort the data by descending order.

predict r1, rstandard gen absr1 = abs(r1) gsort -absr1 clist state absr1 in 1/10, noobsstate absr1 ms 3.56299 fl 2.902663 dc 2.616447 vt 1.742409 mt 1.460884 me 1.426741 ak 1.397418 nj 1.354149 il 1.338192 md 1.287087

Now let’s run our robust regression and we will make use of the **generate** option to have Stata save the
final weights to a new variable which we call **weight** in the data set.

rreg crime poverty single, gen(weight)Huber iteration 1: maximum difference in weights = .66846346 Huber iteration 2: maximum difference in weights = .11288069 Huber iteration 3: maximum difference in weights = .01810715 Biweight iteration 4: maximum difference in weights = .29167992 Biweight iteration 5: maximum difference in weights = .10354281 Biweight iteration 6: maximum difference in weights = .01421094 Biweight iteration 7: maximum difference in weights = .0033545 Robust regression Number of obs = 50 F( 2, 47) = 31.15 Prob > F = 0.0000 ------------------------------------------------------------------------------ crime | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- poverty | 10.36971 7.629288 1.36 0.181 -4.978432 25.71786 single | 142.6339 22.17042 6.43 0.000 98.03276 187.235 _cons | -1160.931 224.2564 -5.18 0.000 -1612.076 -709.7849 ------------------------------------------------------------------------------

Comparing the OLS regression and robust regression models, we can see that
the results are fairly different, especially with respect to the coefficients of **
single. **You will also notice that no R-squared, adjusted R-squared or root
MSE from **rreg** output.

Notice that the number of observations in the robust regression analysis is 50, instead of 51. This is because observation for DC has been dropped since its Cook’s D is greater than 1. We can also see that it is being dropped by looking at the final weight.

clist state weight if state =="dc", noobsstate weight dc .

Now let’s look at other observations with relatively small weight.

sort weight clist sid state weight absr1 d1 in 1/10, noobssid state weight absr1 d1 25 ms .02638862 3.56299 .6138721 9 fl .11772218 2.902663 .1425891 46 vt .59144513 1.742409 .0427155 26 mt .66441582 1.460884 .016755 20 md .67960728 1.287087 .0356962 14 il .69124917 1.338192 .0126569 21 me .69766511 1.426741 .0223313 31 nj .74574796 1.354149 .0222918 19 ma .75392127 1.198541 .016399 5 ca .80179038 1.015206 .0123064

Roughly, as the residual goes down, the weight goes up. In other words, cases with a large residuals tend to be down-weighted, and the values of Cook’s D don’t closely correspond to the weights. This output shows us that the observation for Mississippi will be down-weighted the most. Florida will also be substantially down-weighted. In OLS regression, all cases have a weight of 1. Hence, the more cases in the robust regression that have a weight close to one, the closer the results of the OLS and robust regressions. We can also visualize this relationship by graphing the data points with the weight information as the size of circles.

twoway (scatter crime single [weight=weight], msymbol(oh)) if state !="dc"

Many post-estimation commands are available after running **rreg**, such
as **test** command and **margins** command. For example, we can get
the predicted values with respect to a set of values of variable **single**
holding **poverty** at its mean.

margins, at(single=(8(2)22)) vsquishPredictive margins Number of obs = 50 Expression : Fitted values, predict() 1._at : single = 8 2._at : single = 10 3._at : single = 12 4._at : single = 14 5._at : single = 16 6._at : single = 18 7._at : single = 20 8._at : single = 22 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at | 1 | 125.4825 74.88788 1.68 0.094 -21.29505 272.26 2 | 410.7503 38.20604 10.75 0.000 335.8678 485.6328 3 | 696.0181 35.2623 19.74 0.000 626.9053 765.1309 4 | 981.2859 70.42285 13.93 0.000 843.2596 1119.312 5 | 1266.554 112.2833 11.28 0.000 1046.482 1486.625 6 | 1551.821 155.5247 9.98 0.000 1246.999 1856.644 7 | 1837.089 199.25 9.22 0.000 1446.567 2227.612 8 | 2122.357 243.1982 8.73 0.000 1645.697 2599.017 ------------------------------------------------------------------------------

This table shows that as the percent of single parents increases so does the predicted crime rate.

## Things to consider

- There are other Stata commands for robust regression written by Stata
users. For example,
**mregress**and**mmregress**by Verardi and Corux implemented more effective M-estimator and MM-estimator. One can get these command via "**search mregress**" or "**search mmregress**". - You might be interested in getting R-squared and other model fit
indices, such as AIC or BIC after running
**rreg**. Here is a page that describes a user-written program for obtaining these measures. **rreg**does not address issues of heterogeneity of variance. This problem can be addressed by using the**robust**option with the**regress**command.

## See also

- Stata 11 Manuals
- Regression with Graphics: A Second Course in Applied Statistics by Lawrence Hamilton

## References

- Li, G. 1985.
*Robust regression*. In*Exploring Data Tables, Trends, and Shapes*, ed. D. C. Hoaglin, F. Mosteller, and J. W. Tukey, Wiley. - Verardi, V and Croux, C. 2009.
*Robust regression in Stata.*The Stata Journal, Vol 9. No 3 - John Fox,
*Applied regression analysis, linear models, and related models*, Sage publications, Inc, 1997