**1**. The following data set consists of measured weight, measured height,
reported weight and reported height of some 200 people. You can get it from
within Stata by typing **use http://www.ats.ucla.edu/stat/stata/webbooks/reg/davis
** We tried to build a model to predict measured weight by reported weight, reported height and measured height. We did an lvr2plot after the regression and here is what we have. Explain what you see in the graph and try to use other STATA commands to identify the problematic observation(s). What do you think the problem is and
what is your solution?

use http://www.ats.ucla.edu/stat/stata/webbooks/reg/davis regress measwt measht reptwt repthtSource | SS df MS Number of obs = 181 ---------+------------------------------ F( 3, 177) = 1640.88 Model | 40891.9594 3 13630.6531 Prob > F = 0.0000 Residual | 1470.3279 177 8.30693727 R-squared = 0.9653 ---------+------------------------------ Adj R-squared = 0.9647 Total | 42362.2873 180 235.346041 Root MSE = 2.8822 ------------------------------------------------------------------------------ measwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- measht | -.9607757 .0260189 -36.926 0.000 -1.012123 -.9094285 reptwt | 1.01917 .0240778 42.328 0.000 .971654 1.066687 reptht | .8184156 .0419658 19.502 0.000 .7355979 .9012334 _cons | 24.8138 4.888302 5.076 0.000 15.16695 34.46065 ------------------------------------------------------------------------------lvr2plot

**2**. Using the data from the last exercise, what measure would you use if
you want to know how much change an observation would make on a coefficient
for a predictor? For
example, show how much change would it be for the coefficient of predictor **reptht
**if we omit observation 12 from our regression analysis? What are the other
measures that you would use to assess the influence of an observation on
regression? What are the cut-off values for them?

**3**. The following data file is
called **bbwt.dta** and it is from Weisberg’s Applied Regression Analysis.
You can obtain it from within Stata by typing **use http://www.ats.ucla.edu/stat/stata/webbooks/reg/bbwt**
It consists of the body weights and brain weights of some 60 animals. We want to predict the brain weight by body
weight, that is, a simple linear regression of brain weight against body
weight. Show what you have to do to verify the linearity assumption.
If you think that it violates the linearity assumption, show some possible remedies that you
would consider.

use http://www.ats.ucla.edu/stat/stata/webbooks/reg/bbwt, clear regress brainwt bodywtSource | SS df MS Number of obs = 62 ---------+------------------------------ F( 1, 60) = 411.12 Model | 46067326.8 1 46067326.8 Prob > F = 0.0000 Residual | 6723217.18 60 112053.62 R-squared = 0.8726 ---------+------------------------------ Adj R-squared = 0.8705 Total | 52790543.9 61 865418.753 Root MSE = 334.74 ------------------------------------------------------------------------------ brainwt | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- bodywt | .9664599 .0476651 20.276 0.000 .8711155 1.061804 _cons | 91.00865 43.55574 2.089 0.041 3.884201 178.1331 ------------------------------------------------------------------------------

**4**. We did a regression analysis using the data file **elemapi2** in chapter 2. Continuing with the analysis we did, we did an avplot
here. Explain what an avplot is and what type of information you would
get from the plot. If variable **full** were put in the model, would it be a
significant predictor?

use http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi2, clear regress api00 meals ell emerSource | SS df MS Number of obs = 400 ---------+------------------------------ F( 3, 396) = 673.00 Model | 6749782.75 3 2249927.58 Prob > F = 0.0000 Residual | 1323889.25 396 3343.15467 R-squared = 0.8360 ---------+------------------------------ Adj R-squared = 0.8348 Total | 8073672.00 399 20234.7669 Root MSE = 57.82 ------------------------------------------------------------------------------ api00 | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- meals | -3.159189 .1497371 -21.098 0.000 -3.453568 -2.864809 ell | -.9098732 .1846442 -4.928 0.000 -1.272878 -.5468678 emer | -1.573496 .293112 -5.368 0.000 -2.149746 -.9972456 _cons | 886.7033 6.25976 141.651 0.000 874.3967 899.0098 ------------------------------------------------------------------------------avplot full, mlabel(snum)

**5**. The data set **wage.dta** is from a national sample of 6000 households
with a male head earning less than $15,000 annually in 1966. You can get this
data file by typing **use http://www.ats.ucla.edu/stat/stata/webbooks/reg/wage
**from
within Stata**. **The data were classified
into 39 demographic groups for analysis.
We tried to predict the average hours worked by average age of respondent and average yearly non-earned income.

use http://www.ats.ucla.edu/stat/stata/webbooks/reg/wage, clear regress HRS AGE NEINSource | SS df MS Number of obs = 39 ---------+------------------------------ F( 2, 36) = 39.72 Model | 107205.109 2 53602.5543 Prob > F = 0.0000 Residual | 48578.1222 36 1349.39228 R-squared = 0.6882 ---------+------------------------------ Adj R-squared = 0.6708 Total | 155783.231 38 4099.5587 Root MSE = 36.734 ------------------------------------------------------------------------------ HRS | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- AGE | -8.281632 1.603736 -5.164 0.000 -11.53416 -5.029104 NEIN | .4289202 .0484882 8.846 0.000 .3305816 .5272588 _cons | 2321.03 57.55038 40.330 0.000 2204.312 2437.748 ------------------------------------------------------------------------------

Both predictors are significant. Now if we add ASSET to our predictors list, neither NEIN nor ASSET is significant.

regress HRS AGE NEIN ASSETSource | SS df MS Number of obs = 39 ---------+------------------------------ F( 3, 35) = 25.83 Model | 107317.64 3 35772.5467 Prob > F = 0.0000 Residual | 48465.5908 35 1384.73117 R-squared = 0.6889 ---------+------------------------------ Adj R-squared = 0.6622 Total | 155783.231 38 4099.5587 Root MSE = 37.212 ------------------------------------------------------------------------------ HRS | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- AGE | -8.007181 1.88844 -4.240 0.000 -11.84092 -4.173443 NEIN | .3338277 .337171 0.990 0.329 -.3506658 1.018321 ASSET | .0044232 .015516 0.285 0.777 -.027076 .0359223 _cons | 2314.054 63.22636 36.600 0.000 2185.698 2442.411 ------------------------------------------------------------------------------

Can you explain why?

**6.** Continue to use the previous data set.
This time we want to predict the average hourly wage by average percent of white
respondents.
Carry out the regression analysis and list the STATA commands that you can use to check for
heteroscedasticity. Explain the result of your test(s).

Now we want build another model to predict the average percent of white respondents by the average hours worked. Repeat the analysis you performed on the previous regression model. Explain your results.

**7**. We have a data set that consists of volume, diameter and height
of some objects. Someone did a regression of volume on diameter and height.

use http://www.ats.ucla.edu/stat/stata/webbooks/reg/tree, clear regress vol dia heightSource | SS df MS Number of obs = 31 ---------+------------------------------ F( 2, 28) = 254.97 Model | 7684.16254 2 3842.08127 Prob > F = 0.0000 Residual | 421.921306 28 15.0686181 R-squared = 0.9480 ---------+------------------------------ Adj R-squared = 0.9442 Total | 8106.08385 30 270.202795 Root MSE = 3.8818 ------------------------------------------------------------------------------ vol | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- dia | 4.708161 .2642646 17.816 0.000 4.166839 5.249482 height | .3392513 .1301512 2.607 0.014 .0726487 .6058538 _cons | -57.98766 8.638225 -6.713 0.000 -75.68226 -40.29306 ------------------------------------------------------------------------------

Explain what tests you can use to detect model specification errors and if there is any, your solution to correct it.