Sometimes your research may predict that the size of a
regression coefficient may vary across groups. For example, you might believe that the
regression coefficient of **height** predicting **weight** would
differ across three age groups (young, middle age, senior citizen). Below, we have a data file
with 10 fictional young people, 10 fictional middle age people, and 10 fictional
senior citizens, along with their **height** in inches and their **weight**
in pounds. The variable **age** indicates the age group and is coded 1 for
young people, 2 for middle aged, and 3 for senior citizens.

DATA htwt; INPUT id age height weight ; CARDS; 1 1 56 140 2 1 60 155 3 1 64 143 4 1 68 161 5 1 72 139 6 1 54 159 7 1 62 138 8 1 65 121 9 1 65 161 10 1 70 145 11 2 56 117 12 2 60 125 13 2 64 133 14 2 68 141 15 2 72 149 16 2 54 109 17 2 62 128 18 2 65 131 19 2 65 131 20 2 70 145 21 3 64 211 22 3 68 223 23 3 72 235 24 3 76 247 25 3 80 259 26 3 62 201 27 3 69 228 28 3 74 245 29 3 75 241 30 3 82 269 ; RUN;

We analyze their data separately using the **proc reg** below.

PROC REG DATA=htwt; BY age ; MODEL weight = height ; RUN;

The parameter estimates (coefficients) for the young,
middle age, and senior citizens are shown below. below, and the results do seem to suggest
that **height** is a stronger predictor of **weight** for
seniors (3.18) than for the middle aged (2.09). The results also seem to suggest that
**height**
does not predict **weight** as strongly for the young (-.37) as for the
middle aged and seniors. However, we would need to perform specific significance tests to
be able to make claims about the differences among these regression coefficients.

AGE=1 Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 170.166445 49.43018216 3.443 0.0088 HEIGHT 1 -0.376831 0.77433413 -0.487 0.6396 AGE=2 Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 -2.397470 7.05327189 -0.340 0.7427 HEIGHT 1 2.095872 0.11049098 18.969 0.0001 AGE=3 Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 5.601677 8.93019669 0.627 0.5480 HEIGHT 1 3.189727 0.12323669 25.883 0.0001

We can compare the regression coefficients among these three age groups to test the null hypothesis

Ho:B=_{1}B=_{2}B_{3}

where **B _{1}** is the regression for for the young,

**B**is the regression for for the middle aged, and

_{2}**B**is the regression for for senior citizens. To do this analysis, we first make a dummy variable called

_{3}**age1**that is coded 1 if young (age=1), 0 otherwise, and

**age2**that is coded 1 if middle aged (age=2), 0 otherwise. We also create

**age1ht**that is

**age1**times

**height**, and

**age2ht**that is

**age2**times

**height**.

data htwt2; set htwt; age1 = . ; age2 = . ; IF age = 1 then age1 = 1; ELSE age1 = 0 ; IF age = 2 then age2 = 1; ELSE age2 = 0 ; age1ht = age1*height ; age2ht = age2*height ; RUN;

We can now use **age1** **age2** **height**, **age1ht** and **age2ht** as predictors
in the regression equation in **proc reg** below. In the **proc reg**
we use the

TEST age1ht=0, age2ht=0;

statement to test the null hypothesis

Ho:B=_{1}B=_{2}B_{3}

This test will have two degrees of freedom because it compares among three regression coefficients.

PROC REG DATA=htwt2 ; MODEL weight = age1 age2 height age1ht age2ht ; TEST age1ht=0, age2ht=0 ; RUN;

The output below shows that the null hypothesis

Ho:B=_{1}B=_{2}B_{3}

can be rejected (**F=17.29, p = 0.0000**). This means that the
regression coefficients between **height** and **weight** do
indeed significantly differ across the 3 age groups (young, middle age, senior citizen).

Model: MODEL1 Dependent Variable: WEIGHT Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 5 69595.35464 13919.07093 220.261 0.0001 Error 24 1516.64536 63.19356 C Total 29 71112.00000 Root MSE 7.94944 R-square 0.9787 Dep Mean 171.00000 Adj R-sq 0.9742 C.V. 4.64879 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 5.601677 29.48853690 0.190 0.8509 AGE1 1 164.564768 41.55490307 3.960 0.0006 AGE2 1 -7.999147 41.55490307 -0.192 0.8490 HEIGHT 1 3.189727 0.40694172 7.838 0.0001 AGE1HT 1 -3.566558 0.61316088 -5.817 0.0001 AGE2HT 1 -1.093855 0.61316088 -1.784 0.0871 Dependent Variable: WEIGHT Test: Numerator: 1092.7718 DF: 2 F value: 17.2925 Denominator: 63.19356 DF: 24 Prob>F: 0.0001

It is also possible to run such an analysis in **proc glm**, using syntax as shown below. Instead of using a
**test**
statement, the **contrast** statement is used to test the null hypothesis

Ho:B=_{1}B=_{2}B_{3}

The contrast statement uses the comma to join together what would have been two separate one degree of freedom tests into a single two degree of freedom test that tests the null hypothesis above.

PROC GLM DATA=htwt2 ; CLASS age ; MODEL weight = age height age*height / SOLUTION ; CONTRAST 'test equal slopes' age*height 1 -1 0, age*height 0 1 -1 ; RUN;

If you compare the **contrast** output from **proc glm** (labeled **test equal slopes** found below with the
output from **test** from **proc glm** above, you will see the F
values and p values are the same. This is because these two tests are equivalent.

General Linear Models Procedure Class Level Information Class Levels Values AGE 3 1 2 3 Number of observations in data set = 30 General Linear Models Procedure Dependent Variable: WEIGHT Sum of Mean Source DF Squares Square F Value Pr > F Model 5 69595.354644 13919.070929 220.26 0.0001 Error 24 1516.645356 63.193557 Corrected Total 29 71112.000000 R-Square C.V. Root MSE WEIGHT Mean 0.978672 4.648794 7.9494375 171.00000 Source DF Type I SS Mean Square F Value Pr > F AGE 2 64350.600000 32175.300000 509.15 0.0001 HEIGHT 1 3059.211075 3059.211075 48.41 0.0001 HEIGHT*AGE 2 2185.543569 1092.771784 17.29 0.0001 Source DF Type III SS Mean Square F Value Pr > F AGE 2 1395.9046778 697.9523389 11.04 0.0004 HEIGHT 1 2597.0189017 2597.0189017 41.10 0.0001 HEIGHT*AGE 2 2185.5435689 1092.7717845 17.29 0.0001 Contrast DF Contrast SS Mean Square F Value Pr > F test equal slopes 2 2185.5435689 1092.7717845 17.29 0.0001 T for H0: Pr > |T| Std Error of Parameter Estimate Parameter=0 Estimate INTERCEPT 5.6016771 B 0.19 0.8509 29.48853690 AGE 1 164.5647676 B 3.96 0.0006 41.55490307 2 -7.9991472 B -0.19 0.8490 41.55490307 3 0.0000000 B . . . HEIGHT 3.1897275 B 7.84 0.0001 0.40694172 HEIGHT*AGE 1 -3.5665584 B -5.82 0.0001 0.61316088 2 -1.0938553 B -1.78 0.0871 0.61316088 3 0.0000000 B . . . NOTE: The X'X matrix has been found to be singular and a generalized inverse was used to solve the normal equations. Estimates followed by the letter 'B' are biased, and are not unique estimators of the parameters.

You might notice that the null hypothesis that we are testing

Ho:B=_{1}B=_{2}B_{3}

is similar to the null hypothesis that you might test using ANOVA to compare the means of the three groups,

Ho:Mu=_{1}Mu=_{2}Mu_{3}

In ANOVA, you can get an overall F test testing the null hypothesis. In addition to
that overall test, you could perform planned comparisons among the three groups.
So far we have seen how to to an overall test of the equality of the three regression
coefficients, and now we will test planned comparisons among the regression coefficients.
Below, we show how you can perform two such tests using the **contrasta**
statement in **proc glm**. The first **contrast** compares the
regression coefficients of the **middle aged** vs. **senior**.

Ho:B=_{2}B_{3}

The second **contrast** compares the regression coefficients of the **young**
vs. **middle aged** and **seniors**.

Ho:B= (_{1}B+_{2}B)/2_{3}PROC GLM DATA=htwt2 ; CLASS age ; MODEL weight = age height age*height ; CONTRAST 'Mid Age vs. Sen. ' age*height 0 1 -1 ; CONTRAST 'Yng vs (Mid & Sen)' age*height -2 1 1 ; RUN;

The output from **contrast** indicates that
regression coefficients for **middle aged and seniors do not significantly
differ (F=3.18, p=0.0871) The second contrast was significant (F=29.96,
p=0.0000) indicating that the regression coefficients for the young differ from the middle
age and seniors combined. **

General Linear Models Procedure Class Level Information Class Levels Values AGE 3 1 2 3 Number of observations in data set = 30 General Linear Models Procedure Dependent Variable: WEIGHT Sum of Mean Source DF Squares Square F Value Pr > F Model 5 69595.354644 13919.070929 220.26 0.0001 Error 24 1516.645356 63.193557 Corrected Total 29 71112.000000 R-Square C.V. Root MSE WEIGHT Mean 0.978672 4.648794 7.9494375 171.00000 Source DF Type I SS Mean Square F Value Pr > F AGE 2 64350.600000 32175.300000 509.15 0.0001 HEIGHT 1 3059.211075 3059.211075 48.41 0.0001 HEIGHT*AGE 2 2185.543569 1092.771784 17.29 0.0001 Source DF Type III SS Mean Square F Value Pr > F AGE 2 1395.9046778 697.9523389 11.04 0.0004 HEIGHT 1 2597.0189017 2597.0189017 41.10 0.0001 HEIGHT*AGE 2 2185.5435689 1092.7717845 17.29 0.0001 Contrast DF Contrast SS Mean Square F Value Pr > F Mid Age vs. Sen. 1 201.1146303 201.1146303 3.18 0.0871 Yng vs (Mid & Sen) 1 1893.2074903 1893.2074903 29.96 0.0001

We can do the exact same analysis in **proc reg**
by coding **age1** and **age2** like the coding shown in the **contrast**
statements above We will create **age1** that will be:

0 for young 1 for middle age -1 for senior

and we will create **age2** that will be:

-2 for young 1 for middle age 1 for senior

The significance tests in **proc reg**
below for **age1ht** and **age2ht** will correspond to the
**contrast**
statements we used in **proc glm** above.

data htwt3; set htwt; age1 = . ; age2 = . ; IF age = 1 then age1 = 0; IF age = 2 then age1 = 1; IF age = 3 then age1 = -1; IF age = 1 then age2 = -2; IF age = 2 then age2 = 1; IF age = 3 then age2 = 1; age1ht = age1*height ; age2ht = age2*height ; RUN; PROC REG DATA=htwt3 ; MODEL weight = age1 age2 height age1ht age2ht ; RUN;

The results below correspond to the **proc reg**
results above except that the **proc glm** results are reported as F values
and the **proc reg** results are reported as t values. We can square the t
values to make them comparable to the F values. Indeed, for the comparison of Middle age
vs. Seniors, the t value of -1.784 when squared becomes 3.183, the same as the F value
from **proc glm**. Likewise, for the comparison of Young vs. middle &
Senior the t value from **proc reg** is 5.473 and when squared becomes
29.954, the same as the F value from **proc glm**.

Model: MODEL1 Dependent Variable: WEIGHT Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 5 69595.35464 13919.07093 220.261 0.0001 Error 24 1516.64536 63.19356 C Total 29 71112.00000 Root MSE 7.94944 R-square 0.9787 Dep Mean 171.00000 Adj R-sq 0.9742 C.V. 4.64879 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Parameter=0 Prob > |T| INTERCEP 1 57.790217 16.94450462 3.411 0.0023 AGE1 1 -3.999574 20.77745154 -0.192 0.8490 AGE2 1 -56.188114 11.96726393 -4.695 0.0001 HEIGHT 1 1.636256 0.25524084 6.411 0.0001 AGE1HT 1 -0.546928 0.30658044 -1.784 0.0871 AGE2HT 1 1.006544 0.18389498 5.473 0.0001