## 1.0 Stata commands in this unit

ttest |
t-test |

anova |
Analysis of variance |

xi |
Creates dummy variables during model estimation |

regress |
Regression |

predict |
Predicts after model estimation |

kdensity |
Kernel density estimates and graphs |

pnorm |
Graphs a standardized normal plot |

qnorm |
Graphs a quantile plot |

rvfplot |
Graphs a residual versus fitted plot |

test |
Test linear hypotheses after model estimation |

logit |
Logistic regression |

tabulate |
Crosstabs with chi-square test |

signtest |
Tests the equality of matched pairs of data |

signrank |
Wilcoxon matched-pairs signed rank test |

ranksum |
Mann-Whitney two-sample test |

kwallis |
Nonparametric analog to the one-way anova |

## 2.0 Demonstration and explanation

use hs1, clear

## 2.1 chi-square test of frequencies

Here is the

tabulatecommand for a crosstabulation with an option to compute chi-square test of independence and measures of association.

tabulate prgtype ses, all

Here is the command with an option to display expected frequencies so that one can check for cells with very small expected values.

tabulate prgtype ses, all expected

## 2.2 t-tests

This is the one-sample t-test, testing whether the sample of writing scores was drawn from a population with a mean of 50.

ttest write = 50

This is the paired t-test, testing whether or not the mean of

writeequals the mean ofread.

ttest write = read

This is the two-sample independent t-test with pooled (equal) variances.

ttest write, by(female)

This is the two-sample independent t-test with separate (unequal) variances.

ttest write, by(female) unequal

## 2.3 Analysis of Variance

The

anovacommand, unsurprisingly, performs analysis of variance (ANOVA). Here is an examplr of a one-way analysis of variance.

anova write prog

In this example the

anovacommand is used to perform a two-way factorial analysis of variance (ANOVA).

anova write prog female prog*female

Here is an example of an analysis of covariance (ANCOVA) using the

anovacommand.

anova write prog female prog*female read, continuous(read)

## 2.4 Regression

Plain vanilla OLS linear regression.

regress write read female

In the example below, we run the regression with robust standard errors. This is very useful when there is heterogeneity of variance. This option does not affect the estimates of the regression coefficients.

regress write read female, robust

The

predictcommand calculates predictions, residuals, influence statistics, and the like after an estimation command. The default shown here is to calculate the predicted scores.

predict p

When using the

residoption thepredictcommand calculates the residual.

predict r, resid

The

listcommand displays the values of the variables that we have generated. Thein 1/20option stipulates that only the first 20 observations be displayed.

list math p r in 1/20

The

kdensitycommand with thenormaloption displays a density graph of the residuals with an normal distribution superimposed on the graph. This is particularly useful in verifying that the residuals are normally distributed, which is a very important assumption for regression.

kdensity r, normal

The

pnormcommand produces a normal probability plot and it is another method of testing wether the residuals from the regression are normally distributed.

pnorm r

The

qnormcommand produces a normal quantile plot. It is yet another method for testing if the residuals are normally distributed. Theqnormplot is more sensitive to deviances from normality in the tails of the distribution, whereas thepnormplot is more sensitive to deviances near the mean of the distribution.

qnorm r

rvfplotis a convenience command that generates a plot of the residual versus the fitted values; it is used afterregressoranova.

rvfplot

Creating dummy variables by using the xi commandThe

xiprefix is use to dummy code categorical variables such asprog. The predictorproghas three levels and requires two dummy-coded variables. Thetestcommand is used to test the collective effect of the two dummy-coded variables; in other words, it tests the main effect ofprog.

xi: regress write read i.prog describe _I* test _Iprog_2 _Iprog_3

The

xiprefix can also be used to create dummy variables forprogand for the interaction ofprogandread. The firsttestcommand tests the overall interaction and the secondtestcommand tests the main effect ofprog.

xi: regress write i.prog*read describe _I* test _IproXread_2 _IproXread_3 test _Iprog_2 _Iprog_3

## 2.5 Logistic regression

In order to demonstrate the logistic regression commands, we will create a dichotomous variable called

honcomp(honors composition) to use as our dependent variable. This is purely for illustrative purposes only!

gen honcomp = write >= 60 tab honcomp

The

logisticcommand defaults to producing the output in odds ratios but can display the coefficients if thecoefoption is used. The exact same results can be obtained by using thelogitcommand, which produces coefficients as the default but will display the odds ratio if theoroption is used.

logit honcomp read female logit, or

## 2.6 Non-Parametric Tests

The

signtestis the nonparametric analog of the single-sample t-test.

signtest write = 50

The

signrankcommand computes a Wilcoxon sign-ranked test, the nonparametric analog of the paired t-test.

signrank write = read

The

ranksumtest is the nonparametric analog of the independent two-sample t-test and is know as the Mann-Whitney or Wilcoxon test.

ranksum write, by(female)

The

kwalliscommand computes a Kruskal-Wallis test, the non-parametric analog of the one-way ANOVA.

kwallis write, by(prog)

## 3.0 For more information

**Statistics with Stata****10**- Chapters 5, 6, 7, 9, 10

**Stata Web Books****Regression with Stata Webbook**Includes such topics as diagnostics, categorical predictors, testing interactions and testing contrasts**Regression Models For Categorical Dependent Variables, Second Edition**by Long and Freese Shows how to optimize Stata’s capabilities for analyzing logistic regression

**Frequently Asked Questions**Covers many topics, including ANOVA and linear regression