#### 1.0 SAS statements and procs in this unit

proc ttest |
t-tests, including one sample, two sample and paired |

proc freq |
Used here for chi-squared tests |

proc reg |
Simple and multiple regression |

proc glm |
Used here for ANOVA models |

proc logistic |
Logistic regression |

proc npar1way |
Non-parametric analyses |

proc univariate |
Used here for signrank tests |

#### 2.0 Demonstration and explanation

#### 2.1 Chi-squared test

Below we use **proc freq** to perform a chi-squared test (**chisq**) and to show the expected frequencies (**expected**) used to compute the test statistic.

proc freq data=in.hs1; table prgtype*ses / chisq expected; run;

#### 2.2 T-tests

This is the one-sample t-test, testing whether the sample of writing scores was drawn from a population with a mean of 50.

proc ttest data=in.hs1 H0=50; var write; run;

This is the paired t-test, testing whether or not the mean of **write** equals the mean of **read**.

proc ttest data=in.hs1; paired write*read; run;

This is the two-sample independent t-test. The output includes the t-test for both equal and unequal variances. The **class **statement is necessary in order to indicate which groups/categories are to be compared on the mean of **write**.

proc ttest data=in.hs1; class female; var write; run;

#### 2.3 ANOVA

SAS has a procedure called **proc anova**, but it is only used when there are an equal number of observations in each of the ANOVA cells (which is called a balanced design). ** Proc glm** is a much more general procedure that will work with any balanced or unbalanced design (unbalanced meaning an unequal number of observations in each cell).

In this example we are using **proc glm** to perform a one-way analysis of variance. As with **proc ttest, **the **class **statement is used to indicate that **prog** is a categorical variable. We use the **ss3** option to indicate that we are only interested in looking at the Type III sums of squares, which are the sums of squares that are appropriate for an unbalanced design.

proc glm data=in.hs1; class prog; model write=prog / ss3; run; quit;

Here **proc glm** performs an analysis of covariance (ANCOVA). In this example, **prog** is the categorical predictor and **read** is the continuous covariate.

proc glm data=in.hs1; class prog; model write = read prog / ss3; run; quit;

#### 2.4 Regression

In this example we will demonstrate how to set-up a Ordinary Least Squares (OLS) regression model. **Proc reg **is a very powerful and versatile procedure. In the following examples we will illustrate just a few of the many uses of**proc reg** . *Note* that this command does not support a **class **statement. If you need to specify a variable as categorical, use** Proc glm **instead.

proc reg data=in.hs1; model write = female read; run; quit;

If you are using SAS 9.3 or earlier, specifying **plots=diagnostics** on the **proc reg** statement produces a number of diagnostic graphs. However, version 9.4 provides these diagnostic plots by default. The **output** statement creates a new dataset, called **temp**, which includes the predicted values (by using the **p =** option) and the residuals (by using the **r =** option). The **proc print** displays the values of selected variables from the **temp** dataset.

proc reg data =in.hs1 plots=diagnostics; model math = write socst; output out=temp p=predict r=resid; run; quit;proc print data=temp (obs=20); var math predict resid; run;

#### 2.5 Logistic regression

In order to demonstrate logistic regression, we will create a dichotomous variable called **honcomp** (honors composition), which will be equal to 1 when the logical test of **write** >= 60 is true and equal to zero when it is not true. This variable is created purely for illustration purpose only.

data hs2; set in.hs1; honcomp = (write >= 60); run;

The **proc logistic** performs a logistic regression. It is necessary to include the **descending** option when a variable is coded 0/1 with 1 representing the event whose probability is being modeled. This is needed so that the odds ratios are calculated for the comparison of interest.

proc logistic data=hs2 descending; model honcomp = female read; run;

#### 2.6 Nonparametric tests

The **signtest** is the nonparametric analog of the one-sample t-test. The **sign **test is part of the output of the tests of location in **proc univariate**. The value that is being tested is specified by the **mu0** option on the **proc univariate** statement.

proc univariate data=in.hs1 mu0=50; var write; run;

The **signrank** test is the nonparametric analog of the paired t-test. To obtain this test, it is necessary to first compute the difference between the variables to be compared in a separate data step. Then the new difference variable is tested in **proc univariate**. The **signrank** test is found in the section of the output called “tests of location”.

data hs1c; set in.hs1; diff = read - write; run; proc univariate data=hs1c; var diff; run;

The **ranksum** test is the nonparametric analog of the independent two-sample t-test.

proc npar1way data=in.hs1; class female; var write; run;

The **kruskal wallis **test is the nonparametric analog of the one-way ANOVA.

proc npar1way data=in.hs1; class ses; var write; run;

## 3.0 For more information

- The
Little SAS Book, Fifth Edition
- Chapter 9

- SAS Statistics by Example
- Chapters 4-12

**Regression and ANOVA: An Integrated Approach Using SAS Software****SAS System for Linear Models, Fourth Edition****Logistic Regression Using the SAS System: Theory and Application****Logistic Regression Examples Using the SAS System****Choosing the Correct Statistical Test**Includes guidelines for choosing the correct non-parametric test**Data Analysis Examples**Gives examples of common analysis and interpretation of the output**Annotated Output**Fully annotates the output from common statistical procedures**SAS Frequently Asked Questions**Covers many different topic including among others: ANOVA, Generalized Linear Models (GLM), linear regression and logistic regression**SAS Regression Webbook**Includes such topics as diagnostics, categorical predictors, testing interactions and testing contrasts