Below we show the SAS code and the output for **proc freq**. We have
used the hsb2 data set. We have
made a two-way table with a three-level categorical variable (**ses**) and a
two-level categorical variable (**female**). Remember that you do not
want to use a continuous variable in a **proc freq**, because each value of
the variable will be used and the output can get to be very long.

proc freq data = "D:\hsb2"; tables ses*female / expected chisq; run;

The FREQ Procedure

Table of ses by female

ses female

Frequency| Expected | Percent | Row Pct | Col Pct | 0| 1| Total ---------+--------+--------+ 1 | 15 | 32 | 47 | 21.385 | 25.615 | | 7.50 | 16.00 | 23.50 | 31.91 | 68.09 | | 16.48 | 29.36 | ---------+--------+--------+ 2 | 47 | 48 | 95 | 43.225 | 51.775 | | 23.50 | 24.00 | 47.50 | 49.47 | 50.53 | | 51.65 | 44.04 | ---------+--------+--------+ 3 | 29 | 29 | 58 | 26.39 | 31.61 | | 14.50 | 14.50 | 29.00 | 50.00 | 50.00 | | 31.87 | 26.61 | ---------+--------+--------+ Total 91 109 200 45.50 54.50 100.00

Statistics for Table of ses by female

Statistic DF Value Prob ------------------------------------------------------ Chi-Square 2 4.5765 0.1014 Likelihood Ratio Chi-Square 2 4.6789 0.0964 Mantel-Haenszel Chi-Square 1 3.1098 0.0778 Phi Coefficient 0.1513 Contingency Coefficient 0.1496 Cramer's V 0.1513

Sample Size = 200

## Table of frequencies

The FREQ Procedure

Table of ses by female^{a}

ses female

Frequency| Expected^{b}| Percent^{c}| Row Pct^{d}| Col Pct^{e}| 0| 1| Total^{f}---------+--------+--------+ 1 | 15 | 32 | 47 | 21.385 | 25.615 | | 7.50 | 16.00 | 23.50 | 31.91 | 68.09 | | 16.48 | 29.36 | ---------+--------+--------+ 2 | 47 | 48 | 95 | 43.225 | 51.775 | | 23.50 | 24.00 | 47.50 | 49.47 | 50.53 | | 51.65 | 44.04 | ---------+--------+--------+ 3 | 29 | 29 | 58 | 26.39 | 31.61 | | 14.50 | 14.50 | 29.00 | 50.00 | 50.00 | | 31.87 | 26.61 | ---------+--------+--------+ Total^{g}^{g}91 109 200 45.50 54.50 100.00

a. **Table of** – This is the title of the table. The first
variable listed will be the row variable and the second variable will be the
column variable.

b. **Frequency** – This is the observed cell frequency. It is
also called count. For example, there are 15 males (female=0) in the low
socioeconomic status group. The observed cell frequencies and the expected
cell frequencies are used to test if the row and the column variables are
independent.

c. **Expected** – This is the cell frequency expected under the null
hypothesis that the row and column variables are independent. This number
is produced by using the option **expected** in the **tables** statement.
Comparing the expected cell frequency with the observed frequency we should have
some idea about whether the row variable is independent of the column variable.

d. **Percent** – This is the percent of the total observations
represented by the cell frequency. In the table above, we see that there
are 15 males (**female**=0) in the low socioeconomic status group (**ses**=1).
That represents 7.5% of the total number of observations. You can suppress
this output by using the **nopercent** option on the **tables** statement.

e. **Row Pct** – This gives the percent of observations in the row.
In the table above, we see that there are 15 males (**female**=0) and 32
females (**female**=1) in low socioeconomic status group. So the row
percent for the first cell is 15/47*100=31.91. You can suppress this
output by using the option **norow** in **tables** statement.

f. **Col Pct** – This gives the percent of observations in the
column. In the table above, we see that there are 91 males and there are
15 males in the low socioeconomic status group. So the column percent for the
first cell is 15/91*100=16.48. You can suppress this output by using the option
**nocolumn** in the **tables** statement.

g. **Total** – This is the number of valid observations for the
variable. The total number of observations is the sum of N and the number
of missing values. If the sample size is not large enough, the test of
independence of contingency tables such as Chi-square may not be accurate.

## Statistics

Statistics for Table of ses by female

StatisticDF Value Prob ------------------------------------------------------ Chi-Square^{h}2 4.5765 0.1014 Likelihood Ratio Chi-Square^{i}^{j }2 4.6789 0.0964 Mantel-Haenszel Chi-Square^{k }1 3.1098 0.0778 Phi Coefficient^{l }0.1513 Contingency Coefficient^{m }0.1496 Cramer's V^{n }0.1513

Sample Size = 200

h. **Statistic** – This part of the output is produced by SAS by
using the option **chisq** on the **tables** statement. It consists
of chi-square tests and statistics. They test the null hypothesis that
there is no association between the row variable and the column variable.
For measures of association, you can use **measures** option on the **tables**
statement.

i. **Chi-square** – It is also known as Pearson chi-square test.
It compares the observed frequencies with the expected frequencies collectively
(considering the degree of freedom for each of the variables). The degrees
of freedom for chi-square test is (R-1)*(C-1) where R is the number of rows and
C the number of columns of the table. (In other words, the number of
levels of each of the variables.) A large chi-square statistic will
correspond to small p-value. If the p-value is small enough (say < 0.05),
then we will reject the null hypothesis that the two variables are independent
and conclude that there is an association between the row and the column
variables.

j. **Likelihood Ratio Chi-Square** – This involves the ratio
between the observed and the expected frequencies, whereas the ordinary
chi-square test involves the difference between the two. This method was
developed more recently than the chi-square test and is the second most widely
used after the chi-square test. It is directly related to log-linear
analysis and logistic regression. When the row and column variables are
independent, the likelihood-ratio chi-square has an approximate chi-square
distribution with (R-1)*(C-1) degrees of freedom where R is the number of rows
and C the number of column of the table.

k. **Mantel-Haenszel Chi-Square** – It is also called the Mantel-Haenszel
test for linear association. Unlike ordinary and likelihood ratio
chi-square, it is an ordinal measure of significance. It is defined as **
(N-1)r ^{2}** where

**r**is the Pearson correlation between the row variable and the column variable. It is preferred when testing the significance of a linear relationship between two ordinal variables. If the test is significant, we say that increases in one variable are associated with increases (or decreases for negative relationships) in the other variable greater than would be expected by chance. Like other chi-square statistics, Mantel-Haeszel chi-square should not be used with tables with small cell counts.

l. **Phi Coefficient** – This is a measure of association based on
adjusting chi-square significance to factor out sample size. The range of
it is between -1 and 1 for 2-by-2 tables, and is between 0 and min(sqrt(R-1),
sqrt(C-1)). Computationally, phi is the square root of chi-square divided
by n, the sample size. The phi coefficient is often used as a measure of
association in 2-by-2 tables formed by true dichotomies.

m. **Contingency Coefficient** – The contingency coefficient is an
adjustment to phi coefficient, intended to adapt it to tables larger than
2-by-2. The contingency coefficient is computed as the square root of
chi-square divided by chi-square plus n, the sample size. The contingency
coefficient will be always less than 1 and will be approaching 1.0 only for
large tables. The larger the contingency coefficient the stronger the
association. Some researchers recommend it only for 5-by-5 tables or
larger. For smaller tables it will underestimated the level of
association.

n. **Cramer’s V** – Cramer’s V is the most popular of the
chi-square-based measures of nominal association because it is designed so that
the attainable upper limit is always 1. Cramer’s V equals the square root
of chi-square divided by sample size, n, times m, which is the smaller of (rows
– 1) or (columns – 1).