THE MANTEL-HAENSZEL STATISTIC FOR 2x2xK TABLES David P. Nichols Senior Support Statistician SPSS, Inc. From SPSS Keywords, Volume 54, 1994 One of the more common applications in statistical analysis is to assess the degree of relationship of two variables while controlling for one or more nuisance or control variables. A particular situation that has received a good deal of attention, particularly in the medical research community, but lately also in numerous other areas such as psychometrics, is that of a relationship between two dichotomous variables controlling for one or more categorical factors. When there is only one control variable or when the 2x2 relationship is examined within each combination of the levels of the control variables, the result is a 2x2xK cross classification, where the K levels of the control variable or variable combinations are often referred to as strata. A case of interest to researchers in many areas occurs when there is no three way interaction present in the 2x2xK layout. In this case a question of common interest is whether there is any relationship between the two main variables of interest after controlling for the stratification variable. An example might be the relationship between administration of a drug and remediation of disease effects controlling for gender of patient, mode of administration, and/or other factors. A common way to assess relationships in 2x2 way tables is through the odds ratio. In our drug example if one group is given a drug and another group a placebo, then all patients are assessed for recovery, the odds ratio measures the increase (or decrease) in odds of recovery for patients given the active drug relative to those given the placebo. An odds ratio of 1 represents no effect, while a ratio greater than 1 indicates that the drug increases the odds of recovery and a ratio less than 1 indicates that it dimishes the odds of recovery. In the SPSS CROSSTABS procedure, this odds ratio can be obtained for a 2x2 table as the Case Control Relative Risk estimate. However, this measure is given only separately for each 2x2 table. The most popular estimator of the common odds ratio across the K strata was suggested by Mantel and Haenszel, who also provided a chi-squared test of the null hypothesis that this common odds ratio is 1. The common odds ratio estimate is given at the top of page 236 of Alan Agresti's _Categorical Data Analysis_, while the test statistic commonly referred to as the Mantel-Haenszel chi-squared test for 2x2xK tables is given by equation 7.8 on page 231 (Agresti refers to it as the Cochran- Mantel-Haenszel statistic). The SPSS macro introduced here provides an easy way to produce both of these quantities, along with the significance level for the test statistic (the statistic is approximately distributed as a single degree of freedom chi-squared random variable under the null hypothesis). The source code for the macro in SPSS syntax is given below, and is also available from SPSS via anonymous ftp to spss.com or through the SPSS forum on Compuserve. It is named mh.sps. Figure 1 The MH.SPS Macro Code ----------------------------------------------------------------------------- preserve set printback=off mprint=off define mh (!positional !tokens(1) /!positional !tokens(1) /!positional !tokens(1)) preserve set printback=off mprint=off save outfile='mh__tmp1.sav' autorecode !1 !2 /into row__var col__var aggregate outfile=* /break=row__var col__var !3 /n__=n numeric a__ b__ c__ d__ vector cell=a__ to d__ compute index=(row__var-1)*2+col__var compute cell(index)=n__ aggregate outfile=* /break !3 /a__ b__ c__ d__=max(a__ b__ c__ d__) recode all (sysmis=0) compute a=a__ compute b=b__ compute c=c__ compute d=d__ compute m=((a+b)*(a+c))/sum(a to d) compute v=((a+b)*(c+d)*(a+c)*(b+d))/ ((sum(a to d)**2)*(sum(a to d)-1)) compute o1=(a*d)/sum(a to d) compute o2=(b*c)/sum(a to d) compute constant=1 aggregate outfile=* /break=constant /sumn=sum(a) /summ=sum(m) /sumv=sum(v) /sumo1=sum(o1) /sumo2=sum(o2) compute odds=sumo1/sumo2 compute mhs=((abs(sumn-summ)-.5)**2)/sumv compute sig=2*(1-cdfnorm(sqrt(mhs))) formats odds(f10.5) mhs(f8.4) sig(f6.5) variable labels odds 'Odds Ratio' mhs 'Mantel-Haenszel Statistic' sig 'Significance' report format=list automatic align(center) /variables=odds mhs sig /title "Estimate of Common Odds Ratio and " + "Mantel-Haenszel Statistic" get file='mh__tmp1.sav' restore !enddefine restore ----------------------------------------------------------------------------- End Figure 1 The mh.sps macro is most easily used by simply having it resident as a text file in your working directory and executing the following SPSS syntax: INCLUDE MH.SPS. MH rowvar colvar stratvar. where rowvar is the name of one of the two primary variables, colvar is the other primary variable and stratvar is the stratification variable. As usual, a working data file must be defined when invoking the macro. The data may be either individual cases or weighted aggregated data, just as with most SPSS procedures. The macro first saves your working data file to a file named mh__tmp1.sav. The double underscore in the file name is an attempt to render unlikely the overwriting of an existing file. The same convention has been used later in the macro when creating new variables in an attempt to avoid duplicating existing variable names. Such duplication would cause the macro to fail. The SET commands are used to minimize output; they may be changed or removed if you have problems running the macro in order to aid in identification of problem sources. The macro should function on any SPSS release offering macro support. It uses commands and procedures from the Base System exclusively, so it does not require the Advanced Statistics module, as do the official SPSS macros released with Windows versions of SPSS. It consists essentially of some data manipulation, some basic calculations and a reporting of results. The row and column variables can be either numeric or string variables (as they are handled using AUTORECODE). Each variable should assume only two unique values in the data (since they define a 2x2 cross classification). More than two distinct values on either or both of these variables will cause the macro to fail. The stratification variable can also be either string or numeric; each unique value defines a stratum. Let's look at an example. The following data were compiled from the _Amnesty International 1990 Report_. The variables forming the 2x2 cross classification are 0-1 variables indicating whether or not a government has ratified an international human rights agreement (1=Yes, 0=No). ICCPR is the International Covenant on Civil and Political Rights. CAT is the Convention against Torture and Other Cruel, Inhuman or Degrading Treatment or Punishment. AREA is the geographic location of the state (1=Africa and the Middle East, 2=The Americas, 3=Australasia and the Pacific, 4=Europe). The COUNT variable gives the number of observations for each combination of the AREA, ICCPR and CAT variables, which allows us to represent 168 observations with only 16 lines of data. Ratification status is as of December 31, 1989. Figure 2 Human Rights Data ----------------------------------------------------------------------------- AREA ICCPR CAT COUNT 1 1 1 8 1 1 0 21 1 0 1 2 1 0 0 35 2 1 1 11 2 1 0 10 2 0 1 2 2 0 0 13 3 1 1 4 3 1 0 7 3 0 1 1 3 0 0 22 4 1 1 19 4 1 0 7 4 0 1 2 4 0 0 4 ----------------------------------------------------------------------------- End Figure 2 Note that strictly speaking we have no need for inferential statistics with these data, as they represent the entire population of governments. However, for some purposes it may be of interest to treat this group of governments as a sample from a population of potential governments. The substantive questions of interest for our purposes are whether the odds of ratifying one agreement given ratification of the other vs. no ratification of the other are constant across geographical areas, and assuming this to be true, what is the common odds ratio? The first step in this analysis is to test the null hypothesis of no three way interaction. This was done using the LOGLINEAR procedure, fitting a model excluding only the three way term. The likelihood ratio chi-squared value was .32 on 3 degrees of freedom, with a significance of .96, indicating that the three way interaction term is not needed. Removal of any of the two way interaction terms would produce an unacceptably large increase in the likehihood ratio statistic. Thus our data appear to exhibit exactly the type of structure for which use of the Mantel-Haenszel common odds ratio estimate and chi-squared test are useful. Figure 3 contains the relevant output from the mh.sps macro. The odds ratio of 7.18 bears out the commonsense assumption that governments having ratified the ICCPR agreement are more likely to have ratified the CAT agreement than are governments not having ratified the ICCPR agreement. The large value of the Mantel-Haenszel statistic is very unlikely to occur in random samples from populations with odds ratios of 1. Figure 3 Statistical Output from MH.SPS Macro ----------------------------------------------------------------------------- Estimate of Common Odds Ratio and Mantel-Haenszel Statistic Mantel-Haenszel Odds Ratio Statistic Significance __________ _______________ ____________ 7.18023 18.6741 .00002 ----------------------------------------------------------------------------- End Figure 3 Note that the macro uses AUTORECODE to process the variable values for the row and column variables, and effectively constructs the 2x2 tables so that the low/low and high/high combinations are on the main (top left to lower right) diagonal. If the variable codings are constructed such that the lower value means something different on one variable than it does on the other (such as 0=No, 1=Yes for the row variable and 1=Yes and 2=No on the column variable), then the odds ratio will be the reciprocal of what consistent coding produces. The Mantel-Haenszel statistic and significance will not be affected.