How can I use multiply imputed data sets in SUDAAN?

One of the new features of SUDAAN 9 is its ability to use multiply imputed data sets. According the SUDAAN 9 Language Manual (page 91), SUDAAN does not accept the data sets stacked into a single data set as the SAS proc mi creates. Rather, SUDAAN accepts multiply imputed data sets in two forms: either the individually imputed data sets or a single data set with variables in the same file with different numeric suffixes. For our examples below, we will use the NHANES III data. The NHANES III multiply imputed data sets can be found at http://www.cdc.gov/nchs/nhanes.htm about half way down the page. You need to have each of the imputed data sets sorted by strata and PSU just as you have to have a non-imputed data set sorted. You can use the macro shown below (with minor changes for the names of the data sets) to sort the data sets, or you can have multiple calls to proc sort. Once the data sets are sorted, you need to either add one more option, mi_count, to the proc statement, or a mi_file statement. If the data sets are sequentially numbers, such as nh3mi1, nh3mi2, etc., you can use the mi_count option and indicate the number of imputed data sets. On the data option, specify the first of the imputed data sets. Remember that the variables in each of the imputed data sets need to be in the same order, of the same type, etc. SUDAAN will issue a warning if the number of cases differs between the imputed data sets.

NOTE: The examples of the use of these options and statements in the SUDAAN 9 Manual (page 91) show the use of quotes around a file path specification. This will work only for stand-alone SUDAAN, not the SAS-callable version.

%MACRO srt(NUMBER);
PROC SORT DATA=nh3mi&NUMBER;
by sdpstra6 sdppsu6;
run;
%mend srt;

%srt(1);
%srt(2);
%srt(3);
%srt(4);
%srt(5);

proc descript data = NH3MI1 filetype = sas mi_count = 5 design = wr;
nest sdpstra6 sdppsu6 / missunit;
weight WTPFQX6 ;
var TCPMI;
setenv colwidth = 19;
setenv decwidth = 3;
print nsum wsum mean semean / nohead;
run;

Variance Estimation Method: Taylor Series (WR) Using Multiply Imputed Data
Results for Summary Over All Imputations
by: Variable, One.

------------------------------------------------------------
|                 |                  |
| Variable        |                  | One
|                 |                  | 1                   |
------------------------------------------------------------
|                 |                  |                     |
| Serum           | Sample Size      |           28012.000 |
| cholesterol     | Weighted Size    |       235771269.750 |
| (mg/dL)         | Mean             |             194.357 |
|                 | SE Mean          |               0.577 |
------------------------------------------------------------

In the example below, the mi_files statement is used instead of the mi_count option. As before, the first of the imputed data sets is listed on the data option on the proc statement. The rest of the files are listed on the mi_files statement.

proc regress data = nh3mi1 filetype = sas design = wr;
nest sdpstra6 sdppsu6 / missunit;
weight WTPFQX6 ;
class HAN6SRMI ;
model BMPWSTMI =  HAM5MI HAN6SRMI HSSEX;
mi_files  nh3mi2 nh3mi3 nh3mi4 nh3mi5 ;
run;

Frequencies and Values for CLASS Variables
Results for Summary Over All Imputations
by: Beer/wine/liquor (recode).

--------------------------------------
Beer/wine/l-
  iquor
  (recode)          Frequency    Value
--------------------------------------
Ordered
  Position:
  1                 11230.000        1
Ordered
  Position:
  2                  5546.600        2
Ordered
  Position:
  3                  3273.400        3
--------------------------------------

Variance Estimation Method: Taylor Series (WR) Using Multiply Imputed Data
SE Method: Robust (Binder, 1983)
Working Correlations: Independent
Link Function: Identity
Response variable BMPWSTMI: Waist circumference (cm)
Results for Summary Over All Imputations
by: Independent Variables and Effects.

-------------------------------------------------------------------------------------
Independent
  Variables and        Beta                      Lower 95%    Upper 95%
  Effects              Coeff.          SE Beta   Limit Beta   Limit Beta   T-Test B=0
-------------------------------------------------------------------------------------
Intercept                   44.05         4.54        34.90        53.20         9.71
How tall are you
  without shoes-
  inchs                      0.74         0.06         0.62         0.86        12.41
Beer/wine/liquor
  (recode)
  1                          4.23         0.38         3.46         4.99        11.13
  2                          0.91         0.41         0.07         1.75         2.20
  3                          0.00         0.00         0.00         0.00          .
Sex                         -2.84         0.49        -3.83        -1.84        -5.75
-------------------------------------------------------------------------------------

----------------------------------------
Independent            P-value
  Variables and        T-Test     DDF
  Effects              B=0        Beta
----------------------------------------
Intercept                0.0000   43.516
How tall are you
  without shoes-
  inchs                  0.0000   43.551
Beer/wine/liquor
  (recode)
  1                      0.0000   45.861
  2                      0.0342   37.224
  3                       .       49.000
Sex                      0.0000   43.755
----------------------------------------

Variance Estimation Method: Taylor Series (WR) Using Multiply Imputed Data
SE Method: Robust (Binder, 1983)
Working Correlations: Independent
Link Function: Identity
Response variable BMPWSTMI: Waist circumference (cm)
Results for Summary Over All Imputations
by: Contrast.

-------------------------------------------------------

Contrast               Degrees
                       of                      P-value
                       Freedom        Wald F   Wald F
-------------------------------------------------------
OVERALL MODEL                 5     40326.90     0.0000
MODEL MINUS
  INTERCEPT                   4       182.83     0.0000
INTERCEPT                     .          .        .
HAM5MI                        1       153.97     0.0000
HAN6SRMI                      2        62.84     0.0000
HSSEX                         1        33.04     0.0000
-------------------------------------------------------

In the two examples below, we show that you can use either method of correcting the standard errors, strata/PSUs or replicate weights.

proc crosstabs data = NH3MI1 filetype = sas mi_count = 5 design = wr;
nest sdpstra6 sdppsu6 / missunit;
weight WTPFQX6 ;
subgroups  DMARETHN HAE7;
levels 2 2;
tables  DMARETHN*HAE7;
setenv colwidth = 12;
run;

Variance Estimation Method: Taylor Series (WR) Using Multiply Imputed Data
Results for Summary Over All Imputations
by: Race-ethnicity, Ever told had high cholesterol.

-----------------------------------------------------------------------------------
|                 |                  |
| Race-ethnicity  |                  | Ever told had high cholesterol
|                 |                  | Total        | 1            | 2            |
-----------------------------------------------------------------------------------
|                 |                  |              |              |              |
| Total           | Sample Size      |         7830 |         2548 |         5282 |
|                 | Weighted Size    |  94129643.85 |  31162209.08 |  62967434.77 |
|                 | Tot Percent      |       100.00 |        33.11 |        66.89 |
|                 | Col Percent      |       100.00 |       100.00 |       100.00 |
|                 | SE Col Percent   |         0.00 |         0.00 |         0.00 |
|                 | Row Percent      |       100.00 |        33.11 |        66.89 |
|                 | SE Row Percent   |         0.00 |         0.82 |         0.82 |
-----------------------------------------------------------------------------------
|                 |                  |              |              |              |
| 1               | Sample Size      |         5378 |         1856 |         3522 |
|                 | Weighted Size    |  84795202.29 |  28668926.11 |  56126276.18 |
|                 | Tot Percent      |        90.08 |        30.46 |        59.63 |
|                 | Col Percent      |        90.08 |        92.00 |        89.14 |
|                 | SE Col Percent   |         0.69 |         0.62 |         0.79 |
|                 | Row Percent      |       100.00 |        33.81 |        66.19 |
|                 | SE Row Percent   |         0.00 |         0.90 |         0.90 |
-----------------------------------------------------------------------------------
|                 |                  |              |              |              |
| 2               | Sample Size      |         2452 |          692 |         1760 |
|                 | Weighted Size    |   9334441.56 |   2493282.97 |   6841158.59 |
|                 | Tot Percent      |         9.92 |         2.65 |         7.27 |
|                 | Col Percent      |         9.92 |         8.00 |        10.86 |
|                 | SE Col Percent   |         0.69 |         0.62 |         0.79 |
|                 | Row Percent      |       100.00 |        26.71 |        73.29 |
|                 | SE Row Percent   |         0.00 |         1.00 |         1.00 |
-----------------------------------------------------------------------------------

proc crosstabs data = NH3MI1 filetype = sas mi_count = 5 design = brr;
repwgt WTPQRP1 - WTPQRP52 / adjfay = 1.7;
weight WTPFQX6 ;
subgroups  DMARETHN HAE7;
levels 2 2;
tables  DMARETHN*HAE7;
setenv colwidth = 12;
print nsum wsum totper colper secol rowper serow;
run;

Variance Estimation Method: BRR Using Multiply Imputed Data
Results for Summary Over All Imputations
by: Race-ethnicity, Ever told had high cholesterol.

-----------------------------------------------------------------------------------
|                 |                  |
| Race-ethnicity  |                  | Ever told had high cholesterol
|                 |                  | Total        | 1            | 2            |
-----------------------------------------------------------------------------------
|                 |                  |              |              |              |
| Total           | Sample Size      |         7830 |         2548 |         5282 |
|                 | Weighted Size    |  94129643.85 |  31162209.08 |  62967434.77 |
|                 | Tot Percent      |       100.00 |        33.11 |        66.89 |
|                 | Col Percent      |       100.00 |       100.00 |       100.00 |
|                 | SE Col Percent   |         0.00 |         0.00 |         0.00 |
|                 | Row Percent      |       100.00 |        33.11 |        66.89 |
|                 | SE Row Percent   |         0.00 |         0.64 |         0.64 |
-----------------------------------------------------------------------------------
|                 |                  |              |              |              |
| 1               | Sample Size      |         5378 |         1856 |         3522 |
|                 | Weighted Size    |  84795202.29 |  28668926.11 |  56126276.18 |
|                 | Tot Percent      |        90.08 |        30.46 |        59.63 |
|                 | Col Percent      |        90.08 |        92.00 |        89.14 |
|                 | SE Col Percent   |         0.23 |         0.35 |         0.29 |
|                 | Row Percent      |       100.00 |        33.81 |        66.19 |
|                 | SE Row Percent   |         0.00 |         0.70 |         0.70 |
-----------------------------------------------------------------------------------
|                 |                  |              |              |              |
| 2               | Sample Size      |         2452 |          692 |         1760 |
|                 | Weighted Size    |   9334441.56 |   2493282.97 |   6841158.59 |
|                 | Tot Percent      |         9.92 |         2.65 |         7.27 |
|                 | Col Percent      |         9.92 |         8.00 |        10.86 |
|                 | SE Col Percent   |         0.23 |         0.35 |         0.29 |
|                 | Row Percent      |       100.00 |        26.71 |        73.29 |
|                 | SE Row Percent   |         0.00 |         0.92 |         0.92 |
-----------------------------------------------------------------------------------

To illustrate the use of the multiple imputed variables in a single data file, we will create a small example data set and then use the mi_vars statement.

data temp;
input x x1 x2 x3 y;
cards;
1 1 1 1 7
3 3 3 3 8
. 2 1 3 5
. 1 5 4 8
4 4 4 4 9
6 6 6 6 7
. 7 5 4 9
;
run;

proc regress data = temp filetype = sas design = wr;
weight _one_;
nest _one_;
model y = x;
mi_vars x1 x2 x3;
run;