This page illustrates some of the "peculiarities" often encountered when using SUDAAN. For more information about a specific procedure or option, please see the SUDAAN help available through SAS or consult the SUDAAN manuals. All examples use the CHIS adult data set. (include link)
Missing data/dummy variables How do I use categorical variables in SUDAAN?
Dummy variables
Suppose that you are running a regression or a logistic regression, and instead of getting the results that you expect, you find that SUDAAN considers a large portion of your data to be missing. You use proc print or something to see your data set, and it looks fine to you. What has happened? You may have included in your model and on the subgroup statement dummy variables that are coded 0/1. W
While this is the standard way of coding dummy variables, SUDAAN considers the cases coded as 0 to be missing for variables that are listed on the subgroup statement. In other words, to SUDAAN, non-positive values in variables that are used as categorical independent variables are considered to be missing. Hence, when SUDAAN does a listwise deletion of missing data, a large portion of your cases may be deleted, possibly to the point of making the model unestimatible. (Please see pages 165-166 of the SUDAAN manual for a complete description regarding the use of the subgroup statement, including valid values for subgroups, and below for the example using the subgroup statement.) Consider the example below in which srsex is coded 1/2 and newvar1 is coded 0/1. As you can see, an error is printed in the log and the number of cases used in the analysis is about 4050 fewer than there should be (the 4050 cases that are coded 0 in the data step). You have several ways of dealing with this problem. Perhaps the easiest is to not list the 0/1 variable on the subgroup statement. In many ways the subgroup statement in SUDAAN is like the class statement in SAS. In the same way that you would not list a 0/1 variable on the class statement in SAS, you do not list a 0/1 variable on the subgroup statement in SUDAAN. Another solution is the recode the 0/1 variable to be a 1/2 variable. If you have a variable that is 0/1/2, then you need to recode it. You can do this in a data step before running the procedure.
data temp01; set temp1; newvar1 = 0; if _n_ ge 4050 then newvar1 = 1; run;
proc regress data=temp01 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; model ab1 = srsex newvar1 ; subgroup srsex newvar1; levels 2 2; run;
The following message is displayed in the log.
Opened SAS data file TEMP01 for reading. DATA WARNING: The matrix for estimable parameters is singular. The model may be overspecified. You should reduce the number of variables on the right-hand side and refit the model before attempting to draw any conclusions. DATA WARNING : Degrees of freedom for OVERALL contrast are less than maximum number of estimable parameters You may wish to rerun this job with a tolerance (TOL) of 1.000000e-007 and 1.000000e-005
The erroneous output is shown below.
Number of observations read : 55428 Weighted count: 23847415 Observations used in the analysis : 51339 Weighted count: 22067131 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 3 Weighted mean response is 2.504337 Multiple R-Square for the dependent variable AB1: 0.001315
Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Identity Response variable AB1: AB1 ---------------------------------------------------------------------- Independent P-value Variables and Beta T-Test Effects Coeff. SE Beta T-Test B=0 B=0 ---------------------------------------------------------------------- Intercept 2.54 0.01 323.02 0.0000 SRSEX MALE -0.08 0.01 -6.56 0.0000 FEMALE 0.00 0.00 . . NEWVAR1 1 0.00 0.00 . . 2 0.00 0.00 . . ----------------------------------------------------------------------
------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 2 75483.96 0.0000 MODEL MINUS INTERCEPT 1 43.01 0.0000 INTERCEPT . . . SRSEX 1 43.01 0.0000 NEWVAR1 . . . -------------------------------------------------------
This problem is caused by the dummy variable newvar1. If you compare the number of cases used by SUDAAN for the analysis above, 51339, you will see that the 4050 cases coded as 0 in the data step above are missing. Although in the example below we have recoded the problem variable in a data step, you could also use the recode statement in SUDAAN to temporarily recode the variable. If you have many variables that need to be recoded, you may want to use an array in a data step. These options are perhaps most useful when you really want to have the dummy variable listed on the subgroup statement, such as when you are using proc crosstabs. As mentioned above, you could also list only the categorical variables coded with non-zero values on the subgroup statement.
data temp01a; set temp01; if newvar1 = 0 then newvar2 = 1; if newvar1 = 1 then newvar2 = 2; run; proc regress data=temp01a filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; model ab1 = srsex newvar2 ; subgroup srsex newvar2; levels 2 2; run;
Number of observations read : 55428 Weighted count: 23847415 Observations used in the analysis : 55383 Weighted count: 23829382 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 3 Weighted mean response is 2.502603 Multiple R-Square for the dependent variable AB1: 0.001221
Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Identity Response variable AB1: AB1 ---------------------------------------------------------------------- Independent P-value Variables and Beta T-Test Effects Coeff. SE Beta T-Test B=0 B=0 ---------------------------------------------------------------------- Intercept 2.54 0.01 326.26 0.0000 SRSEX MALE -0.07 0.01 -6.39 0.0000 FEMALE 0.00 0.00 . . NEWVAR2 1 -0.02 0.02 -0.97 0.3343 2 0.00 0.00 . . ----------------------------------------------------------------------
------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 3 60042.62 0.0000 MODEL MINUS INTERCEPT 2 20.71 0.0000 INTERCEPT . . . SRSEX 1 40.85 0.0000 NEWVAR2 1 0.94 0.3343 -------------------------------------------------------
The subgroup statement
In this example we have a 0/1 variable (newvar1) and we are not using it on the subgroup statement. If you want to have the table broken out by the values of newvar1, then you need to recode it to be a 1/2 variable and include it on the subgroup statement and include the number of levels on the levels statement.
proc descript data=temp01 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; var srsex racehpra newvar1; subgroup srsex racehpra; levels 2 2; run;
S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute January 2003 Release 8.0.2 Number of observations read : 55428 Weighted count : 23847415 Denominator degrees of freedom : 80
Variance Estimation Method: Replicate Weight Jackknife by: Variable, Self-reported gender. ----------------------------------------------------------------------------------- | | | | Variable | | Self-reported gender | | | Total | MALE | FEMALE | ----------------------------------------------------------------------------------- | | | | | | | Self-reported | Sample Size | 55428 | 23002 | 32426 | | gender | Weighted Size | 23847415.32 | 11631728.37 | 12215686.95 | | | Total | 36063102.27 | 11631728.37 | 24431373.90 | | | Mean | 1.51 | 1.00 | 2.00 | | | SE Mean | 0.00 | 0.00 | 0.00 | ----------------------------------------------------------------------------------- | | | | | | | Race - UCLA | Sample Size | 9677 | 4084 | 5593 | | CHPR Definition | Weighted Size | 5705917.88 | 2866894.01 | 2839023.87 | | | Total | 5767889.98 | 2897175.85 | 2870714.13 | | | Mean | 1.01 | 1.01 | 1.01 | | | SE Mean | 0.00 | 0.00 | 0.00 | ----------------------------------------------------------------------------------- | | | | | | | NEWVAR1 | Sample Size | 55428 | 23002 | 32426 | | | Weighted Size | 23847415.32 | 11631728.37 | 12215686.95 | | | Total | 22084052.10 | 10772176.06 | 11311876.04 | | | Mean | 0.93 | 0.93 | 0.93 | | | SE Mean | 0.00 | 0.00 | 0.00 | -----------------------------------------------------------------------------------
----------------------------------------------------------------------------------- | | | | Variable | | Race - UCLA CHPR Definition | | | Total | LATINO | PACIFIC | | | | | | ISLANDER | ----------------------------------------------------------------------------------- | | | | | | | Self-reported | Sample Size | 9677 | 9458 | 219 | | gender | Weighted Size | 5705917.88 | 5643945.79 | 61972.10 | | | Total | 8544941.75 | 8451279.40 | 93662.35 | | | Mean | 1.50 | 1.50 | 1.51 | | | SE Mean | 0.01 | 0.01 | 0.04 | ----------------------------------------------------------------------------------- | | | | | | | Race - UCLA | Sample Size | 9677 | 9458 | 219 | | CHPR Definition | Weighted Size | 5705917.88 | 5643945.79 | 61972.10 | | | Total | 5767889.98 | 5643945.79 | 123944.19 | | | Mean | 1.01 | 1.00 | 2.00 | | | SE Mean | 0.00 | 0.00 | 0.00 | ----------------------------------------------------------------------------------- | | | | | | | NEWVAR1 | Sample Size | 9677 | 9458 | 219 | | | Weighted Size | 5705917.88 | 5643945.79 | 61972.10 | | | Total | 5275702.50 | 5218781.91 | 56920.60 | | | Mean | 0.92 | 0.92 | 0.92 | | | SE Mean | 0.00 | 0.00 | 0.03 | -----------------------------------------------------------------------------------
Creating interaction terms on the model statement
In proc regress, proc rlogitst and proc survival, you can use a * between two variables (such as two categorical variables or one categorical and one continuous variable) to create an interaction term on the model statement. However, you cannot do this with two continuous variables; you need to create the interaction term in a data step before running the model.
Date/time
To suppress the printing of the time and/or date at the top of your results, you can use the notime and/or nodate option on the print statement in all of the analysis procedures.
proc descript data=temp1 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; var ab1; catlevel 1; setenv colwidth=12; print / notime nodate; run;
Research Triangle Institute Page : 1 The DESCRIPT Procedure Table : 1 Variance Estimation Method: Replicate Weight Jackknife by: Variable, One. ----------------------------------------------------- | | | | Variable | | One | | | 1 | ----------------------------------------------------- | | | | | AB1: EXCELLENT | Sample Size | 55383 | | | Weighted Size | 23829382.24 | | | Total | 4639091.67 | | | Percent | 19.47 | | | SE Percent | 0.23 | -----------------------------------------------------
Limiting the number of observations
If you are working with a very large data set and you find that running procedures takes a while, you can use the maxobs = option on the proc statement of all analysis procedures to limit the number of observations that are read in. This can be very useful when you are debugging a program. Just remember to delete that option when you have the programming working correctly. Compare the results of the two proc reg calls below.
proc regress data=temp1 filetype=sas design = jackknife maxobs = 1000; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; model ae13 = ae14; run;
S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute January 2003 Release 8.0.2 Number of observations read : 1000 Weighted count: 431947 Observations used in the analysis : 591 Weighted count: 242364 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 2 Weighted mean response is 2.262239 Multiple R-Square for the dependent variable AE13: 0.216196
Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Identity Response variable AE13: AE13 ---------------------------------------------------------------------- Independent P-value Variables and Beta T-Test Effects Coeff. SE Beta T-Test B=0 B=0 ---------------------------------------------------------------------- Intercept 1.96 0.11 17.83 0.0000 AE14 0.32 0.08 3.78 0.0003 ---------------------------------------------------------------------- ------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 2 197.90 0.0000 MODEL MINUS INTERCEPT 1 14.29 0.0003 INTERCEPT 1 317.85 0.0000 AE14 1 14.29 0.0003 -------------------------------------------------------
proc regress data=temp1 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; model ae13 = ae14; run;
S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute January 2003 Release 8.0.2 Number of observations read : 55428 Weighted count: 23847415 Observations used in the analysis : 32538 Weighted count: 13783845 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 2 Weighted mean response is 2.188590 Multiple R-Square for the dependent variable AE13: 0.241897
Multiple R-Square for the dependent variable AE13: 0.241897
Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Identity Response variable AE13: AE13 ---------------------------------------------------------------------- Independent P-value Variables and Beta T-Test Effects Coeff. SE Beta T-Test B=0 B=0 ---------------------------------------------------------------------- Intercept 1.88 0.01 152.15 0.0000 AE14 0.34 0.01 25.47 0.0000 ----------------------------------------------------------------------
------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 2 12818.28 0.0000 MODEL MINUS INTERCEPT 1 648.71 0.0000 INTERCEPT 1 23150.59 0.0000 AE14 1 648.71 0.0000 -------------------------------------------------------
Stars instead of numbers
If you see stars in your output where numbers should be, you can change the length of the column width (which is specified on the setenv statement) so that it is wide enough to display the results correctly. In the example below, the column width (colwidth = 10) is not set to be large enough, even though it is set higher than the default.
proc descript data=temp1 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; var srsex racehpra racehpra racehpra; catlevel 1 1 2 3; setenv colwidth=10; run;
S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute January 2003 Release 8.0.2 Number of observations read : 55428 Weighted count : 23847415 Denominator degrees of freedom : 80
Variance Estimation Method: Replicate Weight Jackknife by: Variable, One. --------------------------------------------------- | | | | Variable | | One | | | 1 | --------------------------------------------------- | | | | | SRSEX: MALE | Sample Size | 55428 | | | Weighted Size | ********** | | | Total | ********** | | | Percent | 48.78 | | | SE Percent | 0.00 | --------------------------------------------------- | | | | | RACEHPRA: | Sample Size | 55428 | | LATINO | Weighted Size | ********** | | | Total | 5643945.79 | | | Percent | 23.67 | | | SE Percent | 0.12 | --------------------------------------------------- | | | | | RACEHPRA: | Sample Size | 55428 | | PACIFIC | Weighted Size | ********** | | ISLANDER | Total | 61972.10 | | | Percent | 0.26 | | | SE Percent | 0.02 | --------------------------------------------------- | | | | | RACEHPRA: AIAN | Sample Size | 55428 | | | Weighted Size | ********** | | | Total | 85146.30 | | | Percent | 0.36 | | | SE Percent | 0.02 | ---------------------------------------------------
In the example below, the setenv statement has been modified so that the colwidth is 12 instead of 10, and now the results are displayed properly.
proc descript data=temp1 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; var srsex racehpra racehpra racehpra; catlevel 1 1 2 3; setenv colwidth=12; run;
S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute January 2003 Release 8.0.2 Number of observations read : 55428 Weighted count : 23847415 Denominator degrees of freedom : 80
Variance Estimation Method: Replicate Weight Jackknife by: Variable, One. ----------------------------------------------------- | | | | Variable | | One | | | 1 | ----------------------------------------------------- | | | | | SRSEX: MALE | Sample Size | 55428 | | | Weighted Size | 23847415.32 | | | Total | 11631728.37 | | | Percent | 48.78 | | | SE Percent | 0.00 | ----------------------------------------------------- | | | | | RACEHPRA: | Sample Size | 55428 | | LATINO | Weighted Size | 23847415.32 | | | Total | 5643945.79 | | | Percent | 23.67 | | | SE Percent | 0.12 | ----------------------------------------------------- | | | | | RACEHPRA: | Sample Size | 55428 | | PACIFIC | Weighted Size | 23847415.32 | | ISLANDER | Total | 61972.10 | | | Percent | 0.26 | | | SE Percent | 0.02 | ----------------------------------------------------- | | | | | RACEHPRA: AIAN | Sample Size | 55428 | | | Weighted Size | 23847415.32 | | | Total | 85146.30 | | | Percent | 0.36 | | | SE Percent | 0.02 | -----------------------------------------------------
Proc records crashing
The records procedure will not work properly if you have a data set with formats that have negative values, for example, as the CHIS data set does. SUDAAN has been notified of this problem.
Reference categories for categorical independent variables and how to change them
By default, the last category (i.e., the highest numbered category) is used as the reference category when you have categorical predictors in a regression model. In this example, srsex is coded 1 = male and 2 = female, and racehpra is coded 1 = Latino, 2 = Pacific Islander, 2 = AIAN and 4 = Asian.
proc regress data=temp1 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; model ab1 = srsex racehpra; subgroup srsex racehpra; levels 2 4; run;
S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute January 2003 Release 8.0.2 Number of observations read : 55428 Weighted count: 23847415 Observations used in the analysis : 14400 Weighted count: 8414669 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 5 Weighted mean response is 2.749634 Multiple R-Square for the dependent variable AB1: 0.026720
Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Identity Response variable AB1: AB1 ---------------------------------------------------------------------- Independent P-value Variables and Beta T-Test Effects Coeff. SE Beta T-Test B=0 B=0 ---------------------------------------------------------------------- Intercept 2.57 0.02 120.45 0.0000 SRSEX MALE -0.11 0.03 -4.51 0.0000 FEMALE 0.00 0.00 . . RACEHPRA LATINO 0.35 0.02 14.48 0.0000 PACIFIC ISLANDER 0.01 0.11 0.06 0.9562 AIAN 0.23 0.06 3.56 0.0006 ASIAN 0.00 0.00 . . ----------------------------------------------------------------------
------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 5 12285.21 0.0000 MODEL MINUS INTERCEPT 4 56.79 0.0000 INTERCEPT . . . SRSEX 1 20.36 0.0000 RACEHPRA 3 71.68 0.0000 -------------------------------------------------------
To change the reference category, you can use the reflevel statement. In this example, we have changed the reference category for both variables.
proc regress data=temp1 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; reflevel racehpra = 2 srsex = 1 ; model ab1 = srsex racehpra; subgroup srsex racehpra; levels 2 4; run;
S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute January 2003 Release 8.0.2 Number of observations read : 55428 Weighted count: 23847415 Observations used in the analysis : 14400 Weighted count: 8414669 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 5 Weighted mean response is 2.749634 Multiple R-Square for the dependent variable AB1: 0.026720
Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Identity Response variable AB1: AB1 ---------------------------------------------------------------------- Independent P-value Variables and Beta T-Test Effects Coeff. SE Beta T-Test B=0 B=0 ---------------------------------------------------------------------- Intercept 2.46 0.11 22.88 0.0000 SRSEX MALE 0.00 0.00 . . FEMALE 0.11 0.03 4.51 0.0000 RACEHPRA LATINO 0.35 0.11 3.27 0.0016 PACIFIC ISLANDER 0.00 0.00 . . AIAN 0.22 0.12 1.91 0.0598 ASIAN -0.01 0.11 -0.06 0.9562 ----------------------------------------------------------------------
------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 5 12285.21 0.0000 MODEL MINUS INTERCEPT 4 56.79 0.0000 INTERCEPT . . . SRSEX 1 20.36 0.0000 RACEHPRA 3 71.68 0.0000 -------------------------------------------------------
Using only some of the categories in a categorical variable
You can specify just some of the levels of a categorical variable by listing only the desired levels on the catlevel statement.
proc descript data=temp1 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; var srsex racehpra racehpra racehpra; catlevel 1 1 2 3; setenv colwidth=12; run;
S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute January 2003 Release 8.0.2 Number of observations read : 55428 Weighted count : 23847415 Denominator degrees of freedom : 80
Variance Estimation Method: Replicate Weight Jackknife by: Variable, One. ----------------------------------------------------- | | | | Variable | | One | | | 1 | ----------------------------------------------------- | | | | | SRSEX: MALE | Sample Size | 55428 | | | Weighted Size | 23847415.32 | | | Total | 11631728.37 | | | Percent | 48.78 | | | SE Percent | 0.00 | ----------------------------------------------------- | | | | | RACEHPRA: | Sample Size | 55428 | | LATINO | Weighted Size | 23847415.32 | | | Total | 5643945.79 | | | Percent | 23.67 | | | SE Percent | 0.12 | ----------------------------------------------------- | | | | | RACEHPRA: | Sample Size | 55428 | | PACIFIC | Weighted Size | 23847415.32 | | ISLANDER | Total | 61972.10 | | | Percent | 0.26 | | | SE Percent | 0.02 | ----------------------------------------------------- | | | | | RACEHPRA: AIAN | Sample Size | 55428 | | | Weighted Size | 23847415.32 | | | Total | 85146.30 | | | Percent | 0.36 | | | SE Percent | 0.02 | -----------------------------------------------------
The following examples show what will happen if you use a dichotomous variable coded as 1/2 as the dependent variable in a logistic regression in SUDAAN. As you can see, SUDAAN, like other statistical packages, requires that the dependent variable in a logistic regression be coded as 0/1.
proc freq data = temp1; tables ab24; format ab24; run;
The FREQ Procedure Taking insulin Cumulative Cumulative AB24 Frequency Percent Frequency Percent --------------------------------------------------------- 1 878 23.23 878 23.23 2 2902 76.77 3780 100.00 Frequency Missing = 51648
proc rlogist data=temp1 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; model ab24 = srsex racehpra ab23; subgroup srsex racehpra; levels 2 2; run;
Opened SAS data file TEMP1 for reading. WARNING: There is only 1 nonzero cell in the frequency table for: (Intercept) * Taking insulin Deleting all observations corresponding to (Intercept) before computing the remaining betas WARNING: There is only 1 nonzero cell in the frequency table for: (Self-reported gender = MALE) * Taking insulin Deleting all observations corresponding to (Self-reported gender = MALE) before computing the remaining betas WARNING: There is only 1 nonzero cell in the frequency table for: (Self-reported gender = FEMALE) * Taking insulin Deleting all observations corresponding to (Self-reported gender = FEMALE) before computing the remaining betas WARNING: There is only 1 nonzero cell in the frequency table for: (Race - UCLA CHPR Definition = LATINO) * Taking insulin Deleting all observations corresponding to (Race - UCLA CHPR Definition = LATINO) before computing the remaining betas WARNING: There is only 1 nonzero cell in the frequency table for: (Race - UCLA CHPR Definition = PACIFIC ISLANDER) * Taking insulin Deleting all observations corresponding to (Race - UCLA CHPR Definition = PACIFIC ISLANDER) before computing the remaining betas SUMMARY: Deleting a total of 701 observations due to infinite betas before computing remaining betas DATA ERROR : There are no records with dependent variable AB24=1 SUDAAN processing halted.
Below is a small SAS data step that corrects the problem of the dependent variable being coded 1/2. As you can see, proc rlogist now runs correctly.
data temp1a; set temp1; ab24a = ab24 - 1; run; proc rlogist data=temp1a filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; model ab24a = srsex racehpra ab23; subgroup srsex racehpra; levels 2 2; run;
S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute January 2003 Release 8.0.2 Number of zero responses : 147 Number of non-zero responses : 554 Independence parameters have converged in 5 iterations Number of observations read : 55428 Weighted count: 23847415 Observations used in the analysis : 701 Weighted count: 338818 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 4 Sample and Population Counts for Response Variable AB24A 0: Sample Count 147 Population Count 68944 1: Sample Count 554 Population Count 269875 R-Square for dependent variable AB24A (Cox & Snell, 1989): 0.004016 -2 * Normalized Log-Likelihood with Intercepts Only : 708.28 -2 * Normalized Log-Likelihood Full Model : 705.46 Approximate Chi-Square (-2 * Log-L Ratio) : 2.82 Degrees of Freedom : 3 Note: The approximate Chi-Square is not adjusted for clustering. Refer to hypothesis test table for adjusted test.
Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Logit Response variable AB24A: AB24A ---------------------------------------------------------------------- Independent P-value Variables and Beta T-Test Effects Coeff. SE Beta T-Test B=0 B=0 ---------------------------------------------------------------------- Intercept 0.97 0.94 1.04 0.3023 Self-reported gender MALE -0.05 0.27 -0.19 0.8525 FEMALE 0.00 0.00 . . Race - UCLA CHPR Definition LATINO -0.03 0.83 -0.03 0.9731 PACIFIC ISLANDER 0.00 0.00 . . Age first told by doctor that have diabetes or sugar diabetes 0.01 0.01 1.11 0.2719 ----------------------------------------------------------------------
------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 4 39.02 0.0000 MODEL MINUS INTERCEPT 3 0.43 0.7312 INTERCEPT . . . SRSEX 1 0.03 0.8525 RACEHPRA 1 0.00 0.9731 AB23 1 1.22 0.2719 -------------------------------------------------------
----------------------------------------------------------- Independent Variables and Lower 95% Upper 95% Effects Odds Ratio Limit OR Limit OR ----------------------------------------------------------- Intercept 2.65 0.41 17.13 Self-reported gender MALE 0.95 0.55 1.63 FEMALE 1.00 1.00 1.00 Race - UCLA CHPR Definition LATINO 0.97 0.18 5.11 PACIFIC ISLANDER 1.00 1.00 1.00 Age first told by doctor that have diabetes or sugar diabetes 1.01 0.99 1.03 -----------------------------------------------------------
Using the recode statement
You can use the recode statement with all procedures in SUDAAN (except proc records). This statement is especially useful when you need to create a categorical variable from a continuous variable. The original continuous variable is recoded "on the fly", and the recoded variable is not added to your data set; rather, it exists only for the duration of the procedure. In the first example below, a 0/1 variable is created from the continuous variable ab23. A cut-off value of 50 is given, so in the recoded variable, values less than 50 will be coded 0 and values equal to and greater than 50 will be coded 1. Please see page 164 of the SUDAAN manual for more information regarding the recode statement. Note that proc descript does not consider 0 to be a missing value. On the var statement, you need to specify the variable one time for each level of that variable that appears on the catlevel statement. On the catlevel statement, you need to specify the value of each level of the variable that you want displayed in the output.
proc descript data=temp1 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; recode ab23 = (50); var ab23 ab23; catlevel 0 1; run;
S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute January 2003 Release 8.0.2 Number of observations read : 55428 Weighted count : 23847415 Denominator degrees of freedom : 80
Variance Estimation Method: Replicate Weight Jackknife by: Variable, One. ----------------------------------------------------- | | | | Variable | | One | | | 1 | ----------------------------------------------------- | | | | | Age first told | Sample Size | 3709 | | by doctor that | Weighted Size | 1380250.55 | | have diabetes | Total | 709750.87 | | or sugar | Percent | 51.42 | | diabetes: 0 - | SE Percent | 1.03 | | HIGH | | | ----------------------------------------------------- | | | | | Age first told | Sample Size | 3709 | | by doctor that | Weighted Size | 1380250.55 | | have diabetes | Total | 670499.68 | | or sugar | Percent | 48.58 | | diabetes: 0 - | SE Percent | 1.03 | | HIGH | | | -----------------------------------------------------
In the example below, the recode statement is used to create a three-level variable from the continuous variable ab23. In the recoded variable, values less than 20 will be coded as 0, values less than or equal to 30 will be coded as 1, and values less than 70 will be coded as 2.
proc descript data=temp1 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; recode ab23 = (20 30 70); var ab23 ab23 ab23; catlevel 0 1 2; run;
----------------------------------------------------- | | | | Variable | | One | | | 1 | ----------------------------------------------------- | | | | | Age first told | Sample Size | 3709 | | by doctor that | Weighted Size | 1380250.55 | | have diabetes | Total | 76494.78 | | or sugar | Percent | 5.54 | | diabetes: 0 - | SE Percent | 0.52 | | HIGH | | | ----------------------------------------------------- | | | | | Age first told | Sample Size | 3709 | | by doctor that | Weighted Size | 1380250.55 | | have diabetes | Total | 113516.38 | | or sugar | Percent | 8.22 | | diabetes: 0 - | SE Percent | 0.58 | | HIGH | | | ----------------------------------------------------- | | | | | Age first told | Sample Size | 3709 | | by doctor that | Weighted Size | 1380250.55 | | have diabetes | Total | 1055360.64 | | or sugar | Percent | 76.46 | | diabetes: 0 - | SE Percent | 0.88 | | HIGH | | | -----------------------------------------------------
The example below shows how you can use the recode statement to recode a 0/1 variable into a 1/2 variable. Although this is not needed for proc descript, this is used because the recoding is so clearly shown in the output.
data temp01; set temp1; newvar1 = 0; if _n_ ge 4050 then newvar1 = 1; run; proc descript data=temp01 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; recode newvar1 = (0 1); var newvar1 newvar1; catlevel 1 2; run;
S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute January 2003 Release 8.0.2 Number of observations read : 11000 Weighted count : 4801259 Denominator degrees of freedom : 80
Variance Estimation Method: Replicate Weight Jackknife by: Variable, One.
----------------------------------------------------- | | | | Variable | | One | | | 1 | ----------------------------------------------------- | | | | | NEWVAR1: 1 | Sample Size | 11000 | | | Weighted Size | 4801258.85 | | | Total | 1763363.22 | | | Percent | 36.73 | | | SE Percent | 0.58 | ----------------------------------------------------- | | | | | NEWVAR1: 2 | Sample Size | 11000 | | | Weighted Size | 4801258.85 | | | Total | 3037895.63 | | | Percent | 63.27 | | | SE Percent | 0.58 | -----------------------------------------------------
The example below shows how you can use the recode statement to recode a 1/2 variable into a 0/1 variable. According to the SUDAAN website, you cannot use the recode statement to recode a value of 2 to 0 (2 = 0).
data temp01a; set temp01; if newvar1 = 0 then newvar2 = 1; if newvar1 = 1 then newvar2 = 2; run; proc descript data=temp01a filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; recode newvar2 = (2 3); var newvar2 newvar2; catlevel 0 1; run;
----------------------------------------------------- | | | | Variable | | One | | | 1 | ----------------------------------------------------- | | | | | NEWVAR2: 0 | Sample Size | 55428 | | | Weighted Size | 23847415.32 | | | Total | 1763363.22 | | | Percent | 7.39 | | | SE Percent | 0.15 | ----------------------------------------------------- | | | | | NEWVAR2: 1 | Sample Size | 55428 | | | Weighted Size | 23847415.32 | | | Total | 22084052.10 | | | Percent | 92.61 | | | SE Percent | 0.15 | -----------------------------------------------------
The values on the recode statement temporarily recode newvar2 to be a 0/1 variable.
proc rlogist data=temp01a filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; recode newvar2 = (2 3); model newvar2 = racehpra ae21a ab23; subgroup racehpra; levels 4; run;
S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute January 2003 Release 8.0.2 Number of zero responses : 37 Number of non-zero responses : 404 Independence parameters have converged in 8 iterations Number of observations read : 55428 Weighted count: 23847415 Observations used in the analysis : 441 Weighted count: 221626 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 6 Sample and Population Counts for Response Variable NEWVAR2 0: Sample Count 37 Population Count 27513 1: Sample Count 404 Population Count 194113 R-Square for dependent variable NEWVAR2 (Cox & Snell, 1989): 0.022326 -2 * Normalized Log-Likelihood with Intercepts Only : 330.83 -2 * Normalized Log-Likelihood Full Model : 320.87 Approximate Chi-Square (-2 * Log-L Ratio) : 9.96 Degrees of Freedom : 4 Note: The approximate Chi-Square is not adjusted for clustering. Refer to hypothesis test table for adjusted test.
Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Logit Response variable NEWVAR2: NEWVAR2 ---------------------------------------------------------------------- Independent P-value Variables and Beta T-Test Effects Coeff. SE Beta T-Test B=0 B=0 ---------------------------------------------------------------------- Intercept 0.45 0.96 0.47 0.6399 Race - UCLA CHPR Definition LATINO 0.50 0.44 1.12 0.2642 PACIFIC ISLANDER inf . . . AIAN 2.54 0.81 3.13 0.0024 ASIAN 0.00 0.00 . . Minutes doing walking or bicycling 0.01 0.01 1.52 0.1329 Age first told by doctor that have diabetes or sugar diabetes 0.02 0.02 0.94 0.3491 ----------------------------------------------------------------------
------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 5 33.64 0.0000 MODEL MINUS INTERCEPT 4 3.16 0.0182 INTERCEPT . . . RACEHPRA 2 4.91 0.0097 AE21A 1 2.30 0.1329 AB23 1 0.89 0.3491 -------------------------------------------------------
----------------------------------------------------------- Independent Variables and Lower 95% Upper 95% Effects Odds Ratio Limit OR Limit OR ----------------------------------------------------------- Intercept 1.57 0.23 10.63 Race - UCLA CHPR Definition LATINO 1.64 0.68 3.94 PACIFIC ISLANDER . . . AIAN 12.64 2.52 63.27 ASIAN 1.00 1.00 1.00 Minutes doing walking or bicycling 1.01 1.00 1.03 Age first told by doctor that have diabetes or sugar diabetes 1.02 0.98 1.05 -----------------------------------------------------------
The effects statement
Below is an example showing how to use the effects statement. Note that the variables listed on the effects statement must be in the same order as those listed on the model statement.
proc regress data=temp1 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; model ae13 = ae14 srsex racehpra; subgroup srsex racehpra; levels 2 4; effects racehpra = (1 0 0 -1) / name = "This is my contrast"; run;
S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute January 2003 Release 8.0.2 Number of observations read : 55428 Weighted count: 23847415 Observations used in the analysis : 6727 Weighted count: 3976584 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 6 Weighted mean response is 2.665200 Multiple R-Square for the dependent variable AE13: 0.262197
Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Identity Response variable AE13: Number of drinks on the days drinking alcohol ---------------------------------------------------------------------- Independent P-value Variables and Beta T-Test Effects Coeff. SE Beta T-Test B=0 B=0 ---------------------------------------------------------------------- Intercept 1.00 0.06 16.58 0.0000 Self-reported gender MALE 0.87 0.07 12.45 0.0000 FEMALE 0.00 0.00 . . Race - UCLA CHPR Definition LATINO 1.03 0.07 14.95 0.0000 PACIFIC ISLANDER 0.67 0.38 1.78 0.0793 AIAN 0.73 0.19 3.91 0.0002 ASIAN 0.00 0.00 . . Number of times having 5 or more drinks in past month 0.39 0.04 10.41 0.0000 ----------------------------------------------------------------------
------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 6 921.40 0.0000 MODEL MINUS INTERCEPT 5 90.79 0.0000 INTERCEPT . . . SRSEX 1 155.04 0.0000 RACEHPRA 3 82.09 0.0000 AE14 1 108.44 0.0000 This is my contrast 1 223.63 0.0000 -------------------------------------------------------
The contrast statement
A new page will be developed describing this statement.
The subpopn statement
Below is an example of the subpopn statement. This statement should be used whenever you want to analyze only a subpopulation in your data. You should NOT subset your data in a data step before running the analysis, as this can cause a wide variety of problems, from incorrect results to difficulties running the procedure at all. See pages 166-169 of the SUDAAN manual for more information regarding the subpopn statement, how to use it, and how missing values are handled. See especially the note in the middle of page 169 for a more complete explanation of why the subpopn statement should be used instead of subsetting the data first.
proc regress data=temp1 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; model ae13 = ae14 racehpra; subpopn srsex = 1; subgroup racehpra; levels 4; run;
S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute January 2003 Release 8.0.2 Number of observations read : 55428 Weighted count: 23847415 Observations in subpopulation : 23002 Weighted count: 11631728 Observations used in the analysis : 3744 Weighted count: 2522055 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 5 Weighted mean response is 3.133033 Multiple R-Square for the dependent variable AE13: 0.231226
Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Identity Response variable AE13: Number of drinks on the days drinking alcohol For Subpopulation: SRSEX = 1 ---------------------------------------------------------------------- Independent P-value Variables and Beta T-Test Effects Coeff. SE Beta T-Test B=0 B=0 ---------------------------------------------------------------------- Intercept 1.71 0.07 24.92 0.0000 Number of times having 5 or more drinks in past month 0.38 0.04 9.67 0.0000 Race - UCLA CHPR Definition LATINO 1.29 0.11 12.31 0.0000 PACIFIC ISLANDER 0.84 0.59 1.44 0.1543 AIAN 0.54 0.24 2.20 0.0307 ASIAN 0.00 0.00 . . ----------------------------------------------------------------------
------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 5 618.86 0.0000 MODEL MINUS INTERCEPT 4 63.04 0.0000 INTERCEPT . . . AE14 1 93.52 0.0000 RACEHPRA 3 50.72 0.0000 -------------------------------------------------------
In this example, we have two conditions on the subpopn statement.
proc regress data=temp1 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; model ae13 = ae14 ; subpopn srsex = 1 and racehpra = 2; run;
S U D A A N Software for the Statistical Analysis of Correlated Data Copyright Research Triangle Institute January 2003 Release 8.0.2 Number of observations read : 55428 Weighted count: 23847415 Observations in subpopulation : 101 Weighted count: 30282 Observations used in the analysis : 69 Weighted count: 17998 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 2 Weighted mean response is 3.607368 Multiple R-Square for the dependent variable AE13: 0.068544
Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Identity Response variable AE13: Number of drinks on the days drinking alcohol For Subpopulation: SRSEX = 1 AND RACEHPRA = 2 ---------------------------------------------------------------------- Independent P-value Variables and Beta T-Test Effects Coeff. SE Beta T-Test B=0 B=0 ---------------------------------------------------------------------- Intercept 3.05 0.63 4.86 0.0000 Number of times having 5 or more drinks in past month 0.20 0.13 1.60 0.1145 ----------------------------------------------------------------------
------------------------------------------------------- Contrast Degrees of P-value Freedom Wald F Wald F ------------------------------------------------------- OVERALL MODEL 2 19.02 0.0000 MODEL MINUS INTERCEPT 1 2.55 0.1145 INTERCEPT 1 23.64 0.0000 AE14 1 2.55 0.1145 -------------------------------------------------------
The test statement
You can use the test statement to obtain different types of chi-squared tests. Please see pages 278-279 of the manual for a description of the tests available in the crosstabs procedure. The nose option is used on the proc crosstabs statement to suppress the display of the standard errors. We have done this to make the output more readable.
* the test statement is not available in proc rlogist???; proc crosstab data=temp1 filetype=sas design = jackknife nose; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; tables srsex*racehpra; test chisq llchisq cmh; subgroup srsex racehpra; levels 2 3; run;
Number of observations read : 55428 Weighted count : 23847415 Denominator degrees of freedom : 80
Variance Estimation Method: Replicate Weight Jackknife by: SRSEX, RACEHPRA. ------------------------------------------------------------------------------------------ | | | | SRSEX | | RACEHPRA | | | Total | LATINO | PACIFIC | AIAN | | | | | | ISLANDER | | ------------------------------------------------------------------------------------------ | | | | | | | | Total | Sample Size | 10458 | 9458 | 219 | 781 | | | Weighted Size | 5791064.18 | 5643945.79 | 61972.10 | 85146.30 | | | Row Percent | 100.00 | 97.46 | 1.07 | 1.47 | | | Col Percent | 100.00 | 100.00 | 100.00 | 100.00 | | | Tot Percent | 100.00 | 97.46 | 1.07 | 1.47 | ------------------------------------------------------------------------------------------ | | | | | | | | MALE | Sample Size | 4435 | 3983 | 101 | 351 | | | Weighted Size | 2911702.72 | 2836612.17 | 30281.84 | 44808.71 | | | Row Percent | 100.00 | 97.42 | 1.04 | 1.54 | | | Col Percent | 50.28 | 50.26 | 48.86 | 52.63 | | | Tot Percent | 50.28 | 48.98 | 0.52 | 0.77 | ------------------------------------------------------------------------------------------ | | | | | | | | FEMALE | Sample Size | 6023 | 5475 | 118 | 430 | | | Weighted Size | 2879361.46 | 2807333.62 | 31690.25 | 40337.59 | | | Row Percent | 100.00 | 97.50 | 1.10 | 1.40 | | | Col Percent | 49.72 | 49.74 | 51.14 | 47.37 | | | Tot Percent | 49.72 | 48.48 | 0.55 | 0.70 | ------------------------------------------------------------------------------------------
------------------------------------------------- | | | | | | | ------------------------------------------------- | | | | | | ChiSq | 1.02 | | | P-value ChiSq | 0.6019 | | | Degrees of | | | | Freedom ChiSq | 2 | | | LLChiSq | 1.03 | | | P-value LLChiSq | 0.5987 | | | Degrees of | | | | Freedom LLChiSq | 2 | -------------------------------------------------
------------------------------------------------- | | | | | | 1 | ------------------------------------------------- | | | | | | Cochran-Mantel- | | | | Haenszel Chi- | | | | Square | 1.0217 | | | Degrees of | | | | Freedom CMH | 2 | | | P-value CMH Test | 0.6019 | -------------------------------------------------
The test statement can be used in proc regress to produce several kinds of Wald tests. Please see page 481 of the manual for more details. Also, proc rlogist does not seem to have a test statement.
proc regress data=temp1 filetype=sas design = jackknife; weight rakedw0; jackwgts rakedw1--rakedw80 / adjjack=1; model ae13 = ae14 srsex; test waldchi adjwaldf; subgroup srsex; levels 2; run;
Number of observations read : 55428 Weighted count: 23847415 Observations used in the analysis : 32538 Weighted count: 13783845 Denominator degrees of freedom : 80 Maximum number of estimable parameters for the model is 3 Weighted mean response is 2.188590 Multiple R-Square for the dependent variable AE13: 0.259603
Variance Estimation Method: Replicate Weight Jackknife Working Correlations: Independent Link Function: Identity Response variable AE13: AE13 ---------------------------------------------------------------------- Independent P-value Variables and Beta T-Test Effects Coeff. SE Beta T-Test B=0 B=0 ---------------------------------------------------------------------- Intercept 1.61 0.01 116.51 0.0000 AE14 0.32 0.01 24.90 0.0000 SRSEX MALE 0.52 0.03 19.77 0.0000 FEMALE 0.00 0.00 . . ----------------------------------------------------------------------
------------------------------------------------------------------------------- Contrast Degrees P-value P-value of Adj Wald Wald Freedom Adj Wald F F Wald ChiSq ChiSq ------------------------------------------------------------------------------- OVERALL MODEL 3 9802.58 0.0000 30161.79 0.0000 MODEL MINUS INTERCEPT 2 522.32 0.0000 1057.86 0.0000 INTERCEPT . . . . . AE14 1 619.78 0.0000 619.78 0.0000 SRSEX 1 390.70 0.0000 390.70 0.0000 -------------------------------------------------------------------------------