Discrete time survival analysis treats time not as a
continuous variable, but as being divided into discrete units or chunks. We will be able to
analyze discrete time data using logistic regression with indicator variables for each of the
time periods. We will illustrate discrete
time survival analysis using the **cancer.dta** dataset.

## Cancer Example

After reading in the dataset, we will **describe** the variables and **list**
several variables for patient 5, 10 and 20.

use http://www.ats.ucla.edu/stat/stata/library/cancer describeContains data from cancer.dta obs: 48 Patient Survival in Drug Trial vars: 7 2 Jan 1904 13:58 size: 1,248 (99.1% of memory free) ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- id float %9.0g studytime int %8.0g Months to death or end of exp. died int %8.0g 1 if patient died drug float %9.0g age int %8.0g Patient's age at start of exp. distime float %9.0g censor float %9.0g -------------------------------------------------------------------------------tab distimedistime | Freq. Percent Cum. ------------+----------------------------------- 1 | 11 22.92 22.92 2 | 13 27.08 50.00 3 | 6 12.50 62.50 4 | 8 16.67 79.17 5 | 4 8.33 87.50 6 | 6 12.50 100.00 ------------+----------------------------------- Total | 48 100.00univar age-------------- Quantiles -------------- Variable n Mean S.D. Min .25 Mdn .75 Max ------------------------------------------------------------------------------- age 48 55.88 5.66 47.00 50.50 56.00 60.00 67.00 -------------------------------------------------------------------------------list distime drug age died censor if id==5distime drug age died censor 5. 1 0 56 1 0list distime drug age died censor if id==10distime drug age died censor 10. 2 0 58 0 1list distime drug age died censor if id==20distime drug age died censor 20. 4 0 52 1 0

Patient 5 (56 years old, did not receive a drug treatment) was observed for one time period, died. So, the observation for patient was not censored. Patient 10 (58, no drug) was observed for two time periods did not die, i.e., observation was censored. Finally, patient 20 (52, no drug) was observed for four time periods, died (not censored).

In this dataset there is one observation for each patient. In order to do discrete time survival analysis we to have as many observations as there are time periods for each patient. For patients that die we need a response variable that is zero until the last time period when it is coded one. For patients that don’t die the response variable will be zero for every observation.

A collection of Stata commands written by Alexis Dinno (Harvard School of Public
Health) will help us with the analysis. You can download this
family of commands from within Stata by typing **search dthaz** (see
How can I use the search command to search for programs and get additional
help? for more information about using **search**).

The command that we are interested in is
**prsnperd** which creates the type of dataset that we
want. **prsnperd** wants a variable that indicates whether the observation is censored or not
which in our dataset is the variable **censor**.
**prsnperd** creates the following variables: **_period** which is the time period,
**_Y** which is the response variable and _d1 through _d6 which are the dummy coded time periods.
Here is what it looks like.

prsnperd id distime censor

list id _period _Y if id==5id _period _Y 5. 5 1 1

list id _period _Y if id==10id _period _Y 11. 10 1 0 12. 10 2 0

list id _period _Y if id==20id _period _Y 35. 20 1 0 36. 20 2 0 37. 20 3 0 38. 20 4 1

Now we can actually do the discrete time survival analysis using the **logit** command.
We will run **logit** with and without the **cluster** and **nocons** options. The
**nocons** **options** is used so that the dummy variables for all of the time periods will
be included.

logit _Y drug age _d1-_d6, cluster(id) noconsLogit estimates Number of obs = 143 Wald chi2(8) = 45.39 Log likelihood = -55.65503 Prob > chi2 = 0.0000

(standard errors adjusted for clustering on id) ------------------------------------------------------------------------------ | Robust _Y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- drug | -3.024052 .6859866 -4.41 0.000 -4.368561 -1.679543 age | .1607128 .0497324 3.23 0.001 .063239 .2581866 _d1 | -9.309867 2.754574 -3.38 0.001 -14.70873 -3.911001 _d2 | -8.335442 2.641359 -3.16 0.002 -13.51241 -3.158473 _d3 | -8.326742 2.533321 -3.29 0.001 -13.29196 -3.361525 _d4 | -7.071942 2.564526 -2.76 0.006 -12.09832 -2.045564 _d5 | -7.19799 2.490689 -2.89 0.004 -12.07965 -2.316328 _d6 | -7.622593 2.722941 -2.80 0.005 -12.95946 -2.285726 ------------------------------------------------------------------------------

logit _Y drug age _d1-_d6, noconsLogit estimates Number of obs = 143 LR chi2(8) = . Log likelihood = -55.65503 Prob > chi2 = .

------------------------------------------------------------------------------ _Y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- drug | -3.024052 .6347086 -4.76 0.000 -4.268058 -1.780046 age | .1607128 .051414 3.13 0.002 .0599433 .2614823 _d1 | -9.309867 2.922645 -3.19 0.001 -15.03815 -3.581589 _d2 | -8.335442 2.780394 -3.00 0.003 -13.78491 -2.885969 _d3 | -8.326742 2.823744 -2.95 0.003 -13.86118 -2.792306 _d4 | -7.071942 2.734906 -2.59 0.010 -12.43226 -1.711624 _d5 | -7.19799 2.811519 -2.56 0.010 -12.70847 -1.687513 _d6 | -7.622593 2.988678 -2.55 0.011 -13.48029 -1.764892 ------------------------------------------------------------------------------

logit, orLogit estimates Number of obs = 143 LR chi2(8) = . Log likelihood = -55.65503 Prob > chi2 = .

------------------------------------------------------------------------------ _Y | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- drug | .0486039 .0308493 -4.76 0.000 .014009 .1686304 age | 1.174348 .0603779 3.13 0.002 1.061776 1.298854 _d1 | .0000905 .0002646 -3.19 0.001 2.94e-07 .0278315 _d2 | .0002399 .0006669 -3.00 0.003 1.03e-06 .0558007 _d3 | .000242 .0006832 -2.95 0.003 9.55e-07 .0612797 _d4 | .0008486 .0023208 -2.59 0.010 3.99e-06 .1805723 _d5 | .0007481 .0021033 -2.56 0.010 3.03e-06 .184979 _d6 | .0004893 .0014623 -2.55 0.011 1.40e-06 .1712052 ------------------------------------------------------------------------------

Both **drug** and **age** are significant with the older patients more likely to
die and those on drug therapy less likely. It is useful to look at the hazard function
(and survival function) to ascertain the effects over time. The **dthaz** command (from Dinno)
will produce a table with hazard and survival values for each time period. We will
**specify**
the function for drug=1 (drug therapy) and age=56 (the median age).

dthaz drug age, specify(1 56)Discrete-Time Estimation of Conditional Hazard and Survival Probabilities ------------------------------------------------------------------------------ Time Parameterization: Fully Discrete

Additional predictors specified as: drug = 1 age = 56

----------------------------------------- Period p(Hazard) p(Survival) (T_j) ^H(T_j) ^S(T_j) ----------------------------------------- 0 -- 1 1 0.0344 0.9656 2 0.0863 0.8822 3 0.0870 0.8055 4 0.2505 0.6037 5 0.2276 0.4663 6 0.1616 0.3910 ----------------------------------------- Logit Link (assumes proportional odds)

Notice that the hazard maxes out at time period four and then declines.