How can I use PROC TRANSREG to generate splines in SAS?

Proc transreg performs transformation regression in which both the outcome and predictor(s) can be transformed and splines can be fit. Splines are piecewise polynomials that can be used to estimate relationships that are difficult to fit with a single function.

In this page, we will walk through an example using some of the most commonly used options of proc transreg. For more information on the options available, see the SAS Online Documentation.

We can begin by creating a dataset with an outcome Y and a predictor X. This example data is generated in the SAS examples for proc transreg.


data a;
  x=-0.000001;
  do i=0 to 199;
    if mod(i,50)=0 then do;
      c=((x/2)-5)**2;
      if i=150 then c=c+5;
      y=c;
      end;
    x=x+0.1;
    y=y-sin(x-c);
	output;
    end;
run;

proc gplot data = a;
  plot y*x;
run;

Image transreg1

Clearly, there is not a single function relating Y to X. The relationship does not appear random, but it does appear to change with X. Thus it makes sense to try to fit this with splines. We will start with the SAS defaults and then show how you can specify the number of polynomials (pieces) you wish to fit, the degree of the polynomials you wish to fit. Before running the proc transreg, we can see that our data contains four variables:

proc print data = a (obs = 5); run;
Obs       X       I       C          Y

  1    0.10000    0    25.0000    24.7694
  2    0.20000    1    25.0000    24.4427
  3    0.30000    2    25.0000    24.0234
  4    0.40000    3    25.0000    23.5155
  5    0.50000    4    25.0000    22.9241

In the proc transreg command, we indicate in the model line that we wish to predict variable y without transformation with identity(y). If we wished to model a transformed version of y (the log or rank of y, for example), we would indicate the transformation here. To predict y, we indicate that we wish to use piecewise polynomial functions of x with pspline(x). There are many other options available that will not be covered in this page. We chose pspline because it is the most commonly used and easy-to-understand option. We also opted to output a dataset, a2, containing predicted values from the model.

proc transreg data=a;
   model identity(y) = pspline(x);
   output out = a2 pprefix = p;
run;

The TRANSREG Procedure

     TRANSREG Univariate Algorithm Iteration History for Identity(Y)
Iteration    Average    Maximum                Criterion
   Number     Change     Change    R-Square       Change    Note
-------------------------------------------------------------------------
        1    0.00000    0.00000     0.46884                 Converged

We can see in the outcome above that the model converged and has an R-squared value of 0.47. Let’s look at the dataset output by proc transreg.

proc print data = a2 (obs = 5); run;

Obs  _TYPE_  _NAME_     Y        TY       pY    Intercept    X_1      X_2

  1  SCORE    ROW1   24.7694  24.7694  24.1144      1      0.10000  0.01000
  2  SCORE    ROW2   24.4427  24.4427  23.4722      1      0.20000  0.04000
  3  SCORE    ROW3   24.0234  24.0234  22.8424      1      0.30000  0.09000
  4  SCORE    ROW4   23.5155  23.5155  22.2249      1      0.40000  0.16000
  5  SCORE    ROW5   22.9241  22.9241  21.6195      1      0.50000  0.25000
  
  Obs    X_3      TIntercept      TX_1       TX_2       TX_3        X

  1  0.00100         1        0.10000    0.01000    0.00100    0.10000
  2  0.00800         1        0.20000    0.04000    0.00800    0.20000
  3  0.02700         1        0.30000    0.09000    0.02700    0.30000
  4  0.06400         1        0.40000    0.16000    0.06400    0.40000
  5  0.12500         1        0.50000    0.25000    0.12500    0.50000

In addition to adding the predicted values, py, to the dataset, we can see that a new variable, ty, has been added for the "transformed" value of y (since our transformation was the identity, these values are the same as y); three variables (x_1, x_2, x_3) that are the powers of x have been added. Transformations of these three variables and the intercept are also included and indicated with a ‘t‘. We can see that, by default, SAS fits a third-degree polynomial in x to y. We can plot the predicted values to see how closely they match the original data.

legend label=none value=('y' 'predicted y') position=(bottom left inside) mode=share down = 2; 
proc gplot data = a2;
   plot (y py)*x / overlay legend = legend;
run;

Image transreg2

If we want SAS to fit more than one polynomial, we can indicate that by specifying a number of "knots". A knot is a point at which one polynomial ends and another begins. Similarly, we can indicate the degree of the polynomials to be fit. Generally, as we increase the number of knots or number of degrees, we are able to generate functions that more closely fit the data. This improved fit comes at the cost of estimating more parameters.

Let us look at how we can specify degrees and knots to achieve different types of models.

One might believe that x and y are linearly related, but that the slope of the line changes at some point in x. In such a situation, you can fit one one-degree (straight line) polynomial up to the given point in x (the "knot") and another one-degree polynomial from that point on. The proc transreg code for this model and a plot of the results are below.

proc transreg data=a;
   model identity(Y) = pspline(X / nknots=1 degree = 1);
   output out = k1d1 PPREFIX =k1d1;
run;

proc gplot data = k1d1;
   plot (y k1d1y)*x / overlay legend = legend;
run;

Image transreg3

We can improve the fit of the above model by fitting two second-degree polynomials:

proc transreg data=a;
   model identity(Y) = pspline(X / nknots=1 degree = 2);
   output out = k1d2 PPREFIX =k1d2;
run;

proc gplot data = k1d2;
   plot (y k1d2y)*x / overlay legend = legend;
run;

Image transreg4

The table below shows how the fit improves with the number of knots and degrees.

Knots	Degrees	R²
0	1	0.10061
0	2	0.40720
0	3	0.46884
1	1	0.47545
1	2	0.46467
2	1	0.41828
2	2	0.51827
2	3	0.55391
3	1	0.50651
3	2	0.53603

These are the most basic examples of proc transreg using polynomial splines. SAS offers many examples that employ other options in its online documentation.