We often examine data with the aim of making predictions. Spatial data
analysis is no exception. Given measurements of a variable at a set of points
in a region, we might like to extrapolate to points in the region where the
variable was not measured or, possibly, to points outside the region that we
believe will behave similarly. We can base these predictions on our measured
values alone by *kriging* or we can incorporate covariates and make
predictions using a regression model.

In SAS, **proc mixed** allows the user to fit a regression model in which the
outcome and the expected errors are spatially autocorrelated. There are
several different forms that the spatial autocorrelation can take and the most
appropriate form for a given dataset can be assessed by looking at the shape of
the variogram of the data and choosing from the options available.

We will again be using the thick dataset provided in the SAS
documentation for **proc variogram**, which includes the measured thickness
of coal seams at different coordinates (we have converted this to a .csv file
for easy use in R). To this dataset, we have added a covariate called **soil**
measuring the soil quality. We wish to predict thickness (**thick**) with
soil quality (**soil**) in a regression model that incorporates the spatial
autocorrelation of our data.

We can first run a model treating our observations as independent of each other and predicting **thick **with **soil**.
We will do this using **proc mixed**.

proc mixed data = thick ; model thick = soil / solution; repeated / subject = intercept; run;The Mixed Procedure[ ... output omitted ... ]Fit Statistics -2 Res Log Likelihood 333.6 AIC (smaller is better) 335.6 AICC (smaller is better) 335.7 BIC (smaller is better) 337.9 Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 0 0.00 1.0000 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept 31.9420 3.1570 0 10.12 . soil 2.2552 0.8656 73 2.61 0.0111 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F soil 1 73 6.79 0.0111

Next, we can run the same model with spatial correlation structures. Let’s
assume that we
determined that our outcome **thick** appears to have a Gaussian spatial
correlation form. We can specify such a structure with the **type=sp(gau)**
command in the **repeated **line followed by the variables we wish to use to
measure distance.

proc mixed data = thick ; model thick = soil / solution; repeated / subject = intercept type = sp(gau)(east north); run;The Mixed Procedure[ ... output omitted ... ]Fit Statistics -2 Log Likelihood 81.5 AIC (smaller is better) 89.5 AICC (smaller is better) 90.1 BIC (smaller is better) 98.8 Null Model Likelihood Ratio Test DF Chi-Square Pr > ChiSq 1 252.81 |t| Intercept 40.3280 0.5799 0 69.55 . soil 0.003479 0.01582 73 0.22 0.8266 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F soil 1 73 0.05 0.8266

In this example, incorporating the Gaussian correlation structure both
improved the model fit and changed the nature of the regression model. Without
the spatial structure, **soil** is a statistically significant predictor of
**thick**. With the spatial structure, this relationship becomes not
significant. This suggests that after controlling for location and the known
correlation structure, **soil** does not add much new information.

## References

- SAS System for Mixed Models, Second Edition by Ramon Littell, George Milliken, Walter Stroup, Russell Wolfinger and Oliver Schabenberger
- Cressie, Noel.
*Statistics for Spatial Data*. John Wiley & Sons, Inc.: New York, 1991.