The interpretation of coefficients in an ordinal logistic regression varies by the software you use. In this FAQ page, we will focus on the interpretation of the coefficients in R, but the results generalize to Stata, SPSS and Mplus. For a detailed description of how to analyze your data using R, refer to R Data Analysis Examples Ordinal Logistic Regression.

## Definitions

First let’s establish some notation and review the concepts involved in ordinal logistic regression. Let $Y$ be an ordinal outcome with $J$ categories. Then $P(Y \le j)$ is the cumulative probability of $Y$ less than or equal to a specific category $j = 1, \cdots, J-1$. Note that $P(Y \le J) =1.$ The odds of being less than or equal a particular category can be defined as

$$\frac{P(Y \le j)}{P(Y>j)}$$

for $j=1,\cdots, J-1$ since $P(Y > J) = 0$ and dividing by zero is undefined. Alternatively, you can write $P(Y >j) = 1 – P(Y \le j)$. The **log odds** is also known as the **logit**, so that

$$log \frac{P(Y \le j)}{P(Y>j)} = logit (P(Y \le j)).$$

## Ordinal Logistic Regression Model

The ordinal logistic regression model can be defined as

$$logit (P(Y \le j)) = \beta_{j0} + \beta_{j1}x_1 + \cdots + \beta_{jp} x_p$$ for $j=1, \cdots, J-1$ and $p$ predictors. Due to the **parallel lines assumption**, the intercepts are different for each category but the slopes are constant across categories, which simplifies the equation above to

$$logit (P(Y \le j)) = \beta_{j0} + \beta_{1}x_1 + \cdots + \beta_{p} x_p.$$

### How R parameterizes the ordinal regression model

In Stata and R (`polr`

) the ordinal logistic regression model is parameterized as

$$logit (P(Y \le j)) = \beta_{j0} – \eta_{1}x_1 – \cdots – \eta_{p} x_p$$

where $\eta_i = -\beta_i.$

Suppose we want to see whether a binary predictor parental education (`pared`

) predicts an ordinal outcome of students who are *unlikely*, *somewhat* *likely* and *very* *likely* to apply to a college (`apply`

).

Due to the parallel lines assumption, even though we have three categories, the coefficient of parental education (`pared`

) stays the same across the two categories. The the two equations for `pared = 1`

and `pared = 0`

are

$$ \begin{eqnarray} logit (P(Y \le j | x_1=1) & = & \beta_{j0} – \eta_{1} \\ logit (P(Y \le j | x_1=0) & = & \beta_{j0} \end{eqnarray} $$

Then $logit (P(Y \le j)|x_1=1) -logit (P(Y \le j)|x_1=0) = – \eta_{1}.$

To run an ordinal logistic regression in R, first load the following libraries:

library(foreign) library(MASS)

Now read in the data and run the analysis using `polr`

:

dat <- read.dta("https://stats.idre.ucla.edu/stat/data/ologit.dta") m <- polr(apply ~ pared, data = dat) summary(m)

The shortened output looks like the following:

Coefficients: Value Std. Error t value pared1.1270.2634 4.28 Intercepts: Value Std. Error t value unlikely|somewhat likely 0.3768 0.1103 3.4152 somewhat likely|very likely 2.4519 0.1826 13.4302

The output shows that for students whose parents attended college, the log odds of being unlikely to apply to college (versus somewhat or very likely) is actually $-\hat{\eta}_1=-1.13$ or $1.13$ points *lower* than students whose parents did not attend college. Recall that $-\eta_i = \beta_i$ for $j=1,2$ only since $logit (P(Y \le 3))$ is undefined. So the formulations for the first and second category becomes:

$$ \begin{eqnarray} logit (P(Y \le 1)) & = & 0.377 – 1.13 x_1 \\ logit (P(Y \le 2)) & = & 2.45 – 1.13 x_1 \\ \end{eqnarray} $$

To see the connection between the parallel lines assumption and the **proportional odds **assumption, exponentiate both sides of the equations above and use the property that $log(b)-log(a) = log(b/a)$ to calculate the odds of `pared`

for each level of `apply`

.

$$ \begin{eqnarray} \frac{P(Y \le 1 | x_1=1)}{P(Y \gt 1 | x_1=1)} & = & exp(0.377)/exp(1.13) \\ \frac{P(Y \le 1 | x_1=0)}{P(Y \gt 1 | x_1=0)} & = & exp(0.377) \\ \frac{P(Y \le 2 | x_1=1)}{P(Y \gt 2 | x_1=1)} & = & exp(2.45)/exp(1.13) \\ \frac{P(Y \le 2 | x_1=0)}{P(Y \gt 2 | x_1=0)} & = & exp(2.45) \end{eqnarray} $$

From the odds of each level of pared, we can calculate the *odds ratio* of `pared`

for each level of `apply`

.

$$ \begin{eqnarray} \frac{P(Y \le 1 | x_1=1)}{P(Y \gt 1 | x_1=1)} / \frac{P(Y \le 1 | x_1=0)}{P(Y \gt 1 | x_1=0)} & = & 1/exp(1.13) & = & exp(-1.13) \\ \frac{P(Y \le 2 | x_1=1)}{P(Y \gt 2 | x_1=1)} / \frac{P(Y \le 2 | x_1=0)}{P(Y \gt 2 | x_1=0)} & = & 1/exp(1.13) & = & exp(-1.13) \\ \end{eqnarray} $$

The proportional odds assumption ensures that the *odds ratios *across all $J-1$ categories are the same. In our example, the proportional odds assumption means that the odds of being unlikely versus somewhat or very likely to apply $(j=1)$ is the same as the odds of being unlikely and somewhat likely versus very likely to apply ($j=2$).

## Interpreting the odds ratio

The proportional odds assumption is not simply that the odds are the same but that the *odds ratios* are the same across categories. These odds ratios can be derived by exponentiating the coefficients (in the log-odds metric), but the interpretation is a bit unexpected. Recall that the coefficient $ – \eta_{1}$ represents a one unit change in the log odds of applying for students whose parents went to college versus parents who did not:

$$logit (P(Y \le j|x_1=1) -logit (P(Y \le j|x_1=0) = – \eta_{1}.$$

Since the exponent is the inverse function of the log, we can simply exponentiate both sides of this equation, and by using the property that $log(b)-log(a) = log(b/a)$,

$$\frac{P(Y \le j |x_1=1)}{P(Y>j|x_1=1)} / \frac{P(Y \le j |x_1=0)}{P(Y>j|x_1=0)} = exp( -\eta_{1}).$$

For simplicity of notation and by the proportional odds assumption, let $\frac{P(Y \le j |x_1=1)}{P(Y>j|x_1=1)} = p_1 / (1-p_1) $ and $\frac{P(Y \le j |x_1=0)}{P(Y>j|x_1=0)} = p_0 / (1-p_0).$ Then the odds ratio is defined as

$$\frac{p_1 / (1-p_1) }{p_0 / (1-p_0)} = exp( -\eta_{1}).$$

However, as we will see in the output, this is *not *what we actually obtain from R!

### R

To obtain the odds ratio in R, simply exponentiate the coefficient or log-odds of `pared`

. The following code uses
`cbind`

to combine the odds ratio with its confidence interval. First store the confidence interval in object `ci`

,

(ci <- confint(m))2.5 % 97.5 % 0.6131222 1.6478130

Then bind the transpose of the `ci`

object with `coef(m)`

and exponentiate the values,

exp(cbind(coef(m),t(ci)))2.5 % 97.5 % pared 3.087899 1.846187 5.195605

In our example, $exp(-1.127) = 0.324$, which means that students whose parents attended college have a 67.6% lower odds of being less likely to apply to college. However, this does not correspond to the odds ratio from the output! Let’s see why.

Since $exp(-\eta_{1}) = \frac{1}{exp(\eta_{1})}$,

$$exp(\eta_{1}) = \frac{p_0 / (1-p_0) }{p_1 / (1-p_1)}.$$

From the output, $\hat{\eta}_1=1.127$, which means the odds ratio $exp(\hat{\eta}_1)=3.086$ is actually $\frac{p_0 / (1-p_0) }{p_1 / (1-p_1)}.$ This suggests that students whose parents did *not *go to college have higher odds of being *less *likely to apply.

### Another way to look at the odds ratio

Double negation can be logically confusing. Suppose we wanted to interpret the odds of being *more *likely to apply to college. We can perform a slight manipulation of our original odds ratio:

$$ \begin{eqnarray} exp(-\eta_{1}) & = & \frac{p_1 / (1-p_1)}{p_0/(1-p_0)} \\ & = & \frac{p_1 (1-p_0)}{p_0(1-p_1)} \\ & = & \frac{(1-p_0)/p_0}{(1-p_1)/p_1} \\ & = & \frac{P (Y >j | x=0)/P(Y \le j|x=0)}{P(Y > j | x=1)/P(Y \le j | x=1)}. \end{eqnarray} $$

Since $exp(-\eta_{1}) = \frac{1}{exp(\eta_{1})}$,

$$\frac{P (Y >j | x=1)/P(Y \le j|x=1)}{P(Y > j | x=0)/P(Y \le j | x=0)} = exp(\eta).$$

Instead of interpreting the odds of being in the $j$th category or less, we can interpret the odds of being greater than the $j$th category by exponentiating $\eta$ itself. In our example, $exp(\hat{\eta}) = exp(1.127) = 3.086$ means that students whose parents went to college have 3.086 times the odds of being very likely to apply (vs. somewhat or unlikely) compared to students whose parents did not go to college. The results here are consistent with our intuition because it removes double negatives. As a general rule, it is easier to interpret the odds ratios of $x_1=1$ vs. $x_1=0$ by simply exponentiating $\eta$ itself rather than interpreting the odds ratios of $x_1=0$ vs. $x_1=1$ by exponentiating $-\eta$. However by doing so, we flip the interpretation of the outcome by placing $P (Y >j)$ in the numerator.

## Verifying both interpretations of the odds ratio using predicted probabilities

To verify that indeed the odds ratio of 3.08 can be interpreted in two ways, let’s derive them from the predicted probabilities in R.

After storing the `polr`

object in object `m`

, pass this object as well as a dataset with the levels of `pared `

into the predict function. Specify `type="p"`

for predicted probabilities.

newdat <- data.frame(pared=c(0,1)) (phat <- predict(object = m, newdat, type="p"))unlikely somewhat likely very likely 1 0.5931114 0.3275856 0.07930294 2 0.3206801 0.4692269 0.21009300

Each row represents the first level ($x_1=0)$ and second level ($x_1=1$) of `pared`

, and each column represents $j=1,2,3$ outcome `apply`

.

### Interpretation 1

The first interpretation is for students whose parents did *not* attend college, the odds of being *unlikely* versus somewhat or very likely (i.e., *less* likely) to apply is 3.08 times that of students whose parents did go to college.

To verify this interpretation, we arbitrarily calculate the odds ratio for the first level of `apply`

which we know by the proportional odds assumption is equivalent to the odds ratio for the second level of `apply`

. Since we are looking at `pared = 0`

vs. `pared = 1`

for $P(Y \le 1 | x_1=x)/P(Y > 1 | x_1=x)$ the respective probabilities are $p_0=.593$ and $p_1=.321$. Then

$$\frac{p_0 / (1-p_0) }{p_1 / (1-p_1)} = \frac{0.593 / (1-0.593) }{0.321 / (1-0.321)} =\frac{1.457}{0.473} =3.08.$$

### Interpretation 2

The second interpretation is for students whose parents *did *attend college, the odds of being *very* or *somewhat* likely versus unlikely (i.e., *more* likely) to apply is 3.08 times that of students whose parents did not go to college.

Here we are looking at `pared = 1`

vs. `pared = 0`

for $P(Y > 1 | x_1=x)/P(Y \le 1 | x_1=x)$. Then for the first level of `apply`

$P(Y>1 | x_1 = 1) =0.469+0.210 = 0.679$ and $P(Y \le 1 | x_1 = 1) = 0.321$. Similarly, $P(Y>1 | x_1 = 0) =0.328+0.079= 0.407$ and $P(Y \le 1 | x_1 = 0) = 0.593.$ Taking the ratio of the two odds gives us the odds ratio,

$$ \frac{P(Y>1 | x_1 = 1) /P(Y \le 1 | x_1=1)}{P(Y>1 | x_1 = 0) /P(Y \le 1 | x_1=0)} = \frac{0.679/0.321}{0.407/0.593} = \frac{2.115}{0.686}=3.08.$$

The odds ratio for both interpretations matches the output of R.

## Summary

In general, to obtain the odds ratio it is easier to exponentiate the coefficient itself rather than its negative because this is what is output directly from R (`polr`

). The researcher must then decide which of the two interpretations to use:

- For students whose parents did
*not*attend college, the odds of being*less*likely to apply is 3.08 times that of students whose parents did go to college. - For students whose parents
*did*attend college, the odds of being*more*likely to apply is 3.08 times that of students whose parents did not go to college.

The second interpretation is easier because it avoids double negation.

### References

Bilder, C. R., & Loughin, T. M. (2014). *Analysis of categorical data with R*. Chapman and Hall/CRC.