This page shows an example of a multinomial logistic regression analysis with
footnotes explaining the output. The data were collected on 200 high school
students and are scores on various tests, including a video game and a
puzzle. The outcome measure in this analysis is the preferred flavor of ice
cream – vanilla, chocolate or strawberry- from which we are going to see what
relationships exists with video game scores (**video**), puzzle scores (**puzzle**)
and gender (**female**). Our response variable, **ice_cream**, is going to
be treated as categorical under the assumption that the levels of **ice_cream**
have *no* natural ordering, and we are going to allow Stata to choose the
referent group. In out example, this will be vanilla. By default, Stata chooses the most frequently occurring
group to be the referent group. The first half of this page interprets the
coefficients in terms of multinomial log-odds (logits). These will be close
to but not equal to the log-odds achieved in a logistic regression with two levels
of the outcome variable. The second half interprets the coefficients in
terms of relative risk ratios.

use http://www.ats.ucla.edu/stat/stata/output/mlogit, clear

Before running the regression, obtaining a frequency of the ice cream flavors in the data can inform the selection of a reference group.

tab ice_creamfavorite flavor of ice cream | Freq. Percent Cum. ------------+----------------------------------- chocolate | 47 23.50 23.50 vanilla | 95 47.50 71.00 strawberry | 58 29.00 100.00 ------------+----------------------------------- Total | 200 100.00

Vanilla is the most frequently occurring ice cream flavor and will be the reference group in this example.

mlogit ice_cream video puzzle femaleIteration 0: log likelihood = -210.58254 Iteration 1: log likelihood = -194.75041 Iteration 2: log likelihood = -194.03782 Iteration 3: log likelihood = -194.03485 Iteration 4: log likelihood = -194.03485 Multinomial logistic regression Number of obs = 200 LR chi2(6) = 33.10 Prob > chi2 = 0.0000 Log likelihood = -194.03485 Pseudo R2 = 0.0786 ------------------------------------------------------------------------------ ice_cream | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- chocolate | video | -.0235647 .0209747 -1.12 0.261 -.0646744 .017545 puzzle | -.0389243 .0195165 -1.99 0.046 -.0771759 -.0006726 female | .8166202 .3909813 2.09 0.037 .050311 1.582929 _cons | 1.912256 1.127256 1.70 0.090 -.2971258 4.121638 -------------+---------------------------------------------------------------- strawberry | video | .022922 .0208718 1.10 0.272 -.0179861 .0638301 puzzle | .0430036 .0198894 2.16 0.031 .0040211 .081986 female | -.032862 .3500153 -0.09 0.925 -.7188793 .6531553 _cons | -4.057323 1.222939 -3.32 0.001 -6.45424 -1.660407 ------------------------------------------------------------------------------ (ice_cream==vanilla is the base outcome)

Iteration Log^{a}

Iteration 0: log likelihood = -210.58254 Iteration 1: log likelihood = -194.75041 Iteration 2: log likelihood = -194.03782 Iteration 3: log likelihood = -194.03485 Iteration 4: log likelihood = -194.03485

a.** Iteration Log** – This is a listing of the log likelihoods at each
iteration. Remember that multinomial logistic regression, like binary and
ordered logistic regression, uses maximum likelihood estimation, which is an
iterative procedure. The first iteration (called iteration 0) is the log
likelihood of the "null" or "empty" model; that is, a model with no predictors.
At the next iteration, the predictor(s) are included in the model. At each
iteration, the log likelihood increases because the goal is to maximize the log
likelihood. When the difference between successive iterations is very small, the
model is said to have "converged", the iterating stops, and the results are
displayed. For more information on this process for binary outcomes, see
Regression Models for Categorical and Limited Dependent Variables by J.
Scott Long (page 52-61).

## Model Summary

Multinomial logistic regression Number of obs^{c}= 200 LR chi2(6)^{d}= 33.10 Prob > chi2^{e}= 0.0000 Log likelihood = -194.03485^{b}Pseudo R2^{f}= 0.0786

b.** Log Likelihood** – This is the log likelihood of the fitted model. It
is used in the Likelihood Ratio Chi-Square test of whether all predictors’
regression coefficients in the model are simultaneously zero and in tests of
nested models.

c.** Number of obs** – This is the number of observations used in the
multinomial logistic regression. It may be less than the number of cases in the
dataset if there are missing values for some variables in the equation. By
default, Stata does a listwise deletion of incomplete cases.

d.** LR chi2(6)** – This is the Likelihood Ratio (LR) Chi-Square test that
for both equations (chocolate relative to vanilla and strawberry relative to
vanilla) that at least one of the predictors’ regression coefficient is not
equal to zero. The number in the parentheses indicates the degrees of freedom of
the Chi-Square distribution used to test the LR Chi-Square statistic and is
defined by the number of models estimated (2) times the number of predictors in
the model (3). The LR Chi-Square statistic can be calculated by -2*( L(null
model) – L(fitted model)) = -2*((-210.583) – (-194.035)) = 33.096, where L(null
model) is from the log likelihood with just the response variable in the model
(Iteration 0) and L(fitted model) is the log likelihood from the final iteration
(assuming the model converged) with all the parameters.

e.** Prob > chi2** – This is the probability of getting a LR test
statistic as extreme as, or more so, than the observed statistic under the null
hypothesis; the null hypothesis is that all of the regression coefficients
across both models are simultaneously equal to zero. In other words, this is the
probability of obtaining this chi-square statistic (33.10) or one more extreme
if there is in fact no effect of the predictor variables. This p-value is
compared to a specified alpha level, our willingness to accept a type I error,
which is typically set at 0.05 or 0.01. The small p-value from the LR test,
<0.00001, would lead us to conclude that at least one of the regression
coefficients in the model is not equal to zero. The parameter of the chi-square
distribution used to test the null hypothesis is defined by the degrees of
freedom in the prior line, **chi2(6)**.

f.** Pseudo R2** – This is McFadden’s pseudo R-squared. Logistic
regression does not have an equivalent to the R-squared that is found in OLS
regression; however, many people have tried to come up with one. There are a
wide variety of pseudo-R-square statistics. Because this statistic does not
mean what R-square means in OLS regression (the proportion of variance of the
response variable explained by the predictors), we suggest interpreting this
statistic with great caution.

## Parameter Estimates

------------------------------------------------------------------------------ ice_cream^{g}| Coef.^{h}Std. Err.^{j}z^{k}P>|z|^{k}[95% Conf. Interval]^{l}-------------+---------------------------------------------------------------- chocolate | video | -.0235647 .0209747 -1.12 0.261 -.0646744 .017545 puzzle | -.0389243 .0195165 -1.99 0.046 -.0771759 -.0006726 female | .8166202 .3909813 2.09 0.037 .050311 1.582929 _cons | 1.912256 1.127256 1.70 0.090 -.2971258 4.121638 -------------+---------------------------------------------------------------- strawberry | video | .022922 .0208718 1.10 0.272 -.0179861 .0638301 puzzle | .0430036 .0198894 2.16 0.031 .0040211 .081986 female | -.032862 .3500153 -0.09 0.925 -.7188793 .6531553 _cons | -4.057323 1.222939 -3.32 0.001 -6.45424 -1.660407 ------------------------------------------------------------------------------ (ice_cream==vanilla is the base outcome)^{i}

g. **ice_cream** – This is the response variable in the multinomial
logistic regression. Underneath **ice_cream** are two replicates of the
predictor variables, representing the two models that are estimated: chocolate
relative to vanilla and strawberry relative to vanilla.

h and i. **Coef.** and **referent group** – These are the estimated
multinomial logistic regression coefficients and the referent level,
respectively, for the model. An important feature of the multinomial logit model
is that it estimates *k-1* models, where *k* is the number of levels
of the outcome variable. In this instance, Stata, by default, set vanilla as
the referent group, and therefore estimated a model for chocolate relative to
vanilla and a model for strawberry relative to vanilla. Since the parameter
estimates are relative to the referent group, the standard interpretation of the
multinomial logit is that for a unit change in the predictor variable, the logit
of outcome *m* relative to the referent group is expected to change by its
respective parameter estimate (which is in log-odds units) given the variables
in the model are held constant.

** chocolate relative to vanilla**

** video** – This is the multinomial logit estimate for a one unit
increase in **video** score for chocolate relative to vanilla, given the
other variables in the model are held constant. If a subject were to increase
his **video** score by one point, the multinomial log-odds for preferring
chocolate to vanilla would be expected to decrease by 0.024 unit while holding
all other variables in the model constant.

** puzzle** – This is the multinomial logit estimate for a one unit
increase in **puzzle** score for chocolate relative to vanilla, given the
other variables in the model are held constant. If a subject were to increase
his **puzzle** score by one point, the multinomial log-odds for preferring
chocolate to vanilla would be expected to decrease by 0.039 unit while holding
all other variables in the model constant.

** female** – This is the multinomial logit estimate comparing females
to males for chocolate relative to vanilla, given the other variables in the
model are held constant. The multinomial logit for females relative to males is
0.817 unit higher for preferring chocolate to vanilla, given all other predictor
variables in the model are held constant. In other words, females are more
likely than males to prefer chocolate to vanilla.

** _cons** – This is the multinomial logit estimate for chocolate
relative to vanilla when the predictor variables in the model are evaluated at
zero. For males (the variable **female** evaluated at zero) with zero **
video** and **puzzle** scores, the logit for preferring chocolate to
vanilla is 1.912. Note that evaluating **video** and **puzzle** at zero is
out of the range of plausible scores. If the scores were mean-centered, the
intercept would have a natural interpretation: log odds of preferring chocolate
to vanilla for a male with average **video** and **puzzle** scores.

** strawberry relative to vanilla**

** video** – This is the multinomial logit estimate for a one unit
increase in **video** score for strawberry relative to vanilla, given the
other variables in the model are held constant. If a subject were to increase
his **video** score by one point, the multinomial log-odds for preferring
strawberry to vanilla would be expected to increase by 0.023 unit while holding
all other variables in the model constant.

** puzzle** – This is the multinomial logit estimate for a one unit
increase in **puzzle** score for strawberry relative to vanilla, given the
other variables in the model are held constant. If a subject were to increase
his **puzzle** score by one point, the multinomial log-odds for preferring
strawberry to vanilla would be expected to increase by 0.043 unit while holding
all other variables in the model constant.

** female** – This is the multinomial logit estimate comparing females
to males for strawberry relative to vanilla, given the other variables in the
model are held constant. The multinomial logit for females relative to males is
0.033 unit lower for preferring strawberry to vanilla, given all other predictor
variables in the model are held constant. In other words, males are more likely
than females to prefer strawberry ice cream to vanilla ice cream.

** _cons** – This is the multinomial logit estimate for strawberry
relative to vanilla when the predictor variables in the model are evaluated at
zero. For males (the variable **female** evaluated at zero) with zero **
video** and **puzzle** scores, the logit for preferring strawberry to
vanilla is -4.057.

j. **Std. Err.** – These are the standard errors of the individual
regression coefficients for the two respective models estimated. They are used
in both the calculation of the **z **test statistic, superscript k, and the
confidence interval of the regression coefficient, superscript l.

k. **z** and **P>|z|** – The test statistic **z** is the ratio of
the **Coef.** to the **Std. Err.** of the respective predictor, and the
p-value **P>|z| **is the probability the **z** test statistic (or a more
extreme test statistic) would be observed under the null hypothesis. For a
given alpha level, **z** and **P>|z|** determine whether or not the null
hypothesis that a particular predictor’s regression coefficient is zero, given
that the rest of the predictors are in the model, can be rejected. If **P>|z|
**is less than alpha, then the null hypothesis can be rejected and the
parameter estimate is considered significant at that alpha level. The **z**
value follows a standard normal distribution which is used to test against a
two-sided alternative hypothesis that the **Coef.** is not equal to zero. In
multinomial logistic regression, the interpretation of a parameter estimate’s
significance is limited to the model in which the parameter estimate was
calculated. For example, the significance of a parameter estimate in the
chocolate relative to vanilla model cannot be assumed to hold in the strawberry
relative to vanilla model.

** chocolate relative to vanilla**

For chocolate relative to vanilla, the **z** test statistic for the
predictor **video** (-0.024/0.021) is -1.12 with an associated p-value of
0.261. If we set our alpha level to 0.05, we would fail to reject the null
hypothesis and conclude that for chocolate relative to vanilla, the regression
coefficient for **video** has not been found to be statistically different
from zero given **puzzle** and **female** are in the model.

For chocolate** **relative to vanilla, the **z** test statistic for
the predictor **puzzle** (-0.039/0.020) is -1.99 with an associated p-value
of 0.046. If we again set our alpha level to 0.05, we would reject the null
hypothesis and conclude that the regression coefficient for **puzzle** has
been found to be statistically different from zero for chocolate** **relative
to vanilla given that **video** and **female** are in the model.

For chocolate** **relative to vanilla, the **z** test statistic for
the predictor **female** (0.817/0.391) is 2.09 with an associated p-value of
0.037. If we again set our alpha level to 0.05, we would reject the null
hypothesis and conclude that the difference between males and females has been
found to be statistically different for chocolate relative to vanilla given that
**video** and **female** are in the model.

For chocolate** **relative to vanilla, the **z** test statistic for
the intercept, **_cons** (1.912/1.127) is 1.70 with an associated p-value of
0.090. With an alpha level of 0.05, we would fail to reject the null hypothesis
and conclude that a) the multinomial logit for males (the variable **female**
evaluated at zero) and with zero **video** and **puzzle** scores in
chocolate** **relative to vanilla are found not to be statistically different
from zero; or b) for males with zero **video** and **puzzle** scores, you
are statistically uncertain whether they are more likely to be classified as
preferring chocolate** **or vanilla. We can make the second interpretation
when we view the **_cons** as a specific covariate profile (males with zero
**video** and **puzzle** scores). Based on the direction and significance
of the coefficient, the **_cons** indicates whether the profile would have a
greater propensity to be classified in one level of the outcome variable than
the other level.

** strawberry relative to vanilla**

For strawberry** **relative to vanilla, the **z** test statistic
for the predictor **video** (0.023/0.021) is 1.10 with an associated p-value
of 0.272. If we set our alpha level to 0.05, we would fail to reject the null
hypothesis and conclude that for strawberry relative to vanilla, the regression
coefficient for **video** has not been found to be statistically different
from zero given **puzzle** and **female** are in the model.

For strawberry relative to vanilla, the **z** test statistic for the
predictor **puzzle** (0.043/0.020) is 2.16 with an associated p-value of
0.031. If we again set our alpha level to 0.05, we would reject the null
hypothesis and conclude that the regression coefficient for **puzzle** has
been found to be statistically different from zero for strawberry** **
relative to vanilla given that **video** and **female** are in the model.

For strawberry relative to vanilla, the **z** test statistic for the
predictor **female** (-0.033/0.350) is -0.09 with an associated p-value of
0.925. If we again set our alpha level to 0.05, we would fail to reject the null
hypothesis and conclude that for strawberry relative to vanilla, the regression
coefficient for **female** has not been found to be statistically different
from zero given **puzzle** and **video** are in the model.

For strawberry relative to vanilla, the **z** test statistic for the
intercept, **_cons** (-4.057/1.223) is -3.32 with an associated p-value of
0.001. With an alpha level of 0.05, we would reject the null hypothesis and
conclude that a) the multinomial logit for males (the variable **female**
evaluated at zero) and with zero **video** and **puzzle** scores in
strawberry relative to vanilla are statistically different from zero; or b) for
males with zero **video** and **puzzle** scores, there is a statistically
significant difference between the likelihood of being classified as preferring
strawberry or preferring vanilla**.** Such a male would be more likely to be
classified as preferring vanilla to strawberry. We can make the second
interpretation when we view the **_cons **as a specific covariate profile
(males with zero **video** and **puzzle** scores). Based on the direction
and significance of the coefficient, the **_cons **indicates whether the
profile would have a greater propensity to be classified in one level of the
outcome variable than the other level.

l. **[95% Conf. Interval]** – This is the Confidence Interval (CI) for an
individual multinomial logit regression coefficient given the other predictors
are in the model for outcome *m* relative to the referent group. For a
given predictor with a level of 95% confidence, we’d say that we are 95%
confident that the "true" population multinomial logit regression coefficient
lies between the lower and upper limit of the interval for outcome *m*
relative to the referent group. It is calculated as the **Coef.** (z_{α/2})*(**Std.Err.**),
where z_{α/2} is a critical value on the standard normal distribution.
The CI is equivalent to the **z** test statistic: if the CI includes zero,
we’d fail to reject the null hypothesis that a particular regression coefficient
is zero given the other predictors are in the model. An advantage of a CI is
that it is illustrative; it provides a range where the "true" parameter may
lie.

## Relative Risk Ratio Interpretation

The following is the interpretation of the multinomial logistic regression in
terms of relative risk ratios and can be obtained by **mlogit, rrr** after
running the multinomial logit model or by specifying the **rrr** option when
the full model is specified. This part of the interpretation applies to the
output below.

mlogit ice_cream video puzzle female, rrr

Iteration 0: log likelihood = -210.58254 Iteration 1: log likelihood = -194.75041 Iteration 2: log likelihood = -194.03782 Iteration 3: log likelihood = -194.03485 Iteration 4: log likelihood = -194.03485 Multinomial logistic regression Number of obs = 200 LR chi2(6) = 33.10 Prob > chi2 = 0.0000 Log likelihood = -194.03485 Pseudo R2 = 0.0786 ------------------------------------------------------------------------------ ice_cream | RRR^{a}Std. Err. z P>|z| [95% Conf. Interval]^{b}-------------+---------------------------------------------------------------- chocolate | video | .9767108 .0204862 -1.12 0.261 .9373726 1.0177 puzzle | .9618236 .0187714 -1.99 0.046 .925727 .9993276 female | 2.262839 .8847276 2.09 0.037 1.051598 4.869199 -------------+---------------------------------------------------------------- strawberry | video | 1.023187 .0213558 1.10 0.272 .9821747 1.065911 puzzle | 1.043942 .0207633 2.16 0.031 1.004029 1.085441 female | .9676721 .3387 -0.09 0.925 .4872981 1.921595 ------------------------------------------------------------------------------ (ice_cream==vanilla is the base outcome)

a. **Relative Risk Ratio** – These are the relative risk ratios for the
multinomial logit model shown earlier. They can be obtained by exponentiating
the multinomial logit coefficients, e^{coef}, or by specifying
the **rrr** option when the **mlogit**
command is issued. Recall that the multinomial logit model estimates k-1 models,
where the k^{th} equation is relative to the referent group. The RRR of
a coefficient indicates how the risk of the outcome falling in the comparison
group compared to the risk of the outcome falling in the referent group changes
with the variable in question. An RRR > 1 indicates that the risk of the
outcome falling in the comparison group relative to the risk of the outcome
falling in the referent group increases as the variable increases. In
other words, the comparison outcome is more likely. An RRR < 1
indicates that the risk of the outcome falling in the comparison group relative
to the risk of the outcome falling in the referent group decreases as the
variable increases. See the interpretations of the relative risk ratios below
for examples. In general, if the RRR < 1, the outcome is more likely to be
in the referent group.

** chocolate relative to vanilla**

** video** – This is the relative risk ratio for a one unit increase in
**video** score for preferring chocolate to vanilla, given that the other
variables in the model are held constant. If a subject were to increase her **
video** score by one unit, the relative risk for preferring chocolate** **
to vanilla would be expected to decrease by a factor of 0.977 given the other
variables in the model are held constant. So, given a one unit increase in **
video**, the relative risk of being in the chocolate** **group would be
0.977 times more likely when the other variables in the model are held constant.
More generally, we can say that if a subject were to increase her **video**
score, we would expect her to be more likely to prefer vanilla ice cream over
chocolate ice cream.

** puzzle** – This is the relative risk ratio for a one unit increase
in **puzzle** score for preferring chocolate to vanilla, given that the other
variables in the model are held constant. If a subject were to increase her **
puzzle** score by one unit, the relative risk for preferring chocolate** **
to vanilla would be expected to decrease by a factor of 0.962 given the other
variables in the model are held constant. More generally, we can say that if two
subjects have identical **video** scores and are both female (or both male),
the subject with the higher **puzzle** score is more likely to prefer vanilla
ice cream over chocolate ice cream than the subject with the lower **puzzle**
score.

** female** – This is the relative risk ratio comparing females to
males for preferring chocolate to vanilla, given that the other variables in the
model are held constant. For females relative to males, the relative risk for
preferring chocolate relative to vanilla would be expected to increase by a
factor of 2.263 given the other variables in the model are held constant. In
other words, females are more likely than males to prefer chocolate ice cream
over vanilla ice cream.

** strawberry relative to vanilla**

** video** – This is the relative risk ratio for a one unit increase in
**video** score for preferring strawberry to vanilla, given that the other
variables in the model are held constant. If a subject were to increase her **
video** score by one unit, the relative risk for strawberry** **relative to
vanilla would be expected to increase by a factor of 1.023 given the other
variables in the model are held constant. More generally, we can say that if a
subject were to increase her **video** score, we would expect her to be more likely
to prefer strawberry ice cream over vanilla ice cream.

** puzzle** – This is the relative risk ratio for a one unit increase
in **puzzle** score for preferring strawberry to vanilla, given that the
other variables in the model are held constant. If a subject were to increase
her **puzzle** score by one unit, the relative risk for strawberry** **
relative to vanilla would be expected to increase by a factor of 1.043 given the
other variables in the model are held constant. More generally, we can say that
if two subjects have identical **video** scores and are both female (or both
male), the subject with the higher **puzzle** score is more likely to prefer
strawberry ice cream to vanilla ice cream than the subject with the lower **
puzzle** score.

** female** – This is the relative risk ratio comparing females to
males for strawberry relative to vanilla, given that the other variables
in the model are held constant. For females relative to males, the relative risk
for preferring strawberry to vanilla would be expected to decrease by a factor
of 0.968 given the other variables in the model are held constant. In other
words, females are less likely than males to prefer strawberry ice cream to
vanilla ice cream.

b.** [95% Conf. Interval]** – This is the CI for the relative risk ratio
given the other predictors are in the model. For a given predictor with a level
of 95% confidence, we’d say that we are 95% confident that the "true" population
relative risk ratio comparing outcome *m* to the referent group lies
between the lower and upper limit of the interval. An advantage of a CI is that
it is illustrative; it provides a range where the "true" relative risk ratio may
lie.