Multinomial Logistic Regression | Stata Annotated Output

This page shows an example of an multinomial logistic regression analysis with footnotes explaining the output. The data were collected on 200 high school students and are scores on various tests, including science, math, reading and social studies. The outcome measure in this analysis is socio-economic status (ses)- low, medium and high- from which we are going to see what relationships exists with science test scores (science), social science test scores (socst) and gender (female). Our response variable, ses, is going to be treated as categorical under the assumption that the levels of ses status have no natural ordering and we are going to allow Stata to choose the referent group, middle ses. The first half of this page interprets the coefficients in terms of multinomial log-odds (logits) and the second half interprets the coefficients in terms of relative risk ratios.

use https://stats.idre.ucla.edu/stat/data/hsb2, clear

mlogit ses science socst female

Iteration 0:   log likelihood = -210.58254
Iteration 1:   log likelihood = -194.75041
Iteration 2:   log likelihood = -194.03782
Iteration 3:   log likelihood = -194.03485
Iteration 4:   log likelihood = -194.03485

Multinomial logistic regression                   Number of obs   =        200
                                                  LR chi2(6)      =      33.10
                                                  Prob > chi2     =     0.0000
Log likelihood = -194.03485                       Pseudo R2       =     0.0786

------------------------------------------------------------------------------
         ses |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
low          |
     science |  -.0235647   .0209747    -1.12   0.261    -.0646744     .017545
       socst |  -.0389243   .0195165    -1.99   0.046    -.0771759   -.0006726
      female |   .8166202   .3909813     2.09   0.037      .050311    1.582929
       _cons |   1.912256   1.127256     1.70   0.090    -.2971258    4.121638
-------------+----------------------------------------------------------------
high         |
     science |    .022922   .0208718     1.10   0.272    -.0179861    .0638301
       socst |   .0430036   .0198894     2.16   0.031     .0040211     .081986
      female |   -.032862   .3500153    -0.09   0.925    -.7188793    .6531553
       _cons |  -4.057323   1.222939    -3.32   0.001     -6.45424   -1.660407
------------------------------------------------------------------------------
(ses==middle is the base outcome)

Iteration Log^a

Iteration 0:   log likelihood = -210.58254
Iteration 1:   log likelihood = -194.75041
Iteration 2:   log likelihood = -194.03782
Iteration 3:   log likelihood = -194.03485
Iteration 4:   log likelihood = -194.03485

a. This is a listing of the log likelihoods at each iteration. Remember that multinomial logistic regression, like binary and ordered logistic regression, uses maximum likelihood estimation, which is an iterative procedure. The first iteration (called iteration 0) is the log likelihood of the "null" or "empty" model; that is, a model with no predictors. At the next iteration, the predictor(s) are included in the model. At each iteration, the log likelihood decreases because the goal is to minimize the log likelihood. When the difference between successive iterations is very small, the model is said to have "converged", the iterating stops, and the results are displayed. For more information on this process for binary outcomes, see Regression Models for Categorical and Limited Dependent Variables by J. Scott Long (page 52-61).

Model Summary

Multinomial logistic regression                   Number of obs^c   =        200
                                                  LR chi2(6)^d      =      33.10
                                                  Prob > chi2^e     =     0.0000
Log likelihood = -194.03485^b                      Pseudo R2^f       =     0.0786

b. Log Likelihood – This is the log likelihood of the fitted model. It is used in the Likelihood Ratio Chi-Square test of whether all predictors’ regression coefficients in the model are simultaneously zero and in tests of nested models.

c. Number of obs – This is the number of observations used in the multinomial logistic regression. It may be less than the number of cases in the dataset if there are missing values for some variables in the equation. By default, Stata does a listwise deletion of incomplete cases.

d. LR chi2(6) – This is the Likelihood Ratio (LR) Chi-Square test that for both equations (low ses relative to middle ses and high ses relative to middle ses) at least one of the predictors’ regression coefficient is not equal to zero. The number in the parentheses indicates the degrees of freedom of the Chi-Square distribution used to test the LR Chi-Square statistic and is defined by the number of models estimated (2) times the number of predictors in the model (3). The LR Chi-Square statistic can be calculated by -2*( L(null model) – L(fitted model)) = -2*((-210.583) – (-194.035)) = 33.096, where L(null model) is from the log likelihood with just the response variable in the model (Iteration 0) and L(fitted model) is the log likelihood from the final iteration (assuming the model converged) with all the parameters.

e. Prob > chi2 – This is the probability of getting a LR test statistic as extreme as, or more so, than the observed under the null hypothesis; the null hypothesis is that all of the regression coefficients across both models are simultaneously equal to zero. In other words, this is the probability of obtaining this chi-square statistic (33.10) if there is in fact no effect of the predictor variables. This p-value is compared to a specified alpha level, our willingness to accept a type I error, which is typically set at 0.05 or 0.01. The small p-value from the LR test, <0.00001, would lead us to conclude that at least one of the regression coefficients in the model is not equal to zero. The parameter of the Chi-Square distribution used to test the null hypothesis is defined by the degrees of freedom in the prior line, chi2(6).

f. Pseudo R2 – This is McFadden’s pseudo R-squared. Logistic regression does not have an equivalent to the R-squared that is found in OLS regression; however, many people have tried to come up with one. There are a wide variety of pseudo-R-square statistics. Because this statistic does not mean what R-square means in OLS regression (the proportion of variance for the response variable explained by the predictors), we suggest interpreting this statistic with great caution.

Parameter Estimates

------------------------------------------------------------------------------
         ses^g |      Coef.^h   Std. Err.^j      z^k    P>|z|^k     [95% Conf. Interval]^l
-------------+----------------------------------------------------------------
low          |
     science |  -.0235647   .0209747    -1.12   0.261    -.0646744     .017545
       socst |  -.0389243   .0195165    -1.99   0.046    -.0771759   -.0006726
      female |   .8166202   .3909813     2.09   0.037      .050311    1.582929
       _cons |   1.912256   1.127256     1.70   0.090    -.2971258    4.121638
-------------+----------------------------------------------------------------
high         |
     science |    .022922   .0208718     1.10   0.272    -.0179861    .0638301
       socst |   .0430036   .0198894     2.16   0.031     .0040211     .081986
      female |   -.032862   .3500153    -0.09   0.925    -.7188793    .6531553
       _cons |  -4.057323   1.222939    -3.32   0.001     -6.45424   -1.660407
------------------------------------------------------------------------------
(ses==middle is the base outcome)ⁱ

g. ses – This is the response variable in the multinomial logistic regression. Underneath ses are two replicates of the predictor variables, representing the two models that are estimated: low ses relative to middle ses and high ses relative to middle ses.

h and i. Coef. and referent group – These are the estimated multinomial logistic regression coefficients and the referent level, respectively, for the model. An important feature of the multinomial logit model is that it estimates k-1 models, where k is the number of levels of the dependent variable. In this instance, Stata, by default, set middle ses as the referent group and therefore estimated a model for low ses relative to middle ses and a model for high ses relative to middle ses. Therefore, since the parameter estimates are relative to the referent group, the standard interpretation of the multinomial logit is that for a unit change in the predictor variable, the logit of outcome m relative to the referent group is expected to change by its respective parameter estimate given the variables in the model are held constant.

low ses relative to middle ses

science – This is the multinomial logit estimate for a one unit increase in science test score for low ses relative to middle ses given the other variables in the model are held constant. If a subject were to increase his science test score by one point, the multinomial log-odds for low ses relative to middle ses would be expected to decrease by 0.024 unit while holding all other variables in the model constant.

socst – This is the multinomial logit estimate for a one unit increase in socst test score for low ses relative to middle ses given the other variables in the model are held constant. If a subject were to increase his socst test score by one point, the multinomial log-odds for low ses relative to middle ses would be expected to decrease by 0.039 unit while holding all other variables in the model constant.

female – This is the multinomial logit estimate comparing females to males for low ses relative to middle ses given the other variables in the model are held constant. The multinomial logit for females relative to males is 0.817 unit higher for being in low ses relative to middle ses given all other predictor variables in the model are held constant.

_cons – This is the multinomial logit estimate for low ses relative to middle ses when the predictor variables in the model are evaluated at zero. For males (the variable female evaluated at zero) with zero science and socst test scores, the logit for being in low ses versus middle ses is 1.912. Note, evaluating science and socst at zero is out of the range of plausible test scores and if the test scores were mean-centered, the intercept would have a natural interpretation: log odds of being in low ses versus middle ses for a male with average science and socst test score.

high ses relative to middle ses

science – This is the multinomial logit estimate for a one unit increase in science test score for high ses relative to middle ses given the other variables in the model are held constant. If a subject were to increase his science test score by one point, the multinomial log-odds for high ses relative to middle ses would be expected to increase by 0.023 unit while holding all other variables in the model constant.

socst – This is the multinomial logit estimate for a one unit increase in socst test score for high ses relative to middle ses given the other variables in the model are held constant. If a subject were to increase his socst test score by one point, the multinomial log-odds for high ses relative to middle ses would be expected to increase by 0.043 unit while holding all other variables in the model constant.

female – This is the multinomial logit estimate comparing females to males for high ses relative to middle ses given the other variables in the model are held constant. The multinomial logit for females relative to males is 0.033 unit lower for being in high ses relative to middle ses given all other predictor variables in the model are held constant.

_cons – This is the multinomial logit estimate for high ses relative to middle ses when the predictor variables in the model are evaluated at zero. For males (the variable female evaluated at zero) with zero science and socst test scores, the logit for being in high ses relative to middle ses is -4.057.

j. Std. Err. – These are the standard errors of the individual regression coefficients for the two respective models estimated. They are used in both the calculation of the z test statistic, superscript k, and the confidence interval of the regression coefficient, superscript l.

k. z and P>|z| – These are the test statistics and p-value, respectively, that within a given model the null hypothesis that an individual predictor’s regression coefficient is zero given that the rest of the predictors are in the model. The test statistic z is the ratio of the Coef. to the Std. Err. of the respective predictor. The z value follows a standard normal distribution which is used to test against a two-sided alternative hypothesis that the Coef. is not equal to zero. The probability that a particular z test statistic is as extreme as, or more so, than what has been observed under the null hypothesis is defined by P>|z|. The interpretation of the parameter estimates’ significance is limited only to the first equation, low ses relative to middle ses. The interpretation for the second model, high ses relative to middle ses, naturally falls out of the first equations interpretation.

For low ses relative to middle ses, the z test statistic for the predictor science (-0.024/0.021) is -1.12 with an associated p-value of 0.261. If we set our alpha level to 0.05, we would fail to reject the null hypothesis and conclude that for low ses relative to middle ses, the regression coefficient for science has not been found to be statistically different from zero given socst and female are in the model.

For low ses relative to middle ses, the z test statistic for the predictor socst (-0.039/0.020) is -1.99 with an associated p-value of 0.046. If we again set our alpha level to 0.05, we would reject the null hypothesis and conclude that the regression coefficient for socst has been found to be statistically different from zero for low ses relative to middle ses given that science and female are in the model.

For low ses relative to middle ses, the z test statistic for the predictor female (0.817/0.391) is 2.09 with an associated p-value of 0.037. If we again set our alpha level to 0.05, we would reject the null hypothesis and conclude that the difference between males and females has been found to be statistically different for low ses relative to middle ses given that science and female are in the model.

For low ses relative to middle ses, the z test statistic for the intercept, _cons (1.912/1.129) is 1.70 with an associated p-value of 0.090. With an alpha level of 0.05, we would fail to reject the null hypothesis and conclude, a) that the multinomial logit for males (the variable female evaluated at zero) and with zero science and socst test scores in low ses relative to middle ses are found not to be statistically different from zero; or b) for males with zero science and socst test scores, you are statistically uncertain whether they are more likely to be classified as low ses or middle ses. We can make the second interpretation when we view the _cons as a specific covariate profile (males with zero science and socst test scores). Based on the direction and significance of the coefficient, the _cons tells whether the profile would have a greater propensity to fall in one of the levels of the dependent variable.

l. [95% Conf. Interval] – This is the Confidence Interval (CI) for an individual multinomial logit regression coefficient given the other predictors are in the model for outcome m relative to the referent group. For a given predictor with a level of 95% confidence, we’d say that we are 95% confident that the "true" population multinomial logit regression coefficient lies between the lower and upper limit of the interval for outcome m relative to the referent group. It is calculated as the Coef. ± (z_α/2)*(Std.Err.), where z_α/2 is a critical value on the standard normal distribution. The CI is equivalent to the z test statistic: if the CI includes zero, we’d fail to reject the null hypothesis that a particular regression coefficient is zero given the other predictors are in the model. An advantage of a CI is that it is illustrative; it provides a range where the "true" parameter may lie.

Relative Risk Ratio Interpretation

The following is the interpretation of the multinomial logistic regression in terms of relative risk ratios and can be obtained by mlogit, rrr after running the multinomial logit model or by specifying the rrr option when the full model is specified. This part of the interpretation applies to the output below.

mlogit ses science socst female, rrr

Iteration 0:   log likelihood = -210.58254
Iteration 1:   log likelihood = -194.75041
Iteration 2:   log likelihood = -194.03782
Iteration 3:   log likelihood = -194.03485
Iteration 4:   log likelihood = -194.03485

Multinomial logistic regression                   Number of obs   =        200
                                                  LR chi2(6)      =      33.10
                                                  Prob > chi2     =     0.0000
Log likelihood = -194.03485                       Pseudo R2       =     0.0786

------------------------------------------------------------------------------
         ses |        RRR^a   Std. Err.      z    P>|z|     [95% Conf. Interval]^b
-------------+----------------------------------------------------------------
low          |
     science |   .9767108   .0204862    -1.12   0.261     .9373726      1.0177
       socst |   .9618236   .0187714    -1.99   0.046      .925727    .9993276
      female |   2.262839   .8847276     2.09   0.037     1.051598    4.869199
-------------+----------------------------------------------------------------
high         |
     science |   1.023187   .0213558     1.10   0.272     .9821747    1.065911
       socst |   1.043942   .0207633     2.16   0.031     1.004029    1.085441
      female |   .9676721      .3387    -0.09   0.925     .4872981    1.921595
------------------------------------------------------------------------------
(ses==middle is the base outcome)

a. Relative Risk Ratio – These are the relative risk ratios for the multinomial logit model shown earlier. They can be obtained by exponentiating the multinomial logit coefficients, e^coef., or by specifying the rrr option. Recall that the multinomial logit model estimates k-1 models, where the k^th equation is relative to the referent group. If the model was to be written out in an exponentiated form where the predictor of interest is evaluated at x + δ and at x for outcome m relative to referent group, where δ is the change in the predictor we are interested in (δ is traditionally is set to one) while the other variables in the model are held constant. If we then take their ratio, the ratio would reduce to the ratio of two probabilities, the relative risk. In this sense, the exponentiated multinomial logit coefficient provides an estimate of relative risk. However, the exponentiated coefficient are commonly interpreted as odds ratios. Standard interpretation of the relative risk ratios is for a unit change in the predictor variable, the relative risk ratio of outcome m relative to the referent group is expected to change by a factor of the respective parameter estimate given the variables in the model are held constant.

low ses relative to middle ses

science – This is the relative risk ratio for a one unit increase in science score for low ses relative to middle ses level given that the other variables in the model are held constant. If a subject were to increase her science test score by one unit, the relative risk for low ses relative to middle ses would be expected to decrease by a factor of 0.977 given the other variables in the model are held constant. So, given a one unit increase in science, the relative risk of being in the low ses group would be 0.977 times more likely when the other variables in the model are held constant. More generally, we can say that if a subject were to increase their science test score, they’d be expected to fall into middle ses as compared to low ses.

socst – This is the relative risk ratio for a one unit increase in socst score for low ses relative to middle ses level given that the other variables in the model are held constant. If a subject were to increase her socst test score by one unit, the relative risk for low ses relative to middle ses would be expected to decrease by a factor of 0.962 given the other variables in the model are held constant.

female – This is the relative risk ratio comparing females to males for low ses relative to middle ses level given that the other variables in the model are held constant. For females relative to males, the relative risk for low ses relative to middle ses would be expected to increase by a factor of 2.263 given the other variables in the model are held constant.

high ses relative to middle ses

science – This is the relative risk ratio for a one unit increase in science score for high ses relative to middle ses level given that the other variables in the model are held constant. If a subject were to increase her science test score by one unit, the relative risk for high ses relative to middle ses would be expected to increase by a factor of 1.023 given the other variables in the model are held constant.

socst – This is the relative risk ratio for a one unit increase in socst score for high ses relative to middle ses level given that the other variables in the model are held constant. If a subject were to increase their socst test score by one unit, the relative risk for high ses relative to middle ses would be expected to increase by a factor of 1.043 given the other variables in the model are held constant.

female – This is the relative risk ratio comparing females to males for high ses relative to middle ses level given that the other variables in the model are held constant. For females relative to males, the relative risk for high ses relative to middle ses would be expected to decrease by a factor of 0.968 given the other variables in the model are held constant.

b. [95% Conf. Interval] – This is the CI for the relative risk ratio given the other predictors are in the model. For a given predictor with a level of 95% confidence, we’d say that we are 95% confident that the "true" population relative risk ratio comparing outcome m to the referent group lies between the lower and upper limit of the interval. An advantage of a CI is that it is illustrative; it provides a range where the “true” relative risk ratio may lie.

Iteration Loga

Model Summary

Parameter Estimates

Relative Risk Ratio Interpretation

Iteration Log^a