I'm not too sure how to interpret the coefficients of a variable that has more than 2 levels. Please note that my model contains explanatory variables that are numeric, binary, and with multiple category

Given that my response variable $$0 = text{no late debt payment} , 1 = text{has late debt payment}$$ and one of my x variables in the model is **education level** given by:

$$

1 = text{no high school diploma/GED} \

2 = text{has high school diploma/GED}\

3 = text{some college education}\

4 = text{College education.}

$$

So, in the R glm output (family = "binomial), the coefficients for the dummy variables are:

$$

text{EDCL2}= 0.48430 \

text{EDCL3}= 0.89571 \

text{EDCL4}= 0.45851 \

…

$$

After exponentiating them, they are :

$$

text{EDCL2}= 1.56 \

text{EDCL3}= 2.36 \

text{EDCL4}= 1.38 \

…

$$

So my interpretation is as follows:

EDCL2: Implies that a respondent that has completed high school education is about 1.56 times as likely to have a late debt payment as a respondent that has NOT completed high school.

EDCL3: Implies that a respondent that has some college education is about 2.69 times as likely to have a late debt payment as a respondent that has NOT completed high school.

EDCL3: Implies that a respondent that has some college education is about 1.38 times as likely to have a late debt payment as a respondent that has NOT completed high school.

Is this interpretation correct? I know that it may be more complex than that and what would be the right way to interpret this data? Any help is appreciated. THANK YOU!

**Contents**hide

#### Best Answer

The original coefficients are additive on the log-odds scale, so the exponentiated coefficients ARE multiplicative, but on the odds scale. "… (T)imes as likely" is not accurate.

For example, the odds that a six-sided die will come up "1" on the next roll is 1:5 or 0.2, whereas the odds that it will come up either "1" or "2" is 2:4 or 0.5 — more than doubled.

For another example, lets say the prevalence of a rare disease is 1 in 1 million people. Then the odds a person has the disease is 1:999,999. If someone's odds were increased by a factor of 10, due to some condition, then their odds would be 10:999,999 and their chances would be 10 in 1,000,009, which is nearly ten times, but not quite.

These two examples show that the intercept makes a difference here; these coefficients alone don't allow us to say how much more likely late debt payment is in groups 2 through 4 than group 1. It would be valid to say that the estimated log-odds are 0.48 larger in group 2 than group 1 and (equivalently) that the estimated odds are 1.56 times larger in group 2 than group 1.