Solved – How does quantile regression compare to logistic regression with the variable split at the quantile

I googled a bit but didn't find anything on this.

Suppose you do a quantile regression on the qth quantile of the dependent variable.

Then you split the DV at the qth quantile and label the result 0 and 1. Then you do logistic regression on the categorized DV.

I'm looking for any Monte-Carlo studies of this or reasons to prefer one over the other etc.

For simplicity, assume you have a continuous dependent variable Y and a continuous predictor variable X.

Logistic Regression

If I understand your post correctly, your logistic regression will categorize Y into 0 and 1 based on the quantile of the (unconditional) distribution of Y. Specifically, the q-th quantile of the distribution of observed Y values will be computed and Ycat will be defined as 0 if Y is strictly less than this quantile and 1 if Y is greater than or equal to this quantile.

If the above captures your intent, then the logistic regression will model the odds of Y exceeding or being equal to the (observed) q-th quantile of the (unconditional) Y distribution as a function of X.

Quantile Regression

On the other hand, if you are performing a quantile regression of Y on X, you are focusing on modelling how the q-th quantile of the conditional distribution of Y given X changes as a function of X.

Logistic Regression versus Quantile Regression

It seems to me that these two procedures have totally different aims, since the first procedure (i.e., logistic regression) focuses on the q-th quantile of the unconditional distribution of Y, whereas the second procedure (i.e., quantile regression) focuses on the the q-th quantile of the conditional distribution of Y.

The unconditional distribution of Y is the  distribution of Y values (hence it ignores any  information about the X values).   The conditional distribution of Y given X is the  distribution of those Y values for which the values  of X are the same.   

Illustrative Example

For illustration purposes, let's say Y = cholesterol and X = body weight.

Then logistic regression is modelling the odds of having a 'high' cholesterol value (i.e., greater than or equal to the q-th quantile of the observed cholesterol values) as a function of body weight, where the definition of 'high' has no relation to body weight. In other words, the marker for what constitutes a 'high' cholesterol value is independent of body weight. What changes with body weight in this model is the odds that a cholesterol value would exceed this marker.

On the other hand, quantile regression is looking at how the 'marker' cholesterol values for which q% of the subjects with the same body weight in the underlying population have a higher cholesterol value vary as a function of body weight. You can think of these cholesterol values as markers for identifying what cholesterol values are 'high' – but in this case, each marker depends on the corresponding body weight; furthermore, the markers are assumed to change in a predictable fashion as the value of X changes (e.g., the markers tend to increase as X increases).

Similar Posts:

Rate this post

Leave a Comment