I googled a bit but didn't find anything on this.

Suppose you do a quantile regression on the qth quantile of the dependent variable.

Then you split the DV at the qth quantile and label the result 0 and 1. Then you do logistic regression on the categorized DV.

I'm looking for any Monte-Carlo studies of this or reasons to prefer one over the other etc.

**Contents**hide

#### Best Answer

For simplicity, assume you have a continuous dependent variable Y and a continuous predictor variable X.

**Logistic Regression**

If I understand your post correctly, your logistic regression will categorize Y into 0 and 1 based on the quantile of the (unconditional) distribution of Y. Specifically, the q-th quantile of the distribution of observed Y values will be computed and Ycat will be defined as 0 if Y is strictly less than this quantile and 1 if Y is greater than or equal to this quantile.

If the above captures your intent, then the logistic regression will model the odds of Y exceeding or being equal to the (observed) q-th quantile of the (unconditional) Y distribution as a function of X.

**Quantile Regression**

On the other hand, if you are performing a quantile regression of Y on X, you are focusing on modelling how the q-th quantile of the conditional distribution of Y given X changes as a function of X.

**Logistic Regression versus Quantile Regression**

It seems to me that these two procedures have totally different aims, since the first procedure (i.e., logistic regression) focuses on the q-th quantile of the unconditional distribution of Y, whereas the second procedure (i.e., quantile regression) focuses on the the q-th quantile of the conditional distribution of Y.

`The unconditional distribution of Y is the distribution of Y values (hence it ignores any information about the X values). The conditional distribution of Y given X is the distribution of those Y values for which the values of X are the same. `

**Illustrative Example**

For illustration purposes, let's say Y = cholesterol and X = body weight.

Then logistic regression is modelling the odds of having a 'high' cholesterol value (i.e., greater than or equal to the q-th quantile of the observed cholesterol values) as a function of body weight, where the definition of 'high' has no relation to body weight. In other words, the marker for what constitutes a 'high' cholesterol value is independent of body weight. What changes with body weight in this model is the odds that a cholesterol value would exceed this marker.

On the other hand, quantile regression is looking at how the *'marker' cholesterol values for which q% of the subjects with the same body weight in the underlying population have a higher cholesterol value* vary as a function of body weight. You can think of these cholesterol values as markers for identifying what cholesterol values are 'high' – but in this case, each marker depends on the corresponding body weight; furthermore, the markers are assumed to change in a predictable fashion as the value of X changes (e.g., the markers tend to increase as X increases).

### Similar Posts:

- Solved – Should I include an interaction term for a covariate if I expect it to be correlated with one or more of the variables
- Solved – Why use quantile regression instead of splitting the data in quantiles and calculating multiple linear regressions
- Solved – Human body organs growth graph or data
- Solved – Interpret coefficients from a multivariate regression
- Solved – the difference between linear regression and logistic regression