I am trying to create a regression model where the (continuous) outcome is multimodal:
The outcome is the retail price of a certain product, and prices tend to fall around distinct amounts (750, 1000, 1250, 1500, etc). There are, however, a few prices in between so the prices are not distinct.
I have run a linear model with satisfying results, though the extra prices between the modes give me pause. I also tried binning the prices down to a few groups representing the modes and it works somewhat well.
Is there a better or worse way to model this? is there some sort of better or worse methodology for binning the outcome?
OLS regression does not assume that the dependent variable is normally distributed, nor even unimodal. It makes assumptions about the error term, as estimated by the residuals.
Many variables exhibit "clumping" at certain round numbers and this is not necessarily problematic for regular regression.
Categorizing, or binning, continuous data is very rarely a good idea. However, if there are very few prices between the round numbers, this may be a case where it does make sense. If you do this, then the OLS model should no longer be used, but ordinal logistic regression (or some other ordinal model) instead.