Solved – Alternative to linear regression

I need to run hundreds of linear regression models, with the same set of independent variables, but with varying dependent variables. I have checked normality for a few dozens. Some are normally distributed and some are not.

My intention, for practical reasons, is to write a macro that will run this automatically and store the P-Values of the last model (I will use stepwise or similar methods), and the association between the predicting variables and the predicted variables. My question is, since I can't use linear regression for all models, can I simply use robust regression for all models, without checking for normality? Maybe loess regression?

There is a lot of misunderstandings here, mostly posted out in comments. So I will make a summary here.

  1. You should not use stepwise methods in any form, they lead to invalid inferences. Many question on this site about that, here is a good one: Algorithms for automatic model selection which have good answers explaining why it is a bad idea.
  2. If you have many variables and need some model reduction, consider lasso or ridge regression instead. Look at Ridge, lasso and elastic net
  3. Linear regression do not assume that the response variable have a normal (or any other) distribution. It is the error term that should be normal (if you want to use the usual normal-based inference), and that can be checked by plotting the distribution of the residuals, not the response. See Why do we use residuals to test the assumptions on errors in regression? and Does the assumption of Normal errors imply that Y is also Normal?

Similar Posts:

Rate this post

Leave a Comment