I need to run hundreds of linear regression models, with the same set of independent variables, but with varying dependent variables. I have checked normality for a few dozens. Some are normally distributed and some are not.

My intention, for practical reasons, is to write a macro that will run this automatically and store the P-Values of the last model (I will use stepwise or similar methods), and the association between the predicting variables and the predicted variables. My question is, since I can't use linear regression for all models, can I simply use robust regression for all models, without checking for normality? Maybe loess regression?

**Contents**hide

#### Best Answer

There is a lot of misunderstandings here, mostly posted out in comments. So I will make a summary here.

- You should not use stepwise methods in any form, they lead to invalid inferences. Many question on this site about that, here is a good one: Algorithms for automatic model selection which have good answers explaining why it is a bad idea.
- If you have many variables and need some model reduction, consider lasso or ridge regression instead. Look at Ridge, lasso and elastic net
- Linear regression do not assume that the response variable have a normal (or any other) distribution. It is the error term that should be normal (if you want to use the usual normal-based inference), and that can be checked by plotting the distribution of the
*residuals*, not the response. See Why do we use residuals to test the assumptions on errors in regression? and Does the assumption of Normal errors imply that Y is also Normal?

### Similar Posts:

- Solved – Checking multivariate normality in linear regression using R
- Solved – Checking multivariate normality in linear regression using R
- Solved – stepwise selection on Negative Binomial regression model
- Solved – Transformation for negative skewness data
- Solved – Alternatives to stepwise regression for generalized linear mixed models