I encountered some statisticians that never use models other than Linear Regression for prediction because they believe that "ML models" such as random forest or gradient boosting are hard to explain or "not interpretable".

In a Linear Regression, given that the set of assumptions is verified (normality of errors, homoskedasticity, no multi-collinearity), the t-tests provide a way to test the significance of variables, tests that to my knowledge are not available in the random forests or gradient boosting models.

Therefore, my question is if I want to model a dependent variable with a set of independent variables, for the sake of interpretability should I always use Linear Regression?

**Contents**hide

#### Best Answer

It is hard for me to believe that you heard people saying this, because it would be a dumb thing to say. It's like saying that you use only the hammer (including drilling holes and for changing the lightbulbs), because it's straightforward to use and gives predictable results.

Second, linear regression is not always "interpretable". If you have linear regression model with many polynomial terms, or just a lot of features, it would be hard to interpret. For example, say that you used the raw values of each of the 784 pixels from MNIST† as features. Would knowing that pixel 237 has weight equal to -2311.67 tell you anything about the model? For image data, looking at activation maps of the convolutional neural network would be much easier to understand.

Finally, there are models that are equally interpretable, e.g. logistic regression, decision trees, naive Bayes algorithm, and many more.

† – *As noticed by @Ingolifs in the comment, and as discussed in this thread, MNIST may be not the best example, since this is a very simple dataset. For most of the realistic image datasets, logistic regression would not work and looking at the weights would not give any straightforward answers. However, if you look closer at the weights in the linked thread, then their interpretation is also not straightforward, for example weights for predicting "5" or "9" do not show any obvious pattern (see image below, copied from the other thread).*

### Similar Posts:

- Solved – In XGboost are weights estimated for each sample and then averaged
- Solved – Regression models similar to Random Forest
- Solved – Is exponential loss function the only reason for AdaBoost being adaptive algorithm
- Solved – Is exponential loss function the only reason for AdaBoost being adaptive algorithm
- Solved – Gradient boosting and functional gradient descent