## Solved – Root mean square vs average absolute deviation

Both Root Mean Square and Average absolute deviation seem like the measures of the magnitude of variability (especially when the variates are both +ve and -ve). What are the rules of thumb to choose one of them over the other? Best Answer In theory, this should be determined by how important different sized errors are … Read more

## Solved – What to Do When a Log-binomial Model’s Convergence Fails

There are times when one might want to estimate a prevalence ratio or relative risk, in preference to an odds ratio, for data with binary outcomes – say, if the outcome in question isn't rare, so the RR ~ OR relationship doesn't hold. I've implemented a model in R to do that, as follows: uni.out … Read more

## Solved – Overall significance of a categorical variables in logistic regression

I have seen two approaches in binary logistic regression with categorical independent variables (IV) with more than two levels. In one approach, a reference category for the IV is defined and the rest of the categories are tested regarding this reference category,thus obtaining p-values for each category compared to the reference category (which is what … Read more

## Solved – model for continuous dependent variable bounded between 0 and 1

I'm attempting a multiple regression model where the predicted variable is runoff ratio – the ratio of watershed discharge to the precipitation input. This should generally be bounded [0,1], however, due to measurement error some values > 1 occur. Originally, I modeled this with the predicted variable un-transformed, but logistic regression has been suggested to … Read more

## Solved – Determining trend significance in a time series

I have some time series data and want to test for the existence of and estimate the parameters of a linear trend in a dependent variable w.r.t. time, i.e. time is my independent variable. The time points cannot be considered IID under the null of no trend. Specifically, the error terms for points sampled near … Read more

## Solved – How to know which interaction terms to include in a regression model

I'm trying to fit a model with one response variable and 11 predictors. Of these 11 predictors: 5 are continuous, 4 are dichotomous and 3 are categorical (containing between 3-7 different categories, which I've coded with dummy variables). I'm having a difficult time trying to figure out which interaction terms to include in the maximum … Read more

## Solved – Cubic and linear relationships in multiple regression model

What is the correct way to fit a multiple regression model where I have a combination of cubic and linearly related independent variables? If I transform the variable showing cubic relationship, how do I transform back the forecasted variable, given that it's the result of both non-transformed and transformed data sets? Best Answer One way … Read more

## Solved – How do instrumental variables address selection bias

I'm wondering how an instrumental variable addresses selection bias in regression. Here's the example I'm chewing on: In Mostly Harmless Econometrics, the authors discuss an IV regression relating to military service and earnings later in life. The question is, "Does serving in the military increase or decrease future earnings?" They investigate this question in the … Read more

## Solved – Regression with Lots of Categorical Variables

I'm facing a regression task with many categorical and few numeric features. I encoded them into dummies and removed the first dummy column for each feature. I am not getting very good R2 at all. I am wondering if, aside from creating dummies, there are any special strategies in these situations related to having so … Read more

## Solved – Model that optimizes mean absolute error always gives same prediction

My gradient boosting regression model (GBM) is trained to minimize mean absolute error (MAE) but gives the same prediction for every record on my highly skewed dataset. I believe there is a quick fix to the immediate problem (use RMSE) but my situation is complicated, and I worry that using RMSE will lead to a … Read more