# Solved – What type of regression to use with negative values

I have a data frame (in R) containing `player_id`, `points`, `away`, `opponent_fact_1`, `opponent_fact_2`. `points` can be negative. `away` lets us know if the game was at home or away (0 or 1). `opponent_fact_1` and `opponent_fact_2` gives us a stat about the opponent. `opponent_fact_1` is on a scale of about 0.0-5.5. `opponent_fact_2` is on a scale of about 70.0-95.0. The issue with the two facts is that there are fewer opponents as you reach the upper and lower bounds, so fewer data points exist at those levels.

How can I determine how much of an influence `away`, `opponent_fact_1`, and `opponent_fact_2` has on a player's `points`?

I asked someone online how to do this and he said use poisson regression, but didn't go into detail. Why would regression be helpful here? What is it? And I read that you can't use poisson regression with negative values? Also, how do I deal with the fewer data points around the upper and lower bounds?

I'm using R, so any examples in R would be awesome. Explaining the output would be even better.

I hope this isn't asking for too much.

``  player_id opponent_team_id away  points opponent_fact_1 opponent_fact_2  1       695               22    0     0.0        2.888889           81.58  2       695               30    1     1.2        2.750000           81.58  3       695                4    1     3.0        3.714286           69.57  4       695               20    0    -3.0        3.000000           84.09  5       695               14    0     0.0        2.444444           72.97  ``
Contents

Regular linear regression (e.g. the `lm` or `glm` functions in R) handles negative values just fine.

One model you could try would be:

``model1 <- lm(points ~ away + opponent_fact_1 + opponent_fact_2, data=my_data_frame) summary(model1) ``

If you've got a lot of data (and several rows per player and per opponent), you could also try this model:

``model2 <- lm(points ~ away + factor(player_id) + factor(opponent_team_id), data=my_data_frame) summary(model2) ``

This will give you a model that includes a coefficient for each player, and for each opponent_team_id. These coefficients will represent the average points expected for a player, as well as the average points expected against a given opponent.

Have you every run a regression model before? What's the goal of this analysis?

Rate this post