Solved – What type of regression to use with negative values

If possible, please explain things like I'm 5. I know very little about this subject, but would like to learn more.

I have a data frame (in R) containing player_id, points, away, opponent_fact_1, opponent_fact_2. points can be negative. away lets us know if the game was at home or away (0 or 1). opponent_fact_1 and opponent_fact_2 gives us a stat about the opponent. opponent_fact_1 is on a scale of about 0.0-5.5. opponent_fact_2 is on a scale of about 70.0-95.0. The issue with the two facts is that there are fewer opponents as you reach the upper and lower bounds, so fewer data points exist at those levels.

How can I determine how much of an influence away, opponent_fact_1, and opponent_fact_2 has on a player's points?

I asked someone online how to do this and he said use poisson regression, but didn't go into detail. Why would regression be helpful here? What is it? And I read that you can't use poisson regression with negative values? Also, how do I deal with the fewer data points around the upper and lower bounds?

I'm using R, so any examples in R would be awesome. Explaining the output would be even better.

I hope this isn't asking for too much.

EDIT: Added sample data

  player_id opponent_team_id away  points opponent_fact_1 opponent_fact_2  1       695               22    0     0.0        2.888889           81.58  2       695               30    1     1.2        2.750000           81.58  3       695                4    1     3.0        3.714286           69.57  4       695               20    0    -3.0        3.000000           84.09  5       695               14    0     0.0        2.444444           72.97  

Regular linear regression (e.g. the lm or glm functions in R) handles negative values just fine.

One model you could try would be:

model1 <- lm(points ~ away + opponent_fact_1 + opponent_fact_2, data=my_data_frame) summary(model1) 

If you've got a lot of data (and several rows per player and per opponent), you could also try this model:

model2 <- lm(points ~ away + factor(player_id) + factor(opponent_team_id), data=my_data_frame) summary(model2) 

This will give you a model that includes a coefficient for each player, and for each opponent_team_id. These coefficients will represent the average points expected for a player, as well as the average points expected against a given opponent.

Have you every run a regression model before? What's the goal of this analysis?

Similar Posts:

Rate this post

Leave a Comment