If possible, please explain things like I'm 5. I know very little about this subject, but would like to learn more.

I have a data frame (in R) containing `player_id`

, `points`

, `away`

, `opponent_fact_1`

, `opponent_fact_2`

. `points`

can be negative. `away`

lets us know if the game was at home or away (0 or 1). `opponent_fact_1`

and `opponent_fact_2`

gives us a stat about the opponent. `opponent_fact_1`

is on a scale of about 0.0-5.5. `opponent_fact_2`

is on a scale of about 70.0-95.0. The issue with the two facts is that there are fewer opponents as you reach the upper and lower bounds, so fewer data points exist at those levels.

How can I determine how much of an influence `away`

, `opponent_fact_1`

, and `opponent_fact_2`

has on a player's `points`

?

I asked someone online how to do this and he said use poisson regression, but didn't go into detail. Why would regression be helpful here? What is it? And I read that you can't use poisson regression with negative values? Also, how do I deal with the fewer data points around the upper and lower bounds?

I'm using R, so any examples in R would be awesome. Explaining the output would be even better.

I hope this isn't asking for too much.

**EDIT: Added sample data**

` player_id opponent_team_id away points opponent_fact_1 opponent_fact_2 1 695 22 0 0.0 2.888889 81.58 2 695 30 1 1.2 2.750000 81.58 3 695 4 1 3.0 3.714286 69.57 4 695 20 0 -3.0 3.000000 84.09 5 695 14 0 0.0 2.444444 72.97 `

**Contents**hide

#### Best Answer

Regular linear regression (e.g. the `lm`

or `glm`

functions in R) handles negative values just fine.

One model you could try would be:

`model1 <- lm(points ~ away + opponent_fact_1 + opponent_fact_2, data=my_data_frame) summary(model1) `

If you've got a lot of data (and several rows per player and per opponent), you could also try this model:

`model2 <- lm(points ~ away + factor(player_id) + factor(opponent_team_id), data=my_data_frame) summary(model2) `

This will give you a model that includes a coefficient for each player, and for each opponent_team_id. These coefficients will represent the average points expected for a player, as well as the average points expected against a given opponent.

Have you every run a regression model before? What's the goal of this analysis?

### Similar Posts:

- Solved – Logistic regression with categorical independent variables and binomial count dependent variables
- Solved – Encoding categorical variables with hundreds of levels for machine learning algorithms
- Solved – Relationship between mean and variance of samples
- Solved – Measuring individual player effectiveness in 2-player per team sports
- Solved – How to predict the odds that a dodgeball team is going to win based on the winning history of its players