If possible, please explain things like I'm 5. I know very little about this subject, but would like to learn more.
I have a data frame (in R) containing player_id
, points
, away
, opponent_fact_1
, opponent_fact_2
. points
can be negative. away
lets us know if the game was at home or away (0 or 1). opponent_fact_1
and opponent_fact_2
gives us a stat about the opponent. opponent_fact_1
is on a scale of about 0.0-5.5. opponent_fact_2
is on a scale of about 70.0-95.0. The issue with the two facts is that there are fewer opponents as you reach the upper and lower bounds, so fewer data points exist at those levels.
How can I determine how much of an influence away
, opponent_fact_1
, and opponent_fact_2
has on a player's points
?
I asked someone online how to do this and he said use poisson regression, but didn't go into detail. Why would regression be helpful here? What is it? And I read that you can't use poisson regression with negative values? Also, how do I deal with the fewer data points around the upper and lower bounds?
I'm using R, so any examples in R would be awesome. Explaining the output would be even better.
I hope this isn't asking for too much.
EDIT: Added sample data
player_id opponent_team_id away points opponent_fact_1 opponent_fact_2 1 695 22 0 0.0 2.888889 81.58 2 695 30 1 1.2 2.750000 81.58 3 695 4 1 3.0 3.714286 69.57 4 695 20 0 -3.0 3.000000 84.09 5 695 14 0 0.0 2.444444 72.97
Best Answer
Regular linear regression (e.g. the lm
or glm
functions in R) handles negative values just fine.
One model you could try would be:
model1 <- lm(points ~ away + opponent_fact_1 + opponent_fact_2, data=my_data_frame) summary(model1)
If you've got a lot of data (and several rows per player and per opponent), you could also try this model:
model2 <- lm(points ~ away + factor(player_id) + factor(opponent_team_id), data=my_data_frame) summary(model2)
This will give you a model that includes a coefficient for each player, and for each opponent_team_id. These coefficients will represent the average points expected for a player, as well as the average points expected against a given opponent.
Have you every run a regression model before? What's the goal of this analysis?
Similar Posts:
- Solved – Logistic regression with categorical independent variables and binomial count dependent variables
- Solved – Encoding categorical variables with hundreds of levels for machine learning algorithms
- Solved – Relationship between mean and variance of samples
- Solved – Measuring individual player effectiveness in 2-player per team sports
- Solved – How to predict the odds that a dodgeball team is going to win based on the winning history of its players