I have modelled a, ticket :{sold , unsold} process using a glm with family = binomial as such:
> head(bob_new) ticketCount ticketsRemain artistRating artistVotes artistName paintingTitle date day 1 9 21 4.38 616 Stella Mandrak-Pagani #TeamAjaz Winter OWL in snow 2016-12-01 Thursday 2 10 10 4.23 401 Meg Burns Simi Cherry Blossoms 2016-12-01 Thursday 3 15 21 4.57 481 Veronica Stach Where the Wild Things Are 2016-12-01 Thursday 4 21 13 4.35 100 Christine "Chri" Lee Lust in the Wind 2016-12-01 Thursday 5 17 0 4.32 113 Nicole Pinder #TeamAjaz Seagull Beach 2016-12-01 Thursday 6 24 1 4.48 657 Monique Ra Brent Aurora on the River 2016-12-01 Thursday month percent 1 December 0.3000000 2 December 0.5000000 3 December 0.4166667 4 December 0.6176471 5 December 1.0000000 6 December 0.9600000 > mod_new <- glm(cbind(ticketCount , ticketsRemain) ~ day + artistRating * artistVotes , data = bob_new, family = binomial) > summary(mod_new) Call: glm(formula = cbind(ticketCount, ticketsRemain) ~ day + artistRating * artistVotes, family = binomial, data = bob_new) Deviance Residuals: Min 1Q Median 3Q Max -5.611 -3.360 -1.961 1.374 15.680 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.962e+00 1.574e-01 -18.820 < 2e-16 *** dayMonday -3.492e-01 2.858e-02 -12.218 < 2e-16 *** daySaturday 2.053e-02 2.658e-02 0.773 0.439783 daySunday 1.091e-01 2.812e-02 3.879 0.000105 *** dayThursday -8.847e-02 2.728e-02 -3.244 0.001180 ** dayTuesday -4.237e-01 2.762e-02 -15.343 < 2e-16 *** dayWednesday -3.749e-01 2.875e-02 -13.037 < 2e-16 *** artistRating 3.692e-01 3.522e-02 10.482 < 2e-16 *** artistVotes 2.247e-03 4.153e-04 5.409 6.33e-08 *** artistRating:artistVotes -4.686e-04 9.229e-05 -5.078 3.81e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 60681 on 3992 degrees of freedom Residual deviance: 59649 on 3983 degrees of freedom (85 observations deleted due to missingness) AIC: 66608 Number of Fisher Scoring iterations: 5
Now I am confused as to how one can interpret this output; when i learned the logistic glm see :https://stats.idre.ucla.edu/r/dae/logit-regression/
I understood that we have for covariates the log odds of success changing by $beta$ for one unit of change in that covariate, and for factor variables we have an overall change in log odds. From here we can obtain an estimated probability of success curve : $pi = frac{e^{Xbeta}}{1 + e^{Xbeta}}$
Now this makes sense to me, when we have data with a binary random variable. such as admit:
## admit gre gpa rank ## 1 0 380 3.61 3 ## 2 1 660 3.67 3 ## 3 1 800 4.00 1 ## 4 1 640 3.19 4 ## 5 0 520 2.93 4 ## 6 1 760 3.00 2
However how do we proceed when we have as response variable : cbind(ticketCount , ticketsRemain) (refer to > head(bob_new) above).
What type of predictions could I achieve here? In the UCLA admit example we can predict probability of admit.
I appreciate any explanation or references!
Best Answer
This is one of the ways how can you provide data for logistic regression in R (see also here). In this case you are modeling probability of selling ticket given the predictors. You are still estimating probabilities. Moreover, your data is still (conditionally) binomial, you are predicting $k$ successes vs $n-k$ failures provided as $(k, n-k)$ tuples — this is just another way of representing the same data.
Similar Posts:
- Solved – Poisson regression with strong pattern in residuals
- Solved – For a logistic regression of a 2 by 2 table using `glm` in `R`, is using `cbind` or using a full data matrix for the response the correct method
- Solved – For a logistic regression of a 2 by 2 table using `glm` in `R`, is using `cbind` or using a full data matrix for the response the correct method
- Solved – For a logistic regression of a 2 by 2 table using `glm` in `R`, is using `cbind` or using a full data matrix for the response the correct method
- Solved – For a logistic regression of a 2 by 2 table using `glm` in `R`, is using `cbind` or using a full data matrix for the response the correct method