I would like to determine what variables from this sample data would be best predictors for CallHandleTimeSeconds.

Im thinking it would be a combination of CreditRating, EligibleForAssistance, TypeOfCall, AmtInArrears but unsure about how to do this. I understand the process when all the variables are numeric but categorical variables make my head spin! Please help, because I learn best from examples then I can basically plug and play other categorical variables in the future.

Like if CreditRating = Good;, EligibleForAssistance = T, TypeOfCall = 2, and AmtInArrears = 21 then CallHandleTimeSeconds = 432?????

` CreditRating = c("Poor", "Poor", "Good", "Good", "Average", "Poor", "Average") EligibleForAssistance = c("T", "F", "T", "F", "T", "T","T") Season = c(1,2,1,3,2,3,4) TypeOfCall = c(1,1,2,3,3,1,2) NumberOfDaysAccountOpen = c(111,2321,33,322,2321,343,785) AmtInArrears = c(0,0,0,22,232,2,0) CallHandleTimeSeconds= c(123,232,543,239,230,400,210) SampleData = data.frame(CreditRating, EligibleForAssistance,Season,TypeOfCall,NumberOfDaysAccountOpen,AmtInArrears,CallHandleTimeSeconds) `

What test would I run? Logistic Regression? Please help.

**Contents**hide

#### Best Answer

Let's simulate more data:

`> CR <- factor(as.vector(rmultinom(100, 2, prob=c(0.1,0.2,0.8))) + 1, labels = c("Poor", "Average", "Good")) > EFA <- as.logical(rbinom(300, 1, 0.7)) > S <- factor(as.vector(rmultinom(100, 3, prob=c(0.1,0.2,0.8))) + 1) > TOC <- factor(as.vector(rmultinom(100, 2, prob=c(0.1,0.2,0.8))) + 1) > NODAO <- trunc(runif(300, 200, 500)) > AIA <- rnbinom(300, 1, 0.05) > CHTS <- as.integer(runif(300, 100, 600)) `

As you can see, categorical variables(CreditRating, Season, TypeOfCall) are coded as *factors*, i.e. you should do something like:

`> CR <- factor(CreditRating) > S <- factor(Season) `

etc. (logical variables as EligibleForAssistance are ok.)

Then you can fit your model, e.g.

`> fit <- lm(CHTS ~ CR + EFA + S + TOC + NODAO + AIA) > round(summary(fit)$coefficients,2) Estimate Std. Error t value Pr(>|t|) (Intercept) 358.65 41.30 8.68 0.00 CRAverage -6.25 22.29 -0.28 0.78 CRGood 25.78 29.41 0.88 0.38 EFATRUE 5.81 18.13 0.32 0.75 S2 6.63 21.37 0.31 0.76 S3 -17.88 30.05 -0.60 0.55 S4 -47.76 33.88 -1.41 0.16 TOC2 7.62 21.36 0.36 0.72 TOC3 20.16 28.76 0.70 0.48 NODAO -0.03 0.10 -0.34 0.73 AIA -0.38 0.47 -0.81 0.42 `

and you can interpret your results. The expected mean value of CallHandleTimeSeconds is:

- if CR="Poor", EFA=FALSE, S=1, TOC=1, NODAO=0 and AIA=0: $358.65$ (the intercept)
- if CR="Average": $$358.65-6.25=352.4$$
- if CR="Average" and EFA=TRUE: $$358.65-6.25+5.81=358.21$$
- if CR="Good", EFA=TRUE and NODAO=500: $$358.65+25.78+5.81-0.03times 500=375.24$$

and so on.

### Similar Posts:

- Solved – Clustering mixed variables in SAS
- Solved – Is it possible for a multinomial sample to be a single number
- Solved – One-way repeated measures anova
- Solved – Negative binomial GLM with 2 factor variables: adding interaction completely changes effect of factor levels
- Solved – get equal AIC, BIC and log likelihood for different models in LME framework