# Solved – Predicting based upon categorical data and one numeric datatype

I would like to determine what variables from this sample data would be best predictors for CallHandleTimeSeconds.

Im thinking it would be a combination of CreditRating, EligibleForAssistance, TypeOfCall, AmtInArrears but unsure about how to do this. I understand the process when all the variables are numeric but categorical variables make my head spin! Please help, because I learn best from examples then I can basically plug and play other categorical variables in the future.

Like if CreditRating = Good;, EligibleForAssistance = T, TypeOfCall = 2, and AmtInArrears = 21 then CallHandleTimeSeconds = 432?????

`` CreditRating = c("Poor", "Poor", "Good", "Good", "Average", "Poor", "Average")   EligibleForAssistance = c("T", "F", "T", "F", "T", "T","T")   Season = c(1,2,1,3,2,3,4)  TypeOfCall = c(1,1,2,3,3,1,2)  NumberOfDaysAccountOpen = c(111,2321,33,322,2321,343,785)  AmtInArrears = c(0,0,0,22,232,2,0)  CallHandleTimeSeconds= c(123,232,543,239,230,400,210)   SampleData = data.frame(CreditRating,      EligibleForAssistance,Season,TypeOfCall,NumberOfDaysAccountOpen,AmtInArrears,CallHandleTimeSeconds)  ``

Contents

Let's simulate more data:

``> CR <- factor(as.vector(rmultinom(100, 2, prob=c(0.1,0.2,0.8))) + 1, labels = c("Poor", "Average", "Good")) > EFA <- as.logical(rbinom(300, 1, 0.7)) > S <- factor(as.vector(rmultinom(100, 3, prob=c(0.1,0.2,0.8))) + 1) > TOC <- factor(as.vector(rmultinom(100, 2, prob=c(0.1,0.2,0.8))) + 1) > NODAO <- trunc(runif(300, 200, 500)) > AIA <- rnbinom(300, 1, 0.05) > CHTS <- as.integer(runif(300, 100, 600)) ``

As you can see, categorical variables(CreditRating, Season, TypeOfCall) are coded as factors, i.e. you should do something like:

``> CR <- factor(CreditRating) > S <- factor(Season) ``

etc. (logical variables as EligibleForAssistance are ok.)

Then you can fit your model, e.g.

``> fit <- lm(CHTS ~ CR + EFA + S + TOC + NODAO + AIA) > round(summary(fit)\$coefficients,2)             Estimate Std. Error t value Pr(>|t|) (Intercept)   358.65      41.30    8.68     0.00 CRAverage      -6.25      22.29   -0.28     0.78 CRGood         25.78      29.41    0.88     0.38 EFATRUE         5.81      18.13    0.32     0.75 S2              6.63      21.37    0.31     0.76 S3            -17.88      30.05   -0.60     0.55 S4            -47.76      33.88   -1.41     0.16 TOC2            7.62      21.36    0.36     0.72 TOC3           20.16      28.76    0.70     0.48 NODAO          -0.03       0.10   -0.34     0.73 AIA            -0.38       0.47   -0.81     0.42 ``

and you can interpret your results. The expected mean value of CallHandleTimeSeconds is:

• if CR="Poor", EFA=FALSE, S=1, TOC=1, NODAO=0 and AIA=0: \$358.65\$ (the intercept)
• if CR="Average": \$\$358.65-6.25=352.4\$\$
• if CR="Average" and EFA=TRUE: \$\$358.65-6.25+5.81=358.21\$\$
• if CR="Good", EFA=TRUE and NODAO=500: \$\$358.65+25.78+5.81-0.03times 500=375.24\$\$

and so on.

Rate this post