Solved – How to handle 0 inflated data set

How to handle a data set with high number of 0's. I am trying to predict automobile insurance claims of people. So many of them of are zeros. If the claim == 1 then claim_amount = +ve integer, else claim = 0 and claim_amount = 0. There are high number of 0's(90%). How to develop a predictive model which can predict claim and claim_amount.

I tried zeroinfl method from CRAN pscl R package. The accuracy is very low. It should be done using Python or R. The data has around 22000 rows with 7 predictor variables and 2 target (claim, claim_amount) variables.

Zero inflation makes things difficult and we don't know what your expectations as to accuracy are. A simple way to let the zero inflation be handled by the algorithm would be a tree model (packages rpart oder party) or a lot of trees (package randomForest or party). 22000 rows are a lot, if 10% out of that are non-zero and there are 7 predictor variables this may even be enough for a sensible neural net.

More R packages on machine learning at https://CRAN.R-project.org/view=MachineLearning

In the following simulated example you can see, how well the different rules of generating zeros are modeled as well as the rule of number generation by a simple tree:

library(rpart) library(rpart.plot)  expl.data <- data.frame(A = sample(1:3, 22000, TRUE), B = sample(1:3, 22000, TRUE),                         C = sample(1:10,22000, TRUE), D = runif(22000,0,100),                         response = rep(NA, 22000))  rules <- function(A, B, C, D){     if(D<20) return(0)     if(D>80) return(0)     if(C<3) return(0)     if(A==1 & B==1) return(0)     return(sample(1:20*A,1)) }  for (i in 1:nrow(expl.data)) # this can be done faster, this is most readable     expl.data$response[i] <- rules(expl.data$A[i],                                    expl.data$B[i],                                    expl.data$C[i],                                    expl.data$D[i])  prp(rpart(response ~ ., data=expl.data)) 

Which leads to graph of decision tree detecting zero rules etc.

Similar Posts:

Rate this post

Leave a Comment