# Solved – Regression when the response is a proportion that can be 0 or 1

I have two datasets a training and a test dataset. The dependent variable is a proportion and there are 54 predictors which are positive and negative real numbers and another 7 predictors that are text.

There are three response variables. Total the normalized total number of hits. Treatment the normalized total number during treatment and a percent which is a ratio of the other two responses.

At the moment using lm on the percent prediction data I have a corolation of .4. 85% of the varibles are within 20% of their target. For the treatment response variable using glm in poisson mode i have a correlation of .6 percent but the variables do not match the target data at all.

I have two main issues I need advice on:

(1) it rejected the text predictors because it said factor has new level(s)
I would like it to ignore the information for those that have new level but not disregard it for those that have the correct information how do i do that?

(2) To make my dependent variable a real number, rather than a proportion bounded between 0 and 1, I was advised to transform the response using, for example, the logit transform or the Normal quantile function (`qnorm` in R). The problem is that these transformations (and others like it) will map 0 and 1 to non-finite values. How can I model these data in a regression setting when the response is a proportion that can be 0 or 1?

Using linear regression with outlier removal I am able to get 2239 of 2583 testing data within 20% of their actual value I would like to have that many within 10%.

Using the posson distribution glm the amount of treatment correlates with 69%.

Ignoring this second issue for the moment, I transform the y~x1+x2 such that y=log(y/(1-y)) the correlation of my predictions to actual data drops from 6% to 2%
This is what the data looks like after the logit transform

This is what the data looks like before the log distribution

Contents