# Solved – Selecting regression model for a non-negative integer response

I have a series of non-negative integers \$y=(y_1,y_2,…, y_n)\$ and a design matrix \$y = beta_0 + beta_1 x_1 + beta_2 x_2 + beta_3 x_1 x_2\$, where \$x_0\$ and \$x_1\$ are \$0\$ or \$1\$, \$x_1x_2\$ is the interaction, and \$beta_0 ldots beta_3\$ are parameters we want to estimate. For example, the data look like

``y    x1    x2    x1*x2 10   0     0     0 23   0     1     0 18   1     1     1 19   1     0     0 25   0     1     0 ... ``

I want to estimate the \$beta_0\$, \$beta_1\$, \$beta_2\$ and \$beta_3\$ coefficients and perform a test to see if any coefficient is nonzero.

There are several different regression models that might be applied to this case:

1. Simple linear regression: `lm`
2. Poisson regression (when \$y\$ follows a Poisson distribution): `glm` with family = poisson
3. Quasi-poisson regression (when \$y\$ is over-dispersed; that means \$text{sd}(y) gt text{mean}(y)\$): `glm` with family = quasi-poisson
4. Negative binomial regression (when \$y\$ is over-dispersed, \$text{sd}(y) gt text{mean}(y)\$): `glm.nb`, in MASS package.

The questions I want to ask are:

1. How should I select the model for this dataset? Is there any way to choose the right model based on some descriptive statistics of my dataset?
2. How should I check and validate if the fitted selected model is right for my data?
Contents