There are times when one might want to estimate a prevalence ratio or relative risk, in preference to an odds ratio, for data with binary outcomes – say, if the outcome in question isn't rare, so the RR ~ OR relationship doesn't hold.

I've implemented a model in R to do that, as follows:

`uni.out <- glm(Death ~ onset, family= binomial(link=log), data=data) `

But I'm continually getting convergence issues, even when providing starting values (such as the coefficient estimates pulled from a logistic regression), or turning up the number of allowed iterations. I've also tried `glm2`

without any success.

The two ideas I have from here are to either fit a poisson model to the same data using a sandwich estimator for the variance, or fitting the model using MCMC and taking the standard error of the posterior (this is being used alongside multiple imputation, so I can't just report the posterior). The problem is, I have no idea how to implement either one of these in `R`

, nor if they're the best solution.

Additionally, while using a model like:

`glm(Death ~ age, family= binomial(link=log),start=c(-3.15,0.03),data=data) `

I'm regularly get an error message "Error: cannot find valid starting values: please specify some", but not always. What is generating this message?

**Contents**hide

#### Best Answer

The Poisson approximation to the relative risk is a very good approach with two small limitations: it is easily possible to overpredict the risk, and the mean-variance assumption may be unreasonable in moderately high risks. Together these do not invalidate the estimates (when using robust standard errors) but they and their inference may be biased and/or conservative.

The log-binomial GLM is very poorly behaved for it fails to converge when encountering overprediction. If you inspect the workhorse for GLM, it begins with the 0 vector as starting coefficients. For logistic regression, this is a 50% risk assigned to each observation but for log-binomial it is a 100% risk which immediately destroys the iterations almost every single time. I think future versions of R could stand to use more intelligent starting vectors.

Using `start=c(log(mean(y), rep(0, np-1))`

will usually fix the problem ($n_p$ the number of parameters in the model including the intercept). I made a little wrapper in the R package `epitools`

called `probratio`

to do this. Another thing it does is marginal standardization. A nice paper on this can be found by Muller Maclehose, 2005.

While the odds ratios are biased estimators of the relative risk, the risk predictions from logistic regression are *not* biased. Using this, you can predict risk all observations in the model when the covariate achieves its current value, then predict risk in all observations in the model when the covariate achieves one unit higher. Average the risks, and take their ratio, and this is an estimate of the relative risk that has (arguably) the correct interpretation whether or not it is mathematically equivalent to the actual relative risk (they are almost always very, very close). The sandwich does not work here, but bootstrapping works brilliantly. I also implemented this in the `probratio`

function but need to tweak it to implement bias corrected accelerated (BCA) bootstrapping.

The third solution is to trick the Cox proportional hazards model to do this for you. If everyone in the sample is assigned a time of 1 unit and the event indicator is taken to indicate failure or censoring, then the Cox model with the Efron method for ties estimates the relative risk. There is a bepress working paper from Thomas Lumley that brilliantly describes how to do this.

A fourth solution is to directly maximize the binomial likelihood for the truncated risk function. An example of R code to do this would be something like:

`negLogLik <- function(b) { risk <- pmin(1, exp(X%*%b)) -sum(dbinom(y, 1, risk, log=T)) } fit <- nlm(negLogLik, b=c(log(mean(y)), 0,0,0), hessian=T) `