I'm running a binary logit regression where I know the dependent variable is miscoded in a small percentage of cases. So I'm trying to estimate $beta$ in this model:

$prob(y_i) = 1/(1 + e^{-z_i})$

$z_i = alpha + X_ibeta$

But instead of the vector $Y$, I have $tilde{Y}$, which includes some random errors (i.e. $y_i = 1$, but $tilde{y_i} = 0$, or vice versa, for some $i$).

Is there a (reasonably) simple correction for this problem?

I know that logit has some nice properties in case-control studies. It seems likely that something similar applies here, but I haven't been able to find a good solution.

A few other constraints: this is a text-mining application, so the dimensions of $X$ are large (in the thousands or tens of thousands). This may rule out some computationally intensive procedures.

Also, I don't care about correctly estimating $alpha$, only $beta$.

**Contents**hide

#### Best Answer

This situation is often referred to as misclassification error. This paper my help you correctly estimating $beta$. EDIT: I found relevant-looking papers using http://www.google.com/search?q=misclassification+of+dependent+variable+logistic.