Solved – How to test for autocorrelated errors in logistic regression

I'm doing a Bayesian logistic regression $Y sim X$ where my predictor $X$ is a count observed over time. So $Y$ and $X$ are each $m x n$ matrices where $m$ is the number of subjects and $n$ is the number of observation years. $Y$ is filled with values in $[0,1]$ and values of $X$ are in $[0, 1, 2,dotsc]$. For a given subject, $X$ is obviously monotonically increasing over time and it's highly autocorrelated.

Is there a problem to regress on such an autocorrelated independent variable? I've read that as soon as errors are not autocorrelated, there is no problem. But in logistic regression there is no error term as I model the mean of the probability of success right? So how can I test for autocorrelated errors if there are no errors?

EDIT: See $Y$ as failures on thousands of systems and I'm trying to give a probability of failure for each of these systems from $X$ which is the cumulative count over the years of minor accidents that happened individually in the past to each of these systems. Presumably, these accidents act like precursors of the future failure. Example: system 1 has a cumulative count of 37 minor accidents and the predicted probability of failure for this system is 1.2%. Lastly, failures can happen more than once on a given system (but I can assume that these individual failures are independent over time).

You seem to have panel (or longitudinal) data with data (time series) on

failures on thousands of systems

with $Y$ binary variables representing the failure (or not). There seem to be one observation per year, and a covariable $X$ counting minor incidents, cumulatively. So $X$ maybe can be seen as a measure of stress on the system, which is monotonically increasing. Here I will concentrate on the modeling of one series, then the thousands of parallel series could be seen as independent realizations, with the same distribution if the systems are exchangeable, or else with some parameters representing possible differences between systems, maybe a random effects model with random intercepts / slopes. But I will now concentrate on the model for one system.

Starting with logistic regression, the simplest model would be like $$ DeclareMathOperator{P}{mathbb{P}} P(Y_j=1 mid X_j=x_j)= frac1{1+e^{-eta(x_j)}} $$ where $eta(x)= beta_0 +beta_1 x$ (or some generalization) is the linear predictor. Since we are conditioning on $X=x$, autocorrelation in the $x$'s is not a problem, but there could still be autocorrelation between the $Y_j$'s. So how could we investigate that? We need a concept of residuals for logistic regression (LR). And as you say, LR do not have an error term, so there is no obvious definition. But see Family of GLM represents the distribution of the response variable or residuals? for some discussion. Residuals can be defined multiple ways, this google search gives many interesting hits. There isn't much on this site, but see Is there i.i.d. assumption on logistic regression? and its links. Also some good ideas here: Diagnostics for logistic regression?.

Similar Posts:

Rate this post

Leave a Comment