Solved – Logistic Regression on Time-dependent Predictors

I would like to know if I can apply the techniques, like say Logistic Regression, to data whose variables/predictors are 'indexed' by time. Or if not, what techniques are appropriate to use in these kinds of data.

To give you a clear picture of the problem, say I have a dependent variable Y, whose values are 0 or 1 (for binary case), or 1,2,3,… (for polytomous case).

And I have predictor variables which are 'indexed' by time, i.e., X1T1, X1T2,…,X1Tn, X2T1, X2T2,…, X2Tm,….XpTk,

where

X1T1 = values of variable X1 at time 1 (T1)

X1T2 = values of variable X1 at time 2 (T2)

. . 

X1Tn = values of variable X1 at time n (Tn)

X2T1 = values of variable X2 at time 1 (T1)

X2T2 = values of variable X2 at time 2 (T2)

. . 

X2Tm = values of variable X2 at time m (Tm)

. . . 

XpTk = values of variable Xp at time k (Tk)

where n,m,k = 1,2,… (variable time 'index') p =1,2,…. (# of predictor variables).

For the data view, I have;

Obs   X1T1   . . .  X1Tn X2T1 . . . X2Tm . . . XpTk  1     .     . . .   .     .  . . .  .    . .   .  2     .     . . .   .     .  . . .  .    . .   .  .     .     . . .   .     .  . . .  .    . .   .  .     .     . . .   .     .  . . .  .    . .   .  .     .     . . .   .     .  . . .  .    . .   .  N     .     . . .   .     .  . . .  .    . .   . 

Can I apply a technique, like say, logistic regression on these types of data (or other techniques for 'multi' category response variable like tree based methods.). If not, what's the appropriate technique that can be used. Thanks a lot!

I removed classification from your title and text because classification has absolutely nothing to do with this problem. You are interested in prediction/modeling of probabilities.

Once you stack the dataset tall and thin as suggested by @DJohnson, you have many options. An extremely flexible approach is called pooled logistic regression or repeated measures logistic regression. The method is so flexible that you can have $Y$ mean an event at an ultimate follow-up time (e.g., at 2 years from subject entry into a study) or $Y$ can be from a moving window of time that is always $t$ days later than the current row's observation.

Parameter estimation using maximum likelihood proceeds in the usual way for binary or polytomous (multinomial) logistic regression, using a GEE working independence model, which means you just stack all the data and ignore any intra-cluster correlations. The standard errors will all be wrong because the ordinary calculation treats all rows as independent of each other. You can get valid standard errors by using the cluster sandwich covariance estimator or the cluster bootstrap.

The R rms package can do all this for binary and ordinal $Y$ using functions lrm, orm, robcov, bootcov. You can reshape the data using the built-in R function reshape plus others.

Similar Posts:

Rate this post

Leave a Comment