I would like to know if I can apply the techniques, like say Logistic Regression, to data whose variables/predictors are 'indexed' by time. Or if not, what techniques are appropriate to use in these kinds of data.

To give you a clear picture of the problem, say I have a dependent variable Y, whose values are 0 or 1 (for binary case), or 1,2,3,… (for polytomous case).

And I have predictor variables which are 'indexed' by time, i.e., X1T1, X1T2,…,X1Tn, X2T1, X2T2,…, X2Tm,….XpTk,

where

X1T1 = values of variable X1 at time 1 (T1)

X1T2 = values of variable X1 at time 2 (T2)

`. . `

X1Tn = values of variable X1 at time n (Tn)

X2T1 = values of variable X2 at time 1 (T1)

X2T2 = values of variable X2 at time 2 (T2)

`. . `

X2Tm = values of variable X2 at time m (Tm)

`. . . `

XpTk = values of variable Xp at time k (Tk)

where n,m,k = 1,2,… (variable time 'index') p =1,2,…. (# of predictor variables).

For the data view, I have;

`Obs X1T1 . . . X1Tn X2T1 . . . X2Tm . . . XpTk 1 . . . . . . . . . . . . . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N . . . . . . . . . . . . . `

Can I apply a technique, like say, logistic regression on these types of data (or other techniques for 'multi' category response variable like tree based methods.). If not, what's the appropriate technique that can be used. Thanks a lot!

**Contents**hide

#### Best Answer

I removed *classification* from your title and text because classification has absolutely nothing to do with this problem. You are interested in prediction/modeling of probabilities.

Once you stack the dataset tall and thin as suggested by @DJohnson, you have many options. An extremely flexible approach is called pooled logistic regression or repeated measures logistic regression. The method is so flexible that you can have $Y$ mean an event at an ultimate follow-up time (e.g., at 2 years from subject entry into a study) or $Y$ can be from a moving window of time that is always $t$ days later than the current row's observation.

Parameter estimation using maximum likelihood proceeds in the usual way for binary or polytomous (multinomial) logistic regression, using a GEE working independence model, which means you just stack all the data and ignore any intra-cluster correlations. The standard errors will all be wrong because the ordinary calculation treats all rows as independent of each other. You can get valid standard errors by using the cluster sandwich covariance estimator or the cluster bootstrap.

The R `rms`

package can do all this for binary and ordinal $Y$ using functions `lrm`

, `orm`

, `robcov`

, `bootcov`

. You can reshape the data using the built-in R function `reshape`

plus others.

### Similar Posts:

- Solved – CI for logistic regression
- Solved – Running logistic regression on survey data
- Solved – Using Mutual Information for Binary Logistic Regression Variable Selection
- Solved – Logistic regression probabilities in scikit-learn
- Solved – the connection (if any) and difference between logistic regression and survival analysis