I'm relatively new to survival analysis and try to get my data in the right shape.

I have two tables both concerning the observed individuals. If I just would use one of the tables, I would have continuous information on each individual without any overlapping periods.

As I however also need the information stored in the other table, it is necessary to merge the two tables. But then the episodes will be overlapping in some cases.

I give you an example as illustration:

Table 1:

`ID: 1 start: 2000-01-01 end: 2002-12-31 state: A `

Table 2:

`ID: 1 start: 2002-01-01 end: 2002-04-15 state: Z `

To do survival analysis (in Stata or R) does it matter if there are overlaps?

If it does, do you have any suggestions on how to remove the overlaps?

**Contents**hide

#### Best Answer

Assuming that by "parametric model" the OP means *fully* parametric, then this sounds like a question about the appropriate data structure for discrete time survival analysis (aka discrete time event history) models such as logit (1), probit (2), or complimentary log-log (3) hazard models, then the appropriate answer is that the data typically need to be structured in a *person-period* format.

- $h_{t} = frac{e^{mathbf{BX}}}{1 + e^{mathbf{BX}}}$
- $h_{t} = Phi(mathbf{BX})$
- $h_{t} = 1 – e^{-e^{mathbf{BX}}}$

where $mathbf{BX}$ are the parameters and predictors in the model. Often discrete time survival analysis models will include *dummy variables* for each time period (see below) and also often include time period itself, or even functions of it, as a variable.

Here's what a person-period data set looks like:

`ID period y x1 x2 x3 t1 t2 t3 . . . tT 1 1 0 1 3 12 1 0 0 . . . 0 1 2 0 1 0 12 0 1 0 . . . 0 1 3 1 1 9 12 0 0 1 . . . 0 2 1 0 0 4 6 1 0 0 . . . 0 3 1 0 1 0 17 1 0 0 . . . 0 3 2 0 1 3 17 0 1 0 . . . 0 3 3 0 1 3 17 0 0 1 . . . 0 etc. `

First of all notice both `ID`

and `period`

which define the hierarchical period of observation nested in person structure of these data. Also notice that `x2`

is *time varying* (i.e. within the same individual it can take different values in different periods), and that `x1`

and `x3`

are static; understand that the model is agnostic as to whether predictors are time-varying or static. Finally examine the relationship between period and the indicator variables for time/period (i.e. `t1`

through `tT`

).

Often times you will receive data in a *person-time* format such as this:

`ID TimeToEvent Censored x1 x2t1 x2t2 . . . x2tT x3, `

and will need to transform the data appropriately. Here `TimeToEvent`

measures how many periods each subject was observed while in the study, and `Censored`

indicates whether or not the subject left the study *without experiencing the event* (i.e. whether that subject was *right censored*). In your data `TimeToEvent`

probably equals `end`

– `start`

, and `Censored`

is certainly some function of `state`

.

There are often tools available for transforming data such as these. For example, in Stata, see `net describe dthaz, from(http://alexisdinno.com/stata)`

### Similar Posts:

- Solved – What data structure is necessary for survival analysis
- Solved – What data structure is necessary for survival analysis
- Solved – Cox-Proportional hazards model with panel (longitude ) data
- Solved – Cox-Proportional hazards model with panel (longitude ) data
- Solved – How to get the survival duration prediction for each individual in the data by using the Kaplan-Meier method