Solved – Splitting Longitudinal Data into Training & Test Sets

I'm trying to find a simple way to split some longitudinal data into a training and test set. I'm familiar with using the Caret package in R to make stratified splits, but only with wide-form data. It looks like somebody has written a function to do this in Python, but I haven't learned that language yet.

In my case, I'd like to make a stratified split on some outcome classification (that, incidentally, does not change over time), on a data set where each individual has more than one observation, in such a way that if an individual is in one of the training/test sets, then all of their observations are in that same set.

I'd like to avoid having to transpose, then split, then transpose both training and test sets back to longitudinal format.

The only way I can think to do this (so far) is in the following code that I built using various sources on this site (1, 2), but I'm not sure that this is (a) 100% accurate or (b) if there isn't a better solution.

library(dplyr) set.seed(1) train <- data %>%   select(ID, outcome) %>%   distinct %>%   group_by(outcome) %>%   sample_frac(0.8) %>%   left_join(data)  test <- data[!(data$ID %in% train$ID), ] 

Just use sample() to choose some number of groups, after converting your id to factor

For example:

smp_size <- floor(0.80 * length (unique (iris$Species))) iris %>% filter(Species %in% sample(levels(Species),smp_size)) 

Similar Posts:

Rate this post

Leave a Comment