Given a data-frame:
d1 <-c("A","B","C","A") d2 <-c("A","V","C","F") d3 <-c("B","V","E","F") d4 <-c("A","B","C","A") data.frame(d1,d2,d3,d4) d1 d2 d3 d4 1 A A D A 2 B V B B 3 C C C C 4 A F A A
Also given that each row may have a unique pattern such that the occurrence of the values A,D,A (first row) represents a unique pattern assigned to a class 1 and F,A,A last row also represents a unique pattern assigned a class 4.
I would like to manipulate the data-frame to search for rows that contain such 'unique patterns' and return a new column that classifies them such that, 0 represents rows that do not have any of the patterns. The pattern has to occur exactly as indicated.
d1 d2 d3 d4 class 1 A A D A 1 2 B V B B 0 3 C C C C 0 4 A F A A 4
I tried to use a select statement with a concat qualifier using package sqldf, but it does not provide a useful approach.
I would appreciate ideas on how to perform the search or if there are relevant packages to perform this type of search.
Thank you
Best Answer
Suppose the entries to data.frame contain single uppercase letters. Suppose that we have a vector containing the patterns and that only one pattern can be in one row.
d1 <-c("A","B","C","A") d2 <-c("A","V","C","F") d3 <-c("B","V","E","F") d4 <-c("A","B","C","A") dd <- data.frame(d1,d2,d3,d4) > dd d1 d2 d3 d4 1 A A B A 2 B V V B 3 C C E C 4 A F F A pats <- c("ABA","FFA") pat.fun <- function(r,pats) { rr <- paste(r,collapse="") tmp <- sapply(pats,function(p)grep(p,rr)) res <- which(tmp==1) if(length(res)==0) res <-0 res } dd$class <- apply(dd,1,pats.fun,pats=pats) > dd d1 d2 d3 d4 class 1 A A B A 1 2 B V V B 0 3 C C E C 0 4 A F F A 2
This is an example, the code certainly does not look like very efficient.