I am creating a sample of 3 observations with 3 values. I would like to then combine them into one observation and determine the frequency of duplicate values but would like to know what values are duplicated how many times. For example, "59741111" is duplicated 3 times and "29611022" is duplicated twice. Thank you very much.
set.seed(10231995) df <- t(replicate(3, sample(id, 3, replace=FALSE, prob=NULL))) df
Result
[,1] [,2] [,3]
[1,] "59741111" "73380703" "29611022"
[2,] "59741111" "70470420" "59741111"
[3,] "33360105" "29611022" "32080517"
Desired result
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
"59741111" "73380703" "29611022" "59741111" "70470420" "59741111" "33360105" "29611022" "32080517"
Value freq rel freq
"59741111" 3 .33
"29611022" 2 .22
Best Answer
The "id" variable is not defined in your reproducible example, so I cannot run your code as is. However, I can still help you out. Take a look at the code below:
require(data.table) set.seed(10231995) id <- 1:5 df <- data.table(t(replicate(3, sample(id, 3, replace=FALSE, prob=NULL)))) df df2 <- melt(df) df2 df2[, .N, by=value]
The data.table package in R lets you do a lot of "SQL-like" operations which are what you would be looking for to do these "group-by" calculations to calculate frequency of duplicates, etc.
This data.table tutorial provides a good introduction into how to do these calculations and it explains briefly their syntax.