I am trying to find a good technique to balance data in which the minority class is about 1% of the data.
As I understood, the most common practice is matching.
What is the difference, though, between matching and stratified sampling? It seems that in both we try to find stratifying variables, like gender, and make sure that the distribution of our sample is the same as the minority class.
Best Answer
Indeed, stratified sampling and matching are related (if by 'matching' you mean sample matching). The main difference lies in the used approach.
Sample matching starts by drawing a target sample by simple random sampling, followed by the matching of k known subjects from an available pool of subjects (e.g., members of the minority class) to each subject in the target sample. This method can be used to obtain a matched sample that is similar to a random sample from the target population and can be very useful if you already have a large pool of subjects available from which you can draw the matched sample.
Stratified sampling takes a different approach. Instead of taking a simple random sample from the total population, the population is first divided into mutually exclusive strata (e.g, classes). Next, simple random samples are drawn from each of the strata, resulting in a stratified sample. The samples drawn from each of the strata can be of equal size or have sizes that are proportional to the strata-sizes in the target population (e.g., 1% for the minority class). In the former case, analyses must be weighted to obtain generalizable estimates, in the latter case, weighting is not necessary. This approach is mainly useful if there is known population heterogeneity.
Depending on your situation, either method has advantages and disadvantages. if you have to start data collection from scratch, stratified sampling may be more useful. If you already have a large amount of (survey) data, sample matching may be more efficient.
Similar Posts:
- Solved – what’s the difference between stratified sampling and matching
- Solved – what’s the difference between stratified sampling and matching
- Solved – R – randomForest resample – replacement or not
- Solved – Sample survey: can I weight back to the target population from the survey population
- Solved – Intuitive explanation of stratified cross validation and nested cross validation