I'm trying to understant the benefit apported by the step of data augmentation in a classification algorithm.
I have a vector of hexadecimal strings and a column vector containing the label associated with the string in the same position. As an optional step in the classification algorithm, a data augmentation process is performed by subsetting the strings in pieces and replating the associated label for the number of split performed.
What are the benefit of this process?
Contents
hide
Best Answer
Overfitting occurs when you have too few records relative to other parameters (e.g., predictors or features). I'm not familiar with your data, but it sounds like the subsetting is creating additional records.
Similar Posts:
- Solved – Darknet and Data Augmentation
- Solved – How to compute term frequency and find clusters in a dataset composed of strings
- Solved – How to Markov cluster algorithms be used to cluster strings
- Solved – Data augmentation or weighted loss function for imbalanced classes
- Solved – Data augmentation or weighted loss function for imbalanced classes