Solved – How does data augmentation reduce overfitting

I'm trying to understant the benefit apported by the step of data augmentation in a classification algorithm.
I have a vector of hexadecimal strings and a column vector containing the label associated with the string in the same position. As an optional step in the classification algorithm, a data augmentation process is performed by subsetting the strings in pieces and replating the associated label for the number of split performed.

What are the benefit of this process?

Overfitting occurs when you have too few records relative to other parameters (e.g., predictors or features). I'm not familiar with your data, but it sounds like the subsetting is creating additional records.

Similar Posts:

Rate this post

Leave a Comment