Solved – Which statistical analysis to use to compare the level of similarity between two large samples

I'm writing a small speech recognition prototype as my side project, which matches pre-recorded words of the speaker. So now I'm thinking of comparing two sets of data (outcome of FFT) which are two lists with length approximately of 7000-10000 each. What would be an appropriate statistical analysis in this case? I want to find what how significant is the similarity (or difference) between those two samples and if it's significant enough to assume that they are same/different. I`m not looking for anything too complicated, just a starting point maybe.

You may compute mean for each of two samples and pooled S.D. Following this, compute t or Z statistc for a test of significance.

Similar Posts:

Rate this post

Leave a Comment