I'm writing a small speech recognition prototype as my side project, which matches pre-recorded words of the speaker. So now I'm thinking of comparing two sets of data (outcome of FFT) which are two lists with length approximately of 7000-10000 each. What would be an appropriate statistical analysis in this case? I want to find what how significant is the similarity (or difference) between those two samples and if it's significant enough to assume that they are same/different. I`m not looking for anything too complicated, just a starting point maybe.

**Contents**hide

#### Best Answer

You may compute mean for each of two samples and pooled S.D. Following this, compute t or Z statistc for a test of significance.

### Similar Posts:

- Solved – What method to use to test Statistical Significance of ASR results
- Solved – one of the most successful applications using LSTM (Long Short-Term Memory) for a time series dataset
- Solved – Heuristic for choosing neural network size (number of hidden units/layers)
- Solved – what are hidden states in HMM based language model
- Solved – Replacing RNNs with dilated convolutions