# Solved – Why does chi square test seem to depend on sample size?

This question relates to my ENT anti-answer in another forum, but is based on a statistical perspective.

Traditionally, the chi test does not depend on how many samples you take, so it should be as good for a 100 as for 2 billion. There is certainly no mention of a relationship in the Wiki article. This seems counter intuitive.

If you look at my graph of the distribution of numbers (0 -255) in a random stream, the chi value drops (asymptotically?) to a zero value. You would expect this in the limit, as the distribution of random bytes should be totally flat. The implicit consequence is that any derived p value (passing /failing a hypothesis) cannot be relied upon without consideration of sample size.

Why is this curve asymptotic? Clearly and typically, I'm confused. Judging by the long list of Similar Questions on my right, so are many others. There must be some perceptual issue here…

Update: This only (for me at least) seems to hold for random numbers generated by Java's SecureRandom. All the other generators I use seem to have chi's unrelated to the sample size as the literature suggests. I had thought that perhaps the chi distribution was based on the number of histogram bins being the square root of the number of overall samples.

Contents

The chi square distribution that is used in the common chi square test is defined by the parameter "Degrees of freedom". The degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. So for instance, if you already have used the data to calculate the mean, and you then use the same data to calculate the variance (with this mean as an input), then you have used one degree of freedom for the calculation of the mean, and the DF would be DF = num obs -1.

Rate this post

# Solved – Why does chi square test seem to depend on sample size?

This question relates to my ENT anti-answer in another forum, but is based on a statistical perspective.

Traditionally, the chi test does not depend on how many samples you take, so it should be as good for a 100 as for 2 billion. There is certainly no mention of a relationship in the Wiki article. This seems counter intuitive.

If you look at my graph of the distribution of numbers (0 -255) in a random stream, the chi value drops (asymptotically?) to a zero value. You would expect this in the limit, as the distribution of random bytes should be totally flat. The implicit consequence is that any derived p value (passing /failing a hypothesis) cannot be relied upon without consideration of sample size.

Why is this curve asymptotic? Clearly and typically, I'm confused. Judging by the long list of Similar Questions on my right, so are many others. There must be some perceptual issue here…

Update: This only (for me at least) seems to hold for random numbers generated by Java's SecureRandom. All the other generators I use seem to have chi's unrelated to the sample size as the literature suggests. I had thought that perhaps the chi distribution was based on the number of histogram bins being the square root of the number of overall samples.