As a followup to my question about the birth month of boxers, I am posing the fundamental question along with my hypothesis (testing if there is any truth to a conclusion an astrologer might make): That good professional boxers would tend to be Taurus, Capricorn or Virgo (the 'earth signs', people of stamina, strength, earthiness, tenacity, power and control). And that we should see fewer 'ethereal' signs in pro boxing (that is statistically significantly fewer Pisces, Aquarius and Libra—these tend to be more peaceful, docile, 'high minded', fairer, less brutal, less earthy).

I am specifically using boxers and not, for example, NFL players because many people might play football for different reasons and have different skills (speed is obvious example, or ability to throw a ball far, or ability to read a defense aka intelligence). But a professional boxer (particularly middle weights and above) must have strength, endurance, and power (lighter weights might get by on speed alone). Even very fast heavyweights like Ali are quite powerful and possess enormous determination.

In any event, any statistical difference between population birth date/sign averages and those of boxers would be interesting even if causality could not or would not be established. I might then look at other areas—for example, birth dates of serial killers, birth dates of song writers. Astrologers might say that 'Capricorns make good accountants' but how does one get the birth dates (reliably) for accountants? The birth dates for pro boxers and for serial killers and other targeted populations can be found.

My results are these (as mentioned in my previous question about birth months of boxers—I avoided mentioning astrology at that time because it makes some people moan and/or think you are crazy. I'm just curious and investigating).

n = 67. 27 fit into earth signs (my predicted Capricorn, Taurus or Virgo). Is that statistically significant? Is 27/67 statistically different from the predicted 25% of 67 (16.75)? (There are 4 sign categories so I expect 25% to be in any particular sign category).

I find also that 10/67 are the 'anti-boxers' that I predicted—that is, only 10 of 67 are Pisces, Aquarius or Libra. Again, is this statistically significant compared to the expected 16.75 (1/4 of 67).

Based upon the previous answer to my question, it appears the chi-square test needs to be done, or does my posting here alter the best way to approach this?

I also want to update my list, getting closer to 100. I also may want to refine and separate out flyweights and anything below middleweight—however I think this will reduce the sample size down too far and perhaps it is making a bad assumption about the need for strength, power and stamina in lower weight class fighters.

Does chi-square remain the best way to test this data? Or is there a way to test it using a 'category' approach (12 categories, 1 per sign, or perhaps 4 categories—the first category is the earth sign, the other category is the Pisces/Libra/Aquarius category of 25% of all births)?

#### Best Answer

**If your hypothesis was formulated a priori, then the data are quite strongly significant.**

Your *null hypothesis* is that astrology does not predict anything. This would mean that the probability of a boxer to be born under an "earth" sign is $0.25$, and the same is true for the "ethereal" signs. I assume that you selected these signs *a priori*, before looking at the actual birth dates.

You want to disprove the null hypothesis, and you are interested in deviations in one particular direction: more boxers born under earth signs, and less under the ethereal signs (not vice versa). This means you can conduct *one-sided tests*. Here is how.

Consider earth signs. Under the null hypothesis, the most probable number of boxers born under these signs out of total $67$ is $67/4$. But for any integer number $x$ between $0$ and $67$ one can compute a probability that exactly so many boxers were born under these signs. This gives a function $p(x)$, known as *binomial probability density function*. You can then ask, what is the probability that $27$ or more boxers would be born under an earth sign? The answer is given by a sum $sum_{x=27}^{67} p(x)$. Computing it gives $0.004$.

This is known as a *p-value*: a probability that you could have observed your result, or an even more extreme result, under the null hypothesis. P-value of $p=0.004$ is pretty low, and most people would call it "significant", i.e. the data seem to speak against the null hypothesis.

We can do the same with the ethereal signs, arriving at the p-value of $p=0.03$ that $10$ or less boxers were born under them. This is also quite low.

Note, however, that these two p-values are not independent: certainly, if more boxers were born under the earth signs, it would automatically mean that less boxers could have been born under the ethereal signs. I don't know how to compute a probability of observing $67$ or more and $10$ or less at the same time, but it is easy to simulate. Let's generate $1:000:000$ parallel worlds where null hypothesis is true. We can then count the number of worlds where number of earth-born boxers is $27$ or more; where number of ethereal-born boxers is $10$ or less; and where both is true. This is called *a Monte Carlo simulation*.

Dividing the counts by $1:000:000$, I obtain: $0.004$, $0.03$, and $0.0008$. First two numbers are identical to the ones obtained above. The last number is the most relevant one.

I would argue that $p=0.0008$ is low enough to think that maybe there is something interesting here! If one has some strong *a priori* reasons to doubt astrology, one would want to use a much more stringent criterium than a conventional threshold of $p<0.05$: *"extraordinary claims require extraordinary evidence"*. But $p=0.0008$ looks quite convincing (even though can still be a fluke).

Finally, let me remind you that all of the above crucially depends on the fact that you selected your Zodiac signs before looking at the data and that you selected your boxers without looking at their birth dates. If that is not true, then p-value can easily change to about $sim 0.05$, as nicely shown here by @whuber.

## Matlab code:

`N = 1e+6; counts = [0 0 0]; n1 = binornd(67, 0.25, [N 1]); n2 = binornd(67-n1, 1/3); counts(1) = length(find(n1>=27)); counts(2) = length(find(n2<=10)); counts(3) = length(find(n1>=27 & n2<=10)); display(['Monte Carlo results: ' num2str(counts/N, 2)]) display(['Analytical solution: ' num2str(1-binocdf(26,67,0.25), 2)]) display(['Analytical solution: ' num2str(binocdf(10,67,0.25), 2)]) `

Running this takes 4.5 seconds on my laptop and results in

`Monte Carlo results: 0.0043 0.034 0.00082 Analytical solution: 0.0042 Analytical solution: 0.034 `

### Similar Posts:

- Solved – Kolmogorv Smirnov Test in R
- Solved – Kolmogorv Smirnov Test in R
- Solved – I need someone to check the conditional probability calculation function
- Solved – Probability of 3 brothers born on the same date of 3 different months
- Solved – Probability of 3 brothers born on the same date of 3 different months