The problem I have is as follows. I have enrolment data describing the number of male and female preferences for two broad fields of education: i) Information Technology and ii) Engineering. The data reflects "first preferences", that is, the number of people who wrote down on paper that their most preferred degree was either one in the field of Information Technology or the field of Engineering. The data does not reflect actual enrolments. The numbers are as follows:
Female: 266 (~13%)
Male: 1783 (~87%)
Female: 684 (~12.5%)
Male: 4773 (~87.5%)
The question I am trying to answer is: is there a statistically significant difference between the gender ratios of Engineering versus Information Technology, and at what level of significance?
I apologize if this is a noob question. I only have relatively basic background knowledge in statistics and tests of significance and this problem didn't seem to fit any of the methods I am familiar with. I'm hoping someone might be able to point me in the right direction as to which statistical test(s) would be appropriate given the dataset and any caveats I should be aware of.
You are looking to test for a "difference between differences," as James Jaccard would say. "Does the gender difference come out differently depending on whether the field is IT or Engineering?" (I would't pursue the question by looking at ratios per se.) Such questions are typically addressed by testing for statistical interactions in a regression or anova model. In this case you have a binary outcome–enrolled or didn't enroll–so logistic regression would be the natural choice.
With that said, it's hard to imagine too many people caring whether the gender-by-field interaction is statistically significant when it looks so insignificant in practical terms.