I am a PhD student with some difficulties about what test to choose to verify an experiment. Hope you can give me some help!
My experiment:
I am counting the number of a subset of cells in Drosophila overtime.
I have a mutant and a wild type control. I dissect the flies at different time points: just eclosed, 1 day after eclosion, 3 dae, 6 dae, 9 dae, 12 dae and 15 dae. For each time-point I have a n number of around 50 for sample. The sample number has a range of 0-9, depending on the day and if mutant or wildtype.
The problem:
What happen with these cells is that in the wild type population the number of cells does not decrease overtime, while in the mutants it does. Moreover the wild type population has an average of around 6 (in all the dissection days), while the mutant starts with 4 and goes down to 2. If they were the same amount initially I could have used an anova test with a posthoc test (like tukey) to show the difference between wild type and control for each day, but in this scenario an anova would tell me only that all the wild type are different from the same dae dissected mutant (even at eclosion!) because of the basic difference they have, or that for example mutant dae15 is different from mutant dae0,1,3. Moreover, what I would like to test against is not that the wild type is different from the mutant at a particular timepoint, but that the mutant numbers decrease in a different way compared to the control numbers.
Possible solutions:
One possibility would be to do an anova on a normalized sample: I could divide every number (and I mean the single numbers of the count, not the average) I get for the average of the eclosion day, and do an anova on that ratio (so I would have like normalised the two populations to the eclosion day and it would not matter anymore that their raw numbers are different), but something tells me I would mess up all the statistics: I am not sure if this approach is statistically correct.
Another possibility I was thinking is to use some kind of correlation or regression analysis, but in the example I found I never found this approach on my kind of experiment, so i am not sure how to work this out. In the various handbook it always show this kind of approach to smaller experiments, like comparing blood pressure and coffee assumption, and with much less numbers, so I am not sure what I would end up with blotting 600 different points together.
Another possibility would be to compare the slope of the two different lines, but I am not sure how I could do this considering all the different counts and not only the final average.
Someone suggested to to a Keplen Maier test, but I cannot see how to apply this kind of tests (logrank) considering all the count that compose one timepoint and not only the final average.
How I am going to do this if I do not find better solution is: just do an anova (yes, one single set of counts is normally distributed) on the raw numbers and show that while Wt Day0 and Day15 are not statistically different, while the Day0 and Day15 from the mutant are statistically different (or even the Day0 from Day12 etc). Maybe this would be enough and I am just over-thinking the problem, but somehow it does not seem to fully describe what I am analysing.
If you have any idea how this problem could be solved, I would be grateful!
Best Answer
Someone suggested to to a Keplen Maier test, but I cannot see how to apply this kind of tests (logrank) considering all the count that compose one timepoint and not only the final average.
While your problem feels like a survival analysis problem, and you could show what you want to show with a set of KM curves, I think it may be a flawed approach. In order to do a survival analysis, one of two things would have to be true:
- You know all the cell counts for all flies at each sampling point. Since you have to dissect them, and only dissect a portion, this isn't true.
- You might be able to ignore this if you could credibly say "the undirected flies all have counts identical to the dissected ones", but I have no idea if this is true or not, and if you have any variability in your data, likely won't be true.
I might indeed go with a regression approach, showing the slope of the line of number of cells over time drops for the mutants, but not for the wild type. Don't worry about having too much data – the folks here regularly run regression problems on thousands, tens of thousands and higher amounts of data. Your books likely don't cover it because it's really easy to see what's happening with small amounts of data, but the technique can handle plenty.
My best advice is to find someone in your department, or a related department, that does regression on a regular basis, and talk to them about how to approach your problem, but it doesn't seem too difficult on the surface.
Similar Posts:
- Solved – How to two different experiments be compared when they have different controls
- Solved – How to two different experiments be compared when they have different controls
- Solved – SNP genotype coding in regression
- Solved – Proper Statistical Test for Binary Data
- Solved – Proper Statistical Test for Binary Data