I have a set of data, $y$ and $x$. I would like to test the following hypothesis: There is a peak in $y$; that is as $x$ increases, $y$ first increases and then decreases.
My first idea was fitting $x$ and $x^2$ in a SLR. That is, if I find that the coefficient before $x$ is significantly positive and the coefficient before $x^2$ is significantly negative, then I have support for the hypothesis. However, this only checks for one type of relationship (quadratic) and may not necessarily capture the existence of the peak.
Then I thought of finding $b$, such a region of (sorted values of) $x$, that $b$ is between $a$ and $c$, two other regions of $x$ that contain at least as many points as $b$, and that $bar{y_b}>bar{y_a}$ and $bar{y_b}>bar{y_c}$ significantly. If the hypothesis is true, we should expect many such regions $b$. Thus, if the number of $b$ is sufficiently large, there should be support for the hypothesis.
Do you think I am on the right track to find a suitable test for my hypothesis? Or am I inventing the wheel and there is an established method for this problem? I will greatly appreciate your input.
UPDATE. My dependent variable $y$ is count (non-negative integer).
Best Answer
I was thinking of the smoothing idea also. But there is a whole area called response surface methodology that searches for peaks in noisy data (it does primarily involve using local quadratic fits to the data) and there was a famous paper I recall with "Bump hunting" in the title. Here are some links to books on response surface methodology. Ray Myer's books are particularly well-written. I will try to find the bump hunting paper.
Response Surface Methodology: Process and Product Optimization Using Designed Experiments
Response Surface Methodology And Related Topics
Empirical Model-Building and Response Surfaces
Although not the article I was looking for, here is a very relevant article by Jerry Friedman and Nick Fisher that deals with these ideas applied to high-dimensional data.
Here is an article with some online comments.
So I hope you at least appreciate my response. I think your ideas are good and on the right track but yes I do think you might be reinventing the wheel and I hope you and others will look at these excellent references.
Similar Posts:
- Solved – Model selection with nonlinear fitting? Statistical tests seem ambiguous
- Solved – Regression analysis and response surface analysis
- Solved – Interpret coefficient for dumthe variable in multiple linear regression
- Solved – How does one calculate the F-value-threshold used to evaluate an F-test
- Solved – Understanding the results of Bartlett’s test of homoscedasticity in ANOVA