Can you see any S-curve in the below scatter plot?
If there is the S-curve is it correct to use a scatterplot plot? If the linear correlation is 0.85, does it mean the S-curve is unlikely?
I am trying to understand the relationship between the two variables.
Best Answer
1) I see no suggestion of an S-curve in the left hand plot. If I saw that alone, my first instinct would have been 'close to linear, at least over the range we can see' (though the small sample size means it could really be almost anything).
However, if the variable on the Y-axis is necessarily bounded above and below (if it's a percentage, say), we could well argue that the appearance linearity can't continue and it should start to increase and decrease more slowly as we go nearer the boundaries.
2) You're going to have to narrow the scope of this down a little; there's all kinds of things that could be said here, depending on what you're after.
3) I don't see how having an S-shape would prevent doing a scatter plot – people plot curved relationships all the time. However, if there are natural boundaries to the y-axis variable, plotting with a transformed Y or X may be more informative.
4) The correlation isn't meaningless, but ordinary Pearson correlation measures the strength of linear relationship. If the actual relationship were monotonic instead you may prefer a measure that captured the strength of that monotonicity; perhaps a Spearman or Kendall correlation.
5) The obvious thing to do is to fit a linear relationship and examine residuals, perhaps with a superimposed loess curve on a plot of residuals vs fitted values – if there's really an "S" curve that plot should clearly look like an S tipped on its side.
—
wanted to verify our analysis and saying even if there was a S-curve, linear regression cannot be incorrect
Well of course fitting a linear model to a nonlinear relationship, if there is one, is 'not correct', in a quite direct sense. However it would be very rare indeed for a relationship to be exactly anything in particular, and especially the case with linear relationships.
Over the range looked at, fitting a linear relationship – even though it's wrong – might nevertheless still be useful, if the range of values over which you want to make any conclusions/predictions remains within the range of the data.
If on the other hand, you have external knowledge about bounds on the variable, you should use it.
For data like the data shown it won't make much difference which you do in terms of fitted values. The two will look very similar. When it comes to prediction outside the range of the data, of course things will look very different.
You could always humor the other person and do what they suggest and see how things compare.