Solved – How to correctly apply a linear trendline equation

I'm a programmer, and unfortunately not a good statistician. I am trying to apply a linear equation (y=mx+b) to display a trendline on a graph. Here's a sample of what I'm able to produce:

enter image description here

Where the Y-axis is a percent relative to the goal (not in currency, but in a percent), and the X-axis is the date (represented discreetly in this image). I've tweaked the graph to show the trendline as an extension from the last day (today) as to visually combine 2 lines into 1 (as opposed to 2 lines running together).

From what I understand, we can calculate a trendline using two or more data points. And that's where my question comes in. I'm using the entire current dataset (July 2-24) to calculate my trendline. So in the image, that would be all data points on the blue line. For example, the first point (7/2/13) is represented as {1,0.28}; the second as {2,0.01}; and so forth.

Am I correctly applying the trendline equation by using that dataset?

As an additional part to the question: should I be using another form of trendline equation, like the polynomial? And, finally, could my linear equation be derived by using two data points from the same time last year? For example, July 26-27's forecast is solved for by using the same dates' historical data from the year prior.

If you represent your data points as $1,2,3dots$ you are assuming that the dates are equally spaced. It would be better to use "days since start" or something like that. You have to decide what to do about weekends-maybe they don't count.

Normally the trendline would be shown going through the data points. It will be above some and below others. You just need to start calculating points on it from the start of the data. If you want to extend it past the data, that is fine. You might remember that "extrapolation is much more hazardous than interpolation"

You can use higher order trendlines, or exponential, or logarithmic. This is not a mathematical question. You might have some reason to think one of these is a better fit than another. Exponential would fit something growing at a certain percentage every month, for example.

One reason to use a trendline is that it tends to remove noise from the data. Your data around July 14 seems rather low, for example. If you fit a line through two points, which is mathematically sound, the noise will be more of a problem. Compare the trendline you have to $y=0$ for all time, which fits the July 4 and July 11 data points for example.

It looks like you are taking the slope of the trendline and then starting from the last data point. The trendline also has a constant term, which will put the line through the middle of your points.

You might read chapter 15 of Numerical Recipes or some other numerical analysis text.

Similar Posts:

Rate this post

Leave a Comment