This is my first question on stats, just trying to learn the basics of time series analysis with `R`

. So any good suggestions about learning resources will be highly appreciated as well as the answer to the question.

For the data below, let's say it represents number of website visits per day, I would like to find out:

What the weekly pattern is (e.g. the highest number of visits

occurs on Thursdays, the lowest on Fridays etc.)Automatically detect changes in that pattern (e.g. in 2008 most of the visits

occur on Thursdays, but the in 2009-01-04 the pattern changes to

something else)

Please let me know if I can provide more details.

`> str(daily) An ‘xts’ object on 2007-02-19 23:32:16/2013-05-05 15:09:17 containing: Data: num [1:2268, 1] 55 32 70 48 75 50 48 46 36 55 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr "cnt" Indexed by objects of class: [POSIXct,POSIXt] TZ: xts Attributes: NULL > head(daily) cnt 2007-02-19 23:32:16 55 2007-02-20 23:58:58 32 2007-02-21 23:40:41 70 2007-02-22 23:01:41 48 2007-02-23 23:53:06 75 2007-02-24 23:47:07 50 plot(daily) `

Full dataset: https://dl.dropboxusercontent.com/u/65347419/daily.csv

**Contents**hide

#### Best Answer

You will need to consider 6 daily dummies, 11 monthly dummies, your ~10-15 holiday dummy variables. You will need to NOT consider any ARIMA as you want to rely more upon deterministic variables already listed. You will need to also consider trend(a dummy variable like 1,2,3,4,5,6etc, and perhaps changes in trend so there could be multiple so 0,0,0,0,1,2,3,4,5,,etc.), outliers, levels shifts, changes in seasonality (ie seasonal pulses as you very smartly point out that there are changes in the day of the week pattern!!!), lead and lag impacts around the holidays. There might also be day of the month variables, but we see that more with datasets tied to cash as payday is usually around the end of the month and middle of the month.

You would need to remove which dummy variables are insignificant. You can do a poor man's check when you are done to compare the coefficient in the model vs a % of the total to see if they make sense. For example if Monday contributes 50% of the overall volume then your Monday dummy should be POSITIVE and much larger than the other cooefficients.

Feel free to post your data and I would be glad to look at it. Just make sure to state the beginning observations date and the country where the data is from in order to bring in the appropriate holidays.

We have been working on time series(since 1975) and the issue of daily data(since 1998).

You will need to consider 6 daily dummies, 11 monthly dummies, your ~10-15 holiday dummy variables. You will need to NOT consider any ARIMA as you want to rely more upon deministic variables already listed. You will need to also consider trend(a dummy variable like 1,2,3,4,5,6etc, and perhaps changes in trend so there could be multiple so 0,0,0,0,1,2,3,4,5,,etc.), outliers, levels shifts, changes in seasonality (ie seasonal pulses as you very smartly point out that there are changes in the day of the week patn!!!), lead and lag impacts around the holidays. There might also be day of the month variables, but we see that more with datasets tied to cash as payday is usually around the end of the month and middle of the month.

You would need to remove which dummy variables are insignificant. You can do a poor man's check when you are done to compare the coefficient in the model vs a % of the total to see if they make sense. For example if Monday contributes 50% of the overall volume then your Monday dummy should be POSITIVE and much larger than the other cooefficients.

Feel free to post your data and I would be glad to look at it. Just make sure to state the beginning observations date and the country where the data is from in order to bring in the appropriate holidays.

We have been working on time series(since 1975) and the issue of daily data(since 1998).

Yes, I appreciate your goal of learning how to get do this, but the best I can do is this. Maybe you can reverse engineer?

Sorry for the delay!

Ok, we have analyzed your data and here are our findings. We reduced the data set to use the last 1,162 observations. The data begins on Monday 3/1/2010. The Monday start date is very important when inpreting the day of the week variables.While 6 years of data can be be helpful, in this case it is too much data as the data is so small at the beginning.

Here is a summary of the average and holidays:

The average demand is 211.

Let's review the holidays, there is a decrease in demand starting 4 days before Christmas of 48.29. Thanksgiving has an impact on the day of and the day af. Most holidays have a negative impact except St.Patrick's.

Y(T) = 211.59

+[X1(T)][(- 48.2988B**-4- 59.9340B**-3- 100.12 B**-2 – 238.07 B**-1- 150.08 – 352.52 B** 1)] M_CHRISTMAS +[X2(T)][(- 64.4805)] M_CINCODEMAYO +[X3(T)][(- 15.8391)] M_COLUMBUS +[X4(T)][(- 31.9900B*

*-3- 100.62 – 11.5661B** 1

– 27.8408B** 2- 14.4484B** 4)] M_GOODFRIDAY +[X5(T)][(- 14.9771B** 1)] M_FATHERSDAY +[X6(T)][(- 36.0068B** 1)] M_HALLOWEEN +[X7(T)][(- 195.05 – 103.42 B** 1)] M_JULY4TH +[X8(T)][(- 198.80 – 53.2202B** 1)] M_LABORDAY +[X9(T)][(- 28.3956- 29.5278B** 1)] M_MARDIGRAS +[X10(T)[(- 81.8183)] M_MARTINLKING +[X11(T)[(- 209.58 – 19.8445B** 1)] M_MEMORIALDAY +[X12(T)[(- 166.62 B*

*-4- 82.5935B**-3- 73.4411B*

*-2*

– 218.53 B*-1- 117.53 – 115.39 B** 1)] M_NEWYEARS +[X13(T)[(- 113.83 – 12.5742B** 1)] M_PRESIDENTS +[X14(T)[(+ 13.2287B** 1)] M_STPATRICKS +[X15(T)[(- 37.4732B** 1)] M_STVALENTINES +[X16(T)[(- 244.69 – 206.13 B** 1)] M_THANKSGIVI +[X17(T)[(- 42.7715+ 17.9379B** 4+ 13.8972B** 5)] M_VEANSDAY

– 218.53 B

Autobox searches for impacts when Holidays land on a Monday or a Friday. The Monday_after a holiday on a Friday is negative 57.23. When there is a holiday on a Friday or Monday the weekend had a lower demand of 3. +[X18(T)[(- 57.2342)] MONDAY_AFTER +[X19(T)[(- 3.4639)] LONGWEEKEND

The month of the year pattern has February and March as the largest months and August as the lowest month. February is not significant so it is the same as the average. March is the intercept.

` +[X20(T)[(- 19.7274)] MONTH_EFF04 +[X21(T)[(- 47.0142)] MONTH_EFF05 +[X22(T)[(- 78.4654)] MONTH_EFF06 +[X23(T)[(- 88.7855)] MONTH_EFF07 +[X24(T)[(- 91.1418)] MONTH_EFF08 +[X25(T)[(- 84.4558)] MONTH_EFF09 +[X26(T)[(- 75.2718)] MONTH_EFF10 +[X27(T)[(- 65.9504)] MONTH_EFF11 +[X28(T)[(- 47.3812)] MONTH_EFF12 +[X29(T)[(- 14.9862)] MONTH_EFF01 `

Saturdays are the lowest and Sundays(not shown as it is the intercept or average of 211.59) are at the average and Tuesdays and Wednesdays Remember that Monday was the first day of in the dataset so the first variable reflects Monday.

` +[X30(T)[(+ 189.10 )] FIXED_EFF_N10107 +[X31(T)[(+ 232.88 )] FIXED_EFF_N10207 +[X32(T)[(+ 231.11 )] FIXED_EFF_N10307 +[X33(T)[(+ 219.69 )] FIXED_EFF_N10407 +[X34(T)[(+ 154.80 )] FIXED_EFF_N10507 +[X35(T)[(- 30.4825)] FIXED_EFF_N10607 `

Two time trends. The first begins at time period 1 and indicates an increase of volume each day by .752. The second trend is negative at -.630 and starts at period 583, but the in general the trend is still up (ie .752-.630=+.122).

` +[X36(T)[(+ .752)] :TIME TREND 1 1/ 1 3/ 1/2010 I~T00001__030110 +[X37(T)[(- .630)] :TIME TREND 583 84/ 2 10/ 4/2011 I~T00583__030110 `

There are 22 one-time (pulse) outliers and 3 level shifts(changes in the intercept) 9 seasonal pulses reflecting a change in the day of the week pattern.

It looks like day 6 and 7(sat and sun) have evolved to be lower a couple of times. There was a drop found on Saturdays beginning 1/15/2011, 1/29/2011, 3/5/2011, and 2/11/2012. Sundays also had some similar drops. Day 2(Tuesdays) also had an increase beginning 5/15/2012 of +27.8687.

Four level shifts occurred with a decrease of 4.44 beginning 10/25/2010, a decrease of 32.45 beginning 8/3/2011, a decrease of 64.65 beginning 4/13/2011 and an increase of 35 beginning 3/19/2012.

` +[X38(T)[(- 43.0325)] :SEASONAL PULSE 713 102/ 6 2/11/2012 I~S00713__030110tet +[X39(T)[(- 42.1592)] :SEASONAL PULSE 679 97/ 7 1/ 8/2012 I~S00679__030110tet +[X40(T)[(- 354.57 )] :PULSE 1031 148/ 2 12/25/2012 I~P01031__030110tet +[X41(T)[(- 348.37 )] :PULSE 1038 149/ 2 1/ 1/2013 I~P01038__030110tet +[X42(T)[(- 231.82 )] :PULSE 1033 148/ 4 12/27/2012 I~P01033__030110tet +[X43(T)[(+ 241.57 )] :PULSE 301 43/ 7 12/26/2010 I~P00301__030110tet +[X44(T)[(+ 85.0799)] :PULSE 1156 166/ 1 4/29/2013 I~P01156__030110tet +[X45(T)[(+ 240.85 )] :PULSE 689 99/ 3 1/18/2012 I~P00689__030110tet +[X46(T)[(+ 44.4059)] :PULSE 1159 166/ 4 5/ 2/2013 I~P01159__030110tet +[X47(T)[(- 50.5678)] :SEASONAL PULSE 329 47/ 7 1/23/2011 I~S00329__030110tet +[X48(T)[(- 32.8224)] :SEASONAL PULSE 28 4/ 7 3/28/2010 I~S00028__030110tet +[X49(T)[(- 26.9859)] :SEASONAL PULSE 370 53/ 6 3/ 5/2011 I~S00370__030110tet +[X50(T)[(+ 27.8687)] :SEASONAL PULSE 807 116/ 2 5/15/2012 I~S00807__030110tet +[X51(T)[(- 177.84 )] :PULSE 667 96/ 2 12/27/2011 I~P00667__030110tet +[X52(T)[(+ 35.4320)] :LEVEL SHIFT 750 108/ 1 3/19/2012 I~L00750__030110tet +[X53(T)[(- 64.6500)] :LEVEL SHIFT 409 59/ 3 4/13/2011 I~L00409__030110tet +[X54(T)[(- 4.4417)] :LEVEL SHIFT 239 35/ 1 10/25/2010 I~L00239__030110tet +[X55(T)[(+ 173.26 )] :PULSE 585 84/ 4 10/ 6/2011 I~P00585__030110tet +[X56(T)[(+ 179.15 )] :PULSE 690 99/ 4 1/19/2012 I~P00690__030110tet +[X57(T)[(- 107.75 )] :PULSE 24 4/ 3 3/24/2010 I~P00024__030110tet +[X58(T)[(+ 203.26 )] :PULSE 126 18/ 7 7/ 4/2010 I~P00126__030110tet +[X59(T)[(+ 208.55 )] :PULSE 664 95/ 6 12/24/2011 I~P00664__030110tet +[X60(T)[(- 27.9535)] :SEASONAL PULSE 335 48/ 6 1/29/2011 I~S00335__030110tet +[X61(T)[(- 74.6895)] :SEASONAL PULSE 322 46/ 7 1/16/2011 I~S00322__030110tet +[X62(T)[(- 106.79 )] :PULSE 1030 148/ 1 12/24/2012 I~P01030__030110tet +[X63(T)[(- 33.8918)] :PULSE 1158 166/ 3 5/ 1/2013 I~P01158__030110tet +[X64(T)[(- 32.0515)] :PULSE 1161 166/ 6 5/ 4/2013 I~P01161__030110tet +[X65(T)[(- 32.4514)] :LEVEL SHIFT 531 76/ 6 8/13/2011 I~L00531__030110tet +[X66(T)[(+ 137.78 )] :PULSE 1095 157/ 3 2/27/2013 I~P01095__030110tet +[X67(T)[(+ 168.20 )] :PULSE 1128 162/ 1 4/ 1/2013 I~P01128__030110tet +[X68(T)[(- 127.34 )] :PULSE 633 91/ 3 11/23/2011 I~P00633__030110tet +[X69(T)[(- 57.6397)] :SEASONAL PULSE 321 46/ 6 1/15/2011 I~S00321__030110tet +[X70(T)[(+ 208.15 )] :PULSE 671 96/ 6 12/31/2011 I~P00671__030110tet +[X71(T)[(- 124.65 )] :PULSE 1107 159/ 1 3/11/2013 I~P01107__030110tet +[X72(T)[(- 102.15 )] :PULSE 429 62/ 2 5/ 3/2011 I~P00429__030110tet + + [A(T)] `

If you do simple math and look at the contribution of a month to total or the day to the total you will see that the coefficients are similar. See the XLS file for the check on this. It shouldn't be exact, but rather directional in nature and it is.

We did allow Autobox to search for arima, special days of the month as this is not CASH demand related to pay days, and expanded the number of outliers to be searched or to a max of 100 due to the largest sample size.

You can see the output from the Autobox run and the XLS file showing the "poor man's" model to compare to the coefficients from Autobox in Dropbox here https://www.dropbox.com/sh/fyd0lvbnjrlbwoz/M0sH1FFhTu