Solved – the best way to determine if pageviews are trending upward or downward

Given the following dataset for a single article on my site:

Article 1 2/1/2010 100 2/2/2010 80 2/3/2010 60  Article 2 2/1/2010 20000 2/2/2010 25000 2/3/2010 23000 

where column 1 is the date and column 2 is the number of pageviews for an article. What is a basic acceleration calculation that can be done to determine if this article is trending upwards or downwards for days I have pageview data for it?

For example looking at the numbers I can see Article1 is trending downwards. How can that be reflected in an algorithm most easily?

thanks!

General thoughts about pageviews

I think there is a fair amount of domain specific knowledge that can be brought to bear on page views. From examining my Google Analytics statistics from particular blog posts, I observe the following characteristics:

  • Large initial spike in pageviews when an article is first posted related to hits coming from RSS feeds, links from syndication sites, prominence on home page, spikes related to newness and social media. This effect tends to decline rapidly, but seems to still provide some boost for a few weeks.
  • Day of the week effects. At least in my blog on statistics, I get a consistent day of the week effect. There is a lull on the weekend. The implication is that if I were trying to understand meaningful trends in an article, I would be looking at changes from week to week rather than day to day.
  • Seasonal effects: I also get more subtle seasonal effects presumably related to when people are working or holidays and for some posts more than others when university students are studying or not. For example, the week between Christmas and New Years is very quiet.
  • After the initial spike, I find most traffic is driven by Google searches, although a few posts derive considerable traffic from links from other blogs or websites. Links from Social media and blog posts tend to lead to abrupt spikes in page views and depending on the medium may or may not lead to a consistent stream over time.

Implications for identifying upward or downward trends in a page

  • The above analysis provides a general model that I use to understand pageviews on my own blog posts. It is a theory of some of the major factors that influence page views, at least on my site and from my experience. I think having a model like this, or something similar, helps to refine the research question.

  • For instance, presumably you are interested in only some forms of upward and downward trends. Trends that operate on the whole site such as day of the week and seasonal trends are probably not the main focus. Likewise, trends related to the initial spike in pageviews and subsequent decline following a posting are relatively obvious and may not be of interest (or maybe they are).

  • There is also an issue related to the time frame and functional form of trending. A page may be gradually increasing in weekly pageviews due to gradual improvements in its positioning in Google's algorithms or general popularity of the topic of the post. Alternatively, a post may experience an abrupt increase as a result of it being linked to by a high profile website.

  • Another issue relates to thresholds for defining trending. This includes both statistical significance and effect sizes. I.e., is the trend statistically significantly different from random variation that you might see, and is the change worthy of your attention.

Simple strategy for detecting interesting trends in pageviews

I'm not an expert in time series analysis, but here are a few thoughts about how I might implement such a tool.

  • I'd compute a table that compares pageviews for the preceding 28 days with the 28 days prior to the most recent 28 days. You could make this more advance by making time frame a variable quantity (e.g., 7 days, 14 days, 56 days, etc.). The more popular the page (and the site in general), the more likely that you are going to have enough page views in a period to do meaningful comparisons. Each row of the table would be a page on your site. You'd start with three columns (page title, current page views, comparison page views)
  • Filter out pages that did not exist for the entire comparison period.
  • Add columns that assist in the assessment of the effect size of any change, and the statistical significance of any change. A simple summary statistic to use would be percentage change from comparison to current. You could also include raw change from comparison to current. Perhaps a chi-square could be used to provide a rough quantification of the significance of any change (although I'm aware that the assumption of independence of observations is often compromised, which also raises the issue of whether you are using pageviews or unique page views).
  • I'd then create a composite of the effect size and the significance test to represent "interestingness".
  • You could also adopt a cut-off for when a change is sufficiently interesting, and of course classify it as upward or downward.
  • You could then apply sorting and filtering tools to answer particular questions.
  • In terms of implementation, this could all be done using R and data exported from tools like Google Analytics. There are also some interfaces between R and Google Analytics, but I haven't personally tried them.

Similar Posts:

Rate this post

Leave a Comment