Not too long ago (prior to Sitecore 7.5) we had a requirement to add the ability to sort Sitecore items by Most Viewed and Most Shared. We got to thinking about what was the best way to implement something like this. In the past we may have created a custom database to track this information. Upon each page view or share, we could have written a row to the database. This data would then be available for use in sorting. This approach has costs involved with needing to maintain an extra database and an API. Could we take a simpler approach? We were already using Google Analytics to track page views and social interactions. Google provides APIs for its products, so we should be able to retrieve the data we need. We decided to go down the route of using Google Analytics to provide us with the data required for sorting by Mode Viewed and Most Shared.
At the most basic level, Google Analytics provides the ability to track page views. This is achieved by including a tracking code on every page you want to track. The tracking code calls out to the Google Analytics servers registering information about the page view. Beyond page views, Google Analytics also had the ability to trackSocial Interactions. Social interactions are user interactions with social buttons and widgets. Clicking the Facebook “Like” button would register a Social Interaction. Generally, event handlers need to be manually wired to the buttons or widgets to register the social interaction. Social Activities take Social Interactions to the next level. Social Activities track Social Interactions on other sites that reference your site. This allows a blog post reference on another site to be reported back to the blog owner.
For this implementation, we will only be addressing Page Views and Social Interactions. I just wanted to mention Social Activities to show what else can be achieved.
As I mentioned before, Google Analytics was already being used on the site we wanted to add sorting by Page Views and Social Interactions to. Going through the Analytics Core Reporting API, we could query for whatever data we were interested in. The Dimensions & Metrics Reference is a good reference for what data is available through the API. Using the reference, we see that we want**ga:pageviews** and ga:socialInteractions metrics to get the number of page views and social interactions. The dimensions within the Page Tracking section gives us information about the page that was requested. We could use the ga:pagePath to figure out what the related Item was. This would require figuring out what item lives at the specific URL or path. This felt like there would be a lot of overhead to find the Item from the path. If we could include the Item ID as part of the page title, then we could quickly parse out the item id without needing to look it up. From this, we can use the ga:pageTitledimension.
How do we get the item ID into the page title? Google Analytics provides Tracking Code for page views. This code needs to live on each page that is tracked. The code typically looks like this:
We can extend the above code adding the Item ID. To do so, we can use the set command.
We are essentially keeping the existing title and appending the Item ID with a pipe separator. This is still human readable if viewing in the Google Analytics reports and it is also parsable. The GUID should be replaced with the current Sitecore Item ID. The approach for getting the Item ID varies based on MVC, Web Forms, and/or the implementation. Now that we have set the title, we should have the Item ID available to us for all Page Views and Social Interactions.
I am not going to cover how to add the Social Interaction script for Google Analytics because it is dependent on the implementation. See here for more details on how to implement.
Now that the data is being collected correctly with the associated Item ID, we need to be able to query for the results. Google provides a convenient Client Library for interacting with Google Analytics data. You can add this client library through NuGet. The library is named Google.Apis.Analytics.v3
The first thing we need to do is provide configuration settings for access to the data from with your Google Analytics account. These settings include the Profile Id, Service Account Email, and the Key File Path. You can learn more about where to obtain the values for these settings from the Readme.md on the GitHub repository. The last setting is the Window in Days. These are the number of days to look back when retrieving analytics data. In our particular case we just want to look at current analytics data. We generally just want to use the past 30 days of data.
Here is the code to get the settings
Now that we have the settings, we can use the Google Analytics API to retrieve the data. First, we need to obtain a reference to the Analytics Service. To do this we need to P12 file path and Service Account Email from the settings.
Once we have the reference to the service, we can build the request object. The request includes the profile id from the settings. It also includes the data range we want to retrieve data for. Lastly it includes the Metrics and Dimensions we want to retrieve; ga:pageviews, ga:socialInteractions, and ga:pageTitle
We can then execute the request. Google may return pages worth of data. because of this, we need to build in a way to retrieve the results page by page. We want to return the results as a dictionary, where the key is the Item ID. The value holds the number of page views and social interactions.
Now that we have the results, we need to parse them. Remember, the page title not only contains the actual page title, but also the Item ID. The results are returned as column headers and rows. we need to figure out what column contains which Dimension or Metric.
The above code retrieves the analytics data for an entire site covering a 30 day period. It would be beneficial to not retrieve this data each time we need to use it. Analytics data is not that time sensitive. Caching Analytics data for a short period should not make that big of an impact to the overall results. Because of this, we should be able cache the analytics data. If we refreshed the cache once an hour, users should see reasonable results. We can use Sitecore’s Task Scheduler to run a task once an hour to update the Analytics data cache.
First we need to create a class that manages the cache.
Now we just need to create the Task and have the task call all of the helper methods we went over above.
Lastly we need to build out our configuration file. This file not only includes the settings for the Google Analytics API, but also the scheduled task.
To make things easier to use, we will create a helper service with several overloads.
Using the above helper methods, we can easily use the Analytics data to sort. To obtain the number of page views and social interactions for an Item, you can do the following:
As you can see, it is very simple to get the number of page views and social interactions for an item. These methods pull from cache and therefore perform well. We can expand this example to sorting a list of Sitecore Items based on the number of views.
As you can see, it is quite simple to get Google Analytics Page View and Social Interaction data for use in Sitecore. All of the code for these examples as well as additional information is located here:https://github.com/onenorth/social-sort
For a version of this post with inline examples, please see: mskutta.github.io/.../