LibrarySites.Banner

Rebuilding the Sitecore Analytics Index

My last post ended with a question that you'll probably ask after you understand how the analytics index is populated: how is it rebuilt?

Why would you want to do this? The obvious answer is "if the index gets corrupted". But I think the more useful, everyday answer is "it allows you to add data directly to the analytics database and have that data indexed". 

(You would only want to do for certain types of testing or educational purposes. There are reasons why you shouldn't add production data directly to the analytics database in this way. I will cover bulk data loading into the analytics database at some point in the future.)

This post explains how to rebuild the analytics index manually.

The Sitecore client includes various tools that allow you to interact with indexes. In Content Editor you can rebuild indexes in the Developer tab. The Indexing Manager - available in the Control Panel - allows you to see statistics about each index in addition to being able to rebuild each index.

But the analytics index is not listed. This is because the Sitecore client tools exclude any index in the group experience. The group is specified as a parameter on the index itself. You can see this setting in line 11 in the excerpt from the file Sitecore.ContentSearch.Lucene.Index.Analytics.config:

01.<?xml version="1.0" encoding="utf-8" ?>
02.<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
03.  <sitecore>
04.    <contentSearch>
05.      <configuration type = "Sitecore.ContentSearch.ContentSearchConfiguration, Sitecore.ContentSearch">
06.        <indexes hint="list:AddIndex">
07.          <index id ="sitecore_analytics_index" type = "Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider">
08.            <param desc="name">$(id)</param>
09.            <param desc="folder">$(id)</param>
10.            <param desc="propertyStore" ref = "contentSearch/indexConfigurations/databasePropertyStore" param1="$(id)" />
11.            <param desc="group">experience</param>
12.            <configuration ref = "contentSearch/indexConfigurations/defaultLuceneIndexConfiguration">
13.              <fieldMap ref = "contentSearch/indexConfigurations/defaultLuceneIndexConfiguration/fieldMap">
14....

If you remove the parameter that sets the group the analytics index will appear in the Sitecore client.

However, you don't want to do this! The analytics index was excluded for a reason.

The crawlers configured for the analytics index are observers. Data is pushed to them. The rebuilding process - as it is implemented in the Sitecore client tools - depends on crawlers that can locate their own data.

As I explained in my previous post, the analytics crawlers are waiting to be given data to index. Using the Sitecore client tools to rebuild the analytics index will result in the data currently in the index being removed, but not replaced. You will end up with an empty index.

I can think of 2 solutions to this problem:

Option 1. Extend the client tools

Extend the client tools so the analytics index is included and ensure the proper logic is used to rebuild the analytics index. This is the elegant solution, but not one I want to implement. It's not worth the effort since I'm sure this functionality will be added to the product soon enough.

Option 2. Create a quick-and-dirty workaround

This fits the main criteria for a solution I'm developing on a Saturday afternoon while my wife is at a college football game (8 words I never thought I'd use in that particular combination): it's quick and easy.

The Solution!

The solution is to do the following:

  1. Reset the analytics index - This removes all of the current data in the index.
  2. Read all of the interactions from the analytics database - This involves reading the documents from the interactions collection in MongoDB.
  3. Add each interaction to the processing pool - When the visitor's session ends, the interaction that represents the visit is added to the tracking database in MongoDB. This usually happens when the submitSessionContext pipeline runs. I don't need to run the entire pipeline. I just need this step in order for the aggregation process to pick up the interaction.

The following code can be run from LINQPad (using the LINQPad Driver for Sitecore) or from an ASPX page:

ContentSearchManager.GetIndex("sitecore_analytics_index").Reset();
var poolPath = "aggregationProcessing/processingPools/live";
var pool = Factory.CreateObject(poolPath, true) as ProcessingPool;
var driver = MongoDbDriver.FromConnectionString("analytics");
var visitorData = driver.Interactions.FindAllAs<VisitData>();
var keys = visitorData.Select(data => new InteractionKey(data.ContactId, data.InteractionId));
foreach(var key in keys)
{
    var poolItem = new ProcessingPoolItem(key.ToByteArray());
    pool.Add(poolItem);
}

For readability I omitted the namespaces from the code above. So be sure to include the following:

  • Sitecore.ContentSearch
  • Sitecore.Configuration
  • Sitecore.Analytics.Processing.ProcessingPool
  • Sitecore.Analytics.Data.DataAccess.MongoDb
  • Sitecore.Analytics.Model

Conclusion

This logic could be packaged into something like a custom button that is added to the Sitecore client. I would urge you to think carefully before doing something like that, however. Like I mentioned earlier, I'm sure this functionality will be added to Sitecore soon. You might want to consider accepting this solution as a work-around for the short term until that happens.

More importantly, I think, this information allows you to add data directly to xDB and get it indexed. This should be useful if you're need to generate test data.

Enjoy!

  • Sitecore Product Support informed me that there's a supported way to rebuild the analytics index: by rebuilding the reporting database. This is mentioned in the release notes.  Hopefully there's still some value in understanding how the index-building process works!

  • I found this code to be incredibly useful for an analytics customization I am working on. Thank you for sharing your work!

  • I found this code to be incredibly useful for an analytics customization I am working on. Thank you for sharing your work!

  • Hi Adam, this was a terrific post.  I have a background agent that needs to update organizational data we are storing in a custom contact facet.  The organizational data can be updated at any time and I need to force the contact to get reindexed after the contact is updated.  The code below works in that it causes the contact to get reindexed but I'm not sure if there are unintended consequences by using it.  Any advice?  Thanks in advance!

    void Main()

    {

    var driver = MongoDbDriver.FromConnectionString("analytics");

    IMongoQuery query = Query<MongoContact>.EQ<Guid>(data => data._id, new Guid("<contact ID goes here>"));

       var contactData = driver.Contacts.FindOneAs<MongoContact>(query, ExceptionBehavior.ThrowException);

       contactData.Identifiers.Identifier.Dump();

    var poolPath = "aggregationProcessing/processingPools/contact";

       var pool = Factory.CreateObject(poolPath, true) as ProcessingPool;

       if (contactData != null && pool != null)

       {

           var poolItem = new ProcessingPoolItem(contactData._id.ToByteArray());

    poolItem.Properties.Add("Reason", "Updated");

           pool.Add(poolItem);

       }

    }

    public class MongoContact

    {

    public Guid _id

    {

    get;set;

    }

    }