LibrarySites.Banner

Sitecore 7 Commit Policies

Sitecore 7 Commit Policies

There are many stages in the life cycle of the indexing process in Sitecore 7. One of the more important phases is the committing of documents to disk. If we didn't have the commit phase then documents would remain in memory or would remain on disk but in a state that could not be persisted. Both SOLR and Lucene.net support Atomic Commits with full rollback support of a commit.

In fact, both providers are ACID compliant:

Atomicity: when you make changes (adding, removing documents) in an IndexWriter session, and then commit, either all (if the commit succeeds) or none (if the commit fails) of your changes will be visible, never something in-between. Some methods have their own atomic behavior: if you call updateDocument, which is implemented as a delete followed by an add, you'll never see the delete without the add, even if you open a near-real-time (NRT) reader or commit from a separate thread. Similarly if you add a block of documents, using the relatively new addDocuments method, you'll see either none or all of the documents in any reader you obtain.

Consistency: if the computer or OS crashes, or the JVM crashes or is killed, or power is lost, your index will remain intact (ie, not corrupt). Note that other problems, such as bad RAM, a bit-flipping CPU or file system corruption, can still easily corrupt the index!

Isolation: while IndexWriter is making changes, nothing is visible to any IndexReader searching the index, until you commit or open a new NRT reader. Only one IndexWriter instance at a time can change the index.

Durability: once commit returns, all changes have been written to durable storage (assuming your I/O system correctly implements fsync). If the computer or OS crashes, or the JVM crashes or is killed, or power is lost to the computer, all changes will still be present in the index.

**Source: http://blog.mikemccandless.com/2012/03/transactional-lucene.html

Committing (also known as hard commit) is about persistence. This process flushes deletes, ensures data is on stable storage and a few more operations. It also flushes the update log.

Flushing is about visibility and seeing on the disk what the search provider could commit. It does not flush deletes or call file sync. Flushing is part of what allows for NRT (Near Realtime indexing).

Commit policies are the idea of telling the provider when it will finally commit documents into a persistent state so that if the application was to crash or the computer was to power down that you would be able to still search for those documents in the index when the application was started again. Common policies are to commit by document count, time, amount of ram buffered or document size count but in essence you have full control over the logic of when a commit will happen.

<commitPolicy hint="raw:SetCommitPolicy">
  <policy type="Sitecore.ContentSearch.TimeIntervalCommitPolicy, Sitecore.ContentSearch" />
</commitPolicy>
<commitPolicyExecutor hint="raw:SetCommitPolicyExecutor">
  <policyExecutor type="Sitecore.ContentSearch.CommitPolicyExecutor, Sitecore.ContentSearch" />
</commitPolicyExecutor>

There are two parts that make up the Commit and that is the policy and the executor. The policy simply triggers the executor to do its work.

public class TimeIntervalCommitPolicy : ICommitPolicy, IDisposable, ISearchIndexInitializable
{
    private int count;
    private ISearchIndex index;

    //Commit Every 3 Minutes
    private TimeSpan interval   = new TimeSpan(0, 0, 3, 0);
    private DateTime lastCommit = DateTime.Now;

    public void Committed()
    {
        this.lastCommit = DateTime.Now;
        this.count      = 0;
    }

    public void Dispose()
    {
    }

    public void IndexModified(IndexOperation operation)
    {
        this.count++;
    }

    public void Initialize(ISearchIndex searchIndex)
    {
        this.index = searchIndex;
    }

    //The Executor is listening for this to be true
    public bool ShouldCommit
    {
        get
        {
            if (this.count > 0)
            {
                bool flag = this.lastCommit.Add(this.interval) <= DateTime.Now;
                if (flag)
                {
                    CrawlingLog.Log.Info(string.Format("[Index={0}] TimeIntervalCommitPolicy.ShouldCommit - Time Limit Exceeded, lastCommit={1}, count={2}", this.index.Name, this.lastCommit, this.count), null);
                }

                return flag;
            }

            this.lastCommit = DateTime.Now;
            return false;
       }
    }
}   

Let's look at the Commit Executor

public class CommitPolicyExecutor : ICommitPolicyExecutor
{
    public void Committed(IProviderUpdateContext context)
    {
        context.CommitPolicy.Committed();
    }

    public void IndexModified(IProviderUpdateContext context, IndexOperation operation)
    {
        context.CommitPolicy.IndexModified(operation);
        lock (this)
        {
            if (context.CommitPolicy.ShouldCommit)
            {
                context.Commit();
            }
        }
    }
}

Sitecore 7 has set the time-based commit policy for all indexes by default simply because it is recommended by both providers that this is best practice. Currently you can only set one commit policy per index but you could easily create a policy that was a combination of time and document count dependant. One thing you will have to design into your index architecture is that you may want to have separate indexes for handling part of your media library that has large files and have other sharded indexes that commit on a time based schedule. This could be useful if you are storing very large files such as ISO files. Although not available out of the box with Sitecore 7 you may even want to integrate the rules engine to determine when to commit documents to an index. This could involve building conditions such as

  • Commit at certain time intervals
  • Commit at a certain part of the day
  • Commit if there is an upcoming scheduled server shutdown
  • Commit using Global.asax events such as Application_End

The actions would be something like

  • Commit
  • Merge and Commit
  • Commit and Optimize

Our advice is to choose the commit policy that makes sense at an index level and don't just assume that the default is the best for your requirements. For example, one of our MVP's has worked on a site that was indexing 1GB+ files and hence a policy that commits on file size would make much more sense than on document count.

Dev Team