LibrarySites.Banner

Sitecore 7: Index Update Strategies

This blog post contains information about index update strategies in version 7 of the Sitecore ASP.NET web Content Management System (CMS). You can configure and implement indexing strategies to control what causes Sitecore to update data in each search index. Before you read this blog post, please read the Sitecore 7: Introduction blog post linked in the list of resources at the end of this page.

Index update strategies provide a transparent and diverse model for index maintenance. You can apply multiple update strategies to each index, but the default configuration applies a single strategy to each index. It is especially important to avoid configuring multiple similar update strategies for a single index. Most importantly, due to the processing resources required, avoid rebuilding indexes more frequently than needed, which can happen if you choose the wrong set of strategies and do not do anything to prevent frequent rebuilds. Initialization and processing messages from indexing strategies appear in the crawling log.

The implementations of the default strategies exist in the Sitecore.ContentSearch.dll assembly within the Sitecore.ContentSearch.Maintenance.Strategies namespace. The configurations exist within the /configuration/sitecore/contentSearch/indexUpdateStrategies element in the Web.config file (technically, the /App_Config/Include/Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config Web.config include file).

Elements within the /configuration/sitecore/contentSearch/indexUpdateStrategies element in the Web.config file (technically, the /App_Config/Include/Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config Web.config include file) define available indexing strategies. You can get additional details about each indexing strategy from the comments above their definitions, as well as from the parameter values passed through the configuration factory to each strategy and for each index.

  • IntervalAsynchronousStrategy (intervalAsyncCore, intervalAsyncMaster): As configured, periodically check the history engine for updated data to index. At some volume, rebuilding an index can be more efficient than updating it. The default configuration sets the CheckForThreshold property to true, causing a full rebuild of the index if the number of affected items determined from the history engine exceeds the value specified by the ContentSearch.FullRebuildItemCountThreshold setting in the Web.config file. If this setting is absent (as per the standard configuration), its default value is 100,000.Default configuration for the sitecore_core_index index of the Core database (configured in the /App_Config/Include/Sitecore.ContentSearch.Lucene.Index.Core.config  Web.config include file) uses this strategy with an interval of one minute.
  • ManualStrategy (manual): This strategy disables automatic index updating. Any index that uses this strategy requires manual or programmatic updating, although you can update any index manually programmatically. The default configuration does not apply this strategy to any indexes. You should not combine this strategy with any other index rebuilding strategies. This intent of this strategy is for specific cases such as when the entire indexing process occurs on a separate, dedicated instance, meaning that the local instance does not need to perform any indexing operations.
  • OnPublishEndAsynchronousStrategy (onPublishEndAsync): Triggered by the publish:end and publish:end:remote events, this strategy uses the event queue to determine updated data to index incrementally. If there are no entries in the history engine for the database with timestamps after that of the index’s last update, this strategy takes no action. By default, Sitecore 7 enables the event queue as required to use this strategy. The default configuration sets the CheckForThreshold setting of this strategy to true, causing a full index rebuild if the history table indicates more than the configured number of items updated. The default configuration for the sitecore_web_index index of the default publishing target database named web configured in the /App_Config/Include/Sitecore.ContentSearch.Lucene.Index.Web.config  Web.config include file applies this strategy, which is appropriate for publishing target databases.  You should not combine this strategy with the SynchronousStrategy strategyor the IntervalAsynchronousStrategy. You may use this strategy with indexes that use the SwitchOnRebuildLuceneIndex implementation, which indexes to a temporary directory to avoid impacting uses of the index during indexing.
  • RebuildAfterFullPublishStrategy (rebuildAfterFullPublish): This strategy performs a full rebuild of the index after a site publishing or any full publishing event. For a single index, you should not use this strategy conjunction with the SynchronousStrategy strategy, though you may combine it with others. If you use this strategy in conjunction with the OnPublishEndAsync strategy, be sure to register the RebuildAfterFullPublishStrategy before the OnPublishEndAsync strategy. Sitecore investigates the strategies in the order configured. With this order, you use the efficient strategy when possible (after small publishing operations), but not immediately after a full index rebuild.
  • RemoteRebuildStrategy (remoteRebuild): Indexes managed on remote hosts can use this strategy to perform full index rebuilds after full rebuilds complete on other hosts. For example, an index in a content delivery instance could use this strategy to force rebuilds when a user rebuilds an index in the content management environment through the user interface. This strategy subscribes to the indexing:end:remote eventThe default configuration does not apply this strategy to any indexes.
  • SynchronousStrategy (syncMaster): This strategy re-indexes updated data immediately after various events. On initialization, this strategy attaches to events in the low-level data engine to provide almost real-time index updates. In single-instance environments, this strategy guarantees index updates immediately after data changes. In multi-instance environments, this strategy works with the event queue that broadcasts remote events that trigger indexing. This is the most expensive indexing strategy in terms of machine resources and should only be used in limited circumstances. This strategy is appropriate for content management environments, and most likely never content delivery environments, unless the real time index update is absolutely critical You should not combine this strategy with any other strategy except the RemoteRebuildStrategy. The default configuration for the sitecore_master_index index of the Master database (configured in the /App_Config/Include/Sitecore.ContentSearch.Lucene.Index.Master.config  Web.config include file) applies this strategy.

To implement a custom indexing strategy, in your Visual Studio project, create a class that implements the Sitecore.ContentSearch.Maintenance.Strategies.IIndexUpdateStrategy interface, add a definition for that strategy to the configuration, and update one or more index configurations to use that strategy. The IIndexUpdateStrategy interface requires that your class implement the Initialize() method accepting a single argument that implements the Sitecore.ContentSearch.ISearchIndex interface.

Resources

  • Hi John.  I'm in the process of troubleshooting remoteRebuild strategy and wondering if you have any check list available?  I've simulated a CM/CD isolated environment locally with Core replicating to a secondary sql server instance.    For both Web and a sharded portion of Web, I have just the remoteRebuild strategy defined and am using the secondary folder setup for the index type.    My assumption was that indexing from the control panel would raise the remote index end event.  I don't see the strategy getting triggered.  I've even built a custom remote strategy (dotnetpeek-ified) and added additional logging but not seeing the strategy triggered.  Any ideas?

  • Hi John.  I figured it out however it's a little misleading... The "reindex" operation in Control Panel doesn't actually kick off a full reindex which is the requirement for remoteRebuild to actually rebuild an index on a CD web front end.  If however, one uses the Reindex operation from the developer toolbar, a full rebuild triggers and in turn triggers a rebuild on the remote CD front ends.  Is there a way to configure the index actions on the control panel to always initiate a full rebuild?  Thanks again!  -Tim

  • Hi John!   If I'd use IntervalAsynchronousStrategy in a multi-server environment, could be the same index update operations executed by multiple servers?  E.g. "Title" was changed to "Title2" Server A picks up this change, in the same moment also server B picks it up and both perform the update before they saved the index updated date.  Thanks, Tamas

  • Hello,  I have to ask someone with more time and knowledge to look into these questions.  Regards,

  • Hi John  Thanks for the insight. We are having issue in one of our projects, where we have 2 CDs. Published data is available on the web db. But not reflecting on UI. We are not using any cache. There are no errors in log file. When we restart iis, the data appears. We started having this issue when we started adding subsites. Kindly advise.  Our sitecore version is 7.2

  • Hi Kishore, Were you able to find a solution or root cause of this issue? We are on Sitecore 7.5 and we are seeing similar issues in our multiserver environment where Lucene indexes are apparently getting out of sync. Thanks Faisal

  • Hi John, Kishore and Faisal, I am encountering same issue on Sitecore 8. Can someone please suggest what needs to be done or is it an existing bug?