LibrarySites.Banner

Sitecore Output Cache Clearing Optimization (1/8): Introduction | John West | Sitecore Blog

This entry is the first in a series of blog posts about an approach that attempts to minimize the number of output caches cleared after publishing in the Sitecore ASP.NET web Content Management System (CMS) and Experience Platform (XP). For more information, see the Resources section at the end of this blog post.

Output caches contain the output of renderings, which can include markup, JavaScript, CSS, JSON, or any other type of data. If the cacheHtml property of a managed site (the cacheHtml attribute of its configuration/sitecore/sites/site element in the web.config file by default) is true, then Sitecore can maintain an output cache for that site. You must set the Cacheable property and potentially select the Clear on Index Update and/or VaryBy properties as appropriate for each rendering.

In general, the more output you can cache, the better your site will perform. The fewer and more general the criteria by which you cache that output, the less memory that cache will consume. You may even wish to implement most or all dynamic aspects of your solution with AJAX and other techniques rather than generating HTML dynamically. In large scale solutions, output caching can actually reduce hardware and hence licensing requirements.

By default, Sitecore uses event handlers to clear output caches. These handlers clear output caches after publishing completes and after search indexes rebuild. The disabled HtmlCacheClearAgent in web.config provides an alternative to this event-based approach, and as in this custom solution, you can invoke the relevant APIs to clear caches as needed.

Sitecore actually provides two handlers for the two different types events. For the publishing:end and publishing:end:remote events, the HtmlCacheClearer event handler clears the output caches for the sites specified in the event handler definition. For the indexing:end and indexing:end:remote events, the IndexDependentHtmlCacheManager event handler trawls the output caches for all managed sites to remove entries with cache keys that contain "_#index", which Sitecore includes in cache keys when you set the Clear on Index Update property of a rendering.

This implementation is somewhat inconsistent: in the case of publishing we must specify the sites; in the case of index rebuilds the handler processes output caches for all sites automatically. Additionally, there is room for optimization:

  • Publishing to one target databases should not clear output caches for sites that use other publishing targets.
  • Publishing content in a language that is irrelevant to a managed site should not its output cache.
  • Publishing an item should not clear output caches associated with sites for which that item is irrelevant.
  • In solutions that use additional techniques such as scheduled agents to clear caches, nothing prevents excessively frequent clearing of output caches (once for each publishing event).

Especially considering concurrent publishing options introduced in Sitecore 7.2, it did not look very easy to intercept every possible point that can trigger publication. As a hedge, I am sorry to say that I implemented a static class used by the publishItem pipeline and a custom event handler. My untested prototype includes:

  • A replacement for the default publish:end, publish:end:remote, indexing:end, and indexing:end:remote event handlers, which reduces the output caches cleared and (in the case of publish:end) raises the custom: publish:end:remote event in order to pass custom parameters.
  • A publishItem pipeline processor that uses the static class to track information about the items published.
  • A clearOutputCaches pipeline, to implement output cache clearing.
  • A scavengeOutputCacheKey pipeline, to implement output cache scavenging.
  • A custom outputCacheMinimimInterval attribute for managed sites (/configuration/sitecore/sites/site elements in the web.config file) to specify minimal intervals for between clearing the output cache for this site.
  • A custom clearOutputCacheAfterPublishingLanguages attribute for managed sites (/configuration/sitecore/sites/site elements in the web.config file) to specify minimal languages relevant to the site as a pipe-separated list.
  • An initialize pipeline processor to configure remote event management.

This solution depends on the site definitions in the content management environment matching those in the content delivery environment. Specifically, to determine the managed sites associated with an item, it matches the paths of published items against the attributes of the managed sites that indicate the start item in the publishing environment, as well as the cacheHtml attribute of those site definitions.

The following diagram shows the solution in effect:

diagram of publishing events and handling

On the publishing instance, the TrackPublishing processor intercepts the publishItem pipeline to maintain information about publishing in a ClearCacheOptions exposed by a property of the ClearSiteOutputCaches static class. The publishing process then raises the publish:end event. The OutputCacheClearingEvent event handler on the publishing instance traps the publish:end event and passes the values from the ClearSiteOutputCaches to create the custom:publish:end:remote event. The OutputCacheClearingEvent event handler on the other instances traps the custom:publish:end:remote event and invokes the clearOutputCaches pipeline, which can in turn call the scavengeOutputCacheKey pipeline. Meanwhile, the OutputCacheClearingEventHandler on the publishing instance has likely continued, invoking the same clearOutputCachesPipeline and then resetting the ClearSiteOutputCaches static class. Not shown are the even-less-tested indexing:end and indexing:end:remote events.

Conclusion

This approach could be especially useful in organizations with many managed sites, especially when those sites use different languages and publishing targets. To summarize the features:

  • Avoid clearing or trawling output caches for managed sites associated with languages other than that published.
  • Avoid clearing or trawling output caches for managed sites associated with publishing targets other than that published.
  • Avoid clearing or trawling output caches for sites not associated with the item(s) published, whether explicitly in a field in those item(s) and their ancestors, or implicitly by matching the path to the start item of a managed site associated with such a cache.
  • Avoid clearing or trawling output caches cleared by this process more recently than the interval specified for the managed site associated with that cache.
  • Automatically process all output caches without the need to specify a list of site names as with the default handler that clears output caches for the publish:end and publish:end:remote events.

Additionally, this prototype demonstrates one way to pass custom parameters to custom remote events.

I have not tested this solution and do not expect to explorer it further or maintain this code. If you have a chance to work with it or any suggestions or other feedback, please comment on this blog post. It would be especially interesting to hear if this improves or worsens performance or capacity in any way.

Series Index

Resources