LibrarySites.Banner

Clear Output Caches Associated with Publishing Target Databases in the Sitecore ASP.NET CMS

This blog post provides an untested prototype solution that clears the output caches only for the managed sites associated with the target database associated with the publishing event in the Sitecore ASP.NET web Content Management System (CMS).

Update 17.December.2012: For updates to the code provided in this blog post, see the last blog post linked at the end of this page.

Yesterday I blogged about some code for a publish:end and publish:end:remote event handler that determines the managed sites for which to clear output caches with logic rather than relying on data passed to the configuration factory (see the blog post linked at the bottom of this page for more information). Overnight I realized that the Sitecore approach to clearing output caches, including the code I posted, can clear output caches unnecessarily in solutions that involve multiple publishing target databases. The update described in this blog post provides an updated prototype for an event handler addresses that avoids some of this unnecessary cache clearing.

I can think of three cases that could involve multiple publishing target databases:

  • To meet a significant load, some number of Sitecore instances in the load-balanced content delivery environment access one publishing target database while others access a different publishing target database. When such a solution publishes to both publishing targets, it raises cache clearing events twice, which clears the output cache unnecessarily once on each instance (and once more for each additional publishing target database).
  • To provide a User Acceptance Testing (UAT) or other pre-production environment, the solution publishes to a pre-production publishing target before final approval, after which it publishes the same content to the production Content Delivery (CD) publishing target. When such a solution publishes to only one of these publishing targets, Sitecore unnecessarily clears the output caches in both environments.
  • The Content Delivery (CD) environment uses separate publishing target databases for different managed sites. When such a solution publishes to one of those targets, Sitecore unnecessarily clears output caches for the managed sites associated with the other publishing target (and for each additional publishing target database).

Some solutions may implement more than one of these approaches, such as different publishing target databases for different managed sites and a pre-production publishing target.

The updated code that follows includes a method named GetTargetDatabase() that attempts to determine the name of the publishing target database associated with the event. For the publish:end event, Sitecore passes an instance of the Sitecore.Events.SitecoreEventArgs class to the event handler; for the publish:end:remote event, Sitecore passes an instance of the Sitecore.Data.Events.PublishEndRemoteEventArgs class. If for any reason this method cannot determine the name of the publishing target database, it returns null. If the GetTargetDatabase() method determines a database, updated logic in the ClearCaches() method clears output caches only for sites associated with that database.

namespace Sitecore.Sharedsource.Publishing
{
  using System;
  
  using SC = Sitecore;
  
  public class HtmlCacheClearer : SC.Publishing.HtmlCacheClearer
  {
    public void ClearCaches(object sender, EventArgs args)
    {
      SC.Diagnostics.Assert.ArgumentNotNull(sender, "sender");
      SC.Diagnostics.Assert.ArgumentNotNull(args, "args");
      string[] siteNames;
  
      if (this.Sites != null && this.Sites.Count > 0)
      {
        siteNames = (string[])this.Sites.ToArray();
      }
      else
      {
        siteNames = SC.Configuration.Factory.GetSiteNames();
      }
  
      string targetDb = this.GetTargetDatabase(args);
      string dbString = targetDb == null ? string.Empty : " for sites associated with " + targetDb;
      SC.Diagnostics.Log.Info(
          this + " : clearing HTML caches" + dbString + "; " + siteNames.Length + " possible sites.",
          this);
  
      foreach (string siteName in siteNames)
      {
        SC.Diagnostics.Assert.IsNotNullOrEmpty(siteName, "siteName");
        SC.Sites.SiteContext site = SC.Configuration.Factory.GetSite(siteName);
        SC.Diagnostics.Assert.IsNotNull(site, "siteName: " + siteName);
  
        if (!site.CacheHtml)
        {
          SC.Diagnostics.Log.Info(this + " : output caching not enabled for " + siteName, this);
          continue;
        }
  
        if (targetDb != null 
          && site.Database != null
          && targetDb != site.Database.Name)
        {
          SC.Diagnostics.Log.Info(this + " : " + targetDb + " not relevenat to " + siteName, this);
          continue;
        }
  
        SC.Caching.HtmlCache htmlCache = SC.Caching.CacheManager.GetHtmlCache(
          site);
        SC.Diagnostics.Assert.IsNotNull(htmlCache, "htmlCache for " + siteName);
  
        if (htmlCache.InnerCache.Count < 1)
        {
          SC.Diagnostics.Log.Info(
              this + " : no entries in output cache for " + siteName,
              this);
          continue;
        }
  
        SC.Diagnostics.Log.Info(
            this + " clearing output cache for " + siteName,
            this);
        htmlCache.Clear();
      }
  
      SC.Diagnostics.Log.Info(this + " done.", this);
    }
  
    private string GetTargetDatabase(EventArgs args)
    {
      SC.Diagnostics.Assert.IsNotNull(args, "args");
      SC.Events.SitecoreEventArgs scArgs =
        args as SC.Events.SitecoreEventArgs;
  
      if (scArgs != null)
      {
        SC.Publishing.Publisher publisher = scArgs.Parameters[0] as SC.Publishing.Publisher;
  
        if (publisher != null
          && publisher.Options != null
          && publisher.Options.TargetDatabase != null
          && !string.IsNullOrEmpty(publisher.Options.TargetDatabase.Name))
        {
          return publisher.Options.TargetDatabase.Name;
        }
      }
      else
      {
        SC.Data.Events.PublishEndRemoteEventArgs pubArgs =
          args as SC.Data.Events.PublishEndRemoteEventArgs;
  
        if (pubArgs != null
          && !string.IsNullOrEmpty(pubArgs.TargetDatabaseName))
        {
          return pubArgs.TargetDatabaseName;
        }
      }
  
      return null;
    }
  }
}

You can use the Web.config include file provided in the previous blog post to enable this handler.

One known caveat to this solution is that the names of the database connections (the values of the id attributes in the /configuration/sitecore/databases/database elements in the Web.config file) must match between all environments. For example, this solution would not work if you name a publishing target database "pub" in the Content Management (CM) environment and name that same database “content” in the Content Delivery (CD) environment.

Resources

  • Hey John,  What about implementing smart partial html cache clearer instead? It would be appreciated by many customers and their CDs.   Best Regards, Alen

  • In 6.3 and subsequent releases, you get partial cache clearing for the data caches by default; Sitecore only entirely clears the output caches. I have no objections to partial cache clearing for the output caches, but I don't see how you could implement it - how would you determine which entries to clear from the output caches? I mean, even if the event handler could somehow determine which items the publishing operation created, updated, deleted, renamed, moved, etc., how would it know which entries in the output caches depend on data in those items?

  • I'm not sure about possibility to do it as a custom event handler only, but it will definitely work if you hook the rendering process for collecting the IDs of involved items and then storing these IDs in the cache entry as well. It is not a piece of cake, of course, otherwise it was implemented years ago, but it is not so hard as everybody expects.

  • ...Continued.  I am not sure if there are limits on the length of the cache key, which I believe would require storage of the item IDs on which an output cache entry depends in some other place, such as a relational database. Then there is the overhead of looping through the items published and the the cache keys (or the database records, etc.) to determine which cache entries to remove. At some point I expect such overhead would exceed the performance improvements achieved by caching.  It's quite possible that I am overcomplicating the picture and there is a solution, but I always try to consider obscure cases and look for ways to break solutions.

  • There are cases where it is straightforward, but cases where it may not be realistic, and any solution must cover all cases. I assume you want to put the IDs of the items on which any rendering depends into the cache keys for those renderings.  Remember that cached output contains not only data from items, but paths to other items. For example, a rendering that iterates over the children of some other item and creates links to each. If the user moves, renames, or deletes any of those items, the cache entry for that rendering is invalid. The same holds if they move or rename the grandparent of those items, which might not result in any record of a change to those individual linked children or their common parent. If the rendering uses the ID of the parent to determine which children to iterate, such a move would not affect the rendering itself, but only its output. It is unlikely that a partial cache clearing solution would have added the grandparent of that item to the cache key, or that it would contain logic to say "if any ancestor of an item on which a cache entry depends moves or experiences a name change, such output cache entries are invalid". I am not sure if such cases are obscure, and of course one solution is for the developer to not to cache the output of such presentation compoments. But there are potnential complexities to consider. The same holds for media - if you update one, it doesn't affect the output cache, but if you move one, it does (because links to the media could exist in cached <a> and <img> elements). And links and images can exist in RTE fields, so the IDs of those items also belong in the cache key. I believe you have the same issue with Image fields, File fields, Multilist fields, and even Droplink and Droplist fields (any fields that allow the CMS user to select another item). It would take some CPU cycles to determine and store those IDs, and again, the parents of those items could move or experience name updates.  Continued...

  • John,  It looks like Sitecore does a publish and consequentially a clear of the caches for each language, which also results in having the caches cleared 3 times if you have 3 languages you publish, is that correct?  Erwin

  • @Erwin:   Good point; I am pretty sure you are correct. There is probably a way to work around that, such as delaying the event that causes the clearing until after publishing all languages, or using some other technique to clear the caches. Unfortunately I will not have time to investigate in the foreseeable future.  Best regards,