Replacing Data During Publication with the Sitecore ASP.NET CMS

This blog post explains how you can use the publish replacer to change text in field values when you publish items in the Sitecore ASP.NET web Content Management System (CMS). The Replacers? and Search and Replace content during Publish threads on the Sitecore Developer Network (SDN) forums prompted me to write this blog post. For more information about publishing, see Sitecore Publishing Operations on SDN.

Overview of Replacers and Replacements

You can use the publishing replacer for any field values that should differ between the Master database and the publishing target database(s). For example, consider external links to other systems. You may want CMS users to insert links using hostnames that correspond to development, test, or other internal systems, and have the system transform those references to the production hostnames during publication to the production content delivery environment. In reality, I have more often seen replacers used to address various defects in Sitecore, most or all of which I think Sitecore subsequently resolved.

Before explaining how use the publishing replacers, consider the difference between Sitecores definitions of the terms replacers and replacements in this context:

  • Replacements are individual transformation configurations, such as mapping one hostname to another.
  • A replacer applies replacements; the publishing replacer is a single class that applies some number of replacements during publication.

Sitecore provides a default replacer (Sitecore.Text.Replacer ) and two default types of replacements, each of which derives from the Sitecore.Text.Replacer.Replacement abstract base class:

  • Simple replacements transform one static value to another. The Sitecore.Text.Replacer. SimpleReplacement inner class implements default simple replacements.
  • Regex replacements transform tokens matching a given regular expression with a static value. The Sitecore.Text.Replacer.RegexReplacement inner class implements default regex replacements.

Sitecore invokes the publishItem pipeline to publish each item. For more information about pipelines, see the blog post All About Pipelines in the Sitecore ASP.NET CMS. For more information about the publishItem pipeline, see the blog post Intercept Item Publishing with the Sitecore ASP.NET CMS.

The PerformAction processor in the publishItem pipeline invokes the publishing replacer. Specifically, the constructor for the Sitecore.Publishing.PublishOptions class used by publication passes "publish" to the Sitecore.Configuration.Factory.GetReplacer(), causing the configuration factory to create configure an instance of the class specified by the type attribute of the /configuration/sitecore/replacers/replacer element in the Web.config file with a value of publish for the id attribute. For information about the configuration factory, see the blog post The Sitecore ASP.NET CMS Configuration Factory. In other words, it is possible to use replacers in contexts other than publication.

By default, the value of that type attribute is Sitecore.Text.Replacer, which is the default replacer class. By default, the value of the mode attribute of the /configuration/sitecore/replacers/replacer with a value of publish for the id attribute is off. To enable this replacer, change the value of this mode attribute to true.

The contents of that /configuration/sitecore/replacers/replacer element in the Web.config file specify any number of simple and regex replacements using <simple> and <regex> elements, respectively, nested within the <replacements> element.

For each <simple> element, the find attribute specifies text to match and the replaceWith element specifies characters with which to replace that text. The ignoreCase element controls whether Sitecore matches the find attribute with character case sensitivity.

For each <regex> element, the find attribute specifies a regular expression to match and the replaceWith element specifies characters with which to replace tokens that match that regular expression. The ignoreCase element controls whether Sitecore evaluates the regular expression with character case sensitivity. Because regular expressions can be expensive, if the <regex> element includes the simpleTest attrbitute, Sitecore uses System.String.IndexOf to check for that value before applying the regular expression replacement.

Both <regex> and <simple> elements support a forPublish attribute. If the value is true, Sitecore increments the Publishing.Replacements performance counter in the Sitecore.Jobs category.

Using Replacements

Other than understanding the attributes of the <regex> and <simple> elements, you do not need to understand much of the explanation in the previous section just to use publishing replacers. What you need to do is set the mode attribute of the /configuration/sitecore/replacer element in the Web.config file with a value of publish for the id attribute to on and add your own /configuration/sitecore/replacer/replacement/simple and/or /configuration/sitecore/replacer/replacement/regex elements within that <replacer> element. You can do this with a web.config file such as this example, which enables the replacer and moves the default <simple> and <regex> examples to this Web.config include file to give you a starting place (be sure to remove any examples that you do not use). For more information about Web.config include files, see the blog post All About web config Include Files with the Sitecore ASP.NET CMS.

Implementing Your Own Replacements

The process is not exactly trivial because Sitecore apparently did not intend for it, but you can implement your own replacements. You might implement a replacer for example if you need to determine the string with which to replace the token at runtime rather than specifying it in the Web.config file.

Creating a replacement is simple:

  1. Write a class that inherits from the Sitecore.Text.Replacer.Replacement abstract base class.
  2. Implement a constructor that accepts a System.Xml.XmlNode that represents the element in the Web.config file that defines the replacement. This step is technically optional, but almost any replacer that is not entirely hard-coded requires some configuration.
  3. Implement the IsEmpty() method to indicate whether the system should process the replacement and the Replace() method to perform the replacement.

The challenge is that you need to add your replacement to the replacer, which means mapping the element you use to configure that type of replacement (similar to <simple> and <regex>) to the class that implements the replacement. Unfortunately, the default replacer uses a private variable to store the list of replacements, and hard-codes the mapping of element names to replacement classes. This requires you to override the replacer, such as by creating a class that inherits from the default implementation (Sitecore.Text.Replacer). In that class:

  1. Override the constructor that accepts a single string argument to call the constructor in the base class.
  2. Override the AddReplacement() method to add your type of replacements to your own list of replacements (typically by calling methods that add those types of replacements to your list) or to call the corresponding method in the base class for other replacement types.
  3. Implement the Replace() method to process your list of replacements and call the corresponding method in the base class.
  4. Implement the IsEmpty() method to return true if your list of replacements is empty and the value of the IsEmpty property in the base class is true.

This untested example implements a replacement that transforms a token such as $random to a random number and replacement that uses <random> elements to configure that type of replacement, and includes a Web.config include file to enable that replacement.

Miscellaneous Details

I did not confirm, but would assume that Sitecore invokes replacements in the order they appear in the Web.config file. This might be important if you have two replacements that transform the same value, or if one replacement generates values that another replacement might transform. This could also affect the way you implement a custom replacer; because you cannot add your replacements to the default ordered list of replacers, you may want your replacer to process your replacements before the base class processes its replacements, or afterwards, or you may want to override the components of the base class that define and use the private variable so that you can add all replacements to a single list. The example provided with this post applies its own types of replacements before the default replacements (its Replace() method applies its replacements and then calls Replace() in the base class).

It appears that the GetPublishedVersionOfItem processor in the filterItem pipeline invokes the replacer used by publishing. This affects managed sites for which the filterItems attribute in the corresponding /configuration/sitecore/sites/site element is true (also known as live mode). For more information about the filterItems attribute, see the comments above the /configuration/sitecore/sites element in the Web.config file. For information about managed sites, see the blog post Managed Web Sites in the Sitecore ASP.NET CMS. For information about live mode, see Live Mode on SDN, but note also the /App_Config/Include/LiveMode.config.example sample Web.config include file distributed with Sitecore CMS to easily enable live mode (rename without the .example extension). Note that I do not personally recommend live mode.

Replacements and replacers have no awareness of the context in which they run, such as the item or field they transform. They operate as simple filters on all field values without any knowledge of the fields that contain those values or the items that contain those fields.

According to the SDN forum thread HTML editor links - invalid xhtml, you may need to escape the ampersand character (&) in the find attribute of <regex> replacement elements with &#38. You may need to escape other characters (I expect quotes characters (" and ') and angle braces (< and >) in a similar manner.

Excessive and expensive replacements could affect publishing performance. Replacements work, so if you accidentally configure them to replace data that the system should not transform, you may experience unexpected results.

Alternatives to Replacements

Sitecore provides a number of facilities that support replacement that may be more appropriate than the publishing replacer appropriate for various requirements. For example:

  • You can use events, pipeline processors, and the rules engine to alter field values when users save items in the Master database. For information about these approaches, see Intercepting Item Updates with Sitecore.
  • Implement a publishItem pipeline processor to perform transformations.
  • You can use a renderField pipeline processor to transform field values at runtime. One disadvantage of this approach is that it works only for fields values that you use the renderField pipeline to render, including the FieldRender web control. Another disadvantage of this approach is that it performs substitutions each time you access the field value rather than once on publication, though you can mitigate this cost somewhat by caching the output of such renderings. For examples of renderField pipeline processors, see the blog post Important Pipelines in the Sitecore ASP.NET CMS. For information about caching the output of renderings, see the blog post How the Sitecore ASP.NET CMS Caches Output.
  • To control URLs, which by default depend on item names rather than field values, you can use events, pipeline processors, and the rules engine to control item names. For an example of a solution that uses the rules engine to control item names, see Use the Sitecore Rules Engine to Control Item Names. You can also update the /configuration/sitecore/encodeNameReplacements section of the Web.config file to substitute characters in item names when generating URLs.