LibrarySites.Banner

Prevent the Sitecore ASP.NET CMS from Interpreting URL Path Prefixes as Language Names

This blog post describes a technique that you can use to prevent the Sitecore ASP.NET web Content Management System (CMS) and Experience Platform (XP) from interpreting the first step in the path in a requested URL as a language.

The StripLanguage processor in the preprocessRequest pipeline attempts to remove the language from the path and rewrite the URL. In some cases, the StripLanguage processor can interpret values that are actually item names as language names. For example, in a default configuration, if you create an item named "da" under the home item, and try to request that item with a URL such as http://instance/da, the StripLanguage processor will interpret "da" as the Danish language and remove it from the URL. When the ItemResolver processor in the httpRequestBegin pipeline later attempts to determine the context item, it will not find "da" in the requested path, and will set the context item to the home item rather than the child of the home item named "da".

Assuming a default configuration, there are at least two potential ways to disable this logic:

  • Do not use anything that Sitecore could interpret as a language as the name of any child of a home item of any managed site. This could include relatively obvious language names such as "en", but also things that you might not know are languages. I would avoid any value that is two characters long or contains a single dash. This might not be acceptable for SEO or other reasons.
  • Set the value attribute of the /configuration/sitecore/settings/setting element in the in the Web.config file named Languages.AlwaysStripLanguage to false, and the languageEmbedding attribute of the /configuration/sitecore/linkManager/providers/add element named sitecore to never. This might not be acceptable if you (sometimes) need to include language names in URL paths.

Another solution could be to override the StripLanguage processor to specify the values that it should interpret as languages (or alternatively, values that StripLanguage should not interpret as languages).

namespace SitecoreJohn.Pipelines.PreprocessRequest
{
  using System.Collections;
  using Sitecore.Web;
 
  public class StripLanguage :
    Sitecore.Pipelines.PreprocessRequest.StripLanguage
  {
    private ArrayList _validLanguages = new ArrayList();
 
    public void AddValidLanguage(string language)
    {
      this._validLanguages.Add(language.ToLower());
    }
 
    public override void Process(
      Sitecore.Pipelines.PreprocessRequest.PreprocessRequestArgs args)
    {
      if (args != null
        && args.Context != null
        && !string.IsNullOrWhiteSpace(args.Context.Request.FilePath))
      {
        string prefix = WebUtil.ExtractLanguageName(
          args.Context.Request.FilePath);
 
        if ((!string.IsNullOrWhiteSpace(prefix))
          && !this._validLanguages.Contains(prefix.ToLower()))
        {
          return;
        }
      }
 
      base.Process(args);
    }
  }
}

You can use a Web.config include file such as the following to enable and configure this processor:

<configuration xmlns:patch="https://www.sitecore.com/xmlconfig/">
  <sitecore>
    <pipelines>
      <preprocessRequest>
        <processor type="Sitecore.Pipelines.PreprocessRequest.StripLanguage, Sitecore.Kernel">
          <patch:attribute name="type">SitecoreJohn.Pipelines.PreprocessRequest.StripLanguage, SitecoreJohn</patch:attribute>
          <allowedLanguges hint="list:AddValidLanguage">
            <en>en</en>
          </allowedLanguges>
        </processor>
      </preprocessRequest>
    </pipelines>
  </configuration>
</sitecore>

To add an allowed language, add another element like <en>en<en>, where the name of the element does not matter (but might as well be the name of the language); the text value within the element is the allowed value.

This approach has some disadvantages:

  • If you add a language, you must update configuration for the StripLanguage processor to allow that language.
  • In solutions that manage multiple logical sites, some languages may be valid for one site, but invalid for another. Because the StripLanguage processor is in the preprocessRequest pipeline, and Sitecore does not determine the context site until the SiteResolver processor in the httpRequestPipeline (which fires after preprocessRequest pipeline), the StripLanguage processor should not access the context site. In other words, because StripLanguage does not know the context site, we cannot use configuration to map managed site names to lists of languages allowed for each. Because of the virtualPath attribute in site definitions, I am not sure that Sitecore could even determine the context site before stripping the language. Depending on requirements, you may be able to implement an alternative, such as mapping domain names (which are available to the StripLanguage processor) to lists of languages allowed for those domains. In such cases, you could pass XML to the StripLanguage processor rather than passing a simple list as in the example provided.

Since I know that *somebody* will eventually ask, I will try it the harder way (mapping domain names to allowed languages). This approach could have issues as well, specifically if one of the managed sites associated with a single domain should strip the language and some should not. The StripLanguage processor records the language stripped from the path in the Sitecore.Context.Data.FilePathLanguage property. I assume that in some cases, you could add that language back to the URL after the SiteResolver processor. I do not know for sure; maybe there is no issue, or maybe there is a better solution.

namespace SitecoreJohn.Pipelines.PreprocessRequest
{
  using System.Collections;
  using System.Xml;
 
  using Sitecore.Diagnostics;
  using Sitecore.Web;
 
  public class StripLanguage :
    Sitecore.Pipelines.PreprocessRequest.StripLanguage
  {
    Hashtable _validLanguages;
 
    public void ConfigureValidLanguages(XmlNode config)
    {
      Assert.IsNotNull(config, "config");
      this._validLanguages = new Hashtable();
 
      foreach (XmlNode domainNode in config.SelectNodes("./*"))
      {
        ArrayList langs = new ArrayList();
 
        foreach (XmlNode langNode in domainNode.SelectNodes("./*"))
        {
          Assert.IsNotNull(langNode.InnerText, "langNode.InnerText");
          langs.Add(langNode.Name.ToLower());
        }
 
        this._validLanguages.Add(domainNode.Name.ToLower(), langs);
      }
    }
 
    public override void Process(
      Sitecore.Pipelines.PreprocessRequest.PreprocessRequestArgs args)
    {
      if (args != null
        && args.Context != null
        && !string.IsNullOrWhiteSpace(args.Context.Request.FilePath)
        && this._validLanguages != null)
      {
        string prefix = WebUtil.ExtractLanguageName(
          args.Context.Request.FilePath);
 
        if (!string.IsNullOrWhiteSpace(prefix))
        {
          Assert.IsTrue(
            this._validLanguages.Contains(args.Context.Request.Url.Host),
            "invalid configuration for " + args.Context.Request.Url.Host);
 
          if (!((ArrayList) this._validLanguages[args.Context.Request.Url.Host]).Contains(
            prefix))
          {
            return;
          }
        }
      }
 
      base.Process(args);
    }
  }
}

The Web.config include file:

<configuration xmlns:patch="https://www.sitecore.com/xmlconfig/">
  <sitecore>
    <pipelines>
      <preprocessRequest>
        <processor type="Sitecore.Pipelines.PreprocessRequest.StripLanguage, Sitecore.Kernel">
          <patch:attribute name="type">SitecoreJohn.Pipelines.PreprocessRequest.StripLanguage, SitecoreJohn</patch:attribute>
          <allowedLanguges hint="raw:ConfigureValidLanguages">
            <configuration>
              <sc150223>
                <en />
              </sc150223>
              <localhost>
                <en />
              </localhost>
            </configuration>
          </allowedLanguges>
        </processor>
      </preprocessRequest>
    </pipelines>
  </sitecore>
</configuration>

Note that without modification, this requires some duplication of configuration (for example, to support both domain.tld and www.domain.tld) and cannot support IP addresses, which are not valid as XML element names (you could store the domain name in an attribute rather than using the element name).

Of course you could do the opposite – rather than listing the allowed languages, you could list prefixes to ignore. My thought is that this would require more frequent configuration updates, as I expect to add languages infrequently, but could add top-level items at any time.

Conclusion

As I tried to indicate above, I expect that this code could result in new issues, especially under various configurations. As always, do not just laugh at my code. Instead, refactor it, test it, and make it work for your solution. This is just a proof of concept.

Resources