LibrarySites.Banner

Sitecore 7: Computed Index Fields

This blog post explains how you can add computed fields to search indexes in version 7 of the Sitecore ASP.NET web Content Management System (CMS). Computed fields allow you to index values calculated while indexing, such as the URL of each item. Before you read this blog post, please read the Sitecore 7: Introduction blog post linked in the list of resources at the end of this page.

Adding fields to an index can improve runtime performance by making data available in the index rather than requiring a visit to a data source, such as an item in a Sitecore database. One tradeoff involved in adding fields to an index is that each such index field increases the weight of that index, meaning the amount of resources required to generate and store the index (for example, processing time and disk space).

At this point it might be valuable to indicate and differentiate at least three definitions of the term field in the context of Sitecore development:

  • In .NET programming, a field is a variable of any type declared directly in a class or struct (structure).
  • In Sitecore, fields contain values that constitute most of the data that makes up an item.
  • In common search index terminology, a field is a discrete indexed value. In the context of Sitecore, search index fields often correspond to fields in items, where indexed documents correspond to items. Search indexes can contain documents that do not correspond to items. In a search index, many fields available for documents that correspond to Sitecore items correspond directly to the fields in those items, but some fields in the index for such documents have no relation to fields in those items.

It might also be worthwhile to mention that search indexes typically have no schema. In other words, you can think of a document as a flat list of named field values, where any document can contain any fields. This makes it very easy to add fields to the index, without the need to update a database schema or even a Sitecore data template.

Sitecore 7 ships configured to index a number of fields. In fact, one objective of this version is to reduce the use of the Sitecore.Data.Items.Item class by allowing developers to retrieve data directly from the index. 

In content delivery environments, presentation components that use APIs to access search indexes often need to limit results by excluding items that do not have URLs. For example, a search results page should not contain links to items that Sitecore cannot render as pages. 

The code and configuration in this blog post adds a field named hasurl to the index. That field contains a Boolean value that indicates whether a document (item) has a URL.

To use this code, your Visual Studio project should reference the new Sitecore.ContentSearch.dll assembly shipped with Sitecore 7 (in the Website/bin subdirectory of your Sitecore installation). Remember to set the Copy Local property of the reference to false. I assume your project already references the Sitecore.Kernel.dll assembly.

To code a computed field, create a class that implements the Sitecore.ContentSearch.ComputedFields.IComputedIndexField interface. This interface requires that your class implement simple string properties named FieldName and ReturnType, but more importantly, a method named ComputeFieldValue(). This method accepts an argument that implements the Sitecore.ContentSearch.IIndexable interface, which specifies the data to index, and returns an object, which represents the value for the field. In the case of Sitecore items, this interface abstracts the underlying Sitecore.Data.Items.Item object. We can retrieve the item (content, media, or other) from the Sitecore.ContentSearch.IIndexable argument passed to the ComputeFieldValue() method. 

Here is some sample code for adding a computed Boolean field to the index to indicate whether each document has a URL:

namespace Sitecore.Sharedsource.ContentSearch.ComputedFields
{
  using System.Linq;
 
  using Assert = Sitecore.Diagnostics.Assert;
  using Log = Sitecore.ContentSearch.Diagnostics.CrawlingLog;
 
  using SC = Sitecore;
 
  public class HasUrl : Sitecore.ContentSearch.ComputedFields.IComputedIndexField
  {
    public string FieldName { get; set; }
 
    public string ReturnType { get; set; }
 
    public object ComputeFieldValue(Sitecore.ContentSearch.IIndexable indexable)
    {
      Assert.ArgumentNotNull(indexable, "indexable");
      SC.ContentSearch.SitecoreIndexableItem scIndexable =
        indexable as SC.ContentSearch.SitecoreIndexableItem;
 
      if (scIndexable == null)
      {
        Log.Log.Warn(
          this + " : unsupported IIndexable type : " + indexable.GetType());
        return false;
      }
 
      SC.Data.Items.Item item = (SC.Data.Items.Item)scIndexable;
 
      if (item == null)
      {
        Log.Log.Warn(
          this + " : unsupported SitecoreIndexableItem type : " + scIndexable.GetType());
        return false;
      }
 
      // optimization to reduce indexing time
      // by skipping this logic for items in the Core database
      if (System.String.Compare(
        item.Database.Name,
        "core",
        System.StringComparison.OrdinalIgnoreCase) == 0)
      {
        return false;
      }
 
      if (item.Paths.IsMediaItem)
      {
        return item.TemplateID != SC.TemplateIDs.MediaFolder
          && item.ID != SC.ItemIDs.MediaLibraryRoot;
      }
 
      if (!item.Paths.IsContentItem)
      {
        return false;
      }
 
      return item.Database.Resources.Devices.GetAll().Where(compare => compare.ID != SC.Syndication.FeedUtil.FeedDeviceId
        || !SC.Syndication.FeedUtil.IsFeed(item)).Any(compare => item.Visualization.GetLayout(compare) != null);
    }
  }
}

Different implementations may use different logic to determine whether an item has a URL (and therefore the logic probably belongs in a pipeline or provider).

Here is a sample Web.config include file (Sitecore.Sharedsource.IndexHasUrl.config in my case) to add this computed field to all of the new indexes:

<configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">
  <sitecore>
    <contentSearch>
      <configuration>
        <DefaultIndexConfiguration>
          <fields hint="raw:AddComputedIndexField">
            <field fieldName="hasurl" storageType="no" indexType="tokenized">Sitecore.Sharedsource.ContentSearch.ComputedFields.HasUrl,Sitecore.Sharedsource</field>
          </fields>
        </DefaultIndexConfiguration>
      </configuration>
    </contentSearch>
  </sitecore>
</configuration>

As you can see from this configuration, in addition to the class that implements the logic to calculate the value to index, you can can specify whether the index should store the indexed value (useful if you need to retrieve the value from the index as opposed to just using it for matching) and whether to tokenize that value (treat strings containing multiple words as multiple words or entire phrases - for example matching "John West" or "John" separately from "West"). In this example, there is no need to store a value for the hasurl field, as the default implementation of Boolean fields indicate their value.

Because we placed this field definition within the <DefaultIndexConfiguration> element in the Web.config file, all indexes that inherit that configuration (which means all of the new Sitecore 7 indexes) inherit this computed field. Instead of putting logic in the code to ignore elements in the Core database, we could configure the index for the core database to exclude this field. 

Remember that you must re-index the data to cause the new field to appear in the index. I know that the following could use more contextual information, but this blog post is already too long. The following class abstracts documents/items and exposes a HasUrl Boolean property based on this computed index field (Sitecore sets this property for us automatically based on the computed value indexed and the fact that the property name case-insensitively matches the name of the field in the index - field names in the index are lowercase by default):

namespace Sitecore.Sharedsource.ContentSearch.SearchTypes
{
  using SC = Sitecore;
 
  public class SearchResultItem : SC.ContentSearch.SearchTypes.SearchResultItem
  {
    public bool HasUrl { get; set; }
  }
}

You can use code such as the following to retrieve instances of this class representing all documents/items that have a URL in the default index associated with an item (normally you would include additional criteria to limit the results):

SC.Data.Items.Item item = Sitecore.Context.Item;
Assert.IsNotNull(item, "item");
Sitecore.ContentSearch.SitecoreIndexableItem sItem =
  new Sitecore.ContentSearch.SitecoreIndexableItem(item);
 
using (
  Sitecore.ContentSearch.IProviderSearchContext context =
    SC.ContentSearch.SearchManager.CreateSearchContext(sItem))
{
  foreach (SC.Sharedsource.ContentSearch.SearchTypes.SearchResultItem result
    in context.GetQueryable<SC.Sharedsource.ContentSearch.SearchTypes.SearchResultItem>().Where(x => x.HasUrl))
  {
    output.WriteLine(result.Path + " : " + result.HasUrl + "<br />");
  }
}

Resources

  • Hi John  I need to create more than one computed index field in solr and also I need to mark all gui id (item id, template id) as "-" separated as it stored in Sitecore. Because we need to template id is my base of my search query.  Problem which I'm facing is that we return statement as it complete 1 functionality like create one computed index field. Can you please explain how I can acheive this in 1 ComputeFieldValue method.  Thanks and high regards Gaurav

  • @Gaurav: I'm sorry, I don't understand the question. I would not try to override the storage format for any of the index fields provided by Sitecore, as that could interfere with core functionality. I would only try to add new computed index fields, and each should store only a single value. For each computed index field, I expect you would have a class, though they may all share some logic used by ComputeFieldValue() methods, maybe through a common base class or a helper class.

  • How can I attach or specify Analyzer for Computed Fields ?  e.g. Keyword Analyzer from configuration, which emits whole text as 1 token.

  • @Mrunal - In the AddFieldByFieldName part of the config, add your field here as well as the computed field part and then specify the Analyzer of choice like it is done with fields such as "parsedlanguage"

  • @Tim - Great thanks Tim, it works... :)

  • Hi John, I am trying to identify where to put my local custom fields in the index. I previously used DefaultIndexConfiguration which add them on global scale.

  • Hi John,  Thanks for great post!   How to make values to be stored instead of GUID in Index for Tree list type fields. Does it require to implement any code or can be done in configuration (Sitecore.ContentSearch.Solr.Indexes.Config) ?  Thanks, G. Naresh Kumar

  • In Sitecore 7.2 I had to change the config to this:  <configuration xmlns:patch="www.sitecore.net/.../" xmlns:set="www.sitecore.net/.../">   <sitecore>     <contentSearch>       <indexConfigurations>         <defaultLuceneIndexConfiguration type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">           <fields hint="raw:AddComputedIndexField">             <field fieldName="hasurl" storageType="no" indexType="tokenized">Sitecore.Sharedsource.ContentSearch.ComputedFields.HasUrl,Sitecore.Sharedsource</field>           </fields>         </defaultLuceneIndexConfiguration>       </indexConfigurations>     </contentSearch>   </sitecore> </configuration>  Other than that - great post - exactly what I was after!  Thanks,  Owen

  • Hi Vikram,  If you are inheriting SearchResultItem class into another class with property PostID. Then change property as below, you will get result.  [IndexField("postid_s")] public IEnumerable<String> PostID { get; set; }   Thanks, G. Naresh Kumar

  • Hello,  I was actually thinking you should have the following:  [IndexField("post_id")] public IEnumerable<String> PostID { get; set; }  Because of the space in your field name.  Hope that works!   Owen  

  • I am trying to search media image through solr,i have written following computed field class for same  public class ComputedFieldRenderedImage : Sitecore.ContentSearch.ComputedFields.IComputedIndexField     {         public object ComputeFieldValue(IIndexable indexable)         {             Assert.ArgumentNotNull(indexable, "indexable");             var indexableItem = indexable as SitecoreIndexableItem;              if (indexableItem != null)             {                 ImageField img = indexableItem.Item.Fields["Image"];                 return img == null || img.MediaItem == null ? null : MediaManager.GetMediaUrl(img.MediaItem);             }             else             {                 Log.Warn(string.Format("{0} : unsupported IIndexable type : {1}", this, indexable.GetType()), this);                 return null;             }         }          public string FieldName { get; set; }         public string ReturnType { get; set; }     }  I have tried adding it in "Sitecore.ContentSearch.Solr.Indexes.config"  <field fieldName="Image"  returnType="string">          Slb.Bluewater.Ocean.Library.SolrSearch.ComputedFieldRenderedImage,Slb.Bluewater.Ocean.Library.SolrSearch</field>   But   i am not getting image url instead getting only alt text of image.Any pointer on this would be appreciated

  • Hi Pavan  Did you try to debug and see what MediaManager.GetMediaUrl(img.MediaItem) is returning? Also, are you checking right field "image_s" in index?  Thanks, G. Naresh Kumar

  • Hi Pavan,  Try to typecast field as ImageField as below  ImageField img = (Sitecore.Data.Fields.ImageField)indexableItem.Item.Fields["Image"];  Thanks, G. Naresh Kumar

  • Hi John, Thanks for another helpul post. Your posts have helped me on several occasions in past. Hope this will too.  In our application:-  -We need to show brand details in product pages. -Each Product might belong to multiple bussinesses. -Multiple products can point to the same bussiness. -Product has a facet field and we need to show faceted search result list of the product page items.   To acheive this: We have created products(as page items) and business(as content items) items in CMS. Products items have a multilist field to associate the businesses.  My query:- When a user will search for content existing in business templates, the business items are returned. Instead of showing the business items we need to show the urls of the product items where they have been referenced in the multilist. How can we acheive this? In ComputeFieldValue method we can find out where the items are referenced and store the product Urls in the computed field of the business items. The same Urls we can show in the Search result list. But since business items does not have the facet fields how product facets will be shown ?  Sorry if you find my query too long, just wanted to mske sure you understand my requirement.  Thanks Hemant

  • For some reason, the latest comment does not appear on this page. Try here instead:  www.sitecore.net/.../Sitecore 7 Computed Index Fields