LibrarySites.Banner

Sitecore 7 Performance Tuning Part 3

Performance Tuning Part 3: Storing and Indexing

Sitecore 7 comes with an index that is tuned towards using the out of the box fields. Obviously, over time you will add new fields and templates to your site and want to make sure that your index is more tuned to these additions. We made a design decision in the default configuration that if we did not display the field in the UI then we would not store it in the index. However, not storing it in the index does not mean that I cannot query by that field. Enter, STORED and INDEXED, two terms that will help you tune you Sitecore 7 solution for performance, size of index and maintainability.

When setting up your field mappings in configuration you need to ask yourself two simple questions per field.

1: Do I want Sitecore to be able to retrieve the raw value of this field from the index or do I plan on getting the value from the database.

2: Do I want to be able to query by this field

Your answer to the first question will be relevant to the decision you make on marking this field as STORED or not. You have two possible settings for this at either the field or field type level. If you set this setting at the field type level then it will apply this to any field based of that field type. If you do it at the field name level then this allows you to override anything set at the field type level.

Store.Yes

  • Means that the value of the field will be stored in the index.
  • If you are using the new LINQ to Provider API in Sitecore then your values will be automatically mapped to your properties.
  • Your index will be large in size

Store.No

  • Means that the value of the field will NOT be stored in the index.
  • If you are using the new LINQ to Provider API in Sitecore then your values will not be automatically mapped to your properties and it will be left up to you to get and set them.
  • Your index will be smaller in size than if you set this property to YES
   //This will not store any values on any items for the title field BUT you will be able to query by the title field
   <field fieldName="title" storageType="NO"  indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider" />

   //This will store any values on any items for the title field and you will be able to query by the title field
   <field fieldName="text"  storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider" />

By default, in Sitecore 7 we have put every single field type to "NO". Why? We do not want to pressure developers to use the new framework and in turn will ask developers to enable what they want when they are ready to use the new API. We have enabled only some fields at the field name level due to the use of it within the Sitecore Content Editor UI.

We also have another important decision to make and that is if we plan on querying by a field. We have many options here including:

Index.Tokenized

The field will be run through the designated Analyzer and in turn may be tokenized when it's indexed. There are many Analyzers available and it is up to you to specify which Analyzer is best for your fields. We have set some by default based off best practices.

Index.Un_Tokenized

The field will not be run through the designated Analyzer and will be stored as a single value.

Index.No

The field will not be indexed and therefore cannot be queried by. You can use Index.No along with Store.Yes to store a value that you don't want to be queryable.

Index.No_Norms

Same as Index.Un_Tokenized except for that a few bytes will be saved by not storing some Normalization data. This data is what is used for boosting and field-length normalization.

** Original Source - http://stackoverflow.com/questions/650643/lucene-indexing-store-and-indexing-modes-explained

    //This will not store any values on any items for the title field BUT you will be able to query by the title field. This field will not be Analyzed and hence will go into the inverted index as is.
   <field fieldName="title" storageType="NO"  indexType="UN-TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider" />

   //This will store any values on any items for the title field and you will be able to query by the text field. If you searched by any token in this value then you will get results.
   <field fieldName="text"  storageType="YES" indexType="TOKENIZED" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider" />

    //This will store any values on any items for the title field and you will be able to query by the text field. This field will not be Analyzed and hence will go into the inverted index as is. It will also not store any normalization information such as boost levels.
   <field fieldName="text"  storageType="YES" indexType="NO_NORMS" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider" />

    //This will store any values on any items for the title field BUT you will NOT be able to query by the text field
   <field fieldName="text"  storageType="YES" indexType="NO" vectorType="NO" boost="1f" type="System.String" settingType="Sitecore.ContentSearch.LuceneProvider.LuceneSearchFieldConfiguration, Sitecore.ContentSearch.LuceneProvider" />

Please review part 1 and 2 of this blog series before implementing the changes above. In particular pay attention to the simple of rule of "Do you actually need to performance tune?".

Dev Team

  • I want to get one clarification that if we are using Store.No, it basically means that these fields are stored in Index (Which I can see in Index using Luke) but are not automatically mapped but you need to get them. So how does it decrease the size of Index.

  • Can someone clarify what the exact wording is for having the index type as untokenized please?  I've got an existing config file with lots of "UNTOKENIZED" but I see from this blog post and the linked StackOverflow answer that the string is "Un_Tokenized". To further add to the confusion, this blog post then gives an example with the string "UN-TOKENIZED". So which is it?  a) "UNTOKENIZED" b) "Un_Tokenized" c) "UN-TOKENIZED"  ?  Thanks