Solr – Schema vs. Schema-less

This post looks at a key difference in the Solr search provider implementation when compared to the default Lucene version.

Solr is different from Lucene in that it requires a schema in order to know how to process each field. The schema file (schema.xml - stored as part of the Solr server configuration) must be updated for each field you want to use, this allows a great deal of flexibility in how you tell Solr how to analyse and process information going into those fields.

In the Lucene provider all the different Analyzer settings for each field can be defined as configuration options, for Solr they are kept inside of the Solr server itself.

The problem we faced in the development of version 7 is that Sitecore allows the user to add and remove fields at will within templates. It would be an inconvenient overhead to have to re-generate your schema file every time a new field was added.

To mitigate this issue we use Solr dynamic fields. These can be configured like regular fields but will index any field matching a wildcard pattern. An example of a dynamic field element in the Solr schema.xml is as follows:

<dynamicField name="*_t" type="text_general" indexed="true" stored="true" />

Anything ending in "t" will be indexed using the text_general analyzer.

Note: You can get Sitecore to create these dynamic field references for you using the Sitecore Solr Schema Generator found in the Control Panel

When an item is indexed and its fields read, the field type is analyzed and mapped to one of the dynamic fields, meaning that there is no need to re-generate the schema more than once unless you want to add any configuration or fields yourself.

Quick Example: Simple schema-less field

In our example we have a field defined in a Sitecore template called ‘title’ which is a single-line text field.

This field is not defined in the Solr schema so when the field is indexed we want it to use the ‘*_t’ dynamic field that has been set up in the Solr schema.xml (mentioned above).

In Sitecore we needed a way to bind field types and CLR types to these dynamic field extensions.

For this purpose we have an additional section in the fieldMap section called ‘AddTypeMatch’ in the Solr configuration file. This tells Sitecore how to match a type to a dynamic field e.g.

<typeMatch typeName="text"   type="System.String"   fieldNameFormat="{0}_t" … />

You can see the ‘typeName’ reference can then be used throughout the configuration file to give hints to Sitecore about which extension to use. An example of this is mapping Sitecore field types to certain dynamic fields e.g.

<fieldType fieldTypeName="html|rich text|single-line text"  returnType="text" />

Coming back to our example, as the item is processed, each field is read and its fieldType is used to determine which dynamic field should be used. In this case, the field is therefore stored in the Solr index as ‘title_t’. What happens if I put this field in my schema?

Sitecore reads in the Solr schema when Sitecore starts up so if the field in our example, ‘title’ was present in the schema it will not add a dynamic field extension to it.

Dynamic fields with Linq and POCO objects

In the previous section we talked about what happens when a field is indexed, this will talk about when a field is queried. When using Linq with POCO objects the type of the property is always known and so we can imply the usage and calculate the dynamic extension automagically. If we had a POCO class with a property:

public string Title { get; set; }

.. any LINQ query made against ‘Title’ (e.g. queryable.Where(x => x.Title == “Swan Danger”) Sitecore is able to imply the type (string) and know that we are actually looking for a query against ‘title_t’ in the index (as ‘title’ is not in the schema) and will adapt the query accordingly. Once it has a result it will then map the result back to the property. Special case: Searching through the UI

Because of the free-text nature of Sitecore UI searches (everything is a string) and because the Solr provider needs a type to match to a dynamic field when querying, there can be an issue when searching for custom fields from the Sitecore UI.

Without any changes our example would be stored as title_t in the index but a search for ‘title:Sitecore’ in the Sitecore UI would fail as the field is not known (it isn’t in the schema) and cannot be implied (as the type isn’t known).

For this scenario we are must give Sitecore a hint by adding an entry to the ‘AddFieldByFieldName’ section of the Sitecore Solr configuration:

<fieldType fieldName="title"  returnType="text" />

.. after this is added the UI search will work as expected, as the type can now be implied.

With the Solr provider the aim is for one-to-one compatibility with the Lucene layer. The difference between a system that uses a schema (Solr) and one that is schema-less (Lucene) means small differences have emerged but the experience of switching between them should be as transparent as possible.

Sitecore handles the adding and removing of these dynamic field extensions but it is important to understand what it is doing ‘under the covers’ in case you need to debug any specific issues.

In the next post we will be looking at how Solr handles the indexing of multiple languages.

Dev Team