Sitecore 7: Why these results?

Why am I getting the results I am getting? This is often a question we ask ourselves when we are searching using a search provider. I can guarantee that you will at some point ask yourself the same question in Sitecore 7. This blog post aims to answer the most common of questions that users will have when using the new search user interface. Let's start with a practical example.

Question 1: How can I search for the word "Android" in the Sitecore Media Library?

Let's see which of the following will work in Sitecore 7.

  • Andr - No
  • Andr* - Yes
  • And - No
  • Android - Yes
  • * d*i * - Yes
  • *droid - Yes
  • ?ndroid - Yes

Why? It also has to do with Tokenization, a concept that developers will need to get used to when working with a search provider. Tokenization is the act of breaking up a stream of words into tokens e.g. breaking up a sentence into individual words. The output of this is a list of "Tokens" that are now searchable. There are 1000's of examples of this on the internet, however just for context, if most search providers were given the following paragraph they would most likely all have a similar output of tokens.

The Android Phone is in competition with the IPhone Phone and the competition is fierce.

Although it may seem like a trivial task, the Tokenization process run over this sentence can involve extremely complex algorithms to break this sentence into tokens. If we take the generic Tokenizer then the output will look something like this:

  • android
  • phone
  • competition
  • iphone
  • fierce

So, starting with 15 words, the process broke it down to 5 Tokens. We just went through a process that stripped out StopWords such as "and, the, is" etc and flattened the sentence, ready for an inverted index.

Let's get back to the original question of "Why do some searches work and some not?".

  • Andr - This will NOT work because "Andr" is NOT a token. It is the first 4 letters of a token.
  • Andr* - This will work as it is a wildcard to say "anything can appear after this"
  • And - This will NOT work because "AND" is a StopWord and we strip that out as it typically does not help in a search.
  • Android - This is a token, this will work.
  • * d*i * - This will work as wildcards are allowed at the start or end or in-between characters.
  • *droid - This will work due to wildcards.
  • ?ndroid - This will work due to ? being a single character wildcard.

Question 2: Is search case sensitive?

Not by default. Although possible using the right Analyzer in your configuration, it is generally considered best practice to lowercase everything in your index. It is rare that "case" adds to the relevance of your results. If it does then please look into the configuration and switch to a KeywordAnalyzer for the fields that would require this.

Question 3: How do I bring back everything?

A common use case will be that you want to search within everything in a part of the content tree without actually entering any search text. Providers like Google do not really allow this however we see a strong use-case within a CMS to do so. Having an empty textbox and pressing "Enter" on the keyboard or the search button or one of the views will bring back all items under the current item. Inserting a "*" will also achieve the same result.

Question 4: Search for a display name yields no results, why?

We have a Google Hangout coming out soon to show you some tips and tricks through the UI, however to answer this question here we need to talk about "Search Aliases". A "Search Alias" is a search written through the UI that is simply an alias to a search on another field or list of fields. A great example is the full-text query. When you type in text to the text box such as "Android" and press "Enter" on the keyboard this actually translates to the "text" search alias which will look up the _name and _content fields. The _name field stores the original name of the item and the _content field is an aggregate field that stores all the internal fields in one big field. This is why we don't have to specify a search that runs over 50 different fields. However, the Display Name is an example of a field that is not aggregated to the _content field. To include the display name in your search criteria you can do two things.

1: You can type _displayname:somename in the text box and it will search that specific field only.

2: You can add display name to the text Search Alias by navigating to the /sitecore/system/Settings/Buckets/Search Types/Text item and then adding "_displayname" to the "Field" field. We have added this by default in Update 2 of Sitecore 7.

Question 5: Why does it not act like Google?

Although we strive to offer a similar experience to Google when searching for content, we are not actually trying to solve the same problem as Google. It is really important to note that the front end UI is also used by authors to determine concrete lists of items to work with. Google does a lot of things in the background to offer up fuzzy or suggested content based off things like spelling mistakes, common trends etc. In fact these days it will sometimes assume you made a spelling mistake and run the corrected results. This works GREAT for Google, but for Sitecore, we do not want that experience. We would like authors to know that if they are searching for content, that they know exactly the content that will be brought back. We want out authors to have complete control over what content is served up in their website and this can only be achieved if we have strict rules over what a search will bring back.

Question 6: If I search in the Media Library, why does my content in the Layouts section not appear?

Context, Context, Context! Sitecore's searches are all context aware and location aware. If you are running a search from the media library node in the content tree then it will only show you results from the media library. You can actually change this behaviour by adding a "location" filter to your search and pointing it to another part of the content tree.

Question 7: Why does "Wayne Smith" get a hit but "Wayne" doesn't?

Think Analyzers! Similar to the answer for question 1 above, tokens, tokenizers and analyzers are your new best friends. Use your log files to see exactly how we are parsing the search if you are having troubles like the question states above.

Question 8 : Why no spelling corrections?

This support will be coming as native functionality soon!

Question 9: Why no auto-suggest?

This support will be coming as native functionality soon! Auto-suggest is available for filter searches but not raw text searches.

Question 10 : Why not zoidberg?