Identifying Search Engines and Search Terms with the Sitecore Customer Engagement Platform

This blog post describes how the Sitecore Customer Engagement Platform (CEP) identifies search engines and search terms to associate with tracked visits. CEP consists of the Sitecore ASP.NET CMS 6.5 or later (currently a technical preview) with the Digital Marketing System (DMS), which replaces the Online Marketing Suite (OMS).

When it initiates tracking for a new visit, CEP initiates the parseReferrer pipeline to identify search engines and search terms. Technically, the StartAnalytics processor in the httpRequestBegin pipeline and a media:request event handler each invoke the startTracking pipeline, which eventually invokes the parseReferrer pipeline. The /App_Config/Include/Sitecore.Analytics.config file defines the startTracking pipeline, the parseReferrer pipeline and the media:request event handler.

By default, the parseRerferrer pipeline contains a single processor: ParseGenericSearchEngine:

   <processor type="Sitecore.Analytics.Pipelines.ParseReferrer.ParseGenericSearchEngine,Sitecore.Analytics">
    <engines hint="raw:AddHostParameterName">
    <engine hostname="" parametername="q"/>
    <engine hostname="" parametername="p"/>

The ParseGenericSearchEngine parseReferrer pipeline processor handles the most common search engines, which use query string parameters to identify the search term. For example, google uses the query string parameter q to identify the search term, while Yahoo uses the query string parameter p. When a user clicks a link, the browser transmits the URL of the page containing that link to the server using the Referer [sic] HTTP header. When the user clicks a link in a search results page, the browser sends the URL of the search page containing that query string parameter as the referrer. The ParseGenericSearchEngine allows you to specify the host names and query strings used by any number of search engines. Sitecore applies the first entry with a hostname that matches the request that initiates tracking of a new visit. The hostname values do not include top level domains because these vary by region.

You can add additional search engines to the ParseGenericSearchEngine processor configuration, and you can implement your own custom parseReferrer pipeline processors. A parseReferrer pipeline processor class contains a method that accepts an argument of type Sitecore.Analytics.Pipelines.ParseReferrer.ParseReferrerArgs. If the processor identifies a search engine, it passes search keywords to the Sitecore.Analytics.Tracker.Visitor.DataContext.GetKeywords() method, and sets the Visit.Keywords property of the ParseGenericSearchEngine argument to that result. You don't have to do anything to identify the search engine - if you set the Visit.Keywords property, Sitecore automatically identifies the domain in the referrer URL as a search engine.

You can implement a solution based on this prototype for a parseReferrer pipeline processor that handles search engine that use the first token in the path to identify the search term (for example, http://domain.tld/searchterm/options). You could add this processor to the parseReferrer pipeline in before or after the ParseGenericSearchEngine processor:

<processor type="Sitecore.Sharedsource.Analytics.Pipelines.ParseReferrer.ParsePath,assembly">
  <hostnames hint="list">
    <hostname unique="1">domain</hostname>

For efficiency, Sitecore only invokes the parseReferrer pipeline if the browser transmits a referrer, and only on the first request in a visit. This processor overrides that referrer, but only under those conditions. To test the functionality described in this solution in a browser, you must click a link and you can't simply refresh the page. I put the static HTML file included with the prototype in the document root of the Sitecore solution, load that file in the browser, click the link, and then close the browser so I remember to start a new session for the next test.