LibrarySites.Banner

Sitecore 7 Inbound and Outbound Filter Pipelines

Sitecore 7 introduces pipelines for controlling a global filter over what goes into the index and what comes out of the index.

Key Takeaway's

  • Sitecore 7 contains an ApplyOutboundSecurityFilter for "nulling" items that the context user does not have access to. This will skew the result counts as your results collection may be 1000 items but you have have actually received 1003 hits at the search provider level.
  • Use these filter pipelines for globally stopping documents going into the index e.g. You may want to only ever insert the latest version of a document into the index (for the web database this will happen anyway).
  • Both the inbound and outbound pipelines run at the API level, so the filters you place in these pipelines will affect the UI results as well as result you get through calling the API directly.

The pipelines exist in the Sitecore.ContentSearch.config file and the default filters consist of a processor stub for the inbound filter and the ApplyOutboundSecurityFilter for "nulling" results that the user should not see. The inbound stub is only there to show you that you can add a filter, it does not filter anything going into the index.

<indexing.filterIndex.inbound>
 <processor type="Sitecore.ContentSearch.Pipelines.IndexingFilters.ApplyInboundIndexFilter, Sitecore.ContentSearch"></processor>
</indexing.filterIndex.inbound>  

<indexing.filterIndex.outbound>
 <processor type="Sitecore.ContentSearch.Pipelines.IndexingFilters.ApplyOutboundSecurityFilter, Sitecore.ContentSearch"></processor>
</indexing.filterIndex.outbound>

Inbound Filters

Let's implement an inbound processor that will only allow the latest version of an IIndexable to be placed into the index. To show that IIndexable can be of the type Item I will also restrict Standard Values from going into the index as well i.e. if the item is the Standard Values of a template, do not put it into the index.

public class ApplyInboundIndexVersionFilter : InboundIndexFilterProcessor   
{        
    public override void Process(InboundIndexFilterArgs args)       
    { 
        var item = args.IndexableToIndex as SitecoreIndexableItem;

        if (!item.Item.Versions.IsLatestVersion())            
        {                  
            args.IsExcluded = true;            
        }
    }
}

We would like each processor to be responsible for its own filter so that we could reuse the code for other projects if necessary, hence, I will separate the version filter and Standard Values filter into separate classes.

public class ApplyInboundIndexStandardValuesFilter : InboundIndexFilterProcessor   
{        
    public override void Process(InboundIndexFilterArgs args)       
     { 
        var item = args.IndexableToIndex as SitecoreIndexableItem;

        if (item.Item.Name == "__Standard Values")            
        {                  
            args.IsExcluded = true;            
        }
    }
}

We now need to add these processors to our indexing.filterIndex.inbound pipeline

 <processor type="Sitecore.Seven.ApplyInboundIndexVersionFilter, Sitecore.Seven"></processor>
 <processor type="Sitecore.Seven.ApplyInboundIndexStandardValuesFilter, Sitecore.Seven"></processor>

Outbound Filters

Let's implement an outbound processor that will only allow IIndexable that sits in the final workflow state to be brought back in the results of a search query.

public class ApplyOutboundIndexWorkflowFilter : OutboundIndexFilterProcessor   
{        
    public override void Process(OutboundIndexFilterArgs args)       
    { 
        if (args.IndexableUniqueId == null)
        {
            return;
        }

        if (args.IndexableDataSource == "sitecore")
        {
            var uri      = new ItemUri(args.IndexableUniqueId);
            var database = Sitecore.Context.Database;
            var workflow = WorkflowProvider.GetWorkflow(database.GetItem(itemUri.ItemID));

            if (!workflow.IsApproved(database.GetItem(itemUri.ItemID))) 
            {
                args.IsExcluded = true;
            }
        }
    }
}

We now need to add these processors to our indexing.filterIndex.outbound pipeline

 <processor type="Sitecore.Seven.ApplyOutboundIndexWorkflowFilter, Sitecore.Seven"></processor>

These pipelines are not designed as global filters for the LINQ to Provider layer but rather a global filter of what goes in and comes out of the index. These pipelines were initially designed to solve the requirement of security in the index but as you can see, be used to solve other requirements as well.

Dev Team

  • Do these filters also apply to searches performed within the shell interfaces?  For example, if I used that workflow filter above, would content search in the admin only return items in the final workflow state?

  • @Kam - Yes, this will only return items in the final workflow state but as mentioned in the post, it will give you a different number of results to the total hits at the provider level.

  • This helped me immensely. Thanks, Tim!

  • I'm trying to setup a Outbound Index Filter and I can get it working replace the following line   var database = Sitecore.Context.Database;  with:  var database = Database.GetDatabase("master");  I've tried to do the search via the desktop Search app as well as just the search tab.  When I step into the code, the context database for either of this search apps shows as "Core" even when i've set my desktop database to master or web.    Is there a way around hard coding the database?

  • @Chirag Patel - in the backend the context database is Core since that's the UI database for the backend. You're probably looking for Sitecore.Context.ContentDatabase.  If you're using it both in the shell and frontend use Sitecore.Context.ContentDatabase ?? Sitecore.Context.Database

  • @Chirag - In 7.2 (haven't checked in earlier versions) you can use the following to get the database  var itemUri = new ItemUri(args.IndexableUniqueId); var database = Sitecore.Data.Database.GetDatabase(itemUri.DatabaseName); var item = database.GetItem(itemUri.ItemID);

  • It's worth noting that implementing the ApplyInboundIndexVersionFilter to ensure only the latest version goes into the index can cause problems that aren't apparent at first glance...  As the WEB db only ever has the latest version, I imagine developers would want to add this inbound filter to their MASTER db index to prevent filling it up with old, irrelevant versions. This is why I implemented it, anyway.  The problem my team found is as follows:  1) Create an item, version 1 goes into the index because it's the latest version 2) Add a new version. Version 2 goes into the index because it's now the latest version. 3) Version 1 gets blocked by the inbound filter, meaning the index entry for version 1 DOESN'T GET UPDATED OR REMOVED. In the index it is still marked as the latest version. So is version 2. This means you have 2 versions in your index, both marked as the latest version. This is a disaster waiting to happen and completely defeats the purpose of the filter in the first place.  You have to be very careful with inbound filters because they don't do as you might expect. I expected that if you set "args.IsExcluded" to true then it would REMOVE that entry from the index, but it doesn't - it ONLY ensures that nothing gets ADDED. That's a subtle but very crucial difference.  Once we found this problem we quickly removed the inbound latest version filter. Now we just ensure all of our index queries have _latesversion == "1" thus ignoring the old versions.

  • @Owen I am facing that issue, any solution please?

  • @Shafaqat Ali, extend SitecoreItemCrawler, override DoAdd() and DoUpdate() and inject a piece of condition into the foreach loop at the end that iterates over versions.  For DoAdd:  Only add to the Operations.Add if it's isLatestVersion   For DoUpdate:  Only add to the Operations.Add if it's isLatestVersion and add to Operations.Delete otherwise.  Note: indexes crawled over web database will have all versions. I just reported an issue against SC8. The problem is - web only has a single version (the latest) and Sitecore will see it as isLatestVersion. Say you indexed version 1 - it's the only one it has in web and it's the latest. Then you published version 2. Unless you reached the full rebuild threashold the update will be incremental and while version 2 is the latest version 1 is no longer in the web database so it won't even go through this code. For that, when I query items of indexes that crawled web I sort them by hit.Document.ItemUri.Version.Number and pick Last().

  • Great stuff @Pavel. Ran into this issue and will be implementing your suggestion.

  • Hi The Inbound filter should return the same IsExcluded result regardless item state(it’s field values). In other words, the example with  LatestVersion was not the best one. As it is mentioned in previous comments, if previously IsExcluded was returned as true, and now it returning as false, indexable item for which it was true will persist in the index and won’t be updated/deleted until next full rebuild. You can still use mentioned processor but, in this case, you have to manually remove data from the index if this is necessary. For this purpose one can use "indexing:excludedfromindex" event: 1. Subscribe to the event:       <event name="indexing:excludedfromindex">         <handler type="Sitecore.Seven.HandleExcludedItems, Sitecore.Seven" method="RemoveExcludedItems"/>       </event>  2. Implement the handler logic that would remove data from the index. Something similar to this one (example might be extended with a custom logic to filter what should \ should not be removed): public sealed class HandleExcludedItems {     public void RemoveExcludedItems(object sender, EventArgs args)     {         var indexName = Sitecore.Events.Event.ExtractParameter<string>(args, 0);         var version = Sitecore.Events.Event.ExtractParameter(args, 1) as IIndexableUniqueId;          if (string.IsNullOrEmpty(indexName) || !ContentSearchManager.SearchConfiguration.Indexes.ContainsKey(indexName)) return;          if (version == null) return;          var index = ContentSearchManager.GetIndex(indexName);         if (index == null) return;          System.Threading.Tasks.Task.Run(() => index.Delete(version));     } }