LibrarySites.Banner

Data Integration using the Sitecore Publish Pipeline

Overview

I recently used the Sitecore Publish Pipeline to do a data integration with other systems in our network. While going through this process, it occurred to me that a “how-to” might be helpful to others trying to do the same.

We had a requirement where we needed published content to be synced with 3rd party systems in near real-time. The 3rd party system essentially needed the content from the web database. The best way to sync the data in near real-time was to plug into the process that Sitecore used to populate the web database. This would ensure that we would have access to the data as it was being published. Sitecore uses the publish pipeline to migrate published items from the master database to the web database. This would be the perfect spot to plug in our logic.

Performance and Durability

One thing to watch out for when adding logic to the publish pipeline or other pipelines is the impact to the performance. Each step in a pipeline is synchronous. Adding an additional step or steps that are slow, will slow down the overall process.

Typically when sending data to a 3rd party system, some sort of action needs to be performed to send the data to the system. This might be a call to a web service on the 3rd party system. Adding a call to a web service for each item during the publish process has the potential to really slow down publishing. Not only will there be the latency involved when calling the web service over the network, we may also have to wait for the 3rd party system to process the data before the web service call returns. What happens when the 3rd party system is not available and the web service is not responding? This will not only slow the system down as it waits for the web service calls to time out, but the 3rd party system will miss the data update and will get out of sync.

One way to resolve the issue of performance and durability, is to use a service bus. Instead of calling the web service directly on the 3rd party system, we would publish a message on the service bus. The 3rd party system would then subscribe to the messages and process them as they became available. The act of publishing messages typically performs very well. Using a service bus, we do not have to wait for the 3rd party system to process the published message. We simply publish the message onto the bus and move on. The 3rd party system will process the message asynchronously when it is ready. Messages on a bus will typically queue up if the subscriber (3rd party system) is not ready to receive them. Once the subscriber is ready, it can process the queued messages. This prevents the subscriber from losing any data.

There are many Service Bus options available. Here are a few:

Implementation

The out of the box configuration for the publish pipeline is as follows:

Through discovery, I found the Sitecore.Publishing.Pipelines.PublishItem.PerformAction pipeline step did the actual work of updating the web database based on the data from the master database. For the most part all the information we need to send to the 3rd party system is available to us after the PerformAction pipeline step runs. At this point, we know the Operation that was performed on the data. Prior to this step, we do not know what the Operation is. The Operation is used to tell the 3rd party system what to do with the data. The information that is missing at this point is the actual item related to deletions. Deleted items no longer exist after the PerformAction pipeline step runs. If we want any information about items that are deleted as part of publishing, we need to tap in before the PerformAction pipeline step runs. Because of this, we need to create 2 PublishItem pipeline steps to capture all the information we need, one before PerformAction and one after.

As I mentioned before, we need a step before the PerformAction step to gather the information about items that are about to be deleted. We first need to determine if the PublishItemContext.Action is PublishAction.DeleteTargetItem. If it is, the item is about to be deleted. We can then attempt to retrieve the item that is about to be deleted from the publishing target. If it is not there, we can try the publishing source.

Note: When attempting to retrieve the item, if the item is not found, the item has already been deleted. This can occur when more than one language is published. The first language published will delete the item.

Once we have the item we can add it to the PublishItemContext.CustomData for use further down the pipeline. The PublishItemContext.CustomData is shared between all the pipeline steps. Here is the complete code:

Now that we have a way to get deleted items, we can move on to the pipeline step that occurs after the PerformAction step. We don't want to send unnecessary data to the 3rd party system. If the PublishOperation is Skipped or None, that means Target (typically the Web database) and the 3rd party system has already been updated. We don't want to send this data again. If a republish is performed we always want to process the delete actions and we don't want to rely on the PublishItemContext.Result.Operation for this. Republish can be used as a way to true-up the data in the target database and the 3rd party system. This may be necessary if the data becomes out of sync.

Note on Deletes... The PublishOperation of Deleted will only come though once for one of the published languages. The other languages will have a result of none. As a result, we only publish one Delete message regardless of the number of published languages

Next, we want to get the item related to the Action and Operation. If the item is due to a delete, we need to pull it out of the PublishItemContext.CustomData from the previous step. Otherwise we get the current item from PublishItemContext.VersionToPublish. To prevent extra noise, you may want to only process items that are of a certain template type. We really only want to process items that the 3rd party system cares about. We also need to ignore the __Standard Values items. We can detect these if the TemplateID is the same as the ParentID.

We need to take the information from the Item and operation and publish it onto the Service Bus for consumption by the 3rd party system. First we need to define the messages that we will be publishing on the bus. There are three primary messages we will be publishing; Created, Updated and Deleted. These map to the Publish Operations that were performed. We want to include the data from the item that was impacted as part of the messages. All of this must be serializable for sending over the message bus. Because of this we will be using Plain Old CLR Objects or POCOs to send the data. We can use a tool such as Glass Mapper to map the data from a Sitecore item to the model objects or entities. First, we need to create the messages that will contain the model objects / entities.

Now we need to use the messages and mappings to send the data over the service bus. The subscribers/consumers of the messages can take action based on the message type and the Entity stored in the message.

For completeness sake, I have included the full source for the EvaluateResult pipeline step below. The_mapper and _bus dependencies are dependent on the implementation. These would typically be injected as dependencies. The code is included as a single method for readability. It is best to break the code up into smaller methods.

Lastly, we need to register the pipeline steps via configuration.

Conclusion

I hope someone finds this to be helpful when designing a near real-time integration.

Want to read more posts on Sitecore? Check out any of the following:

Storing Tabular Lookup Data in Sitecore
In some cases, there may be a need to store tabular data in Sitecore. Read how-to in this post.

Sitecore vCards
We wanted to make it as easy as possible to implement vCards on our Sitecore sites. Read about and download our API to help you with your implementation of vCards.

Sitecore: Express Subitem Module
The Express Subitem Module allows content editors to edit multiple child Sitecore items at the same time within the context of the parent item. Read how in this blog post.