I like blobs. Specifically, I like Sitecore and how it uses blobs for storing media library assets. What I don't like, however, is that blobs can quickly inflate the size of your content database - especially when you start dealing with blobs numbering in the tens of thousands (or more). Wouldn't it be nice if you could retain all the nice features of storing blobs in your database, but not store blobs in your database? Well my friend, you can!
Before we get into the good stuff, let's walk through a primer on Sitecore blob storage.
Open the default Sitecore Master database and you'll see a table named Blobs. This table will contain all blob data used in the Sitecore media library. There are two columns in the Blobs table worth noting: BlobId and Data. The BlobId column contains a GUID used to uniquely identify a blob. The Data column contains the binary data for the blob.
When you upload a file to the Sitecore media library, a Sitecore media item is created. That media item contains a field named Media, which is used to represent the blob associated with the media item. Behind the scenes, the aforementioned BlobId value is stored as the value of the Media field in a media item, thereby providing a way to reference a blob from a media item without directly storing the blob within one of the media item fields. This is an important concept, as it provides separation between blob storage and content storage.
By distinctly separating blob storage from content storage and allowing us to override the default SQL Server data provider, we have the opportunity to roll our own blob storage container - without impacting standard media library functionality. For the purpose of this post, I will be demonstrating the use of Azure blob storage, but in theory you could use any storage container in which you have the ability to read/write data and uniquely identify a blob (e.g. file system, separate SQL server, NOSQL, Azure Table storage, etc...).
Any time I need to override or extend existing Sitecore functionality, the first place I visit is the web.config file to look for potential integration points, then on to .NET Reflector to determine what needs to be done. In this case, I want to get as close to the data as possible so I can be sure all touch points related to blob storage end up filtering through my code - which naturally leads me to the main Sitecore SQL Server data provider (Sitecore.Data.SqlServer.SqlServerDataProvider).
Using Reflector to take a look under the hood, I can see that Sitecore.Data.SqlServer.SqlServerDataProvider contains 3 methods related to blob handling that override base class methods: BlobStreamExists, GetBlobStream, SetBlobStream.
BlobStreamExists, GetBlobStream, SetBlobStream
Walking up the inheritance chain I see that the base class, Sitecore.Data.DataProviders.Sql.SqlDataProvider contains 1 virtual method related to blob handling (CleanupBlobs) and 1 overridden method related to blob handling (RemoveBlobStream).
Walking one step further up the inheritance chain, to the Sitecore.Data.DataProviders.DataProvider class, I don't see any other methods related to blob handling that need to be overridden. Therefore, I now have a list of methods to override in a custom data provider class - 5 methods, not too bad!
First, I created a new class that extends the Sitecore.Data.SqlServer.SqlServerDataProvider class.
AzureBlobStorageProvider : Sitecore.Data.SqlServer.SqlServerDataProvider
Next, I added some properties to provide convenient (and efficient) access to the Azure Storage account and blob container.
_storageAccount ?? (_storageAccount = CloudStorageAccount.Parse(Configuration.Settings.Media.AzureBlobStorage.StorageConnectionString)); }
_blobClient ?? (_blobClient = StorageAccount.CreateCloudBlobClient()); }
_blobContainer = BlobClient.GetContainerReference(Configuration.Settings.Media.AzureBlobStorage.StorageContainerName);
I also created a convenience class for retrieving settings that are specific to the Azure blob storage provider.
I also created an extension method class for extending the Microsoft.WindowsAzure.StorageClient.CloudBlob object. Currently, only one extension method is implemented which determines whether or not a blob object exists in the Azure storage account container. Note: there is an unpleasant smell to the code below due to exception-based logic, but it's functional. Extracted from this blog post - http://blog.smarx.com/posts/testing-existence-of-a-windows-azure-blob
(e.ErrorCode == StorageErrorCode.ResourceNotFound)
And now onto the data provider methods...
I won't go into the code for this method in detail, as it largely uses much of the same code from the Sitecore.Data.DataProviders.Sql.SqlDataProvider.CleanupBlobs method. However, the general algorithm is as follows:
This was actually a fairly easy method to implement until I started working on the CleanupBlobs operation. From an Azure standpoint, it's pretty simple, we get a reference to the blobId passed in to the SetBlobStream method, then upload the blob stream argument to Azure using that reference.
From a Sitecore standpoint, however, we also want to create an "empty" reference to the blob in the BlobsBlobs table, just without the blob. During the CleanupBlobs operation, all Sitecore item field values are examined for references to blobIds stored in the Blobs table. If a blobId is orphaned (i.e. not in use by any Sitecore item fields), then the related blob should be deleted. If we didn't use SQL to generate a list of unused blobs, the alternative would be to retrieve an entire list of blobs from the Azure blob storage container, then iterate through that list to determine which blobs aren't in use within Sitecore and should be "cleaned up" (i.e. removed). That would be an expensive operation, especially as the number of blob items in your Azure storage container increases.
SetBlobStream(Stream stream, Guid blobId, CallContext context)
var blob = BlobContainer.GetBlobReference(blobId.ToString());
//insert an empty reference to the BlobId into the SQL Blobs table, this is basically to assist with the cleanup process.
//during cleanup, it's faster to query the database for the blobs that should be removed as opposed to retrieving and parsing a list from Azure.
"INSERT INTO [Blobs]( [Id], [BlobId], [Index], [Created], [Data] ) VALUES( NewId(), @blobId, @index, @created, @data)"
(var connection =
var command =
CommandTimeout = (
, SqlDbType.Image, 0).Value =
In this method, retrieve a reference to the blobId in question, then use the extension method mentioned earlier to return whether or not the blob exists.
BlobStreamExists(Guid blobId, CallContext context)
In this method, retrieve a reference to the blobId in question. If the referenced blob doesn't exist in Azure storage, then return null. If it does exist, download the blob to a System.IO.MemoryStream object and return that stream.
Stream GetBlobStream(Guid blobId, CallContext context)
var memStream =
In this method, first retrieve a reference to the blobId in question. Then use the Microsoft.WindowsAzure.StorageClient.CloudBlob.DeleteIfExists method to delete the blob if it exists in the Azure storage container. Lastly, call the base class Sitecore.Data.DataProviders.Sql.SqlDataProvider.RemoveBlobStream method. This ensures that any record in the Blobs database table, which references the blobId in question, is deleted from the Blobs table.
RemoveBlobStream(Guid blobId, CallContext context)
I'll echo the words of developers everywhere - "It works in my environment". As such, your experience may vary and you would be wise to exercise caution if you choose to implement some version of the provider demonstrated in this article - especially if you're considering it for production use.
A few other considerations to keep in mind if you choose to use Azure/"the cloud" as a storage provider:
Hi Adam, It is a nice solution. Although I when I tried deleting a media item and permanently deleting it from the recycle bin, I don't see the debugger hitting the RemoveBlobStream override method. Since you say, that the doing the above will remove the media from the storage container, i will expect the debugger to hit this method. Any ideas?
Hi Adrian, As noted in the last section of the article: "When the recycle bin is enabled, blobs are only removed from blob storage when their referencing item is permanently removed from the recycle bin AND the database cleanup operation is performed." In other words, when you "permanently" delete media items from the recycle bin, only the item and blob reference are deleted. The actual blob will still remain in storage (either database or your custom storage) even after a permanent delete from the recycle bin. In order to remove the blob (and subsequently execute the RemoveBlobStream method), you need to run the database cleanup operation (via the Sitecore control panel). This is standard Sitecore behavior when the recycle bin is enabled and not specific to the provider example. If you want to make things more seamless and delete blobs when you permanently delete a media item in the recycle bin, then you'd likely need to explore extending the Sitecore.Data.Archiving.SqlArchive class - specifically the various "RemoveEntries" methods. The challenge will be in determining whether or not an item being permanently removed from the recycle bin contains any blob fields and then obtaining a reference to the blob to be deleted. Cheers, adam
Very useful. I'm looking to use something different to Azure, but still very valuable information. Thanks.
Hi Adam, I wanted to figure out the size of all media item which is stored as blob in my Sitecore master DB,how can I generate this report?