How to import XML documents into Sitecore?

A while ago I came across a rather interesting problem. What if a requirement is made by a client that they would need hundreds and thousands of XML documents imported in Sitecore. How can we achieve this?

So after researching a lot on the topic on various other Sitecore MVP blogs, I didn’t get a convincing solution or let’s say I didn’t look hard enough. After much research and reading documents on Sitecore Developer Network (SDN), I was able to find answer to this interesting question.

Import XML Files into Sitecore

Here it is and would like to share it with the Sitecore community.

  • By creating a File Crawler based on Sitecore.Search. The Sitecore.Search API acts as a wrapper for Lucene.Net. It provides flexible integration of Sitecore with Lucene and a set of .NET friendly wrappers around the Lucene classes related to search.
  • Creating a File Crawler is explained very nicely and in detail in the Sitecore Search and Indexing document (refer pages 39 to 44) on SDN.
  • “Sitecore.Search Crawler is a component in Sitecore that scans a specific storage system such as a database or file system, extracting information and storing it in a search index, making it available to Sitecore Search. It performs several roles – Indexer, Crawler and Monitor”. – Sitecore Search and Indexing document
  • Note: The above solution is only applicable forSitecore CMS 6.6 and above.
    • One thing to bear in mind is that if one can, have the indexing done in regular intervals rather than constantly running in the background during the import. This even furthers the speed of import. Updating the setting <setting name=”Indexing.UpdateInterval” value=”00:50:00″ /> to say 50 minutes and using that window to import the files.

Another aspect is to try and import the files into the web database than master database. Web database is lot faster, purely because by default it’s not connected to History Engine, whereas master database is connected.

But at the same time, as Kern Herskind Nightingale points out to me, if one is importing items into web database they could be wiped by a publish. Also turning off indexing while the upload of data is going on will be a wise step.

I am aware that this solution is applicable for Sitecore 6.6 and above, and now there are other versions of Sitecore like 7.0, 7.1, 7.2, 7.5 and Sitecore 8 coming out soon for everyone, but there are still clients out there who are still using Sitecore 6.6.

Happy Sitecoring!