I was wondering if anyone tried running druid with Azure Data Lake or Azure HDInsights? We are trying to build out a druid cluster on Azure and I couldn’t find anything on searches regarding that. The only thing I found was azure-extensions which makes use of Azure Blob Storage.
Any pointers are appreciated.
We (Bannerflow) have been running Druid on Azure for about half year. I realize that this reply is a bit late, I just saw this question today and I think the topic is still relevant.
Our setup is like this
- Data comes in to Event Hub
- A spark streaming jobs reads from EH and put it to druid for near realtime ingestion
- A spark job does hourly batch processing of data from EH, put it to Azure Blob Storage and finally triggers druid Hadoop indexing job to ingest it from the blob storage
- Druid itself is running on custom VMs (just noticed a couple of days ago that the latest HDInsights version provides a preview of Druid, maybe we will utilize that in the future)
- We use HDInsights cluster for Spark & Hadoop
- Azure Blob storage for storage. (Data Lake only recently became available in the region where we have druid when we started, but it would be interesting to try now since Blob have some problems)
In order to get it to work we had to do some small patches to the azure extension and the Hadoop indexing, otherwise it wouldnt work with the wasb protocol used by Azure Blob Storage. Our druid version is 9.1.1, and because of our custom changes we havent had the time to try upgrading to the latest version yet. I hope the later versions will work out of the box without our patches
In general Druid itself has been running very stable and without any problems. However, spark streaming from Event Hub has required a lot of work to run smoothly.
Can you explain the patches done to azure extension and Hadoop indexing.
We are working on something similar.And facing issues with wasb protocol
The fork with our changes are available here: https://github.com/nordicfactory/druid in the branch wasb-integration
Note that as I wrote its based on Druid 0.9.1.1 and we havent had time to look into whats needed for newer versions of druid.