Druid -Spark Tranquility

I want to be able of use druid to insert data into Historical Nodes and RealTime Nodes for later query combine data on broker Nodes.

I would like to know what are the right way to do that. Im receiving data through a broker ,im parsing it and then i want to :

(1) - Insert those data on real time nodes through scala. Is tranquility the right way of do that?

(2) - Later, periodically i want to write all the same data on historical layer. How can i do that through scala?I suppose tranquility don’t write data on historical layer.

(3) - At any time i want to be able to query broker nodes? i supposed tranquillity don’t do it either? What is the alternative. I want do that through scala.

Thanks!

see inline

(1) I will give a try on Storm,

(2) I pretend to override data from realtime node when i insert data to the same timestamps like on lambda architecture. So iḿ computing my own batch views and then i want to insert it to historical layer to override the real time view. Am not on right way?

(3) Right!

Thanks!

(2) For lambda architecture

(a) you would setup storm-tranquility -> realtime nodes , so that you continue to send events to druid as they arrive. this will keep on generating segments inside druid.

(b) after realtime ingestion is over for a particular interval, you can take raw data for same interval and ingest it into druid via batch ingestion (http://druid.io/docs/latest/Batch-ingestion.html ) which will create the segments for same interval again. now, new segments will automatically override the old segments which were created via realtime ingestion.

– Himanshu

Hi Rafael, going back to the original question in the topic, were you thinking of using Spark Streaming and sending events to Druid? Tranquility currently supports Storm and Samza, but you can extend it to also use Spark Streaming.

Tranquility is only used for realtime ingest and not for batch ingest. Druid supports a hadoop-based batch ingest method as the recommend way of loading large static files.

Druid’s default query language is JSON over HTTP. There are several libraries (http://druid.io/docs/latest/Libraries.html) for using other languages to query Druid.