My Java application needs to send historic data to a remote Druid cluster for indexing. I need a solution without Hadoop.
- Can I use Tranquility for this? The Tranquilizer API looks very appealing but from some older posts it seems that Tranquility does not (yet?) support batch ingestion: it does real-time ingestion and hence will drop events outside its window. Is this info still correct?
- If Tranquility doesn’t work, the next best API according to the documentation seems to be the Index Task API. Since my application and Druid are not on the same file system, I am thinking of using the EventReceiverFirehose endpoint (/druid/worker/v1/chat//push-events/). Some questions:
- Can I make multiple calls to /druid/worker/v1/chat//push-events/? (Looking at DruidBeam.scala, it seems so.)
- Does a call to /druid/worker/v1/chat//push-events/ block until the events are processed?
- If such a call doesn’t block, how can I back pressure (= make sure I don’t overflow Druid with requests)?
Thanks a lot for any feedback!