how long would an a new datasource is created from realtime events

hi, guys

Recently we want to use Druid to help setting up our data pipeline, thus I am using is locally to evaluate the solution. So I tried to setup an realtime analytics scenario with Tranquility and Caravel. The process is like this:

Events ----> Tranquility ----> Druid —> Caravel/Metabase

What I find is that it’s slow locally when Druid indexing the json data, which I understand normal in local environment. One problem I find is that Caravel try get the druid datasource by visiting this uri /druid/coordinator/v1/metadata/datasources, and it can’t get the datasource until the indexing task is finished.

I am a little confused of this, is the new datasource added after the indexing finished? or is there an config about a time range for creating the datasource when indexing realtime events?

I also tried Metabase, which I found it would be faster to get the datasource information of druid. It would be interesting to see why this is diferent.

So my actual question is , given there’s an event posted to druid, when would druid create the datasource and let it show up through /druid/coordinator/v1/metadata/datasources ?

Thanks a lot.



Hi Bowen,

I have also noticed this behavior. I believe that the datasource metadata isn’t available until an indexing job has handed off data to the historical node (I haven’t checked the code, this is just from my observation). It is annoying, but in our case it doesn’t happen that frequently. What I usually do when adding a new data source is to create an indexing task with a short duration so that the datasource gets created more quickly, and then I increase the duration to whatever I wanted it to be. Alternatively you could run a batch ingestion job which will create the datasource when it finishes.


It shows up at that endpoint after handoff completes. FWIW, with Pivot (, the segment metadata query is used to introspect data so a new dimension/datasource/etc shows up almost right away.