Ingest: when is my data queryable?

Hi,

We use druid 0.10, but recently upgraded to 0.11 and still see the same issue. We have a use case in which we (a) ingest data via HDFS indexing, then (b) wait until the data is available to be queried, then © ping a separate system, which then queries the data to do something with it. The problem is in step b. At first we tried this:

  1. Poll the Overlord status endpoint /druid/indexer/v1/task//status until it returns SUCCESS or FAILED. This lets us know when the indexing job is complete.
  2. Poll the Coordinator loadstatus endpoint /druid/coordinator/v1/loadstatus until it returns 100% for our datasource.
    But then we discovered that for a pre-existing datasource there is a period of time after the indexing job completes in which loadstatus returns 100%, before it starts to return something less than 100%. So we added another step:

1.5. Poll the Coordinator loadstatus endpoint every second until it returns non-100%.

This polling has to be every second because if the datasource is small it takes very little time to distribute through the cluster. If we poll every 10 seconds, we might miss seeing a non-100% number, and then we have no way of knowing if the new data is available or not.

This seems really kludgy - please tell me there’s a better way!

Thanks,

Dan

  1. Poll the Coordinator loadstatus endpoint /druid/coordinator/v1/loadstatus until it returns 100% for our datasource.

Hm, this is what we do for our Imply quickstart “post-index-task” script, you bring up a good point though:

But then we discovered that for a pre-existing datasource there is a period of time after the indexing job completes in which loadstatus returns 100%, before it starts to return something less than 100%.

Can you please file an issue about this? If there isn’t a better way to determine when the data ingested by a task is fully available for querying than the scripting approach you described (I don’t know of one at the moment), it seems like it would be a useful feature to have.

Thanks,

Jon

I created issue #5721 for this. Thanks Jon.

Dan