Druid datasource is not fully available after indexing task is completed successfully


I’m using Hadoop batch indexing mechanism to index data into Druid. The index task takes about 10 mins to index 5GB of data and the task status says “SUCCESS”. I can even see the segments on HDFS deep storage.

However, the data source in the Overlord page has been in a “<99% available” state ever since the indexing task is complete.

A quick check in the MySQL database has the following:

“druid_dataSource” table is empty

“druid_pendingSegments” is empty

“druid_segments” table has data source segment info and corresponds to the directories found on HDFS for this data source.

There are no exceptions/errors found in the indexing task logs or the overlord / co-ordinator logs.

None of the queries return any result as well.

Here’s our Druid setup:

Druid version: 0.9.1

1 overlord

1 co-ordinator

3 historical and middle manager nodes

1 broker.

Deep storage - HDFS

Metadata store: MySQL

I found similar behavior (datasource <99% available) for a data source that was indexed using Kafka indexing service as well.

Any pointers would really help.



can you share the logs of coordinator and historical ?

seems like segments are created but not loaded.

it can be the historical can not read from HDFS or they don’t have enough space to load it.

Aslo zookeeper can be an issue as well.

Sharing the logs will help

I am having this same issue. Nothing in the druid_datasource table in MySQL after successfully ingesting (tranquility) and storing segments in HDFS.