Kafka indexing - No errors and no data tooo :(


The quickstart tutorial from imply 1.3 works fine and was able to run the wikiticker kafka tutorial without issues. However, as soon as i try with some real data with a modified version of supervisor.json from kafka tutotial - i am seeing issues.

The Issue is I can see the supervisor task in the Indexer coordinator console and it says it is finished SUCCESSFULLY but there are no segments or data that has been processed. Also, the task finishes too soon .I tried with a very small sample like 20 rows and still it seems to have issue . It does connect to kafka and seek to the right offset but nothing happens after that. One notable error is zookeeper has an error with NoNode .( I am able to load the same data using traditional batch load fine, however i really want to take advantage of this new type of load as the data load has significant delays/stragglers in collection process which touches older segments. I don’t want to bother with tinkering realtime loads/window configuration or intervals/segments on the delta batch load configuration)

INDEX TASK LOG var/druid/task/index_kafka_realfeed_ce4f41348970e46_plffpncd/log

Hey Giri,

Can you post your supervisor spec and full overlord and task logs?

Many Thanks David ! Please find the zip file with logs - i have taken out MonitorScheduler messages as they seem to overwhelm the logs. I have also attached a small spec of dataset which seem to load find with the traditional batch load.

Thanks Again,


index_kafka_realfeed.zip (394 KB)

Hey Giri,

Are you writing the data to Kafka first and then starting the supervisor? By default, the Kafka indexing service will start reading from the latest offset in Kafka which is why it doesn’t read your data. If you want it to start from the earliest offset stored in Kafka, set “useEarliestOffset”:“true” under the ioConfig section in realfeed.json.

Note that this only affects the starting offset for the first task. After the first task completes, future tasks will start from where the previous one left off.

If it’s still not working, you might have some entries in your metadata storage that needs to be removed. The easiest way to do this if you’re doing a PoC with IAP is to just shut everything down and delete the /var directory.

Hi David,

I have deleted var directory and restarted everything. I have also made sure that i put data later . Now, every thing seems to work . May thanks for your help :slight_smile: