HELP: My Historicals doing nothing with kafka-indexing-service ingestion using HDFS deep storage?

Ingesting data to druid from kafka topic using kafka-indexing-service. This works fine when my Deep Storage is Azure Blob.

When I change Deep Storage to HDFS, my historicals act like they have nothing to do.

I also know my Druid connection to HDFS deep storage works because i can store indexing logs and segments in HDFS when ingesting through Tranquility.

MiddleManager Log:

2019-02-17T17:00:56,818 INFO [forking-task-runner-0-[index_kafka_kafkadruidhdfs_dccb14b8a4ee2bb_ebikfbgo]] io.druid.indexing.overlord.ForkingTaskRunner - Process exited with status[0] for task$

2019-02-17T17:00:56,819 INFO [forking-task-runner-0] io.druid.storage.hdfs.tasklog.HdfsTaskLogs - Writing task log to: /druid/indexing-logs/index_kafka_kafkadruidhdfs_dccb14b8a4ee2bb_ebikfbgo

2019-02-17T17:00:56,852 INFO [forking-task-runner-0] io.druid.storage.hdfs.tasklog.HdfsTaskLogs - Wrote task log to: /druid/indexing-logs/index_kafka_kafkadruidhdfs_dccb14b8a4ee2bb_ebikfbgo

2019-02-17T17:00:56,852 INFO [forking-task-runner-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_kafkadruidhdfs_dccb14b8a4ee2bb_ebikfbgo] status changed to [SUCCESS].

2019-02-17T17:00:56,853 INFO [forking-task-runner-0] io.druid.indexing.overlord.ForkingTaskRunner - Removing task directory: /data/druid/persistent/task/index_kafka_kafkadruidhdfs_dccb14b8a4ee$

2019-02-17T17:00:56,860 INFO [WorkerTaskMonitor] io.druid.indexing.worker.WorkerTaskMonitor - Job’s finished. Completed [index_kafka_kafkadruidhdfs_dccb14b8a4ee2bb_ebikfbgo] with status [SUCCE$

2019-02-17T17:00:56,910 INFO [WorkerTaskMonitor] io.druid.indexing.worker.WorkerTaskMonitor - Submitting runnable for task[index_kafka_kafkadruidhdfs_dccb14b8a4ee2bb_knlfnimp]

2019-02-17T17:00:56,918 INFO [WorkerTaskMonitor] io.druid.indexing.worker.WorkerTaskMonitor - Affirmative. Running task [index_kafka_kafkadruidhdfs_dccb14b8a4ee2bb_knlfnimp]

2019-02-17T17:00:56,924 INFO [forking-task-runner-2] io.druid.indexing.overlord.ForkingTaskRunner - Running command: java -cp conf/druid/_common:conf/druid/middleManager:lib/derby-10.11.1.1.ja$

2019-02-17T17:00:56,926 INFO [forking-task-runner-2] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_kafkadruidhdfs_dccb14b8a4ee2bb_knlfnimp] location changed to [TaskLocation{h$

2019-02-17T17:00:56,926 INFO [forking-task-runner-2] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_kafkadruidhdfs_dccb14b8a4ee2bb_knlfnimp] status changed to [RUNNING].

2019-02-17T17:00:56,926 INFO [forking-task-runner-2] io.druid.indexing.overlord.ForkingTaskRunner - Logging task index_kafka_kafkadruidhdfs_dccb14b8a4ee2bb_knlfnimp output to: /data/druid/pers$

2019-02-17T17:00:56,926 INFO [WorkerTaskMonitor] io.druid.indexing.worker.WorkerTaskMonitor - Updating task [index_kafka_kafkadruidhdfs_dccb14b8a4ee2bb_knlfnimp] announcement with location [Ta$

Hi Chris:

What does your ingestion task log say when you try ingesting from Kafka using HDFS as deep storage?

Thanks

The task logs show starting…running…success for the Kafka indexing service…then another with a new task id does the same…and on and on. It’s almost like there is no data in the topic.

Thank you to everyone who responded. It turns out the configuration was completely healthy. The problem was actually with my IO config in my Kafka supervisor spec.

I was telling it to look at two replicants but only configuring it to connect to one Kafka broker to obtain the topic data.