I ran the experiments a couple of more times and here are the answers:
- Are you able to retrieve the logs of the successful tasks?
Yes. I’m able to retrieve the logs of the successful tasks. In fact, I realized that the failure logs are also there. Its just that they’re empty.
- Do you see the logs for the failed tasks in HDFS?
Yes. But the log files are empty
- When you have a failed task and there aren’t any log files generated, are there any exceptions in the middle manager / overlord logs?
(a) I saw the following exception in the overlord log:
2016-11-01T21:11:31,561 WARN [KafkaSupervisor-crs_datasource_1-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - Task [index_kafka_crs_datasource_1_227940bb80823fc_gllobjio] failed to return start time, killing task
java.lang.RuntimeException: java.net.ConnectException: Connection refused
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
at io.druid.indexing.kafka.KafkaIndexTaskClient.submitRequest(KafkaIndexTaskClient.java:328) ~[druid-kafka-indexing-service-0.9.1.1.jar:0.9.1.1]
(b) And the following in the middleManager log:
2016-11-01T21:04:50,048 INFO [forking-task-runner-10] io.druid.indexing.overlord.ForkingTaskRunner - Exception caught during execution
java.io.IOException: Stream closed
at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170) ~[?:1.8.0_101]
at java.io.BufferedInputStream.read1(BufferedInputStream.java:291) ~[?:1.8.0_101]
Does this mean that the overlord is unable to connect to the Kafka cluster?
But from outside druid, I’m able to connect to the Kafka cluster from those same overlord and middleManager nodes