Kafka Indexing Service -- index tasks hangs

Hello,

I have problem with new kafka indexing service.

Environment:

  • 1st druid machine with broker, coordinator & historical nodes running

  • 2nd druid machine with overlord node (overlord running in “local” mode). (AWS EC2 c4.xlarge instance, 4 CPU, 7.5 RAM)

Kafka topics:

  • custom_activity with ~ 12 million messages lag

  • push_sent with ~ 200 K messages lag

  • ~ 25 other topics with 50-100 K messages lag

When I start overlord node, it successfully process some tasks, until it start custom_activity and push_sent topic tasks. Task duration is 3M and 30S.

Then tasks hangs, and CPU usage going down. In task logs I see “org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Marking the coordinator 2147483434 dead” records. Network between kafka broker and overlord is ok, from kafka broker machine I see connections from overlord node (using netstat).

Tasks payload, logs, overlord logs, overlord properties, cpu & memory usage logs is attached.

Thanks in advance!

overlord_log.txt (4.35 KB)

peon_custom_activity_log.txt (584 KB)

index_kafka_custom_activity_96ab0093db13372_epdcbdip.json (1.98 KB)

overlord.properties (824 Bytes)

P.S. I’m using druid 0.9.2-rc1 version.

Hey Aleksandr,

I can’t tell from your logs exactly what happened, but I suspect it’s an OOM. Try setting your maxRowsInMemory to something smaller than 1 000 000; maybe try 100 000 and see if that helps. It would also be helpful to log GC events (-XX:+PrintGCDetails -XX:+PrintGCTimeStamps) and see what’s happening there.

If you’re still seeing these issues, if you could post your full overlord log that would be helpful.