Data ingestion failing from realtime node

Hello,

we have an application which emits data to Kafka topic which needs to be handled by real-time node(for ingestion,segmentation and responsible for transferring the data to deep storage).

we were able to verify that there are messages in the Kafka topic but there are no ingestion tasks when we open the coordinator console(therefore no segmentation or data in deep storage).

Where do we start debugging the issue ? Below are the configs we are using :

spec-file :

“ioConfig” : {

“type” : “realtime”,

“firehose”: {

“type”: “kafka-0.8”,

“consumerProps”: {

“zookeeper.connect”: “localhost:2181”,

“zookeeper.connection.timeout.ms” : “15000”,

“zookeeper.session.timeout.ms” : “15000”,

“zookeeper.sync.time.ms” : “5000”,

“group.id”: “druid-sbs-test”,

“fetch.message.max.bytes” : “1048586”,

“auto.offset.reset”: “largest”,

“auto.commit.enable”: “false”

},

“feed”: “test”

},

“plumber”: {

“type”: “realtime”

}

},

“tuningConfig”: {

“type” : “realtime”,

“maxRowsInMemory”: 500000,

“intermediatePersistPeriod”: “PT15m”,

“windowPeriod”: “PT30m”,

“basePersistDirectory”: “/index/realtime”,

“rejectionPolicy”: {

“type”: “serverTime”

}

}

}

Hello,

Can I recommend that you take a look at the Kafka indexing service? http://druid.io/docs/latest/development/extensions-core/kafka-ingestion.html

Realtime nodes and the kafka-0.8 firehose are both considered legacy implementations now and are not that well supported.

The Kafka indexing service has the advantages of offering exactly-once ingestion from Kafka as well as reading arbitrarily timestamped data without window periods. This is versus something like the Kafka firehose which is both best-effort and can only handle real-time messages.

Your Kafka brokers will need to be at least 0.10 or better to be supported.

Hope this helps,

David

Sure. can you point me to any flow graph on how the indexing service works ? we see segments created but the ingestions are failing

This is a good document on how ingestion in general works in Druid: http://druid.io/docs/latest/ingestion/index.html

If you can provide more details on where you are seeing segments created (seeing them in deep storage or seeing them loaded on historicals, i.e. in the coordinator console?) and logs from the failing ingestion tasks, we might be able to find some hints about what is going on.