Questions about "auto.offset.reset" in Kafka-indexing-service

Hi all,
I’m using druid-0.9.1 with features of Kafka-indexing-service. That’s really awesome!

My imply cluster has one node with zk + broker + pivot + overlord + coordinator, one node with overlord + coordinator for HA, and eight nodes with middlemanager + historical

My Kafka cluster has nine servers, and data retention is set by 2 HOUR.

In Spec of Kafka-indexing-service, I add an option “auto.offset.reset” : “latest” in “consumerProperties“, when I restart supervisor for my topic, I can find “auto.offset.reset=latest” in overlord.log, but this option is still set “none” in worker Spec, so worker will throws an OffsetOutofRangeException when reading an expired offset in partitions . Many Kafka indexing tasks failed and numbers of segments of each hour reduce a lot.

How can I fix this problem, can anyone help me solve this problem?

“ioConfig”: {

“topic”: “tsl”,

“consumerProperties”: {

“bootstrap.servers”: “202.30.75.91:9092, 202.30.75.92:9092,202.30.75.93:9092,202.30.75.94:9092,202.30.75.95:9092,202.30.75.96:9092,202.30.75.98:9092,202.30.75.99:9092,202.30.75.100:9092,”,

“auto.offset.reset” : “latest”

}

}

Because of your 2 hour data retention, I guess you’re hitting a case where the Druid Kafka indexing tasks are trying to read offsets that have already been deleted. This causes problems with the exactly-once transaction handling scheme, which requires that all offsets be read in order, without skipping any. The Github issue https://github.com/druid-io/druid/issues/3195 is about making this better – basically you would have an option to reset the Kafka indexing to latest (this would involve resetting the ingestion metadata Druid stores for the datasource).

In the meantime, maybe it’s possible to make this happen less often by either extending your Kafka retention, or by setting your Druid taskDuration lower than the default of 1 hour.

Thanks, Gian Merlino

在 2016年7月17日星期日 UTC+8上午1:04:08,Gian Merlino写道:

Hi Gian
I have set my Druid taskDuration by “PT20M”, and it really works! but I still need to watch the status of the cluster for some days. Thanks a lot !

I still have no idea about that why setting Druid taskDuration lower can make the loss of segments happen less?

Thanks!

在 2016年7月17日星期日 UTC+8上午1:04:08,Gian Merlino写道:

Using a shorter taskDuration makes Druid commit Kafka offsets more often, which if you have really short retention is probably going to be more stable. (You don’t want the most recently committed offsets to fall out of the retention window, since then you can’t failover or retry tasks.)