Druid 0.11 kafka-indexing-service TimeoutException while getting current/latest offsets

The kafka indexing task final status SUCCESSED , But there is an exception in the overlord log.

Any suggestions ?

2019-10-30 15:49:30,741 INFO [KafkaIndexTaskClient-transaction_crm_merchant_org_ds-0] io.druid.indexing.kafka.KafkaIndexTaskClient - submitRequest failed for [http://druid-others-002:8100/druid/worker/v1/chat/index_kafka_transaction_crm_merchant_org_ds_a826167d93d45fc_fadhojad/offsets/current], with message [Connection refused (Connection refused)]

2019-10-30 15:49:35,941 WARN [KafkaSupervisor-transaction_crm_merchant_org_ds-Reporting-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - Exception while getting current/latest offsets

java.util.concurrent.TimeoutException: Timeout waiting for task.

at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:269) ~[guava-16.0.1.jar:?]

at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:96) ~[guava-16.0.1.jar:?]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.updateCurrentOffsets(KafkaSupervisor.java:1820) ~[?:?]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.lambda$updateCurrentAndLatestOffsets$12(KafkaSupervisor.java:1828) ~[?:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_171]

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_171]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_171]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_171]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

2019-10-30 15:49:40,162 ERROR [KafkaSupervisor-transaction_crm_store_city_bi] io.druid.indexing.kafka.supervisor.KafkaSupervisor - KafkaSupervisor[transaction_crm_store_city_bi] failed to handle notice: {class=io.druid.indexing.kafka.supervisor.KafkaSupervisor, exceptionType=class java.util.concurrent.TimeoutException, exceptionMessage=Timeout waiting for task., noticeClass=RunNotice}

java.util.concurrent.TimeoutException: Timeout waiting for task.

at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:269) ~[guava-16.0.1.jar:?]

at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:96) ~[guava-16.0.1.jar:?]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.checkTaskDuration(KafkaSupervisor.java:1074) ~[druid-kafka-indexing-service-0.11.0.jar:0.11.0]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.runInternal(KafkaSupervisor.java:685) ~[druid-kafka-indexing-service-0.11.0.jar:0.11.0]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor$RunNotice.handle(KafkaSupervisor.java:520) ~[druid-kafka-indexing-service-0.11.0.jar:0.11.0]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor$2.run(KafkaSupervisor.java:338) [druid-kafka-indexing-service-0.11.0.jar:0.11.0]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_171]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

2019-10-30 15:49:40,165 INFO [KafkaSupervisor-transaction_crm_store_city_bi] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“alerts”,“timestamp”:“2019-10-30T07:49:40.165Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“severity”:“component-failure”,“description”:“KafkaSupervisor[transaction_crm_store_city_bi] failed to handle notice”,“data”:{“class”:“io.druid.indexing.kafka.supervisor.KafkaSupervisor”,“exceptionType”:“java.util.concurrent.TimeoutException”,“exceptionMessage”:“Timeout waiting for task.”,“exceptionStackTrace”:“java.util.concurrent.TimeoutException: Timeout waiting for task.\n\tat com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:269)\n\tat com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:96)\n\tat io.druid.indexing.kafka.supervisor.KafkaSupervisor.checkTaskDuration(KafkaSupervisor.java:1074)\n\tat io.druid.indexing.kafka.supervisor.KafkaSupervisor.runInternal(KafkaSupervisor.java:685)\n\tat io.druid.indexing.kafka.supervisor.KafkaSupervisor$RunNotice.handle(KafkaSupervisor.java:520)\n\tat io.druid.indexing.kafka.supervisor.KafkaSupervisor$2.run(KafkaSupervisor.java:338)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n”,“noticeClass”:“RunNotice”}}]

2019-10-30 15:50:10,540 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:10.540Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jetty/numOpenConnections”,“value”:15}]

2019-10-30 15:50:15,868 INFO [KafkaIndexTaskClient-transaction_crm_store_keeper_ds-0] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://druid-others-002:8104

2019-10-30 15:50:15,883 INFO [KafkaSupervisor-transaction_crm_store_keeper_ds-Reporting-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:15.883Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“ingest/kafka/lag”,“value”:12443,“dataSource”:“transaction_crm_store_keeper_ds”}]

2019-10-30 15:50:24,699 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.699Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/mem/max”,“value”:4294967296,“memKind”:“heap”}]

2019-10-30 15:50:24,699 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.699Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/mem/committed”,“value”:4294967296,“memKind”:“heap”}]

2019-10-30 15:50:24,699 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.699Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/mem/used”,“value”:1221591536,“memKind”:“heap”}]

2019-10-30 15:50:24,699 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.699Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/mem/init”,“value”:4294967296,“memKind”:“heap”}]

2019-10-30 15:50:24,699 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.699Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/mem/max”,“value”:-1,“memKind”:“nonheap”}]

2019-10-30 15:50:24,699 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.699Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/mem/committed”,“value”:3769831424,“memKind”:“nonheap”}]

2019-10-30 15:50:24,699 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.699Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/mem/used”,“value”:3077811080,“memKind”:“nonheap”}]

Hi Zhujian ,

Which version of druid you are using?

Can u try increasing the number of druid.processing.numThreads

Thanks

Hi Zhujian ,

Which version of druid you are using?

Can u try increasing the number of druid.processing.numThreads or/and increase the task timeout.

Thanks

my configuration is :

Processing threads and buffers

druid.processing.buffer.sizeBytes=536870912

#druid.processing.buffer.sizeBytes=1073741824

druid.processing.numThreads=14

druid.processing.numMergeBuffers=3

druid.processing.columnCache.sizeBytes=0

druid.query.search.maxSearchLimit=1000

The first exception is :

2019``-``10``-``30 11``:``56``:``57``,``018 WARN [KafkaSupervisor-transaction_crm_store_city_bi] io.druid.java.util.common.RetryUtils - Failed on ``try 1``, retrying in 916ms.

org.skife.jdbi.v2.exceptions.UnableToObtainConnectionException: java.sql.SQLException: Cannot get a connection, general error

``at org.skife.jdbi.v2.DBI.open(DBI.java:``230``) ~[jdbi-``2.63``.``1``.jar:``2.63``.``1``]

Then all my kafka-indexing-task was blocked by TimeOutException.

I reset the supervisor, but my data lost.

在 2019年10月30日星期三 UTC+8下午11:01:58,Tijo Thomas写道:

Hi Zhujian ,

Which version of druid you are using?

Can u try increasing the number of druid.processing.numThreads or/and increase the task timeout.

Thanks

Hi Zhujian ,

Which version of druid you are using?

Can u try increasing the number of druid.processing.numThreads

Thanks

The kafka indexing task final status SUCCESSED , But there is an exception in the overlord log.

Any suggestions ?

2019-10-30 15:49:30,741 INFO [KafkaIndexTaskClient-transaction_crm_merchant_org_ds-0] io.druid.indexing.kafka.KafkaIndexTaskClient - submitRequest failed for [http://druid-others-002:8100/druid/worker/v1/chat/index_kafka_transaction_crm_merchant_org_ds_a826167d93d45fc_fadhojad/offsets/current], with message [Connection refused (Connection refused)]

2019-10-30 15:49:35,941 WARN [KafkaSupervisor-transaction_crm_merchant_org_ds-Reporting-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - Exception while getting current/latest offsets

java.util.concurrent.TimeoutException: Timeout waiting for task.

at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:269) ~[guava-16.0.1.jar:?]

at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:96) ~[guava-16.0.1.jar:?]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.updateCurrentOffsets(KafkaSupervisor.java:1820) ~[?:?]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.lambda$updateCurrentAndLatestOffsets$12(KafkaSupervisor.java:1828) ~[?:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_171]

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_171]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_171]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_171]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

2019-10-30 15:49:40,162 ERROR [KafkaSupervisor-transaction_crm_store_city_bi] io.druid.indexing.kafka.supervisor.KafkaSupervisor - KafkaSupervisor[transaction_crm_store_city_bi] failed to handle notice: {class=io.druid.indexing.kafka.supervisor.KafkaSupervisor, exceptionType=class java.util.concurrent.TimeoutException, exceptionMessage=Timeout waiting for task., noticeClass=RunNotice}

java.util.concurrent.TimeoutException: Timeout waiting for task.

at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:269) ~[guava-16.0.1.jar:?]

at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:96) ~[guava-16.0.1.jar:?]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.checkTaskDuration(KafkaSupervisor.java:1074) ~[druid-kafka-indexing-service-0.11.0.jar:0.11.0]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.runInternal(KafkaSupervisor.java:685) ~[druid-kafka-indexing-service-0.11.0.jar:0.11.0]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor$RunNotice.handle(KafkaSupervisor.java:520) ~[druid-kafka-indexing-service-0.11.0.jar:0.11.0]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor$2.run(KafkaSupervisor.java:338) [druid-kafka-indexing-service-0.11.0.jar:0.11.0]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_171]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_171]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

2019-10-30 15:49:40,165 INFO [KafkaSupervisor-transaction_crm_store_city_bi] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“alerts”,“timestamp”:“2019-10-30T07:49:40.165Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“severity”:“component-failure”,“description”:“KafkaSupervisor[transaction_crm_store_city_bi] failed to handle notice”,“data”:{“class”:“io.druid.indexing.kafka.supervisor.KafkaSupervisor”,“exceptionType”:“java.util.concurrent.TimeoutException”,“exceptionMessage”:“Timeout waiting for task.”,“exceptionStackTrace”:“java.util.concurrent.TimeoutException: Timeout waiting for task.\n\tat com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:269)\n\tat com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:96)\n\tat io.druid.indexing.kafka.supervisor.KafkaSupervisor.checkTaskDuration(KafkaSupervisor.java:1074)\n\tat io.druid.indexing.kafka.supervisor.KafkaSupervisor.runInternal(KafkaSupervisor.java:685)\n\tat io.druid.indexing.kafka.supervisor.KafkaSupervisor$RunNotice.handle(KafkaSupervisor.java:520)\n\tat io.druid.indexing.kafka.supervisor.KafkaSupervisor$2.run(KafkaSupervisor.java:338)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n”,“noticeClass”:“RunNotice”}}]

2019-10-30 15:50:10,540 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:10.540Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jetty/numOpenConnections”,“value”:15}]

2019-10-30 15:50:15,868 INFO [KafkaIndexTaskClient-transaction_crm_store_keeper_ds-0] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://druid-others-002:8104

2019-10-30 15:50:15,883 INFO [KafkaSupervisor-transaction_crm_store_keeper_ds-Reporting-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:15.883Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“ingest/kafka/lag”,“value”:12443,“dataSource”:“transaction_crm_store_keeper_ds”}]

2019-10-30 15:50:24,699 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.699Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/mem/max”,“value”:4294967296,“memKind”:“heap”}]

2019-10-30 15:50:24,699 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.699Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/mem/committed”,“value”:4294967296,“memKind”:“heap”}]

2019-10-30 15:50:24,699 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.699Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/mem/used”,“value”:1221591536,“memKind”:“heap”}]

2019-10-30 15:50:24,699 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.699Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/mem/init”,“value”:4294967296,“memKind”:“heap”}]

2019-10-30 15:50:24,699 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.699Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/mem/max”,“value”:-1,“memKind”:“nonheap”}]

2019-10-30 15:50:24,699 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.699Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/mem/committed”,“value”:3769831424,“memKind”:“nonheap”}]

2019-10-30 15:50:24,699 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.699Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/mem/used”,“value”:3077811080,“memKind”:“nonheap”}]

2019-10-30 15:50:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/mem/init”,“value”:8,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.2.name: s1\u0000 string [internal]”,“gcName”:[“g1”]}]

2019-10-30 15:50:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/count”,“value”:0,“gcGen”:[“old”],“gcName”:[“g1”]}]

2019-10-30 15:50:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/cpu”,“value”:0,“gcGen”:[“old”],“gcName”:[“g1”]}]

2019-10-30 15:50:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/mem/max”,“value”:4294967304,“gcGen”:[“old”],“gcGenSpaceName”:“sun.gc.generation.1.space.0.name: space\u0000 string [internal]”,“gcName”:[“g1”]}]

2019-10-30 15:50:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/mem/capacity”,“value”:3619684360,“gcGen”:[“old”],“gcGenSpaceName”:“sun.gc.generation.1.space.0.name: space\u0000 string [internal]”,“gcName”:[“g1”]}]

2019-10-30 15:50:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/mem/used”,“value”:919601648,“gcGen”:[“old”],“gcGenSpaceName”:“sun.gc.generation.1.space.0.name: space\u0000 string [internal]”,“gcName”:[“g1”]}]

2019-10-30 15:50:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:50:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/mem/init”,“value”:4068474888,“gcGen”:[“old”],“gcGenSpaceName”:“sun.gc.generation.1.space.0.name: space\u0000 string [internal]”,“gcName”:[“g1”]}]

2019-10-30 15:50:27,554 INFO [KafkaIndexTaskClient-transaction_crm_store_keeper_ds-1] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://druid-others-002:8104

2019-10-30 15:50:27,557 INFO [KafkaIndexTaskClient-transaction_crm_store_keeper_ds-0] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://druid-others-003:8101

2019-10-30 15:50:29,038 INFO [KafkaIndexTaskClient-transaction_crm_merchant_keeper_ds-1] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://druid-others-002:8102

2019-10-30 15:50:32,894 WARN [KafkaSupervisor-transaction_crm_store_org_ds-Reporting-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - Exception while getting current/latest offsets

java.util.concurrent.TimeoutException: Timeout waiting for task.

at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:269) ~[guava-16.0.1.jar:?]

at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:96) ~[guava-16.0.1.jar:?]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.updateCurrentOffsets(KafkaSupervisor.java:1820) ~[?:?]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.lambda$updateCurrentAndLatestOffsets$12(KafkaSupervisor.java:1828) ~[?:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_171]

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_171]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_171]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_171]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]

2019-10-30 15:51:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:51:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/mem/max”,“value”:4294967304,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.2.name: s1\u0000 string [internal]”,“gcName”:[“g1”]}]

2019-10-30 15:51:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:51:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/mem/capacity”,“value”:81788936,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.2.name: s1\u0000 string [internal]”,“gcName”:[“g1”]}]

2019-10-30 15:51:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:51:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/mem/used”,“value”:81788928,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.2.name: s1\u0000 string [internal]”,“gcName”:[“g1”]}]

2019-10-30 15:51:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:51:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/mem/init”,“value”:8,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.2.name: s1\u0000 string [internal]”,“gcName”:[“g1”]}]

2019-10-30 15:51:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:51:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/count”,“value”:0,“gcGen”:[“old”],“gcName”:[“g1”]}]

2019-10-30 15:51:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:51:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/cpu”,“value”:0,“gcGen”:[“old”],“gcName”:[“g1”]}]

2019-10-30 15:51:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:51:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/mem/max”,“value”:4294967304,“gcGen”:[“old”],“gcGenSpaceName”:“sun.gc.generation.1.space.0.name: space\u0000 string [internal]”,“gcName”:[“g1”]}]

2019-10-30 15:51:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:51:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/mem/capacity”,“value”:3619684360,“gcGen”:[“old”],“gcGenSpaceName”:“sun.gc.generation.1.space.0.name: space\u0000 string [internal]”,“gcName”:[“g1”]}]

2019-10-30 15:51:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:51:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/mem/used”,“value”:919601648,“gcGen”:[“old”],“gcGenSpaceName”:“sun.gc.generation.1.space.0.name: space\u0000 string [internal]”,“gcName”:[“g1”]}]

2019-10-30 15:51:24,701 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:51:24.701Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jvm/gc/mem/init”,“value”:4068474888,“gcGen”:[“old”],“gcGenSpaceName”:“sun.gc.generation.1.space.0.name: space\u0000 string [internal]”,“gcName”:[“g1”]}]

2019-10-30 15:51:36,315 WARN [KafkaSupervisor-transaction_crm_store_city_bi-Reporting-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - Exception while getting current/latest offsets

java.util.concurrent.TimeoutException: Timeout waiting for task.

at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:269) ~[guava-16.0.1.jar:?]

at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:96) ~[guava-16.0.1.jar:?]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.updateCurrentOffsets(KafkaSupervisor.java:1820) ~[?:?]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.lambda$updateCurrentAndLatestOffsets$12(KafkaSupervisor.java:1828) ~[?:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_171]

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_171]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_171]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_171]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]

2019-10-30 15:51:36,399 INFO [KafkaSupervisor-transaction_crm_store_city_bi-Reporting-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:51:36.399Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“ingest/kafka/lag”,“value”:0,“dataSource”:“transaction_crm_store_city_bi”}]

2019-10-30 15:51:51,677 INFO [KafkaIndexTaskClient-transaction_crm_store_org_ds-0] io.druid.indexing.kafka.KafkaIndexTaskClient - No TaskLocation available for task [index_kafka_transaction_crm_store_org_ds_88eaba9ecfbe7c5_kiajkjie], this task may not have been assigned to a worker yet or may have already completed

2019-10-30 15:51:51,679 INFO [Curator-PathChildrenCache-1] io.druid.indexing.overlord.MetadataTaskStorage - Deleting TaskLock with id[459178]: TaskLock{groupId=index_kafka_transaction_crm_store_org_ds, dataSource=transaction_crm_store_org_ds, interval=2019-10-30T07:00:00.000Z/2019-10-30T08:00:00.000Z, version=2019-10-30T07:00:17.045Z}

2019-10-30 15:51:56,239 INFO [KafkaIndexTaskClient-transaction_crm_merchant_org_ds-1] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://druid-others-003:8105

2019-10-30 15:52:02,100 INFO [KafkaIndexTaskClient-transaction_crm_merchant_keeper_ds-0] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://druid-others-003:8100

2019-10-30 15:52:02,119 INFO [KafkaSupervisor-transaction_crm_merchant_keeper_ds-Reporting-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:52:02.119Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“ingest/kafka/lag”,“value”:13353,“dataSource”:“transaction_crm_merchant_keeper_ds”}]

2019-10-30 15:52:10,541 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2019-10-30T07:52:10.540Z”,“service”:“druid/overlord”,“host”:“druid-others-002:8090”,“version”:“0.11.0”,“metric”:“jetty/numOpenConnections”,“value”:56}]

2019-10-30 15:52:16,000 INFO [KafkaSupervisor-transaction_crm_merchant_org_ds] io.druid.indexing.kafka.supervisor.KafkaSupervisor - {id=‘transaction_crm_merchant_org_ds’, generationTime=2019-10-30T07:52:16.000Z, payload={dataSource=‘transaction_crm_merchant_org_ds’, topic=‘transaction_crm_merchant_org’, partitions=10, replicas=1, durationSeconds=3600, active=[{id=‘index_kafka_transaction_crm_merchant_org_ds_3a0445703242d62_amldmdlg’, startTime=2019-10-30T07:19:53.167Z, remainingSeconds=1657}, {id=‘index_kafka_transaction_crm_merchant_org_ds_bb0b2ee2eeada84_ilfjgibf’, startTime=2019-10-30T07:19:53.612Z, remainingSeconds=1657}], publishing=[{id=‘index_kafka_transaction_crm_merchant_org_ds_a826167d93d45fc_fadhojad’, startTime=2019-10-30T06:19:29.748Z, remainingSeconds=0}]}}

2019-10-30 15:52:16,004 INFO [KafkaIndexTaskClient-transaction_crm_merchant_org_ds-1] io.druid.indexing.kafka.KafkaIndexTaskClient - submitRequest failed for [http://druid-others-002:8100/druid/worker/v1/chat/index_kafka_transaction_crm_merchant_org_ds_a826167d93d45fc_fadhojad/offsets/current], with message [Connection refused (Connection refused)]

2019-10-30 15:52:16,016 WARN [KafkaSupervisor-transaction_crm_merchant_org_ds-Reporting-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - Exception while getting current/latest offsets

java.util.concurrent.TimeoutException: Timeout waiting for task.

at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:269) ~[guava-16.0.1.jar:?]

at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:96) ~[guava-16.0.1.jar:?]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.updateCurrentOffsets(KafkaSupervisor.java:1820) ~[?:?]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.lambda$updateCurrentAndLatestOffsets$12(KafkaSupervisor.java:1828) ~[?:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_171]

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_171]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_171]

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_171]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]

You received this message because you are subscribed to the Google Groups “Druid User” group.

To unsubscribe from this group and stop receiving emails from it, send an email to druid...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/aa63aae6-385f-44b2-84aa-e6721ad26789%40googlegroups.com.


Tijo Thomas

Solutions Architect | => Imply , Bangalore , India


Tijo Thomas

Solutions Architect | => Imply , Bangalore , India

The reason is historical’s “druid.processing.numThreads” config ?

After reset my supervisor, the tasklock removed and all the segments pushed after retries.

It seems that update the supervisor will cause the dbi connection exception, and the tasklock will block the task to push segments.

The way to handle this problem I only know is reset the supervisor.

My druid version is 0.11.0. The kafka-indexing-service feature is experimental.

Is there any document describes the processing steps of pushing and loading segemnts.

在 2019年10月30日星期三 UTC+8下午11:01:58,Tijo Thomas写道:

Hi Zhujian,

Do you have any thought to upgrade to latest druid version (Druid 0.16) , Actually druid has evolved a lot since druid 0.10 and have got a lot of improvements/new features/bug fixe and fantastic Unified console UI?

Also, the Druid-kafka-indexing service is no more an experimental feature and is a Full fledged feature.

You can read the latest doc for Druid kafka indexing here : ( Druid 0.16)
https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html

For the old druid, I found is for Druid 0.13

https://druid.apache.org/docs/0.13.0-incubating/development/extensions-core/kafka-ingestion.html

Now on your issues in Drudi 0.10, Could you attach below details for one such occurrence:

  1. Kafka indexing log

  2. MM Log

  3. Overlord log

  4. Your Supervisor specs.

  5. Your druid configurations

Thanks,

Vaibhav

I will attach the detailed log by timeline after I read all the code and collect the log printed.

This cluster is in the production env of my company.

So I will try our best to find the reason cause the issue.

In the meantime, I will arrange my team to evaluate and test the latest 0.16.x release version.

Any suggestions about the 0.16.x version?

在 2019年10月31日星期四 UTC+8上午1:15:52,Vaibhav Vaibhav写道:

hi, Vaibhav Vaibhav:

Thanks a lot for your suggestions.

I’ll attach the logs and config later.

在 2019年10月31日星期四 UTC+8上午1:15:52,Vaibhav Vaibhav写道:

Druid 0.16 is fantastic !!! You can just run in micro-quickstart mode on your laptop and try/feel it.

Here is the link:
Release notes:
https://github.com/apache/incubator-druid/releases
QuickStart:
https://druid.apache.org/docs/latest/tutorials/index.html

Thanks,

Vaibhav

Thanks, we will test and deploy this version.

Should change the overlord log level to debug. I read the kafka supervisor method code KafkaSupervisor.checkTaskDuration().

Some useful log about the http request “GET current/latest” is debug level

在 2019年10月31日星期四 UTC+8上午1:43:18,Vaibhav Vaibhav写道:

Hi Zhujian,

We dont have any document regarding the steps . Attached a flowchart which may help you to understand the steps.

Thanks to Muthu who created this flowchart for druid community. You can also refer the kafka extension doc : https://druid.apache.org/docs/latest/development/extensions-core/kafka-ingestion.html

As Vaibhav suggested, there are a lot of improvement in druid post 0.11.0 , could you please try with the latest 0.16.0 version.

Regards

Kafka Ingestion Flow - Initial Flow.pdf (40.8 KB)

hi, Tijo Thomas:

Thanks a lot for your advice.

在 2019年10月31日星期四 UTC+8上午11:06:48,Tijo Thomas写道: