Druid nodes not able to connect to zookeeper

Relates to Apache Druid 0.22.1

I am trying to build druid on AWS EKS infrastructure and after implementing all the deployments for Druid nodes & zookeeper, few pods are getting restarted multiple times. Below is the screen shot and error message. Kindly help.

LOGS from Historical POD


2022-03-29T15:42:56,333 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: SUSPENDED

2022-03-29T15:42:56,333 WARN [NodeRoleWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeRoleWatcher - Ignored event type[CONNECTION_SUSPENDED] for node watcher of role[coordinator].

2022-03-29T15:42:56,503 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.502Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/mem/max”,“value”:8303607808,“memKind”:“heap”}

2022-03-29T15:42:56,503 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.503Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/mem/committed”,“value”:8303607808,“memKind”:“heap”}

2022-03-29T15:42:56,503 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.503Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/mem/used”,“value”:1268353224,“memKind”:“heap”}

2022-03-29T15:42:56,503 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.503Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/mem/init”,“value”:8589934592,“memKind”:“heap”}

2022-03-29T15:42:56,503 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.503Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/mem/max”,“value”:-1,“memKind”:“nonheap”}

2022-03-29T15:42:56,503 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.503Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/mem/committed”,“value”:61603840,“memKind”:“nonheap”}

2022-03-29T15:42:56,504 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.503Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/mem/used”,“value”:59716040,“memKind”:“nonheap”}

2022-03-29T15:42:56,504 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.504Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/mem/init”,“value”:2555904,“memKind”:“nonheap”}

2022-03-29T15:42:56,504 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.504Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/max”,“value”:251658240,“poolKind”:“nonheap”,“poolName”:“Code Cache”}

2022-03-29T15:42:56,504 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.504Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/committed”,“value”:11534336,“poolKind”:“nonheap”,“poolName”:“Code Cache”}

2022-03-29T15:42:56,504 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.504Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/used”,“value”:11452160,“poolKind”:“nonheap”,“poolName”:“Code Cache”}

2022-03-29T15:42:56,504 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.504Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/init”,“value”:2555904,“poolKind”:“nonheap”,“poolName”:“Code Cache”}

2022-03-29T15:42:56,504 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.504Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/max”,“value”:-1,“poolKind”:“nonheap”,“poolName”:“Metaspace”}

2022-03-29T15:42:56,504 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.504Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/committed”,“value”:44302336,“poolKind”:“nonheap”,“poolName”:“Metaspace”}

2022-03-29T15:42:56,504 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.504Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/used”,“value”:42922104,“poolKind”:“nonheap”,“poolName”:“Metaspace”}

2022-03-29T15:42:56,505 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.504Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/init”,“value”:0,“poolKind”:“nonheap”,“poolName”:“Metaspace”}

2022-03-29T15:42:56,505 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.505Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/max”,“value”:1073741824,“poolKind”:“nonheap”,“poolName”:“Compressed Class Space”}

2022-03-29T15:42:56,505 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.505Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/committed”,“value”:5767168,“poolKind”:“nonheap”,“poolName”:“Compressed Class Space”}

2022-03-29T15:42:56,505 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.505Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/used”,“value”:5352384,“poolKind”:“nonheap”,“poolName”:“Compressed Class Space”}

2022-03-29T15:42:56,505 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.505Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/init”,“value”:0,“poolKind”:“nonheap”,“poolName”:“Compressed Class Space”}

2022-03-29T15:42:56,505 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.505Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/max”,“value”:2290614272,“poolKind”:“heap”,“poolName”:“Eden Space”}

2022-03-29T15:42:56,505 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.505Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/committed”,“value”:2290614272,“poolKind”:“heap”,“poolName”:“Eden Space”}

2022-03-29T15:42:56,505 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.505Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/used”,“value”:1236936192,“poolKind”:“heap”,“poolName”:“Eden Space”}

2022-03-29T15:42:56,505 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.505Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/init”,“value”:2290614272,“poolKind”:“heap”,“poolName”:“Eden Space”}

2022-03-29T15:42:56,506 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.505Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/max”,“value”:286326784,“poolKind”:“heap”,“poolName”:“Survivor Space”}

2022-03-29T15:42:56,506 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.506Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/committed”,“value”:286326784,“poolKind”:“heap”,“poolName”:“Survivor Space”}

2022-03-29T15:42:56,506 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.506Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/used”,“value”:0,“poolKind”:“heap”,“poolName”:“Survivor Space”}

2022-03-29T15:42:56,506 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.506Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/init”,“value”:286326784,“poolKind”:“heap”,“poolName”:“Survivor Space”}

2022-03-29T15:42:56,506 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.506Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/max”,“value”:5726666752,“poolKind”:“heap”,“poolName”:“Tenured Gen”}

2022-03-29T15:42:56,506 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.506Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/committed”,“value”:5726666752,“poolKind”:“heap”,“poolName”:“Tenured Gen”}

2022-03-29T15:42:56,506 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.506Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/used”,“value”:31417032,“poolKind”:“heap”,“poolName”:“Tenured Gen”}

2022-03-29T15:42:56,506 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.506Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/pool/init”,“value”:5726666752,“poolKind”:“heap”,“poolName”:“Tenured Gen”}

2022-03-29T15:42:56,506 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.506Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/bufferpool/capacity”,“value”:302629720,“bufferpoolName”:“direct”}

2022-03-29T15:42:56,506 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.506Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/bufferpool/used”,“value”:302629720,“bufferpoolName”:“direct”}

2022-03-29T15:42:56,507 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.506Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/bufferpool/count”,“value”:46,“bufferpoolName”:“direct”}

2022-03-29T15:42:56,507 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.507Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/bufferpool/capacity”,“value”:0,“bufferpoolName”:“mapped”}

2022-03-29T15:42:56,507 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.507Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/bufferpool/used”,“value”:0,“bufferpoolName”:“mapped”}

2022-03-29T15:42:56,507 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.507Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/bufferpool/count”,“value”:0,“bufferpoolName”:“mapped”}

2022-03-29T15:42:56,507 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.507Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/count”,“value”:0,“gcGen”:[“young”],“gcName”:[“serial”]}

2022-03-29T15:42:56,507 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.507Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/cpu”,“value”:0,“gcGen”:[“young”],“gcName”:[“serial”]}

2022-03-29T15:42:56,507 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.507Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/mem/max”,“value”:2290614272,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.0.name: eden string [internal]”,“gcName”:[“serial”]}

2022-03-29T15:42:56,507 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.507Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/mem/capacity”,“value”:2290614272,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.0.name: eden string [internal]”,“gcName”:[“serial”]}

2022-03-29T15:42:56,507 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.507Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/mem/used”,“value”:1236936192,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.0.name: eden string [internal]”,“gcName”:[“serial”]}

2022-03-29T15:42:56,507 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.507Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/mem/init”,“value”:0,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.0.name: eden string [internal]”,“gcName”:[“serial”]}

2022-03-29T15:42:56,508 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.508Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/mem/max”,“value”:286326784,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.1.name: s0 string [internal]”,“gcName”:[“serial”]}

2022-03-29T15:42:56,508 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.508Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/mem/capacity”,“value”:286326784,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.1.name: s0 string [internal]”,“gcName”:[“serial”]}

2022-03-29T15:42:56,508 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.508Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/mem/used”,“value”:0,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.1.name: s0 string [internal]”,“gcName”:[“serial”]}

2022-03-29T15:42:56,508 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.508Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/mem/init”,“value”:0,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.1.name: s0 string [internal]”,“gcName”:[“serial”]}

2022-03-29T15:42:56,508 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.508Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/mem/max”,“value”:286326784,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.2.name: s1 string [internal]”,“gcName”:[“serial”]}

2022-03-29T15:42:56,508 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.508Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/mem/capacity”,“value”:286326784,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.2.name: s1 string [internal]”,“gcName”:[“serial”]}

2022-03-29T15:42:56,508 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.508Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/mem/used”,“value”:0,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.2.name: s1 string [internal]”,“gcName”:[“serial”]}

2022-03-29T15:42:56,508 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.508Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/mem/init”,“value”:0,“gcGen”:[“young”],“gcGenSpaceName”:“sun.gc.generation.0.space.2.name: s1 string [internal]”,“gcName”:[“serial”]}

2022-03-29T15:42:56,508 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.508Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/count”,“value”:0,“gcGen”:[“old”],“gcName”:[“MSC”]}

2022-03-29T15:42:56,509 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.508Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/cpu”,“value”:0,“gcGen”:[“old”],“gcName”:[“MSC”]}

2022-03-29T15:42:56,509 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.509Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/mem/max”,“value”:5726666752,“gcGen”:[“old”],“gcGenSpaceName”:“sun.gc.generation.1.space.0.name: old string [internal]”,“gcName”:[“MSC”]}

2022-03-29T15:42:56,509 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.509Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/mem/capacity”,“value”:5726666752,“gcGen”:[“old”],“gcGenSpaceName”:“sun.gc.generation.1.space.0.name: old string [internal]”,“gcName”:[“MSC”]}

2022-03-29T15:42:56,509 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.509Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/mem/used”,“value”:31417032,“gcGen”:[“old”],“gcGenSpaceName”:“sun.gc.generation.1.space.0.name: old string [internal]”,“gcName”:[“MSC”]}

2022-03-29T15:42:56,509 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.509Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/gc/mem/init”,“value”:5726666752,“gcGen”:[“old”],“gcGenSpaceName”:“sun.gc.generation.1.space.0.name: old string [internal]”,“gcName”:[“MSC”]}

2022-03-29T15:42:56,509 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.509Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jvm/heapAlloc/bytes”,“value”:3128640}

2022-03-29T15:42:56,526 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.526Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“segment/scan/pending”,“value”:0}

2022-03-29T15:42:56,527 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {“feed”:“metrics”,“timestamp”:“2022-03-29T15:42:56.527Z”,“service”:“druid/historical”,“host”:“10.0.89.160:8083”,“version”:“0.22.1”,“metric”:“jetty/numOpenConnections”,“value”:0}

2022-03-29T15:42:58,274 INFO [main-SendThread(zk-cs.zk-druid.svc.cluster.local:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server zk-cs.zk-druid.svc.cluster.local/172.20.251.47:2181. Will not attempt to authenticate using SASL (unknown error)

2022-03-29T15:42:58,275 INFO [main-SendThread(zk-cs.zk-druid.svc.cluster.local:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /10.0.89.160:59658, server: zk-cs.zk-druid.svc.cluster.local/172.20.251.47:2181

2022-03-29T15:42:58,276 INFO [main-SendThread(zk-cs.zk-druid.svc.cluster.local:2181)] org.apache.zookeeper.ClientCnxn - Session establishment complete on server zk-cs.zk-druid.svc.cluster.local/172.20.251.47:2181, sessionid = 0x37fd6175e270015, negotiated timeout = 30000

2022-03-29T15:42:58,276 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: RECONNECTED

2022-03-29T15:42:58,276 WARN [NodeRoleWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeRoleWatcher - Ignored event type[CONNECTION_RECONNECTED] for node watcher of role[coordinator].

2022-03-29T15:42:58,277 INFO [main-SendThread(zk-cs.zk-druid.svc.cluster.local:2181)] org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x37fd6175e270015, likely server has closed socket, closing socket connection and attempting reconnect

2022-03-29T15:42:58,277 ERROR [NodeRoleWatcher[COORDINATOR]] org.apache.curator.framework.recipes.cache.PathChildrenCache -

org.apache.zookeeper.KeeperException$UnimplementedException: KeeperErrorCode = Unimplemented for /druid/internal-discovery

at org.apache.zookeeper.KeeperException.create(KeeperException.java:106) ~[zookeeper-3.5.9.jar:3.5.9]

at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[zookeeper-3.5.9.jar:3.5.9]

at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1538) ~[zookeeper-3.5.9.jar:3.5.9]

at org.apache.curator.utils.ZKPaths.mkdirs(ZKPaths.java:351) ~[curator-client-4.3.0.jar:?]

at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:230) ~[curator-framework-4.3.0.jar:4.3.0]

at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:224) ~[curator-framework-4.3.0.jar:4.3.0]

at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:67) ~[curator-client-4.3.0.jar:?]

at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:81) ~[curator-client-4.3.0.jar:?]

at org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:221) ~[curator-framework-4.3.0.jar:4.3.0]

at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:206) ~[curator-framework-4.3.0.jar:4.3.0]

at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:35) ~[curator-framework-4.3.0.jar:4.3.0]

at org.apache.curator.framework.imps.CuratorFrameworkImpl.createContainers(CuratorFrameworkImpl.java:265) ~[curator-framework-4.3.0.jar:4.3.0]

at org.apache.curator.framework.EnsureContainers.internalEnsure(EnsureContainers.java:69) ~[curator-framework-4.3.0.jar:4.3.0]

at org.apache.curator.framework.EnsureContainers.ensure(EnsureContainers.java:53) ~[curator-framework-4.3.0.jar:4.3.0]

at org.apache.curator.framework.recipes.cache.PathChildrenCache.ensurePath(PathChildrenCache.java:596) ~[curator-recipes-4.3.0.jar:4.3.0]

at org.apache.curator.framework.recipes.cache.PathChildrenCache.refresh(PathChildrenCache.java:492) [curator-recipes-4.3.0.jar:4.3.0]

at org.apache.curator.framework.recipes.cache.RefreshOperation.invoke(RefreshOperation.java:35) ~[curator-recipes-4.3.0.jar:4.3.0]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:808) [curator-recipes-4.3.0.jar:4.3.0]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_322]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_322]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_322]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_322]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_322]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_322]

at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]

2022-03-29T15:42:58,377 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: SUSPENDED

2022-03-29T15:42:58,378 WARN [NodeRoleWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeRoleWatcher - Ignored event type[CONNECTION_SUSPENDED] for node watcher of role[coordinator].

Welcome @Sivakumar! How much memory and CPU do you have allocated for your pods?

Below are the configurations I used for the pods

#broker
broker_replicas = 3
broker_requests_cpu = “512m”
broker_requests_memory = “8Gi”
broker_limits_cpu = “512m”
broker_limits_memory = “8Gi”
broker_port = “8082”

#coordinator
coordinator_replicas = 1
coordinator_requests_cpu = “256m”
coordinator_requests_memory = “2Gi”
coordinator_limits_cpu = “256m”
coordinator_limits_memory = “2Gi”
coordinator_port = “8081”

#historical
historical_replicas = 1
historical_requests_cpu = “512m”
historical_requests_memory = “8Gi”
historical_limits_cpu = “512m”
historical_limits_memory = “8Gi”
historical_port = “8083”

#middle manager
middlemanager_replicas = 1
middlemanager_requests_cpu = “512m”
middlemanager_requests_memory = “8Gi”
middlemanager_limits_cpu = “512m”
middlemanager_limits_memory = “8Gi”
middlemanager_port = “8084”

#overload
overlord_replicas = 1
overlord_requests_cpu = “512m”
overlord_requests_memory = “2Gi”
overlord_limits_cpu = “512m”
overlord_limits_memory = “2Gi”
overlord_port = “8090”

#router
router_replicas = 1
router_requests_cpu = “128m”
router_requests_memory = “512Mi”
router_limits_cpu = “128m”
router_limits_memory = “512Mi”
router_port = “8888”

#zookeeper
zookeeper_namespace = “zk-druid”
zookeeper_requests_cpu = “512m”
zookeeper_requests_memory = “2Gi”
zookeeper_host = “zk-cs.zk-druid.svc.cluster.local”
zookeeper_replicas = 3

#postgres
create_postgres = true
postgres_namespace = “druid”
postgres_db = “druid”
postgres_host = “postgres-cs.druid.svc.cluster.local”
postgres_port = “5432”
postgres_user = “druid”
postgres_password = “druid”

Thanks for that. I’m still researching, but I want to share a discussion I came across in ASF Slack:

I am curious of folks wisdom here on how to deploy a stable Zookeeper in a dynamic cloud like EKS.

Nodes come and go, if you are unlucky, 2 ZK pods could be on 2 different bad nodes.

Should I deploy 5 instances? Would that be better?

For reliable ZooKeeper service, you should deploy ZooKeeper in a cluster known as an ensemble . As long as a majority of the ensemble are up, the service will be available. Because Zookeeper requires a majority, it is best to use an odd number of machines.
For example, with four machines ZooKeeper can only handle the failure of a single machine; if two machines fail, the remaining two machines do not constitute a majority. However, with five machines ZooKeeper can handle the failure of two machines.

So bumping the number of Zookeeper machines up to five might be worth a try.

I will try to increase and zookeeper and will let you know the results. Thanks Mark.

Mark,

Tried increased the zookeeper pods to 5 and still the same results… Getting the same error for Middlemanager|overload|router pods. Please help me in fixing this

2022-03-31T14:29:56,975 INFO [main] org.apache.druid.indexing.common.config.TaskConfig - Batch processing mode:[CLOSED_SEGMENTS]

2022-03-31T14:29:58,574 INFO [main] org.eclipse.jetty.util.log - Logging initialized @10902ms to org.eclipse.jetty.util.log.Slf4jLog

2022-03-31T14:29:58,669 INFO [main] org.apache.druid.server.initialization.jetty.JettyServerModule - Creating http connector with port [8084]

2022-03-31T14:29:58,969 WARN [main] org.eclipse.jetty.server.handler.gzip.GzipHandler - minGzipSize of 0 is inefficient for short content, break even is size 23

2022-03-31T14:29:58,982 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [INIT]

2022-03-31T14:29:58,982 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [NORMAL]

2022-03-31T14:29:58,982 INFO [main] org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting

2022-03-31T14:29:59,063 INFO [main] org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=zk-cs.zk-druid.svc.cluster.local sessionTimeout=30000 watcher=org.apache.curator.ConnectionState@611b35d6

2022-03-31T14:29:59,070 INFO [main] org.apache.zookeeper.common.X509Util - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation

2022-03-31T14:29:59,074 INFO [main] org.apache.zookeeper.ClientCnxnSocket - jute.maxbuffer value is 4194304 Bytes

2022-03-31T14:29:59,081 INFO [main] org.apache.zookeeper.ClientCnxn - zookeeper.request.timeout value is 0. feature enabled=

2022-03-31T14:29:59,089 INFO [main] org.apache.curator.framework.imps.CuratorFrameworkImpl - Default schema

2022-03-31T14:29:59,090 INFO [main] org.apache.druid.java.util.emitter.core.LoggingEmitter - Start: started [true]

2022-03-31T14:29:59,160 INFO [main] org.apache.druid.indexing.worker.WorkerCuratorCoordinator - WorkerCuratorCoordinator good to go sir. Server[10.0.174.224:8084]

2022-03-31T14:29:59,164 INFO [main-SendThread(zk-cs.zk-druid.svc.cluster.local:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server zk-cs.zk-druid.svc.cluster.local/172.20.250.204:2181. Will not attempt to authenticate using SASL (unknown error)

2022-03-31T14:29:59,169 INFO [main-SendThread(zk-cs.zk-druid.svc.cluster.local:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /10.0.174.224:39632, server: zk-cs.zk-druid.svc.cluster.local/172.20.250.204:2181

2022-03-31T14:29:59,177 INFO [main-SendThread(zk-cs.zk-druid.svc.cluster.local:2181)] org.apache.zookeeper.ClientCnxn - Session establishment complete on server zk-cs.zk-druid.svc.cluster.local/172.20.250.204:2181, sessionid = 0x47fe060e6860003, negotiated timeout = 30000

2022-03-31T14:29:59,263 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED

2022-03-31T14:29:59,279 INFO [main-SendThread(zk-cs.zk-druid.svc.cluster.local:2181)] org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x47fe060e6860003, likely server has closed socket, closing socket connection and attempting reconnect

2022-03-31T14:29:59,365 ERROR [main] org.apache.druid.cli.CliMiddleManager - Error when starting up. Failing.

java.lang.reflect.InvocationTargetException: null

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_322]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_322]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_322]

at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_322]

at org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler.start(Lifecycle.java:446) ~[druid-core-0.22.1.jar:0.22.1]

at org.apache.druid.java.util.common.lifecycle.Lifecycle.start(Lifecycle.java:341) ~[druid-core-0.22.1.jar:0.22.1]

at org.apache.druid.guice.LifecycleModule$2.start(LifecycleModule.java:143) ~[druid-core-0.22.1.jar:0.22.1]

at org.apache.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:115) [druid-services-0.22.1.jar:0.22.1]

at org.apache.druid.cli.ServerRunnable.run(ServerRunnable.java:63) [druid-services-0.22.1.jar:0.22.1]

at org.apache.druid.cli.Main.main(Main.java:113) [druid-services-0.22.1.jar:0.22.1]

Caused by: org.apache.zookeeper.KeeperException$UnimplementedException: KeeperErrorCode = Unimplemented for /druid/indexer/tasks/10.0.174.224:8084

at org.apache.zookeeper.KeeperException.create(KeeperException.java:106) ~[zookeeper-3.5.9.jar:3.5.9]

at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[zookeeper-3.5.9.jar:3.5.9]

at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:1637) ~[zookeeper-3.5.9.jar:3.5.9]

at org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1180) ~[curator-framework-4.3.0.jar:4.3.0]

at org.apache.curator.framework.imps.CreateBuilderImpl$17.call(CreateBuilderImpl.java:1156) ~[curator-framework-4.3.0.jar:4.3.0]

at org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:67) ~[curator-client-4.3.0.jar:?]

at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:81) ~[curator-client-4.3.0.jar:?]

at org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:1153) ~[curator-framework-4.3.0.jar:4.3.0]

at org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:607) ~[curator-framework-4.3.0.jar:4.3.0]

at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:597) ~[curator-framework-4.3.0.jar:4.3.0]

at org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:51) ~[curator-framework-4.3.0.jar:4.3.0]

at org.apache.druid.curator.CuratorUtils.createIfNotExists(CuratorUtils.java:63) ~[druid-server-0.22.1.jar:0.22.1]

at org.apache.druid.indexing.worker.WorkerCuratorCoordinator.start(WorkerCuratorCoordinator.java:96) ~[druid-indexing-service-0.22.1.jar:0.22.1]

… 10 more

Thanks for trying that, and sorry to hear that you’re getting the same errors. I’m looking at your pod configurations and wondering if you might have too much memory for the MiddleManager? The docs generally recommend 128MiB.