All the prosses in the cluster stopped suddenly

Dear friends in Druid community:

I am deploying cluster mode in three nodes . I stared up master server in A node, stated up data server in B node, and started up broker server in C node.

Then I was ingesting streaming data in druid . Everything seemed to be going well. I can also query the datasource I am ingesting .

But a few minutes later , i encountered a problem . All the processes in the nodes was stopped . I did not know how it could happen. Please help me. The overload.log in A node is in the following.

2019-10-12T18:39:32,013 WARN [KafkaSupervisor-CrossDistrict] org.apache.druid.indexing.common.task.CompactionTask - keepSegmentGranularity is deprecated. Set a proper segmentGranularity instead

2019-10-12T18:39:32,022 WARN [KafkaSupervisor-FenceHistoryCross] org.apache.druid.indexing.common.task.CompactionTask - keepSegmentGranularity is deprecated. Set a proper segmentGranularity instead

2019-10-12T18:39:32,025 WARN [KafkaSupervisor-VirtualStation] org.apache.druid.indexing.common.task.CompactionTask - keepSegmentGranularity is deprecated. Set a proper segmentGranularity instead

2019-10-12T18:39:32,036 INFO [KafkaSupervisor-CrossDistrict] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - [CrossDistrict] supervisor is running.

2019-10-12T18:39:32,036 INFO [KafkaSupervisor-CrossDistrict] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - {id=‘CrossDistrict’, generationTime=2019-10-12T10:39:32.036Z, payload=KafkaSupervisorReportPayload{dataSource=‘CrossDistrict’, topic=‘Cross_District’, partitions=3, replicas=2, durationSeconds=600, active=[{id=‘index_kafka_CrossDistrict_31f259431b24ba3_koghokai’, startTime=null, remainingSeconds=null}, {id=‘index_kafka_CrossDistrict_31f259431b24ba3_bebmfiod’, startTime=2019-10-12T10:32:27.839Z, remainingSeconds=175}], publishing=, suspended=false}}

2019-10-12T18:39:32,042 INFO [KafkaSupervisor-FenceHistoryCross] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - [FenceHistoryCross] supervisor is running.

2019-10-12T18:39:32,042 INFO [KafkaSupervisor-FenceHistoryCross] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - {id=‘FenceHistoryCross’, generationTime=2019-10-12T10:39:32.042Z, payload=KafkaSupervisorReportPayload{dataSource=‘FenceHistoryCross’, topic=‘FENCE_HISTORY_CROSS’, partitions=3, replicas=2, durationSeconds=600, active=[{id=‘index_kafka_FenceHistoryCross_1baa2099ea05d2a_fmocbjbo’, startTime=2019-10-12T10:32:28.019Z, remainingSeconds=175}, {id=‘index_kafka_FenceHistoryCross_1baa2099ea05d2a_dabjnonh’, startTime=null, remainingSeconds=null}], publishing=, suspended=false}}

2019-10-12T18:39:32,049 INFO [KafkaSupervisor-VirtualStation] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - [VirtualStation] supervisor is running.

2019-10-12T18:39:32,049 INFO [KafkaSupervisor-VirtualStation] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - {id=‘VirtualStation’, generationTime=2019-10-12T10:39:32.049Z, payload=KafkaSupervisorReportPayload{dataSource=‘VirtualStation’, topic=‘VIRTUAL_STATION’, partitions=3, replicas=2, durationSeconds=600, active=[{id=‘index_kafka_VirtualStation_960bb30c502a3f1_legcncch’, startTime=null, remainingSeconds=null}, {id=‘index_kafka_VirtualStation_960bb30c502a3f1_mdidmamo’, startTime=2019-10-12T10:33:58.286Z, remainingSeconds=266}], publishing=, suspended=false}}

2019-10-12T18:39:33,906 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle - Lifecycle [module] running shutdown hook

2019-10-12T18:39:33,906 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]

2019-10-12T18:39:33,908 INFO [Thread-61] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannouncing [DiscoveryDruidNode{druidNode=DruidNode{serviceName=‘druid/coordinator’, host=‘ecs-xsda-prod-mrs01’, bindOnHost=false, port=-1, plaintextPort=8081, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType=‘OVERLORD’, services={}}].

2019-10-12T18:39:33,909 INFO [Thread-61] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/internal-discovery/OVERLORD/ecs-xsda-prod-mrs01:8081]

2019-10-12T18:39:33,921 INFO [Thread-61] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannounced [DiscoveryDruidNode{druidNode=DruidNode{serviceName=‘druid/coordinator’, host=‘ecs-xsda-prod-mrs01’, bindOnHost=false, port=-1, plaintextPort=8081, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType=‘OVERLORD’, services={}}].

2019-10-12T18:39:33,921 INFO [Thread-61] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannouncing [DiscoveryDruidNode{druidNode=DruidNode{serviceName=‘druid/coordinator’, host=‘ecs-xsda-prod-mrs01’, bindOnHost=false, port=-1, plaintextPort=8081, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType=‘COORDINATOR’, services={}}].

2019-10-12T18:39:33,921 INFO [Thread-61] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/internal-discovery/COORDINATOR/ecs-xsda-prod-mrs01:8081]

2019-10-12T18:39:33,921 INFO [NodeTypeWatcher[OVERLORD]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeTypeWatcher - Node[ecs-xsda-prod-mrs01:8081:DiscoveryDruidNode{druidNode=DruidNode{serviceName=‘druid/coordinator’, host=‘ecs-xsda-prod-mrs01’, bindOnHost=false, port=-1, plaintextPort=8081, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType=‘OVERLORD’, services={}}] disappeared.

2019-10-12T18:39:33,924 INFO [Thread-61] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannounced [DiscoveryDruidNode{druidNode=DruidNode{serviceName=‘druid/coordinator’, host=‘ecs-xsda-prod-mrs01’, bindOnHost=false, port=-1, plaintextPort=8081, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType=‘COORDINATOR’, services={}}].

2019-10-12T18:39:33,924 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.announcement.Announcer.stop()] on object[org.apache.druid.curator.announcement.Announcer@7c421952].

2019-10-12T18:39:33,924 INFO [Thread-61] org.apache.druid.curator.announcement.Announcer - Stopping announcer

2019-10-12T18:39:33,925 INFO [NodeTypeWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeTypeWatcher - Node[ecs-xsda-prod-mrs01:8081:DiscoveryDruidNode{druidNode=DruidNode{serviceName=‘druid/coordinator’, host=‘ecs-xsda-prod-mrs01’, bindOnHost=false, port=-1, plaintextPort=8081, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType=‘COORDINATOR’, services={}}] disappeared.

2019-10-12T18:39:33,925 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER]

2019-10-12T18:39:33,925 INFO [Thread-61] org.apache.druid.server.initialization.jetty.JettyServerModule - Stopping Jetty Server…

2019-10-12T18:39:33,932 INFO [Thread-61] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@6f289728{HTTP/1.1,[http/1.1]}{0.0.0.0:8081}

2019-10-12T18:39:33,932 INFO [Thread-61] org.eclipse.jetty.server.session - node0 Stopped scavenging

2019-10-12T18:39:33,934 INFO [Thread-61] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@1cc708a7{/,jar:file:/opt/app/apache-druid-0.15.1-incubating/lib/druid-console-0.15.1-incubating.jar!/org/apache/druid/console,UNAVAILABLE}

2019-10-12T18:39:33,937 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [NORMAL]

2019-10-12T18:39:33,938 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.discovery.DruidLeaderClient.stop()] on object[org.apache.druid.discovery.DruidLeaderClient@629aa21f].

2019-10-12T18:39:33,938 INFO [Thread-61] org.apache.druid.discovery.DruidLeaderClient - Stopped.

2019-10-12T18:39:33,938 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[org.apache.druid.curator.discovery.ServerDiscoverySelector@1bd3808b].

2019-10-12T18:39:33,940 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.indexing.overlord.TaskMaster.stop()] on object[org.apache.druid.indexing.overlord.TaskMaster@50f4b83d].

2019-10-12T18:39:33,940 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [task-master] stage [ANNOUNCEMENTS]

2019-10-12T18:39:33,940 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [task-master] stage [SERVER]

2019-10-12T18:39:33,940 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [task-master] stage [NORMAL]

2019-10-12T18:39:33,940 INFO [Thread-61] org.apache.druid.curator.discovery.CuratorServiceAnnouncer - Unannouncing service[DruidNode{serviceName=‘druid/overlord’, host=‘ecs-xsda-prod-mrs01’, bindOnHost=false, port=-1, plaintextPort=8081, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}]

2019-10-12T18:39:33,944 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.indexing.overlord.helpers.OverlordHelperManager.stop()] on object[org.apache.druid.indexing.overlord.helpers.OverlordHelperManager@f5951e9].

2019-10-12T18:39:33,944 INFO [Thread-61] org.apache.druid.indexing.overlord.helpers.OverlordHelperManager - OverlordHelperManager is stopping.

2019-10-12T18:39:33,944 INFO [Thread-61] org.apache.druid.indexing.overlord.helpers.OverlordHelperManager - OverlordHelperManager is stopped.

2019-10-12T18:39:33,944 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.indexing.overlord.supervisor.SupervisorManager.stop()] on object[org.apache.druid.indexing.overlord.supervisor.SupervisorManager@29ccba45].

2019-10-12T18:39:33,944 INFO [Thread-61] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - Beginning shutdown of [KafkaSupervisor-CrossDistrict]

2019-10-12T18:39:33,944 INFO [Thread-61] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - Posting ShutdownNotice

2019-10-12T18:39:33,948 INFO [Thread-61] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - Shutdown notice handled

2019-10-12T18:39:33,948 INFO [Thread-61] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - [KafkaSupervisor-CrossDistrict] has stopped

2019-10-12T18:39:33,948 INFO [Thread-61] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - Beginning shutdown of [KafkaSupervisor-FenceHistoryCross]

2019-10-12T18:39:33,948 INFO [Thread-61] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - Posting ShutdownNotice

2019-10-12T18:39:33,950 INFO [Thread-61] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - Shutdown notice handled

2019-10-12T18:39:33,950 INFO [Thread-61] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - [KafkaSupervisor-FenceHistoryCross] has stopped

2019-10-12T18:39:33,950 INFO [Thread-61] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - Beginning shutdown of [KafkaSupervisor-VirtualStation]

2019-10-12T18:39:33,950 INFO [Thread-61] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - Posting ShutdownNotice

2019-10-12T18:39:33,953 INFO [Thread-61] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - Shutdown notice handled

2019-10-12T18:39:33,953 INFO [Thread-61] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - [KafkaSupervisor-VirtualStation] has stopped

2019-10-12T18:39:33,953 INFO [Thread-61] org.apache.druid.indexing.overlord.supervisor.SupervisorManager - SupervisorManager stopped.

2019-10-12T18:39:33,953 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.indexing.overlord.TaskQueue.stop()] on object[org.apache.druid.indexing.overlord.TaskQueue@2bc63db7].

2019-10-12T18:39:33,953 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.indexing.overlord.RemoteTaskRunner.stop()] on object[org.apache.druid.indexing.overlord.RemoteTaskRunner@4c6c559e].

2019-10-12T18:39:33,953 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.TaskQueue - Interrupted, exiting!

2019-10-12T18:39:33,953 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [task-master] stage [INIT]

2019-10-12T18:39:33,955 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.indexing.overlord.MetadataTaskStorage.stop()] on object[org.apache.druid.indexing.overlord.MetadataTaskStorage@dc59ec2].

2019-10-12T18:39:33,955 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.server.coordinator.DruidCoordinator.stop()] on object[org.apache.druid.server.coordinator.DruidCoordinator@615b5480].

2019-10-12T18:39:33,956 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.discovery.DruidLeaderClient.stop()] on object[org.apache.druid.discovery.DruidLeaderClient@5618fc1f].

2019-10-12T18:39:33,956 INFO [Thread-61] org.apache.druid.discovery.DruidLeaderClient - Stopped.

2019-10-12T18:39:33,956 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[org.apache.druid.curator.discovery.ServerDiscoverySelector@113dcaf8].

2019-10-12T18:39:33,956 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider.stop()] on object[org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider@423f8a73].

2019-10-12T18:39:33,956 INFO [Thread-61] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - stopping

2019-10-12T18:39:33,956 INFO [Thread-61] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - stopped

2019-10-12T18:39:33,956 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.http.client.NettyHttpClient.stop()] on object[org.apache.druid.java.util.http.client.NettyHttpClient@1491cd6c].

2019-10-12T18:39:33,974 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.metadata.SQLMetadataRuleManager.stop()] on object[org.apache.druid.metadata.SQLMetadataRuleManager@57b9389f].

2019-10-12T18:39:33,975 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.metadata.SQLMetadataSegmentManager.stop()] on object[org.apache.druid.metadata.SQLMetadataSegmentManager@3751acd7].

2019-10-12T18:39:33,976 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.client.AbstractCuratorServerInventoryView.stop() throws java.io.IOException] on object[org.apache.druid.client.BatchServerInventoryView@6c8f4bc7].

2019-10-12T18:39:33,976 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.storage.hdfs.HdfsStorageAuthentication.stop()] on object[org.apache.druid.storage.hdfs.HdfsStorageAuthentication@687eb455].

2019-10-12T18:39:33,976 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.metrics.MonitorScheduler.stop()] on object[org.apache.druid.java.util.metrics.MonitorScheduler@e26a3df].

2019-10-12T18:39:33,976 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.java.util.emitter.service.ServiceEmitter.close() throws java.io.IOException] on object[ServiceEmitter{serviceDimensions={service=druid/coordinator, host=ecs-xsda-prod-mrs01:8081, version=0.15.1-incubating}, emitter=org.apache.druid.java.util.emitter.core.NoopEmitter@49fb0bbd}].

2019-10-12T18:39:33,976 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.common.config.ConfigManager.stop()] on object[org.apache.druid.common.config.ConfigManager@1471b98d].

2019-10-12T18:39:33,972 ERROR [main-EventThread] org.apache.curator.framework.imps.CuratorFrameworkImpl - Watcher exception

java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@14375ba3 rejected from java.util.concurrent.ThreadPoolExecutor@4f396d88[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 290]

at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) ~[?:1.8.0_222]

at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_222]

at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) ~[?:1.8.0_222]

at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) ~[?:1.8.0_222]

at org.apache.druid.server.coordinator.CuratorLoadQueuePeon.executeCallbacks(CuratorLoadQueuePeon.java:453) ~[druid-server-0.15.1-incubating.jar:0.15.1-incubating]

at org.apache.druid.server.coordinator.CuratorLoadQueuePeon.actionCompleted(CuratorLoadQueuePeon.java:314) ~[druid-server-0.15.1-incubating.jar:0.15.1-incubating]

at org.apache.druid.server.coordinator.CuratorLoadQueuePeon.entryRemoved(CuratorLoadQueuePeon.java:348) ~[druid-server-0.15.1-incubating.jar:0.15.1-incubating]

at org.apache.druid.server.coordinator.CuratorLoadQueuePeon.access$600(CuratorLoadQueuePeon.java:66) ~[druid-server-0.15.1-incubating.jar:0.15.1-incubating]

at org.apache.druid.server.coordinator.CuratorLoadQueuePeon$SegmentChangeProcessor.lambda$run$0(CuratorLoadQueuePeon.java:239) ~[druid-server-0.15.1-incubating.jar:0.15.1-incubating]

at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:83) [curator-framework-4.1.0.jar:4.1.0]

at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531) [zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]

at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) [zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]

2019-10-12T18:39:33,978 ERROR [main-EventThread] org.apache.druid.curator.CuratorModule - Unhandled error in Curator Framework

java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@14375ba3 rejected from java.util.concurrent.ThreadPoolExecutor@4f396d88[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 290]

at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) ~[?:1.8.0_222]

at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_222]

at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) ~[?:1.8.0_222]

at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) ~[?:1.8.0_222]

at org.apache.druid.server.coordinator.CuratorLoadQueuePeon.executeCallbacks(CuratorLoadQueuePeon.java:453) ~[druid-server-0.15.1-incubating.jar:0.15.1-incubating]

at org.apache.druid.server.coordinator.CuratorLoadQueuePeon.actionCompleted(CuratorLoadQueuePeon.java:314) ~[druid-server-0.15.1-incubating.jar:0.15.1-incubating]

at org.apache.druid.server.coordinator.CuratorLoadQueuePeon.entryRemoved(CuratorLoadQueuePeon.java:348) ~[druid-server-0.15.1-incubating.jar:0.15.1-incubating]

at org.apache.druid.server.coordinator.CuratorLoadQueuePeon.access$600(CuratorLoadQueuePeon.java:66) ~[druid-server-0.15.1-incubating.jar:0.15.1-incubating]

at org.apache.druid.server.coordinator.CuratorLoadQueuePeon$SegmentChangeProcessor.lambda$run$0(CuratorLoadQueuePeon.java:239) ~[druid-server-0.15.1-incubating.jar:0.15.1-incubating]

at org.apache.curator.framework.imps.NamespaceWatcher.process(NamespaceWatcher.java:83) [curator-framework-4.1.0.jar:4.1.0]

at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:531) [zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]

at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) [zookeeper-3.4.11.jar:3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0]

2019-10-12T18:39:33,978 INFO [main-EventThread] org.apache.druid.java.util.common.lifecycle.Lifecycle - Lifecycle [module] already stopped and stop was called. Silently skipping

2019-10-12T18:39:33,982 INFO [Thread-61] org.apache.druid.curator.CuratorModule - Stopping Curator

2019-10-12T18:39:33,982 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting

2019-10-12T18:39:33,986 INFO [Thread-61] org.apache.zookeeper.ZooKeeper - Session: 0x27000006ba4f32e1 closed

2019-10-12T18:39:33,986 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [INIT]

2019-10-12T18:39:33,986 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x27000006ba4f32e1

2019-10-12T18:39:33,986 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void org.apache.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner.stop()] on object[org.apache.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner@4ffced4e].

HI Scoffi,

[1] Do you mean all the druid services were stopped? i.e

Overload

Coordinator

Middle manager

Historical

Broker

?

[2] Or you had an issue with your ingestion jobs only?

if [1] is the case, you may would like to look into the druid service logs to see what had happened.

You may find the druid service logs on the server nodes under :

<DRUID_HOME>/var/sv/

eg:

…/apache-druid-0.16.0-incubating/var/sv/

Thanks,

Vaibhav

Hi Vaibhav,

Thank you for helping me. I mean al the druid services were stopped including overload , coordinator, middle manager, historical and broker.

I am using the 0.15.0 version. and I have see all the logs, I have not found the way to solve it.

Vaibhav Vaibhav vaibhav@imply.io 于2019年10月14日周一 上午11:17写道:

So you mean you do not see anything in the server log? Could you please attach one log from the master server ( Overload log) and one from the data server (Historical log) and approx time when your services went down.

Thanks,

Vaibhav

Hi,Vaibhav:
I have posted the overload log on the top message. I have viewed the historical log,but i did not found any log in the time when my serviced wend down.

Vaibhav Vaibhav vaibhav@imply.io 于2019年10月14日周一 上午11:45写道:

Hi Scoffi,

From the log you have pasted it seems a graceful shutdown of Overlord :

2019-10-12T18:39:33,906 INFO [Thread-61] org.apache.druid.java.util.common.lifecycle.Lifecycle - Lifecycle [module] running shutdown hook

2019-10-12T18:39:33,953 INFO [Thread-61] org.apache.druid.indexing.overlord.supervisor.SupervisorManager - SupervisorManager stopped.

Did someone stopped/re-started service accidentally by any chance?

Are you facing this issue often?

Could you attach other services logs as well?

Thanks,

Vaibhav

Dear Vaibhav:

Thank you for helping me. You are so warmhearted that i am touched by your patience. No one stopped the service. I am often facing the issue.

Every time i started up all the service,but it would be shut dow a few minutes later. Is it could belong to zookeeper’s issue. I did not use the master’s zookeeper . I just use the outer zookeeper cluster.

I have started up all the server a few hours ago,I am sorry that i am in Chengdu China,Maybe you are fall asleep right now.

Now i have attached the newest logs to you, including broker.log , coordinator.log ,historical.log , router.log and middleManager.log to you.

And i also attached the configuration in the attachment. I hope we can solve the problem . Thanks

Vaibhav Vaibhav vaibhav@imply.io 于2019年10月14日周一 下午9:18写道:

broker.log (105 KB)

coordinator-overlord.log (494 KB)

historical.log (120 KB)

router.log (103 KB)

middleManager.log (88.9 KB)

common.runtime.properties (4.72 KB)

hi,Vaibhav:

the picture above is the property of the druid cluster.

I have attch the machine log in the attchments. Please help me,thank you.

The time point of the issue is from Oct 14 18:44:36 to Oct 14 18:48:07.Then you can find the corresponding point in time in the messages and secure log.

Vaibhav Vaibhav vaibhav@imply.io 于2019年10月14日周一 下午9:18写道:

messages (23.7 KB)

secure (8.14 KB)

Starting the druid service with Screen resolved this issue.

https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!msg/druid-user/CFm9-vbp1fs/DkEWiUCdDQAJ

Thanks,

Vaibhav