[druid-user] Historical node not loading any segments

Hi everyone,

On druid 0.16, our production cluster is working fine, but on our test cluster I recently noticed an issue with batch index tasks changed data not showing up in queries, and the coordinator loadStatus was never 100% and the Druid UI showed 20 segments unavailable and 6 segments to drop.

I’ve tried restarting both the coordinator (only 1 coordinator in this cluster) and the historical node (only 1 historical node in cluster), which did not help. Then I blew away the historical node and brought up a new one, and now it is not loading any segments.

The UI is showing that the datasource has 1066 segments to load:

But the loadQueue is empty and the historical server isn’t loading any segments:

These numbers are all staying constant.

Here are coordinator logs:

2021-05-14T18:10:03,418 INFO [org.apache.druid.metadata.SQLMetadataSegmentManager-Exec–0] org.apache.druid.metadata.SQLMetadataSegmentManager - Polled and found 1,079 segments in the database
2021-05-14T18:10:06,909 INFO [KafkaSupervisor-campaignsummary] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - [campaignsummary] supervisor is running.
2021-05-14T18:10:21,936 INFO [KafkaSupervisor-campaignsummary-Reporting-0] org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=consumer-1, groupId=kafka-supervisor-gbcmcgcn] Resetting offset for partition campaignsummary-0 to offset 1839.
2021-05-14T18:10:36,910 INFO [KafkaSupervisor-campaignsummary] org.apache.druid.indexing.seekablestream.supervisor.SeekableStreamSupervisor - [campaignsummary] supervisor is running.
2021-05-14T18:10:51,913 INFO [KafkaSupervisor-campaignsummary-Reporting-0] org.apache.kafka.clients.consumer.internals.Fetcher - [Consumer clientId=consumer-1, groupId=kafka-supervisor-gbcmcgcn] Resetting offset for partition campaignsummary-0 to offset 1839.
2021-05-14T18:11:01,278 INFO [DatabaseRuleManager-Exec–0] org.apache.druid.metadata.SQLMetadataRuleManager - Polled and found 2 rule(s) for 2 datasource(s)
2021-05-14T18:11:01,351 INFO [TaskQueue-StorageSync] org.apache.druid.indexing.overlord.TaskQueue - Synced 1 tasks from storage (0 tasks added, 0 tasks removed).
2021-05-14T18:11:03,239 INFO [LookupCoordinatorManager–0] org.apache.druid.server.lookup.cache.LookupCoordinatorManager - Not updating lookups because no data exists
2021-05-14T18:11:03,480 INFO [org.apache.druid.metadata.SQLMetadataSegmentManager-Exec–0] org.apache.druid.metadata.SQLMetadataSegmentManager - Polled and found 1,079 segments in the database

And on the historical server there are no logs at all after startup.

I also verified that both the historical and the coordinator have network access to zookeeper, using telnet to each of the 3 zookeeper nodes.

Can anyone help me figure out what is going on in this cluster?

Thank you,
Michael

1066 segments to load - there must be some connection to the Norman Invasion and the Battle of Hastings.
The historical must be King Harold…

No logs on historical server is odd. Does the druid user (that druid is running as) have permissions to write to the log directories,
and other necessary directories?

If so, then what is maxSegmentsToMove set to, and have you verified that the druid user can access the segments on deep storage?
But no logs makes me think there’s a permission issue on the historical.

Sorry, I should clarify, the Historical server is writing logs, for example on startup, and after the first couple queries when it creates buffers, but then nothing after that, so there are no issues writing logs:

2021-05-14T17:00:34,090 INFO [main] org.eclipse.jetty.util.log - Logging initialized @5565ms to org.eclipse.jetty.util.log.Slf4jLog
2021-05-14T17:00:34,103 INFO [main] org.apache.druid.server.initialization.jetty.JettyServerModule - Creating http connector with port [9081]
2021-05-14T17:00:34,220 INFO [main] org.apache.druid.guice.JsonConfigurator - Loaded class[class org.apache.druid.query.lookup.LookupListeningAnnouncerConfig] from props[druid.lookup.] as [ListeningAnnouncerConfig{listenersPath=’/druid_odyssey/listeners’}]
2021-05-14T17:00:34,225 INFO [main] org.apache.druid.guice.JsonConfigurator - Loaded class[class org.apache.druid.query.lookup.LookupConfig] from props[druid.lookup.] as [LookupConfig{snapshotWorkingDir=’’, enableLookupSyncOnStartup=true, numLookupLoadingThreads=1, coordinatorFetchRetries=3, lookupStartRetries=3, coordinatorRetryDelay=60000}]
2021-05-14T17:00:34,228 INFO [main] org.apache.druid.guice.JsonConfigurator - Loaded class[class org.apache.druid.guice.http.DruidHttpClientConfig] from props[druid.global.http.] as [org.apache.druid.guice.http.DruidHttpClientConfig@7015ebef]
2021-05-14T17:00:34,302 INFO [main] org.apache.druid.guice.JsonConfigurator - Loaded class[class org.apache.druid.client.coordinator.CoordinatorSelectorConfig] from props[druid.selectors.coordinator.] as [org.apache.druid.client.coordinator.CoordinatorSelectorConfig@73608eb0]
2021-05-14T17:00:34,322 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [INIT]
2021-05-14T17:00:34,322 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner.start()] on object[org.apache.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner@19ae2ee5].
2021-05-14T17:00:34,323 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [NORMAL]
2021-05-14T17:00:34,323 INFO [main] org.apache.druid.curator.CuratorModule - Starting Curator
2021-05-14T17:00:34,323 INFO [main] org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting
2021-05-14T17:00:34,331 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:zookeeper.version=3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
2021-05-14T17:00:34,331 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:host.name=ip-10-0-5-119.ec2.internal
2021-05-14T17:00:34,331 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.version=1.8.0_282
2021-05-14T17:00:34,331 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Red Hat, Inc.
2021-05-14T17:00:34,331 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.282.b08-1.amzn2.0.1.x86_64/jre
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.class.path=/usr/share/druid/conf/druid/cluster/_common:/usr/share/druid/conf/druid/cluster/data/historical:/usr/share/druid/lib/druid-services-0.16.0-incubating.jar:/usr/share/druid/lib/druid-core-0.16.0-incubating.jar:/usr/share/druid/lib/commons-io-2.6.jar:/usr/share/druid/lib/commons-lang-2.6.jar:/usr/share/druid/lib/commons-compress-1.18.jar:/usr/share/druid/lib/config-magic-0.9.jar:/usr/share/druid/lib/hibernate-validator-5.1.3.Final.jar:/usr/share/druid/lib/validation-api-1.1.0.Final.jar:/usr/share/druid/lib/jboss-logging-3.1.3.GA.jar:/usr/share/druid/lib/classmate-1.0.0.jar:/usr/share/druid/lib/javax.el-api-3.0.0.jar:/usr/share/druid/lib/javax.el-3.0.0.jar:/usr/share/druid/lib/jaxb-api-2.3.1.jar:/usr/share/druid/lib/javax.activation-api-1.2.0.jar:/usr/share/druid/lib/guava-16.0.1.jar:/usr/share/druid/lib/jackson-core-2.6.7.jar:/usr/share/druid/lib/jackson-annotations-2.6.7.jar:/usr/share/druid/lib/jackson-databind-2.6.7.jar:/usr/share/druid/lib/jackson-datatype-guava-2.6.7.jar:/usr/share/druid/lib/jackson-datatype-joda-2.6.7.jar:/usr/share/druid/lib/joda-time-2.10.2.jar:/usr/share/druid/lib/jackson-dataformat-smile-2.6.7.jar:/usr/share/druid/lib/guice-4.1.0.jar:/usr/share/druid/lib/javax.inject-1.jar:/usr/share/druid/lib/aopalliance-1.0.jar:/usr/share/druid/lib/guice-multibindings-4.1.0.jar:/usr/share/druid/lib/jdbi-2.63.1.jar:/usr/share/druid/lib/log4j-api-2.5.jar:/usr/share/druid/lib/log4j-core-2.5.jar:/usr/share/druid/lib/log4j-slf4j-impl-2.5.jar:/usr/share/druid/lib/slf4j-api-1.7.25.jar:/usr/share/druid/lib/log4j-jul-2.5.jar:/usr/share/druid/lib/log4j-1.2-api-2.5.jar:/usr/share/druid/lib/jcl-over-slf4j-1.7.12.jar:/usr/share/druid/lib/airline-0.7.jar:/usr/share/druid/lib/jackson-jq-0.0.10.jar:/usr/share/druid/lib/joni-2.1.27.jar:/usr/share/druid/lib/jcodings-1.0.43.jar:/usr/share/druid/lib/fastutil-8.2.3.jar:/usr/share/druid/lib/netty-buffer-4.1.30.Final.jar:/usr/share/druid/lib/netty-common-4.1.30.Final.jar:/usr/share/druid/lib/netty-codec-http-4.1.30.Final.jar:/usr/share/druid/lib/netty-codec-4.1.30.Final.jar:/usr/share/druid/lib/netty-transport-4.1.30.Final.jar:/usr/share/druid/lib/netty-resolver-4.1.30.Final.jar:/usr/share/druid/lib/opencsv-4.6.jar:/usr/share/druid/lib/commons-lang3-3.8.1.jar:/usr/share/druid/lib/commons-text-1.3.jar:/usr/share/druid/lib/commons-beanutils-1.9.3.jar:/usr/share/druid/lib/commons-logging-1.1.1.jar:/usr/share/druid/lib/commons-collections-3.2.2.jar:/usr/share/druid/lib/commons-collections4-4.2.jar:/usr/share/druid/lib/rhino-1.7.11.jar:/usr/share/druid/lib/xz-1.8.jar:/usr/share/druid/lib/zstd-jni-1.3.3-1.jar:/usr/share/druid/lib/json-path-2.3.0.jar:/usr/share/druid/lib/json-smart-2.3.jar:/usr/share/druid/lib/accessors-smart-1.2.jar:/usr/share/druid/lib/asm-7.1.jar:/usr/share/druid/lib/antlr4-runtime-4.5.1.jar:/usr/share/druid/lib/disruptor-3.3.6.jar:/usr/share/druid/lib/jsr305-2.0.1.jar:/usr/share/druid/lib/jna-4.5.1.jar:/usr/share/druid/lib/async-http-client-2.5.3.jar:/usr/share/druid/lib/async-http-client-netty-utils-2.5.3.jar:/usr/share/druid/lib/javax.activation-1.2.0.jar:/usr/share/druid/lib/netty-handler-4.1.30.Final.jar:/usr/share/druid/lib/netty-codec-socks-4.1.30.Final.jar:/usr/share/druid/lib/netty-handler-proxy-4.1.30.Final.jar:/usr/share/druid/lib/netty-transport-native-epoll-4.1.30.Final-linux-x86_64.jar:/usr/share/druid/lib/netty-transport-native-unix-common-4.1.30.Final.jar:/usr/share/druid/lib/netty-resolver-dns-4.1.30.Final.jar:/usr/share/druid/lib/netty-codec-dns-4.1.30.Final.jar:/usr/share/druid/lib/reactive-streams-1.0.2.jar:/usr/share/druid/lib/netty-reactive-streams-2.0.0.jar:/usr/share/druid/lib/sigar-1.6.5.132.jar:/usr/share/druid/lib/jvm-attach-api-1.5.jar:/usr/share/druid/lib/netty-3.10.6.Final.jar:/usr/share/druid/lib/error_prone_annotations-2.3.2.jar:/usr/share/druid/lib/druid-processing-0.16.0-incubating.jar:/usr/share/druid/lib/druid-hll-0.16.0-incubating.jar:/usr/share/druid/lib/extendedset-0.16.0-incubating.jar:/usr/share/druid/lib/RoaringBitmap-0.8.6.jar:/usr/share/druid/lib/shims-0.8.6.jar:/usr/share/druid/lib/compress-lzf-1.0.4.jar:/usr/share/druid/lib/commons-net-3.6.jar:/usr/share/druid/lib/icu4j-54.1.1.jar:/usr/share/druid/lib/lz4-java-1.6.0.jar:/usr/share/druid/lib/asm-commons-7.1.jar:/usr/share/druid/lib/asm-tree-7.1.jar:/usr/share/druid/lib/asm-analysis-7.1.jar:/usr/share/druid/lib/checker-qual-2.5.7.jar:/usr/share/druid/lib/maven-artifact-3.6.0.jar:/usr/share/druid/lib/plexus-utils-3.0.15.jar:/usr/share/druid/lib/druid-server-0.16.0-incubating.jar:/usr/share/druid/lib/druid-aws-common-0.16.0-incubating.jar:/usr/share/druid/lib/aws-java-sdk-ec2-1.11.199.jar:/usr/share/druid/lib/aws-java-sdk-core-1.11.199.jar:/usr/share/druid/lib/httpclient-4.5.3.jar:/usr/share/druid/lib/httpcore-4.4.11.jar:/usr/share/druid/lib/commons-codec-1.12.jar:/usr/share/druid/lib/ion-java-1.0.2.jar:/usr/share/druid/lib/jackson-dataformat-cbor-2.6.7.jar:/usr/share/druid/lib/jmespath-java-1.11.199.jar:/usr/share/druid/lib/aws-java-sdk-s3-1.11.199.jar:/usr/share/druid/lib/aws-java-sdk-kms-1.11.199.jar:/usr/share/druid/lib/druid-gcp-common-0.16.0-incubating.jar:/usr/share/druid/lib/google-api-client-1.22.0.jar:/usr/share/druid/lib/google-oauth-client-1.22.0.jar:/usr/share/druid/lib/google-http-client-1.22.0.jar:/usr/share/druid/lib/google-http-client-jackson2-1.22.0.jar:/usr/share/druid/lib/jackson-module-guice-2.6.7.jar:/usr/share/druid/lib/druid-console-0.16.0-incubating.jar:/usr/share/druid/lib/zookeeper-3.4.14.jar:/usr/share/druid/lib/jline-0.9.94.jar:/usr/share/druid/lib/audience-annotations-0.5.0.jar:/usr/share/druid/lib/curator-framework-4.1.0.jar:/usr/share/druid/lib/curator-client-4.1.0.jar:/usr/share/druid/lib/curator-x-discovery-4.1.0.jar:/usr/share/druid/lib/curator-recipes-4.1.0.jar:/usr/share/druid/lib/jackson-mapper-asl-1.9.13.jar:/usr/share/druid/lib/jackson-core-asl-1.9.13.jar:/usr/share/druid/lib/jackson-jaxrs-json-provider-2.6.7.jar:/usr/share/druid/lib/jackson-jaxrs-base-2.6.7.jar:/usr/share/druid/lib/jackson-module-jaxb-annotations-2.6.7.jar:/usr/share/druid/lib/jackson-jaxrs-smile-provider-2.6.7.jar:/usr/share/druid/lib/jersey-server-1.19.3.jar:/usr/share/druid/lib/jersey-core-1.19.3.jar:/usr/share/druid/lib/jsr311-api-1.1.1.jar:/usr/share/druid/lib/guice-servlet-4.1.0.jar:/usr/share/druid/lib/jersey-guice-1.19.3.jar:/usr/share/druid/lib/jersey-servlet-1.19.3.jar:/usr/share/druid/lib/jetty-server-9.4.10.v20180503.jar:/usr/share/druid/lib/javax.servlet-api-3.1.0.jar:/usr/share/druid/lib/jetty-http-9.4.10.v20180503.jar:/usr/share/druid/lib/jetty-util-9.4.10.v20180503.jar:/usr/share/druid/lib/jetty-io-9.4.10.v20180503.jar:/usr/share/druid/lib/jetty-proxy-9.4.10.v20180503.jar:/usr/share/druid/lib/jetty-client-9.4.10.v20180503.jar:/usr/share/druid/lib/tesla-aether-0.0.5.jar:/usr/share/druid/lib/aether-api-0.9.0.M2.jar:/usr/share/druid/lib/aether-spi-0.9.0.M2.jar:/usr/share/druid/lib/aether-util-0.9.0.M2.jar:/usr/share/druid/lib/aether-impl-0.9.0.M2.jar:/usr/share/druid/lib/aether-connector-file-0.9.0.M2.jar:/usr/share/druid/lib/aether-connector-okhttp-0.0.9.jar:/usr/share/druid/lib/okhttp-1.0.2.jar:/usr/share/druid/lib/wagon-provider-api-2.4.jar:/usr/share/druid/lib/maven-aether-provider-3.1.1.jar:/usr/share/druid/lib/maven-model-3.1.1.jar:/usr/share/druid/lib/maven-model-builder-3.1.1.jar:/usr/share/druid/lib/plexus-interpolation-1.19.jar:/usr/share/druid/lib/maven-repository-metadata-3.1.1.jar:/usr/share/druid/lib/maven-settings-builder-3.1.1.jar:/usr/share/druid/lib/maven-settings-3.1.1.jar:/usr/share/druid/lib/spymemcached-2.12.3.jar:/usr/share/druid/lib/jetty-servlet-9.4.10.v20180503.jar:/usr/share/druid/lib/jetty-security-9.4.10.v20180503.jar:/usr/share/druid/lib/jetty-servlets-9.4.10.v20180503.jar:/usr/share/druid/lib/jetty-continuation-9.4.10.v20180503.jar:/usr/share/druid/lib/derby-10.14.2.0.jar:/usr/share/druid/lib/derbynet-10.14.2.0.jar:/usr/share/druid/lib/derbyclient-10.14.2.0.jar:/usr/share/druid/lib/commons-math3-3.6.1.jar:/usr/share/druid/lib/caffeine-2.5.5.jar:/usr/share/druid/lib/commons-dbcp2-2.0.1.jar:/usr/share/druid/lib/commons-pool2-2.2.jar:/usr/share/druid/lib/druid-indexing-hadoop-0.16.0-incubating.jar:/usr/share/druid/lib/druid-indexing-service-0.16.0-incubating.jar:/usr/share/druid/lib/metrics-core-4.0.0.jar:/usr/share/druid/lib/druid-sql-0.16.0-incubating.jar:/usr/share/druid/lib/calcite-core-1.17.0.jar:/usr/share/druid/lib/avatica-core-1.12.0.jar:/usr/share/druid/lib/avatica-metrics-1.12.0.jar:/usr/share/druid/lib/protobuf-java-3.1.0.jar:/usr/share/druid/lib/calcite-linq4j-1.17.0.jar:/usr/share/druid/lib/esri-geometry-api-2.0.0.jar:/usr/share/druid/lib/aggdesigner-algorithm-6.0.jar:/usr/share/druid/lib/janino-2.7.6.jar:/usr/share/druid/lib/commons-compiler-2.7.6.jar:/usr/share/druid/lib/avatica-server-1.12.0.jar:/usr/share/druid/lib/jaxb-runtime-2.3.1.jar:/usr/share/druid/lib/txw2-2.3.1.jar:/usr/share/druid/lib/istack-commons-runtime-3.0.7.jar:/usr/share/druid/lib/stax-ex-1.8.jar:/usr/share/druid/lib/FastInfoset-1.2.15.jar
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:os.name=Linux
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:os.version=4.14.231-173.361.amzn2.x86_64
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:user.name=root
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:user.home=/root
2021-05-14T17:00:34,332 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:user.dir=/
2021-05-14T17:00:34,333 INFO [main] org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=10.0.1.85:2181,10.0.2.224:2181,10.0.5.75:2181 sessionTimeout=30000 watcher=org.apache.curator.ConnectionState@d5e575c
2021-05-14T17:00:34,350 INFO [main] org.apache.curator.framework.imps.CuratorFrameworkImpl - Default schema
2021-05-14T17:00:34,350 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.java.util.emitter.core.HttpPostEmitter.start()] on object[HttpPostEmitter{config=HttpEmitterConfig{flushMillis=60000, flushCount=1000, flushTimeOut=9223372036854775807, basicAuthentication=‘null’, batchingStrategy=ARRAY, maxBatchSize=10383360, contentEncoding=null, batchQueueSizeLimit=100, httpTimeoutAllowanceFactor=8.0, minHttpTimeoutMillis=150, recipientBaseUrl=‘http://localhost:8000/’}}].
2021-05-14T17:00:34,352 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.java.util.emitter.service.ServiceEmitter.start()] on object[ServiceEmitter{serviceDimensions={service=druid/historical, host=ip-10-0-5-119.ec2.internal:9081, version=0.16.0-incubating}, emitter=HttpPostEmitter{config=HttpEmitterConfig{flushMillis=60000, flushCount=1000, flushTimeOut=9223372036854775807, basicAuthentication=‘null’, batchingStrategy=ARRAY, maxBatchSize=10383360, contentEncoding=null, batchQueueSizeLimit=100, httpTimeoutAllowanceFactor=8.0, minHttpTimeoutMillis=150, recipientBaseUrl=‘http://localhost:8000/’}}}].
2021-05-14T17:00:34,353 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.server.coordination.SegmentLoadDropHandler.start() throws java.io.IOException] on object[org.apache.druid.server.coordination.SegmentLoadDropHandler@12fcb2c3].
2021-05-14T17:00:34,353 INFO [main] org.apache.druid.server.coordination.SegmentLoadDropHandler - Starting…
2021-05-14T17:00:34,361 INFO [main] org.apache.druid.server.coordination.SegmentLoadDropHandler$BackgroundSegmentAnnouncer - Completed background segment announcing
2021-05-14T17:00:34,362 INFO [main] org.apache.druid.server.coordination.SegmentLoadDropHandler - Cache load took 9 ms
2021-05-14T17:00:34,362 INFO [main] org.apache.druid.server.coordination.CuratorDataSegmentServerAnnouncer - Announcing self[DruidServerMetadata{name=‘ip-10-0-5-119.ec2.internal:9081’, hostAndPort=‘ip-10-0-5-119.ec2.internal:9081’, hostAndTlsPort=‘null’, maxSize=60000000000, tier=’_default_tier’, type=historical, priority=0}] at [/druid_odyssey/announcements/ip-10-0-5-119.ec2.internal:9081]
2021-05-14T17:00:34,372 INFO [main-SendThread(ip-10-0-1-85.ec2.internal:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server ip-10-0-1-85.ec2.internal/10.0.1.85:2181. Will not attempt to authenticate using SASL (unknown error)
2021-05-14T17:00:34,379 INFO [main-SendThread(ip-10-0-1-85.ec2.internal:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to ip-10-0-1-85.ec2.internal/10.0.1.85:2181, initiating session
2021-05-14T17:00:34,383 INFO [main] org.apache.druid.server.coordination.SegmentLoadDropHandler - Started.
2021-05-14T17:00:34,383 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.java.util.metrics.MonitorScheduler.start()] on object[org.apache.druid.java.util.metrics.MonitorScheduler@5f395ce1].
2021-05-14T17:00:34,385 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.server.coordination.ZkCoordinator.start() throws java.io.IOException] on object[org.apache.druid.server.coordination.ZkCoordinator@78c262ba].
2021-05-14T17:00:34,385 INFO [main] org.apache.druid.server.coordination.ZkCoordinator - Starting zkCoordinator for server[ip-10-0-5-119.ec2.internal:9081]
2021-05-14T17:00:34,386 INFO [main-SendThread(ip-10-0-1-85.ec2.internal:2181)] org.apache.zookeeper.ClientCnxn - Session establishment complete on server ip-10-0-1-85.ec2.internal/10.0.1.85:2181, sessionid = 0x1001943531b033e, negotiated timeout = 30000
2021-05-14T17:00:34,396 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED
2021-05-14T17:00:34,422 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.java.util.http.client.NettyHttpClient.start()] on object[org.apache.druid.java.util.http.client.NettyHttpClient@8840c98].
2021-05-14T17:00:34,423 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider.start()] on object[org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider@273fa9e].
2021-05-14T17:00:34,423 INFO [main] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - starting
2021-05-14T17:00:34,423 INFO [main] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - started
2021-05-14T17:00:34,423 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.curator.discovery.ServerDiscoverySelector.start() throws java.lang.Exception] on object[org.apache.druid.curator.discovery.ServerDiscoverySelector@7ab2a07e].
2021-05-14T17:00:34,506 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.discovery.DruidLeaderClient.start()] on object[org.apache.druid.discovery.DruidLeaderClient@30f28b5].
2021-05-14T17:00:34,507 INFO [main] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - Creating NodeTypeWatcher for nodeType [COORDINATOR].
2021-05-14T17:00:34,511 INFO [main] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider - Created NodeTypeWatcher for nodeType [COORDINATOR].
2021-05-14T17:00:34,511 INFO [main] org.apache.druid.discovery.DruidLeaderClient - Started.
2021-05-14T17:00:34,511 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.query.lookup.LookupReferencesManager.start()] on object[org.apache.druid.query.lookup.LookupReferencesManager@2db86a7c].
2021-05-14T17:00:34,511 INFO [main] org.apache.druid.query.lookup.LookupReferencesManager - LookupExtractorFactoryContainerProvider is starting.
2021-05-14T17:00:34,579 INFO [NodeTypeWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeTypeWatcher - Node[ip-10-0-2-142.ec2.internal:9082:DiscoveryDruidNode{druidNode=DruidNode{serviceName=‘druid/coordinator’, host=‘ip-10-0-2-142.ec2.internal’, bindOnHost=false, port=-1, plaintextPort=9082, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType=‘COORDINATOR’, services={}}] appeared.
2021-05-14T17:00:34,579 INFO [NodeTypeWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeTypeWatcher - Received INITIALIZED in node watcher.
2021-05-14T17:00:34,611 WARN [main] org.apache.druid.query.lookup.LookupReferencesManager - No lookups found for tier [__default], response [org.apache.druid.java.util.http.client.response.StringFullResponseHolder@6075b369]
2021-05-14T17:00:34,612 INFO [main] org.apache.druid.query.lookup.LookupReferencesManager - Coordinator is unavailable. Loading saved snapshot instead
2021-05-14T17:00:34,612 INFO [main] org.apache.druid.query.lookup.LookupReferencesManager - No lookups to be loaded at this point
2021-05-14T17:00:34,612 INFO [main] org.apache.druid.query.lookup.LookupReferencesManager - LookupExtractorFactoryContainerProvider is started.
2021-05-14T17:00:34,612 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.server.listener.announcer.ListenerResourceAnnouncer.start()] on object[org.apache.druid.query.lookup.LookupResourceListenerAnnouncer@6b70d1fb].
2021-05-14T17:00:34,615 INFO [main] org.apache.druid.server.listener.announcer.ListenerResourceAnnouncer - Announcing start time on [/druid_odyssey/listeners/lookups/__default/http:ip-10-0-5-119.ec2.internal:9081]
2021-05-14T17:00:34,615 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [SERVER]
2021-05-14T17:00:34,615 INFO [main] org.apache.druid.server.initialization.jetty.JettyServerModule - Starting Jetty Server…
2021-05-14T17:00:34,618 INFO [main] org.eclipse.jetty.server.Server - jetty-9.4.10.v20180503; built: 2018-05-03T15:56:21.710Z; git: daa59876e6f384329b122929e70a80934569428c; jvm 1.8.0_282-b08
2021-05-14T17:00:34,651 INFO [main] org.eclipse.jetty.server.session - DefaultSessionIdManager workerName=node0
2021-05-14T17:00:34,651 INFO [main] org.eclipse.jetty.server.session - No SessionScavenger set, using defaults
2021-05-14T17:00:34,654 INFO [main] org.eclipse.jetty.server.session - node0 Scavenging every 600000ms
2021-05-14T17:00:34,742 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider as a provider class
2021-05-14T17:00:34,743 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering com.fasterxml.jackson.jaxrs.smile.JacksonSmileProvider as a provider class
2021-05-14T17:00:34,743 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering org.apache.druid.server.initialization.jetty.CustomExceptionMapper as a provider class
2021-05-14T17:00:34,743 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering org.apache.druid.server.initialization.jetty.ForbiddenExceptionMapper as a provider class
2021-05-14T17:00:34,743 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering org.apache.druid.server.initialization.jetty.BadRequestExceptionMapper as a provider class
2021-05-14T17:00:34,744 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering org.apache.druid.server.StatusResource as a root resource class
2021-05-14T17:00:34,787 INFO [main] com.sun.jersey.server.impl.application.WebApplicationImpl - Initiating Jersey application, version ‘Jersey: 1.19.3 10/24/2016 03:43 PM’
2021-05-14T17:00:34,868 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.initialization.jetty.CustomExceptionMapper to GuiceManagedComponentProvider with the scope “Singleton”
2021-05-14T17:00:34,869 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.initialization.jetty.ForbiddenExceptionMapper to GuiceManagedComponentProvider with the scope “Singleton”
2021-05-14T17:00:34,870 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.initialization.jetty.BadRequestExceptionMapper to GuiceManagedComponentProvider with the scope “Singleton”
2021-05-14T17:00:34,870 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider to GuiceManagedComponentProvider with the scope “Singleton”
2021-05-14T17:00:34,877 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding com.fasterxml.jackson.jaxrs.smile.JacksonSmileProvider to GuiceManagedComponentProvider with the scope “Singleton”
2021-05-14T17:00:35,238 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.QueryResource to GuiceInstantiatedComponentProvider
2021-05-14T17:00:35,244 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.http.security.StateResourceFilter to GuiceInstantiatedComponentProvider
2021-05-14T17:00:35,248 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.http.HistoricalResource to GuiceInstantiatedComponentProvider
2021-05-14T17:00:35,253 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.http.SegmentListerResource to GuiceInstantiatedComponentProvider
2021-05-14T17:00:35,255 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.http.security.ConfigResourceFilter to GuiceInstantiatedComponentProvider
2021-05-14T17:00:35,259 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.query.lookup.LookupListeningResource to GuiceInstantiatedComponentProvider
2021-05-14T17:00:35,261 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.query.lookup.LookupIntrospectionResource to GuiceInstantiatedComponentProvider
2021-05-14T17:00:35,263 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.StatusResource to GuiceManagedComponentProvider with the scope “Undefined”
2021-05-14T17:00:35,290 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@30d5e37c{/,null,AVAILABLE}
2021-05-14T17:00:35,300 INFO [main] org.eclipse.jetty.server.AbstractConnector - Started ServerConnector@26994c6a{HTTP/1.1,[http/1.1]}{0.0.0.0:9081}
2021-05-14T17:00:35,300 INFO [main] org.eclipse.jetty.server.Server - Started @6776ms
2021-05-14T17:00:35,300 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [ANNOUNCEMENTS]
2021-05-14T17:00:35,300 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking start method[public void org.apache.druid.curator.announcement.Announcer.start()] on object[org.apache.druid.curator.announcement.Announcer@3cd46491].
2021-05-14T17:00:35,300 INFO [main] org.apache.druid.curator.announcement.Announcer - Starting announcer
2021-05-14T17:00:35,336 INFO [main] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Announcing [DiscoveryDruidNode{druidNode=DruidNode{serviceName=‘druid/historical’, host=‘ip-10-0-5-119.ec2.internal’, bindOnHost=false, port=-1, plaintextPort=9081, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType=‘HISTORICAL’, services={dataNodeService=DataNodeService{tier=’_default_tier’, maxSize=60000000000, type=historical, priority=0}, lookupNodeService=LookupNodeService{lookupTier=’__default’}}}].
2021-05-14T17:00:35,354 INFO [main] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Announced [DiscoveryDruidNode{druidNode=DruidNode{serviceName=‘druid/historical’, host=‘ip-10-0-5-119.ec2.internal’, bindOnHost=false, port=-1, plaintextPort=9081, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType=‘HISTORICAL’, services={dataNodeService=DataNodeService{tier=’_default_tier’, maxSize=60000000000, type=historical, priority=0}, lookupNodeService=LookupNodeService{lookupTier=’__default’}}}].
2021-05-14T17:00:35,355 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Successfully started lifecycle [module]
2021-05-14T17:56:55,862 INFO [NodeTypeWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeTypeWatcher - Node[ip-10-0-2-142.ec2.internal:9082:DiscoveryDruidNode{druidNode=DruidNode{serviceName=‘druid/coordinator’, host=‘ip-10-0-2-142.ec2.internal’, bindOnHost=false, port=-1, plaintextPort=9082, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType=‘COORDINATOR’, services={}}] disappeared.
2021-05-14T17:57:02,546 INFO [NodeTypeWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeTypeWatcher - Node[ip-10-0-2-142.ec2.internal:9082:DiscoveryDruidNode{druidNode=DruidNode{serviceName=‘druid/coordinator’, host=‘ip-10-0-2-142.ec2.internal’, bindOnHost=false, port=-1, plaintextPort=9082, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType=‘COORDINATOR’, services={}}] appeared.
[mlubavin@ip-10-0-5-119 ~]$

the coordinator appearing and disappearing at the end was from me restarting coordinator.

So… back to my original question - is it coordinator that’s not doing its job telling historical node to load segments, or historical not doing its job listening, or something else?

I’m not sure - but you might check that it can access the deep storage (although I’d expect an error if not), and maxSegmentsToMove, and other dynamic coordinator properties.

Hi Michael,
Can you share a longer window (30 mins) of logs for both coordinator and historical?
The coordinator is responsible for assigning the segments to historical and in version 0.16 that you are using, zookeeper is used for mediation.

OK, I think I have some more info and am getting closer. I found this error during coordinator startup:

2021-05-19T14:04:04,164 ERROR [LeaderSelector[/druid_odyssey/coordinator/_COORDINATOR]] org.apache.curator.framework.listen.ListenerContainer - Listener (org.apache.druid.curator.discovery.CuratorDruidLeaderSelector$1@c14e5a5) threw an exception
java.lang.ClassFormatError: Illegal field name “org.apache.druid.server.coordinator.DruidCoordinator$this” in class org/apache/druid/server/coordinator/DruidCoordinator$CoordinatorHistoricalManagerRunnable
at java.lang.ClassLoader.defineClass1(Native Method) ~[?:1.8.0_282]
at java.lang.ClassLoader.defineClass(ClassLoader.java:756) ~[?:1.8.0_282]
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ~[?:1.8.0_282]
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) ~[?:1.8.0_282]
at java.net.URLClassLoader.access$100(URLClassLoader.java:74) ~[?:1.8.0_282]
at java.net.URLClassLoader$1.run(URLClassLoader.java:369) ~[?:1.8.0_282]
at java.net.URLClassLoader$1.run(URLClassLoader.java:363) ~[?:1.8.0_282]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_282]
at java.net.URLClassLoader.findClass(URLClassLoader.java:362) ~[?:1.8.0_282]

And my behavior (0 loaded segments) is the same as in [druid-user] Unable to run ingestion with specific java version
And I also found Historicals are not loading new segments – Imply Help Center
And I do have java version 1.8.0_282
Is there no patch for this, only that we have to downgrade Java?

I checked and my working cluster is on java 1.8.0_272

Confirming that downgrading java to 1.8.0_272 on the same coordinator machine that wasn’t working, is now working

Hi Nishant,

Thanks for your reply. Are you saying this is avoided in a different version of Druid?

Thanks,
Michael