Compaction not working correctly – How do I troubleshoot?

Relates to Apache Druid 0.21.1

Hi everyone, I have a question about why my compaction is going slowly.

My Setup

I’m running a medium-sized Druid 0.21.1 cluster with 1 master, 2 data servers, and 1 query server, using the example configs from the Druid Clustered Deployment article. Servers sizes are mostly as recommended by that article, although my master server is slightly smaller with only 16GB ram.

I have a data source which contains about 3.8Gb of data in around 67 000 segments or 32 million rows. I receive around 20-50 000 new rows each day.

What’s the problem?

Over the last few weeks, with more and more data coming in, query performance has been suffering. My assumption is that this is because my data usually doesn’t arrive in order, so segments seem pretty fragmented. My average segment size is 475 rows.

To combat this, I have enabled auto compaction (configuration at the bottom of this post). However, compaction seems to be advancing very slowly. I’ve had auto compaction turned on for the last week or so, and according to the datasources page, only 0.03% of all segments have been compacted.

Looking at the tasks, a compact task seems to be run about once an hour and running about 15 to 20 seconds. About a fifth of the compaction tasks are marked as “failed” with no error message.

Here are my questions:

  • Is this slow speed normal? Are there any configuration changes I can make to speed up the compaction process?
  • Are the failed tasks cause for concern? Perusing the historical logs, I haven’t found any error messages regarding compaction? (Am I looking at the correct log file?)

Thanks a lot in advance for your questions and comments,
Daniel

Compaction Configuration

{
  "dataSource": "datasource-name",
  "taskPriority": 25,
  "inputSegmentSizeBytes": 419430400,
  "maxRowsPerSegment": null,
  "skipOffsetFromLatest": "P3D",
  "tuningConfig": {
    "maxRowsInMemory": null,
    "maxBytesInMemory": null,
    "maxTotalRows": null,
    "splitHintSpec": null,
    "partitionsSpec": {
      "type": "hashed",
      "numShards": null,
      "partitionDimensions": [],
      "partitionFunction": "murmur3_32_abs",
      "maxRowsPerSegment": 5000000
    },
    "indexSpec": null,
    "indexSpecForIntermediatePersists": null,
    "maxPendingPersists": null,
    "pushTimeout": null,
    "segmentWriteOutMediumFactory": null,
    "maxNumConcurrentSubTasks": 4,
    "maxRetry": null,
    "taskStatusCheckPeriodMs": null,
    "chatHandlerTimeout": null,
    "chatHandlerNumRetries": null,
    "maxNumSegmentsToMerge": null,
    "totalNumMergeTasks": null,
    "forceGuaranteedRollup": true,
    "type": "index_parallel"
  },
  "taskContext": null
}

That does seem very slow. I might check coordinator logs for errors - I think the coordinator handles the compactions.

Good point! Here are excerpts from my coordinator-overlord.log:

I have lots of these, these seem to span the whole of my data until about a month ago:

2021-09-09T17:11:38,145 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.NewestSegmentFirstIterator - Candidate segment[telemetry-signals-tagged_2021-07-30T02:00:00.000Z_2021-07-30T03:00:00.000Z_2021-08-12T15:09:43.918Z_1] is not compacted yet. Needs compaction.
2021-09-09T17:11:38,146 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.NewestSegmentFirstIterator - Candidate segment[telemetry-signals-tagged_2021-07-30T01:00:00.000Z_2021-07-30T02:00:00.000Z_2021-08-12T15:09:43.671Z_1] is not compacted yet. Needs compaction.
2021-09-09T17:11:38,146 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.NewestSegmentFirstIterator - Candidate segment[telemetry-signals-tagged_2021-07-30T00:00:00.000Z_2021-07-30T01:00:00.000Z_2021-08-12T15:09:42.723Z_1] is not compacted yet. Needs compaction.
...
2021-09-09T17:11:38,215 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.NewestSegmentFirstIterator - Candidate segment[telemetry-signals-tagged_2020-10-06T21:00:00.000Z_2020-10-06T22:00:00.000Z_2021-08-12T09:25:12.367Z] is not compacted yet. Needs compaction.
2021-09-09T17:11:38,215 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.NewestSegmentFirstIterator - Candidate segment[telemetry-signals-tagged_2020-10-06T20:00:00.000Z_2020-10-06T21:00:00.000Z_2021-08-12T09:25:09.707Z] is not compacted yet. Needs compaction.
2021-09-09T17:11:38,215 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.NewestSegmentFirstIterator - Candidate segment[telemetry-signals-tagged_2020-10-06T19:00:00.000Z_2020-10-06T20:00:00.000Z_2021-08-12T09:25:06.262Z] is not compacted yet. Needs compaction.
2021-09-09T17:11:38,215 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.NewestSegmentFirstIterator - Candidate segment[telemetry-signals-tagged_2020-10-06T18:00:00.000Z_2020-10-06T19:00:00.000Z_2021-08-12T09:25:06.890Z] is not compacted yet. Needs compaction.

Then the very next thing is this, which confuses me, because I very much disagree:

2021-09-09T17:11:38,216 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.NewestSegmentFirstIterator - All segments look good! Nothing to compact

And then various entries that seem to me indications of normal behavior:

2021-09-09T17:11:46,988 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.TaskQueue - Task done: AbstractTask{id='partial_dimension_cardinality_telemetry-signals-tagged_kegmgcic_2021-09-09T17:11:42.633Z', groupId='coordinator-issued_compact_telemetry-signals-tagged_gadjaoak_2021-09-09T17:11:38.080Z', taskResource=TaskResource{availabilityGroup='partial_dimension_cardinality_telemetry-signals-tagged_kegmgcic_2021-09-09T17:11:42.633Z', requiredCapacity=1}, dataSource='telemetry-signals-tagged', context={storeCompactionState=true, appenderatorTrackingTaskId=coordinator-issued_compact_telemetry-signals-tagged_gadjaoak_2021-09-09T17:11:38.080Z, priority=25, forceTimeChunkLock=true}}
2021-09-09T17:11:46,988 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.TaskQueue - Task SUCCESS: AbstractTask{id='partial_dimension_cardinality_telemetry-signals-tagged_kegmgcic_2021-09-09T17:11:42.633Z', groupId='coordinator-issued_compact_telemetry-signals-tagged_gadjaoak_2021-09-09T17:11:38.080Z', taskResource=TaskResource{availabilityGroup='partial_dimension_cardinality_telemetry-signals-tagged_kegmgcic_2021-09-09T17:11:42.633Z', requiredCapacity=1}, dataSource='telemetry-signals-tagged', context={storeCompactionState=true, appenderatorTrackingTaskId=coordinator-issued_compact_telemetry-signals-tagged_gadjaoak_2021-09-09T17:11:38.080Z, priority=25, forceTimeChunkLock=true}} (4296 run duration)
2021-09-09T17:11:47,698 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.TaskLockbox - Task[coordinator-issued_compact_telemetry-signals-tagged_gadjaoak_2021-09-09T17:11:38.080Z] already present in TaskLock[coordinator-issued_compact_telemetry-signals-tagged_gadjaoak_2021-09-09T17:11:38.080Z]
2021-09-09T17:11:51,292 INFO [qtp1317052417-96] org.apache.druid.indexing.overlord.TaskLockbox - Task[coordinator-issued_compact_telemetry-signals-tagged_gadjaoak_2021-09-09T17:11:38.080Z] already present in TaskLock[coordinator-issued_compact_telemetry-signals-tagged_gadjaoak_2021-09-09T17:11:38.080Z]
2021-09-09T17:11:51,319 INFO [qtp1317052417-116] org.apache.druid.indexing.overlord.TaskLockbox - Task[coordinator-issued_compact_telemetry-signals-tagged_gadjaoak_2021-09-09T17:11:38.080Z] already present in TaskLock[coordinator-issued_compact_telemetry-signals-tagged_gadjaoak_2021-09-09T17:11:38.080Z]
2021-09-09T17:11:52,538 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.TaskQueue - Task done: AbstractTask{id='partial_index_generate_telemetry-signals-tagged_maaonbgj_2021-09-09T17:11:47.680Z', groupId='coordinator-issued_compact_telemetry-signals-tagged_gadjaoak_2021-09-09T17:11:38.080Z', taskResource=TaskResource{availabilityGroup='partial_index_generate_telemetry-signals-tagged_maaonbgj_2021-09-09T17:11:47.680Z', requiredCapacity=1}, dataSource='telemetry-signals-tagged', context={storeCompactionState=true, appenderatorTrackingTaskId=coordinator-issued_compact_telemetry-signals-tagged_gadjaoak_2021-09-09T17:11:38.080Z, priority=25, forceTimeChunkLock=true}}

Question: Since this all looks pretty normal, except for the beginning, do I maybe have a misconfiguration in my compaction configuration?

One thing I’ve seen cause a problem is inputSegmentSizeBytes. My understanding (I could be wrong) is that it needs to be big enough to cover all the existing segments for a time chunk. You might try increasing it accordingly, and see whether that helps.

Thanks, I’ll try setting that to a gigabyte and see if that helps.

Sooo after 9 hours of letting it run withinputSegmentSizeBytes set to two gigabytes, here’s a result:

  • A compact job ran 12 times
  • It only ran successfully 6 times, and failed 6 times
  • The percentage of compacted data actually went down, because more new data arrived than was compacted

So I think inputSegmentSizeBytes is not the culprit.

Might this be a memory issue? I am not very well versed with Java’s memory handling and heap sizes and stuff. But I took care to run this on machines with exactly as much RAM as in the clustered deployment guide, except for the controller, and I updated it’s heap size configuration.

I could also see if I can reduce inputSegmentSizeBytes and see if that yields any results.

Alternatively, I’d be super interested in hearing any strategies to make compaction jobs just appear faster. This way I could experiment and troubleshoot quicker. Any ideas?


Here are some log files but they all look so … benign:

historical.log

2021-09-10T07:14:42,496 INFO [ZKCoordinator--1] org.apache.druid.server.coordination.ZkCoordinator - Completed request [DROP: telemetry-signals-tagged_2021-09-10T09:00:00.000Z_2021-09-10T10:00:00.000Z_2021-09-10T07:02:14.903Z]
2021-09-10T07:14:42,496 INFO [ZkCoordinator] org.apache.druid.server.coordination.ZkCoordinator - zNode[/druid/loadQueue/10.0.1.22:8083/telemetry-signals-tagged_2021-09-10T09:00:00.000Z_2021-09-10T10:00:00.000Z_2021-09-10T07:02:14.903Z] was removed
2021-09-10T07:15:12,496 INFO [SimpleDataSegmentChangeHandler-1] org.apache.druid.server.SegmentManager - Attempting to close segment telemetry-signals-tagged_2021-09-10T09:00:00.000Z_2021-09-10T10:00:00.000Z_2021-09-10T07:02:14.903Z
2021-09-10T07:15:12,496 INFO [SimpleDataSegmentChangeHandler-1] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[var/druid/segment-cache/telemetry-signals-tagged/2021-09-10T09:00:00.000Z_2021-09-10T10:00:00.000Z/2021-09-10T07:02:14.903Z/0]
2021-09-10T07:15:12,497 INFO [SimpleDataSegmentChangeHandler-1] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[var/druid/segment-cache/telemetry-signals-tagged/2021-09-10T09:00:00.000Z_2021-09-10T10:00:00.000Z/2021-09-10T07:02:14.903Z]
2021-09-10T07:42:42,322 INFO [ZKCoordinator--0] org.apache.druid.server.coordination.SegmentLoadDropHandler - Loading segment telemetry-signals-tagged_2021-09-10T08:00:00.000Z_2021-09-10T09:00:00.000Z_2021-09-10T07:41:50.332Z
2021-09-10T07:42:42,322 INFO [ZKCoordinator--0] org.apache.druid.storage.s3.S3DataSegmentPuller - Pulling index at path[CloudObjectLocation{bucket='deepstorage', path='druid/segments/telemetry-signals-tagged/2021-09-10T08:00:00.000Z_2021-09-10T09:00:00.000Z/2021-09-10T07:41:50.332Z/0/index.zip'}] to outDir[var/druid/segment-cache/telemetry-signals-tagged/2021-09-10T08:00:00.000Z_2021-09-10T09:00:00.000Z/2021-09-10T07:41:50.332Z/0]
2021-09-10T07:42:42,350 INFO [ZKCoordinator--0] org.apache.druid.storage.s3.S3DataSegmentPuller - Loaded 3346 bytes from [CloudObjectLocation{bucket='deepstorage', path='druid/segments/telemetry-signals-tagged/2021-09-10T08:00:00.000Z_2021-09-10T09:00:00.000Z/2021-09-10T07:41:50.332Z/0/index.zip'}] to [/home/druid/druid/var/druid/segment-cache/telemetry-signals-tagged/2021-09-10T08:00:00.000Z_2021-09-10T09:00:00.000Z/2021-09-10T07:41:50.332Z/0]
2021-09-10T07:42:42,353 INFO [ZKCoordinator--0] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Announcing segment[telemetry-signals-tagged_2021-09-10T08:00:00.000Z_2021-09-10T09:00:00.000Z_2021-09-10T07:41:50.332Z] at existing path[/druid/segments/10.0.1.22:8083/10.0.1.22:8083_historical__default_tier_2021-09-10T06:08:11.488Z_22b5fd1b7bbb4647a2544fb3d275183d1444]
2021-09-10T07:42:42,356 INFO [ZkCoordinator] org.apache.druid.server.coordination.ZkCoordinator - zNode[/druid/loadQueue/10.0.1.22:8083/telemetry-signals-tagged_2021-09-10T08:00:00.000Z_2021-09-10T09:00:00.000Z_2021-09-10T07:41:50.332Z] was removed
2021-09-10T07:42:42,356 INFO [ZKCoordinator--0] org.apache.druid.server.coordination.ZkCoordinator - Completed request [LOAD: telemetry-signals-tagged_2021-09-10T08:00:00.000Z_2021-09-10T09:00:00.000Z_2021-09-10T07:41:50.332Z]
2021-09-10T07:43:47,647 INFO [ZKCoordinator--1] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[telemetry-signals-tagged_2021-09-10T08:00:00.000Z_2021-09-10T09:00:00.000Z_2021-09-10T06:48:03.743Z] at path[/druid/segments/10.0.1.22:8083/10.0.1.22:8083_historical__default_tier_2021-09-10T06:08:11.488Z_22b5fd1b7bbb4647a2544fb3d275183d1444]
2021-09-10T07:43:47,647 INFO [ZKCoordinator--1] org.apache.druid.server.coordination.SegmentLoadDropHandler - Completely removing [telemetry-signals-tagged_2021-09-10T08:00:00.000Z_2021-09-10T09:00:00.000Z_2021-09-10T06:48:03.743Z] in [30,000] millis
2021-09-10T07:43:47,651 INFO [ZKCoordinator--1] org.apache.druid.server.coordination.ZkCoordinator - Completed request [DROP: telemetry-signals-tagged_2021-09-10T08:00:00.000Z_2021-09-10T09:00:00.000Z_2021-09-10T06:48:03.743Z]
2021-09-10T07:43:47,651 INFO [ZkCoordinator] org.apache.druid.server.coordination.ZkCoordinator - zNode[/druid/loadQueue/10.0.1.22:8083/telemetry-signals-tagged_2021-09-10T08:00:00.000Z_2021-09-10T09:00:00.000Z_2021-09-10T06:48:03.743Z] was removed
2021-09-10T07:44:17,649 INFO [SimpleDataSegmentChangeHandler-0] org.apache.druid.server.SegmentManager - Attempting to close segment telemetry-signals-tagged_2021-09-10T08:00:00.000Z_2021-09-10T09:00:00.000Z_2021-09-10T06:48:03.743Z
2021-09-10T07:44:17,649 INFO [SimpleDataSegmentChangeHandler-0] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[var/druid/segment-cache/telemetry-signals-tagged/2021-09-10T08:00:00.000Z_2021-09-10T09:00:00.000Z/2021-09-10T06:48:03.743Z/0]
2021-09-10T07:44:17,649 INFO [SimpleDataSegmentChangeHandler-0] org.apache.druid.segment.loading.SegmentLoaderLocalCacheManager - Deleting directory[var/druid/segment-cache/telemetry-signals-tagged/2021-09-10T08:00:00.000Z_2021-09-10T09:00:00.000Z/2021-09-10T06:48:03.743Z]

middlemanager.log

r/druid/task/partial_index_generate_telemetry-signals-tagged_aoojedjb_2021-09-10T07:41:59.093Z
2021-09-10T07:42:03,361 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Task [partial_index_generate_telemetry-signals-tagged_aoojedjb_2021-09-10T07:41:59.093Z] completed with status [SUCCESS].
2021-09-10T07:42:04,161 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Task[partial_index_generic_merge_telemetry-signals-tagged_bfcnapkm_2021-09-10T07:42:04.118Z] started.
2021-09-10T07:42:04,163 INFO [forking-task-runner-0] org.apache.druid.indexing.overlord.ForkingTaskRunner - Running command: java -cp /home/druid/druid/conf/druid/cluster/data/middleManager:/home/druid/druid/conf/druid/cluster/data/_common:/home/druid/druid/conf/druid/cluster/data/_common/hadoop-xml:/home/druid/druid/conf/druid/cluster/data/../_common:/home/druid/druid/conf/druid/cluster/data/../_common/hadoop-xml:/home/druid/druid/bin/../lib/stax-ex-1.8.jar:/home/druid/druid/bin/../lib/json-smart-2.3.jar:/home/druid/druid/bin/../lib/commons-dbcp2-2.0.1.jar:/home/druid/druid/bin/../lib/aether-impl-0.9.0.M2.jar:/home/druid/druid/bin/../lib/okhttp-1.0.2.jar:/home/druid/druid/bin/../lib/javax.el-3.0.0.jar:/home/druid/druid/bin/../lib/jetty-proxy-9.4.40.v20210413.jar:/home/druid/druid/bin/../lib/jetty-rewrite-9.4.40.v20210413.jar:/home/druid/druid/bin/../lib/jersey-server-1.19.3.jar:/home/druid/druid/bin/../lib/aether-util-0.9.0.M2.jar:/home/druid/druid/bin/../lib/google-api-client-1.26.0.jar:/home/druid/druid/bin/../lib/asm-analysis-7.1.jar:/home/druid/druid/bin/../lib/datasketches-java-1.3.0-incubating.jar:/home/druid/druid/bin/../lib/commons-pool2-2.2.jar:/home/druid/druid/bin/../lib/guice-assistedinject-4.1.0.jar:/home/druid/druid/bin/../lib/aws-java-sdk-kms-1.11.199.jar:/home/druid/druid/bin/../lib/jetty-http-9.4.40.v20210413.jar:/home/druid/druid/bin/../lib/aggdesigner-algorithm-6.0.jar:/home/druid/druid/bin/../lib/maven-repository-metadata-3.1.1.jar:/home/druid/druid/bin/../lib/vavr-match-0.10.2.jar:/home/druid/druid/bin/../lib/log4j-slf4j-impl-2.8.2.jar:/home/druid/druid/bin/../lib/jetty-client-9.4.40.v20210413.jar:/home/druid/druid/bin/../lib/commons-collections4-4.2.jar:/home/druid/druid/bin/../lib/jdbi-2.63.1.jar:/home/druid/druid/bin/../lib/validation-api-1.1.0.Final.jar:/home/druid/druid/bin/../lib/jackson-jaxrs-json-provider-2.10.2.jar:/home/druid/druid/bin/../lib/netty-transport-native-unix-common-4.1.48.Final.jar:/home/druid/druid/bin/../lib/jetty-server-9.4.40.v20210413.jar:/home/druid/druid/bin/../lib/log4j-jul-2.8.2.jar:/home/druid/druid/bin/../lib/maven-settings-builder-3.1.1.jar:/home/druid/druid/bin/../lib/commons-logging-1.1.1.jar:/home/druid/druid/bin/../lib/derbynet-10.14.2.0.jar:/home/druid/druid/bin/../lib/curator-client-4.3.0.jar:/home/druid/druid/bin/../lib/jackson-module-guice-2.10.2.jar:/home/druid/druid/bin/../lib/netty-codec-socks-4.1.48.Final.jar:/home/druid/druid/bin/../lib/druid-sql-0.21.1.jar:/home/druid/druid/bin/../lib/jersey-servlet-1.19.3.jar:/home/druid/druid/bin/../lib/datasketches-memory-1.2.0-incubating.jar:/home/druid/druid/bin/../lib/jvm-attach-api-1.5.jar:/home/druid/druid/bin/../lib/istack-commons-runtime-3.0.7.jar:/home/druid/druid/bin/../lib/plexus-interpolation-1.19.jar:/home/druid/druid/bin/../lib/aether-api-0.9.0.M2.jar:/home/druid/druid/bin/../lib/guava-16.0.1.jar:/home/druid/druid/bin/../lib/jaxb-api-2.3.1.jar:/home/druid/druid/bin/../lib/protobuf-java-3.11.0.jar:/home/druid/druid/bin/../lib/classmate-1.1.0.jar:/home/druid/druid/bin/../lib/netty-common-4.1.48.Final.jar:/home/druid/druid/bin/../lib/metrics-core-4.0.0.jar:/home/druid/druid/bin/../lib/jakarta.xml.bind-api-2.3.2.jar:/home/druid/druid/bin/../lib/maven-settings-3.1.1.jar:/home/druid/druid/bin/../lib/plexus-utils-3.0.24.jar:/home/druid/druid/bin/../lib/netty-reactive-streams-2.0.0.jar:/home/druid/druid/bin/../lib/jsr311-api-1.1.1.jar:/home/druid/druid/bin/../lib/hibernate-validator-5.2.5.Final.jar:/home/druid/druid/bin/../lib/json-path-2.3.0.jar:/home/druid/druid/bin/../lib/guice-servlet-4.1.0.jar:/home/druid/druid/bin/../lib/resilience4j-core-1.3.1.jar:/home/druid/druid/bin/../lib/druid-indexing-hadoop-0.21.1.jar:/home/druid/druid/bin/../lib/druid-services-0.21.1.jar:/home/druid/druid/bin/../lib/maven-aether-provider-3.1.1.jar:/home/druid/druid/bin/../lib/javax.activation-1.2.0.jar:/home/druid/druid/bin/../lib/jackson-jaxrs-smile-provider-2.10.2.jar:/home/druid/druid/bin/../lib/netty-handler-proxy-4.1.48.Final.jar:/home/druid/druid/bin/../lib/spymemcached-2.12.3.jar:/home/druid/druid/bin/../lib/jna-4.5.1.jar:/home/druid/druid/bin/../lib/avatica-metrics-1.17.0.jar:/home/druid/druid/bin/../lib/druid-indexing-service-0.21.1.jar:/home/druid/druid/bin/../lib/guice-multibindings-4.1.0.jar:/home/druid/druid/bin/../lib/druid-core-0.21.1.jar:/home/druid/druid/bin/../lib/jboss-logging-3.2.1.Final.jar:/home/druid/druid/bin/../lib/commons-compress-1.19.jar:/home/druid/druid/bin/../lib/vavr-0.10.2.jar:/home/druid/druid/bin/../lib/async-http-client-netty-utils-2.5.3.jar:/home/druid/druid/bin/../lib/janino-3.0.11.jar:/home/druid/druid/bin/../lib/netty-transport-4.1.48.Final.jar:/home/druid/druid/bin/../lib/commons-lang3-3.8.1.jar:/home/druid/druid/bin/../lib/commons-lang-2.6.jar:/home/druid/druid/bin/../lib/aws-java-sdk-ec2-1.11.199.jar:/home/druid/druid/bin/../lib/druid-hll-0.21.1.jar:/home/druid/druid/bin/../lib/curator-framework-4.3.0.jar:/home/druid/druid/bin/../lib/ion-java-1.0.2.jar:/home/druid/druid/bin/../lib/avatica-server-1.17.0.jar:/home/druid/druid/bin/../lib/config-magic-0.9.jar:/home/druid/druid/bin/../lib/druid-aws-common-0.21.1.jar:/home/druid/druid/bin/../lib/shims-0.9.0.jar:/home/druid/druid/bin/../lib/avatica-core-1.17.0.jar:/home/druid/druid/bin/../lib/aether-connector-okhttp-0.0.9.jar:/home/druid/druid/bin/../lib/derbyclient-10.14.2.0.jar:/home/druid/druid/bin/../lib/jsr305-2.0.1.jar:/home/druid/druid/bin/../lib/cron-scheduler-0.1.jar:/home/druid/druid/bin/../lib/icu4j-55.1.jar:/home/druid/druid/bin/../lib/druid-gcp-common-0.21.1.jar:/home/druid/druid/bin/../lib/commons-net-3.6.jar:/home/druid/druid/bin/../lib/jetty-continuation-9.4.40.v20210413.jar:/home/druid/druid/bin/../lib/lz4-java-1.7.1.jar:/home/druid/druid/bin/../lib/rhino-1.7.11.jar:/home/druid/druid/bin/../lib/antlr4-runtime-4.5.1.jar:/home/druid/druid/bin/../lib/error_prone_annotations-2.3.2.jar:/home/druid/druid/bin/../lib/aws-java-sdk-s3-1.11.199.jar:/home/druid/druid/bin/../lib/wagon-provider-api-2.4.jar:/home/druid/druid/bin/../lib/derby-10.14.2.0.jar:/home/druid/druid/bin/../lib/aether-connector-file-0.9.0.M2.jar:/home/druid/druid/bin/../lib/jetty-io-9.4.40.v20210413.jar:/home/druid/druid/bin/../lib/jersey-guice-1.19.3.jar:/home/druid/druid/bin/../lib/netty-resolver-4.1.48.Final.jar:/home/druid/druid/bin/../lib/jaxb-runtime-2.3.1.jar:/home/druid/druid/bin/../lib/jackson-module-jaxb-annotations-2.10.2.jar:/home/druid/druid/bin/../lib/tesla-aether-0.0.5.jar:/home/druid/druid/bin/../lib/netty-codec-4.1.48.Final.jar:/home/druid/druid/bin/../lib/google-http-client-1.26.0.jar:/home/druid/druid/bin/../lib/guice-4.1.0.jar:/home/druid/druid/bin/../lib/zookeeper-3.4.14.jar:/home/druid/druid/bin/../lib/commons-codec-1.13.jar:/home/druid/druid/bin/../lib/google-oauth-client-1.26.0.jar:/home/druid/druid/bin/../lib/xz-1.8.jar:/home/druid/druid/bin/../lib/jersey-core-1.19.3.jar:/home/druid/druid/bin/../lib/maven-model-3.1.1.jar:/home/druid/druid/bin/../lib/aether-spi-0.9.0.M2.jar:/home/druid/druid/bin/../lib/jackson-jaxrs-base-2.10.2.jar:/home/druid/druid/bin/../lib/opencsv-4.6.jar:/home/druid/druid/bin/../lib/fastutil-8.2.3.jar:/home/druid/druid/bin/../lib/asm-7.1.jar:/home/druid/druid/bin/../lib/jackson-datatype-guava-2.10.2.jar:/home/druid/druid/bin/../lib/caffeine-2.8.0.jar:/home/druid/druid/bin/../lib/asm-commons-7.1.jar:/home/druid/druid/bin/../lib/commons-collections-3.2.2.jar:/home/druid/druid/bin/../lib/async-http-client-2.5.3.jar:/home/druid/druid/bin/../lib/jmespath-java-1.11.199.jar:/home/druid/druid/bin/../lib/asm-tree-7.1.jar:/home/druid/druid/bin/../lib/netty-handler-4.1.48.Final.jar:/home/druid/druid/bin/../lib/maven-model-builder-3.1.1.jar:/home/druid/druid/bin/../lib/jcodings-1.0.43.jar:/home/druid/druid/bin/../lib/accessors-smart-1.2.jar:/home/druid/druid/bin/../lib/jackson-core-2.10.2.jar:/home/druid/druid/bin/../lib/jackson-dataformat-smile-2.10.2.jar:/home/druid/druid/bin/../lib/checker-qual-2.5.7.jar:/home/druid/druid/bin/../lib/log4j-core-2.8.2.jar:/home/druid/druid/bin/../lib/netty-resolver-dns-4.1.48.Final.jar:/home/druid/druid/bin/../lib/jackson-datatype-joda-2.10.2.jar:/home/druid/druid/bin/../lib/aopalliance-1.0.jar:/home/druid/druid/bin/../lib/jcl-over-slf4j-1.7.12.jar:/home/druid/druid/bin/../lib/jetty-security-9.4.40.v20210413.jar:/home/druid/druid/bin/../lib/jetty-servlet-9.4.40.v20210413.jar:/home/druid/druid/bin/../lib/javax.activation-api-1.2.0.jar:/home/druid/druid/bin/../lib/log4j-1.2-api-2.8.2.jar:/home/druid/druid/bin/../lib/commons-text-1.3.jar:/home/druid/druid/bin/../lib/druid-console-0.21.1.jar:/home/druid/druid/bin/../lib/slf4j-api-1.7.25.jar:/home/druid/druid/bin/../lib/netty-3.10.6.Final.jar:/home/druid/druid/bin/../lib/druid-server-0.21.1.jar:/home/druid/druid/bin/../lib/netty-codec-dns-4.1.48.Final.jar:/home/druid/druid/bin/../lib/resilience4j-bulkhead-1.3.1.jar:/home/druid/druid/bin/../lib/extendedset-0.21.1.jar:/home/druid/druid/bin/../lib/aws-java-sdk-core-1.11.199.jar:/home/druid/druid/bin/../lib/commons-io-2.6.jar:/home/druid/druid/bin/../lib/compress-lzf-1.0.4.jar:/home/druid/druid/bin/../lib/FastInfoset-1.2.15.jar:/home/druid/druid/bin/../lib/txw2-2.3.1.jar:/home/druid/druid/bin/../lib/netty-buffer-4.1.48.Final.jar:/home/druid/druid/bin/../lib/curator-recipes-4.3.0.jar:/home/druid/druid/bin/../lib/esri-geometry-api-2.2.0.jar:/home/druid/druid/bin/../lib/log4j-api-2.8.2.jar:/home/druid/druid/bin/../lib/commons-beanutils-1.9.4.jar:/home/druid/druid/bin/../lib/jackson-annotations-2.10.2.jar:/home/druid/druid/bin/../lib/jetty-util-ajax-9.4.40.v20210413.jar:/home/druid/druid/bin/../lib/RoaringBitmap-0.9.0.jar:/home/druid/druid/bin/../lib/reactive-streams-1.0.2.jar:/home/druid/druid/bin/../lib/maven-artifact-3.6.0.jar:/home/druid/druid/bin/../lib/calcite-core-1.21.0.jar:/home/druid/druid/bin/../lib/jackson-dataformat-cbor-2.10.2.jar:/home/druid/druid/bin/../lib/airline-0.7.jar:/home/druid/druid/bin/../lib/audience-annotations-0.5.0.jar:/home/druid/druid/bin/../lib/commons-compiler-3.0.11.jar:/home/druid/druid/bin/../lib/javax.servlet-api-3.1.0.jar:/home/druid/druid/bin/../lib/netty-transport-native-epoll-4.1.48.Final-linux-x86_64.jar:/home/druid/druid/bin/../lib/druid-processing-0.21.1.jar:/home/druid/druid/bin/../lib/jline-0.9.94.jar:/home/druid/druid/bin/../lib/javax.inject-1.jar:/home/druid/druid/bin/../lib/jetty-util-9.4.40.v20210413.jar:/home/druid/druid/bin/../lib/jakarta.activation-api-1.2.1.jar:/home/druid/druid/bin/../lib/google-http-client-jackson2-1.26.0.jar:/home/druid/druid/bin/../lib/commons-math3-3.6.1.jar:/home/druid/druid/bin/../lib/netty-codec-http-4.1.48.Final.jar:/home/druid/druid/bin/../lib/j2objc-annotations-1.1.jar:/home/druid/druid/bin/../lib/joda-time-2.10.5.jar:/home/druid/druid/bin/../lib/httpcore-4.4.11.jar:/home/druid/druid/bin/../lib/zstd-jni-1.3.3-1.jar:/home/druid/druid/bin/../lib/sigar-1.6.5.132.jar:/home/druid/druid/bin/../lib/calcite-linq4j-1.21.0.jar:/home/druid/druid/bin/../lib/jetty-servlets-9.4.40.v20210413.jar:/home/druid/druid/bin/../lib/javax.el-api-3.0.0.jar:/home/druid/druid/bin/../lib/disruptor-3.3.6.jar:/home/druid/druid/bin/../lib/httpclient-4.5.10.jar:/home/druid/druid/bin/../lib/joni-2.1.27.jar:/home/druid/druid/bin/../lib/jackson-databind-2.10.5.1.jar:/home/druid/druid/bin/../lib/curator-x-discovery-4.3.0.jar:/home/druid/druid/bin/../lib/jackson-jq-0.0.10.jar -server -Xms1g -Xmx1g -XX:MaxDirectMemorySize=1g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Ddruid.s3.enablePathStyleAccess=true -Ddruid.indexer.logs.s3Bucket=logs -Ddruid.indexer.task.baseTaskDir=var/druid/task -Ddruid.host=10.0.1.22 -Ddruid.storage.bucket=deepstorage -Ddruid.metadata.storage.connector.password=<masked> -Ddruid.indexer.fork.property.druid.processing.numThreads=1 -Ddruid.emitter.logging.logLevel=info -Ddruid.storage.baseKey=druid/segments -Ddruid.emitter=noop -Ddruid.indexer.fork.property.druid.processing.buffer.sizeBytes=100MiB -Duser.timezone=UTC -Ddruid.indexer.logs.s3Prefix=druid/indexing-logs -Dfile.encoding.pkg=sun.io -Ddruid.selectors.coordinator.serviceName=druid/coordinator -Ddruid.selectors.indexing.serviceName=druid/overlord -Ddruid.indexing.doubleStorage=double -Ddruid.s3.protocol=http -Ddruid.lookup.enableLookupSyncOnStartup=false -Ddruid.server.http.numThreads=60 -Ddruid.worker.capacity=4 -Ddruid.s3.endpoint.url=http://10.0.0.8:9000 -Ddruid.s3.accessKey=minio_user -Ddruid.s3.secretKey=<masked> -Ddruid.service=druid/middleManager -Ddruid.metadata.storage.connector.user=druid -Ddruid.metadata.storage.type=postgresql -Ddruid.metadata.storage.connector.connectURI=jdbc:postgresql://10.0.0.7:5432/druid -Ddruid.plaintextPort=8091 -Djava.io.tmpdir=var/tmp -Ddruid.extensions.loadList=["druid-hdfs-storage", "druid-kafka-indexing-service", "druid-datasketches", "postgresql-metadata-storage", "druid-s3-extensions"] -Ddruid.sql.enable=true -Ddruid.startup.logging.logProperties=true -Ddruid.server.hiddenProperties=["druid.s3.accessKey","druid.s3.secretKey","druid.metadata.storage.connector.password"] -Ddruid.zk.service.host=10.0.0.6 -Ddruid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor"] -Ddruid.zk.paths.base=/druid -Dfile.encoding=UTF-8 -Ddruid.storage.type=s3 -Ddruid.indexer.task.hadoopWorkingPath=var/druid/hadoop-tmp -Ddruid.indexer.fork.property.druid.processing.numMergeBuffers=2 -Ddruid.indexer.logs.type=s3 -Ddruid.processing.numThreads=1 -Ddruid.processing.buffer.sizeBytes=100MiB -Ddruid.processing.numMergeBuffers=2 -Ddruid.metrics.emitter.dimension.dataSource=telemetry-signals-tagged -Ddruid.metrics.emitter.dimension.taskId=partial_index_generic_merge_telemetry-signals-tagged_bfcnapkm_2021-09-10T07:42:04.118Z -Ddruid.metrics.emitter.dimension.taskType=partial_index_generic_merge -Ddruid.host=10.0.1.22 -Ddruid.plaintextPort=8101 -Ddruid.tlsPort=-1 -Ddruid.task.executor.service=druid/middleManager -Ddruid.task.executor.host=10.0.1.22 -Ddruid.task.executor.plaintextPort=8091 -Ddruid.task.executor.enablePlaintextPort=true -Ddruid.task.executor.tlsPort=-1 -Ddruid.task.executor.enableTlsPort=false org.apache.druid.cli.Main internal peon var/druid/task/partial_index_generic_merge_telemetry-signals-tagged_bfcnapkm_2021-09-10T07:42:04.118Z/task.json var/druid/task/partial_index_generic_merge_telemetry-signals-tagged_bfcnapkm_2021-09-10T07:42:04.118Z/c2a142c8-aed5-4857-87e8-fba9cab35fa7/status.json var/druid/task/partial_index_generic_merge_telemetry-signals-tagged_bfcnapkm_2021-09-10T07:42:04.118Z/c2a142c8-aed5-4857-87e8-fba9cab35fa7/report.json
2021-09-10T07:42:04,165 INFO [forking-task-runner-0] org.apache.druid.indexing.overlord.ForkingTaskRunner - Logging task partial_index_generic_merge_telemetry-signals-tagged_bfcnapkm_2021-09-10T07:42:04.118Z output to: var/druid/task/partial_index_generic_merge_telemetry-signals-tagged_bfcnapkm_2021-09-10T07:42:04.118Z/log
2021-09-10T07:42:08,235 INFO [forking-task-runner-0-[partial_index_generic_merge_telemetry-signals-tagged_bfcnapkm_2021-09-10T07:42:04.118Z]] org.apache.druid.indexing.overlord.ForkingTaskRunner - Process exited with status[0] for task: partial_index_generic_merge_telemetry-signals-tagged_bfcnapkm_2021-09-10T07:42:04.118Z
2021-09-10T07:42:08,235 INFO [forking-task-runner-0] org.apache.druid.storage.s3.S3TaskLogs - Pushing task log var/druid/task/partial_index_generic_merge_telemetry-signals-tagged_bfcnapkm_2021-09-10T07:42:04.118Z/log to: druid/indexing-logs/partial_index_generic_merge_telemetry-signals-tagged_bfcnapkm_2021-09-10T07:42:04.118Z/log
2021-09-10T07:42:08,238 INFO [forking-task-runner-0] org.apache.druid.storage.s3.S3Utils - Pushing [var/druid/task/partial_index_generic_merge_telemetry-signals-tagged_bfcnapkm_2021-09-10T07:42:04.118Z/log] to bucket[logs] and key[druid/indexing-logs/partial_index_generic_merge_telemetry-signals-tagged_bfcnapkm_2021-09-10T07:42:04.118Z/log].
2021-09-10T07:42:08,253 INFO [forking-task-runner-0] org.apache.druid.indexing.overlord.ForkingTaskRunner - Removing task directory: var/druid/task/partial_index_generic_merge_telemetry-signals-tagged_bfcnapkm_2021-09-10T07:42:04.118Z
2021-09-10T07:42:08,265 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Task [partial_index_generic_merge_telemetry-signals-tagged_bfcnapkm_2021-09-10T07:42:04.118Z] completed with status [SUCCESS].
2021-09-10T07:42:09,583 INFO [forking-task-runner-3-[coordinator-issued_compact_telemetry-signals-tagged_mgcnenol_2021-09-10T07:41:50.317Z]] org.apache.druid.indexing.overlord.ForkingTaskRunner - Process exited with status[0] for task: coordinator-issued_compact_telemetry-signals-tagged_mgcnenol_2021-09-10T07:41:50.317Z
2021-09-10T07:42:09,583 INFO [forking-task-runner-3] org.apache.druid.storage.s3.S3TaskLogs - Pushing task log var/druid/task/coordinator-issued_compact_telemetry-signals-tagged_mgcnenol_2021-09-10T07:41:50.317Z/log to: druid/indexing-logs/coordinator-issued_compact_telemetry-signals-tagged_mgcnenol_2021-09-10T07:41:50.317Z/log
2021-09-10T07:42:09,588 INFO [forking-task-runner-3] org.apache.druid.storage.s3.S3Utils - Pushing [var/druid/task/coordinator-issued_compact_telemetry-signals-tagged_mgcnenol_2021-09-10T07:41:50.317Z/log] to bucket[logs] and key[druid/indexing-logs/coordinator-issued_compact_telemetry-signals-tagged_mgcnenol_2021-09-10T07:41:50.317Z/log].
2021-09-10T07:42:09,609 INFO [forking-task-runner-3] org.apache.druid.indexing.overlord.ForkingTaskRunner - Removing task directory: var/druid/task/coordinator-issued_compact_telemetry-signals-tagged_mgcnenol_2021-09-10T07:41:50.317Z
2021-09-10T07:42:09,624 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Task [coordinator-issued_compact_telemetry-signals-tagged_mgcnenol_2021-09-10T07:41:50.317Z] completed with status [SUCCESS].
2021-09-10T07:46:35,726 INFO [intermediary-data-manager-0] org.apache.druid.indexing.worker.shuffle.IntermediaryDataManager - Found [1] expired supervisor tasks
2021-09-10T07:46:35,734 INFO [intermediary-data-manager-0] org.apache.druid.indexing.worker.shuffle.IntermediaryDataManager - Cleaning up [var/tmp/intermediary-segments/coordinator-issued_compact_telemetry-signals-tagged_mgcnenol_2021-09-10T07:41:50.317Z]
2021-09-10T07:46:35,734 WARN [intermediary-data-manager-0] org.apache.druid.segment.loading.StorageLocation - File[var/tmp/intermediary-segments/coordinator-issued_compact_telemetry-signals-tagged_mgcnenol_2021-09-10T07:41:50.317Z/2021-09-10T08:00:00.000Z/2021-09-10T09:00:00.000Z/0/partial_index_generate_telemetry-signals-tagged_aoojedjb_2021-09-10T07:41:59.093Z] is not found under this location[var/tmp/intermediary-segments]

coordinator-overlord.log

2021-09-10T08:04:54,502 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.BalanceSegments - Found 2 active servers, 0 decommissioning servers
2021-09-10T08:04:54,502 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.BalanceSegments - Processing 4 segments for moving from decommissioning servers
2021-09-10T08:04:54,502 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.BalanceSegments - All servers to move segments from are empty, ending run.
2021-09-10T08:04:54,502 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.BalanceSegments - Processing 5 segments for balancing between active servers
2021-09-10T08:04:54,894 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.BalanceSegments - No good moves found in tier [_default_tier]
2021-09-10T08:04:54,894 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.BalanceSegments - [_default_tier]: Segments Moved: [0] Segments Let Alone: [5]
2021-09-10T08:04:54,894 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics - [_default_tier] : Assigned 0 segments among 2 servers
2021-09-10T08:04:54,894 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics - [_default_tier] : Dropped 0 segments among 2 servers
2021-09-10T08:04:54,894 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics - [_default_tier] : Moved 0 segment(s)
2021-09-10T08:04:54,895 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics - [_default_tier] : Let alone 5 segment(s)
2021-09-10T08:04:54,895 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics - Load Queues:
2021-09-10T08:04:54,895 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics - Server[10.0.1.22:8083, historical, _default_tier] has 0 left to load, 0 left to drop, 0 bytes queued, 3,784,943,302 bytes served.
2021-09-10T08:04:54,895 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.EmitClusterStatsAndMetrics - Server[10.0.1.21:8083, historical, _default_tier] has 0 left to load, 0 left to drop, 0 bytes queued, 3,784,943,302 bytes served.
2021-09-10T08:04:55,363 INFO [KafkaSupervisor-telemetry-signals-tagged-Reporting-0] org.apache.kafka.clients.consumer.internals.SubscriptionState - [Consumer clientId=consumer-kafka-supervisor-lhigmjmm-1, groupId=kafka-supervisor-lhigmjmm] Seeking to LATEST offset of partition telemetry-signals-tagged-0
2021-09-10T08:04:55,366 INFO [KafkaSupervisor-telemetry-signals-tagged-Reporting-0] org.apache.kafka.clients.consumer.internals.SubscriptionState - [Consumer clientId=consumer-kafka-supervisor-lhigmjmm-1, groupId=kafka-supervisor-lhigmjmm] Resetting offset for partition telemetry-signals-tagged-0 to position FetchPosition{offset=110318280, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[10.0.1.2:9092 (id: 0 rack: null)], epoch=0}}.
2021-09-10T08:04:55,393 INFO [KafkaSupervisor-server-logs-Reporting-0] org.apache.kafka.clients.consumer.internals.SubscriptionState - [Consumer clientId=consumer-kafka-supervisor-bdhdcnmn-2, groupId=kafka-supervisor-bdhdcnmn] Seeking to LATEST offset of partition server-logs-0
2021-09-10T08:04:55,396 INFO [KafkaSupervisor-server-logs-Reporting-0] org.apache.kafka.clients.consumer.internals.SubscriptionState - [Consumer clientId=consumer-kafka-supervisor-bdhdcnmn-2, groupId=kafka-supervisor-bdhdcnmn] Resetting offset for partition server-logs-0 to position FetchPosition{offset=3410316, offsetEpoch=Optional.empty, currentLeader=LeaderAndEpoch{leader=Optional[10.0.1.2:9092 (id: 0 rack: null)], epoch=0}}.
2021-09-10T08:04:56,958 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.duty.LogUsedSegments - Found [72,425] used segments.
2021-09-10T08:04:57,388 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.ReplicationThrottler - [_default_tier]: Replicant create queue is empty.

zk.log

2021-09-10T07:42:02,117 INFO [SyncThread:0] org.apache.zookeeper.server.ZooKeeperServer - Established session 0x1000035522e0129 with negotiated timeout 30000 for client /10.0.1.22:43174
2021-09-10T07:42:02,968 INFO [ProcessThread(sid:0 cport:2181):] org.apache.zookeeper.server.PrepRequestProcessor - Processed session termination for sessionid: 0x1000035522e0129
2021-09-10T07:42:02,970 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /10.0.1.22:43174 which had sessionid 0x1000035522e0129
2021-09-10T07:42:07,162 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] org.apache.zookeeper.server.NIOServerCnxnFactory - Accepted socket connection from /10.0.1.22:43260
2021-09-10T07:42:07,164 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] org.apache.zookeeper.server.ZooKeeperServer - Client attempting to establish new session at /10.0.1.22:43260
2021-09-10T07:42:07,166 INFO [SyncThread:0] org.apache.zookeeper.server.ZooKeeperServer - Established session 0x1000035522e012a with negotiated timeout 30000 for client /10.0.1.22:43260
2021-09-10T07:42:07,816 INFO [ProcessThread(sid:0 cport:2181):] org.apache.zookeeper.server.PrepRequestProcessor - Processed session termination for sessionid: 0x1000035522e012a
2021-09-10T07:42:07,818 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /10.0.1.22:43260 which had sessionid 0x1000035522e012a
2021-09-10T07:42:09,192 INFO [ProcessThread(sid:0 cport:2181):] org.apache.zookeeper.server.PrepRequestProcessor - Processed session termination for sessionid: 0x1000035522e0127
2021-09-10T07:42:09,208 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] org.apache.zookeeper.server.NIOServerCnxn - Closed socket connection for client /10.0.1.22:43042 which had sessionid 0x1000035522e0127

Wait I can just disable auto compaction and run it manually as described in the compaction tutorial. I’m playing around with that now and will report back.

This compaction best practices article is leading me towards the fact that my problem seems to be the number of available slots, both available for compaction and available in general.

My total worker capacity is 8, since I have two data servers that both have their worker capacity set to 4. However the middlemanager configuration documentation says the default for druid.worker.capacity should be the “Number of CPUs on the machine - 1”, which clearly isn’t the case in this machine: it has 8 physical cores, that’s 16 with hyper threading.

I’ve commented out the line with druid.worker.capacity in conf/druid/cluster/data/middleManager/runtime.properties and the data servers now report 15 worker slots each.

I’ll see if my compaction jobs run now! This looks promising.

Using this improved configuration I was able to run manual compaction tasks for all segments until about 2 weeks ago. That is when my manual compaction tasks ran into an issue where they would never start. The log file said

Cannot create a new taskLockPosse for request[TimeChunkLockRequest{lockType=EXCLUSIVE, ... }] TaskLockPosse [ index ] have same or higher priorities

Which makes sense: rows arrive at this server very randomly and days and weeks late, so the index processes have to lock pretty far into the past to be able to write to segments.

Which also explains why my automatic compaction tasks have been failing: They were unable to acquire a lock on the newest segments (I set the offset to P3D) and never came around to the older segments. I set the offset to P30D now. Here’s hoping this will work in combination with the increased worker capacity.

Thanks for coming along for the ride :slight_smile:

2 Likes