Compaction always failing

Relates to Apache Druid version 0.22.1

All coordinator compaction tasks are failing with:
CompactionTask - indexSpec is not ready

My Setup

We have a cluster with 2 masters, 1 query, and 2 data servers with around 28GB of data with roughly 800 segments.

I have 4 data sources set up, each with mostly the default auto-compaction setup. Any deviation from the default values would be me trying to get compaction to work, which it never has.

The only reference I can find with Google for indexSpec is not ready is to the java code in the Github repository.

Any idea what this could be getting caused by?

Compaction Config
{
	"dataSource"           : "datasource1",
	"taskPriority"         : 25,
	"inputSegmentSizeBytes": 419430400,
	"maxRowsPerSegment"    : null,
	"skipOffsetFromLatest" : "P3D",
	"tuningConfig"         : {
		"maxRowsInMemory"                 : null,
		"maxBytesInMemory"                : null,
		"maxTotalRows"                    : null,
		"splitHintSpec"                   : null,
		"partitionsSpec"                  : {"type": "dynamic", "maxRowsPerSegment": 25000000, "maxTotalRows": 50000000},
		"indexSpec"                       : null,
		"indexSpecForIntermediatePersists": null,
		"maxPendingPersists"              : null,
		"pushTimeout"                     : null,
		"segmentWriteOutMediumFactory"    : null,
		"maxNumConcurrentSubTasks"        : null,
		"maxRetry"                        : null,
		"taskStatusCheckPeriodMs"         : null,
		"chatHandlerTimeout"              : null,
		"chatHandlerNumRetries"           : null,
		"maxNumSegmentsToMerge"           : null,
		"totalNumMergeTasks"              : null,
		"forceGuaranteedRollup"           : false,
		"type"                            : "index_parallel"
	},
	"granularitySpec"      : null,
	"ioConfig"             : null,
	"taskContext"          : null
}

Full Log from the failed task

2022-07-21T21:03:24,926 INFO [NodeRoleWatcher[COORDINATOR]] org.apache.druid.discovery.BaseNodeRoleWatcher - Node[http://druid-master-02:8081] of role[coordinator] detected.
2022-07-21T21:03:24,927 INFO [NodeRoleWatcher[COORDINATOR]] org.apache.druid.discovery.BaseNodeRoleWatcher - Node[http://druid-master-01:8081] of role[coordinator] detected.
2022-07-21T21:03:24,927 INFO [NodeRoleWatcher[COORDINATOR]] org.apache.druid.discovery.BaseNodeRoleWatcher - Node watcher of role[coordinator] is now initialized.
2022-07-21T21:03:25,048 INFO [main] org.apache.druid.discovery.DruidLeaderClient - Request[http://druid-master-01:8081/druid/indexer/v1/action] received redirect response to location [http://druid-master-02:8081/druid/indexer/v1/action].
2022-07-21T21:03:25,097 INFO [main] org.apache.druid.indexing.common.task.AbstractBatchIndexTask - [forceTimeChunkLock] is set to true in task context. Use timeChunk lock
2022-07-21T21:03:25,123 INFO [main] org.apache.druid.segment.loading.SegmentLocalCacheManager - Using storage location strategy: [LeastBytesUsedStorageLocationSelectorStrategy]
2022-07-21T21:03:25,125 INFO [task-runner-0-priority-0] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Running task: coordinator-issued_compact_datasource1_npfkagdk_2022-07-21T21:03:19.969Z
2022-07-21T21:03:25,129 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [SERVER]
2022-07-21T21:03:25,134 INFO [main] org.eclipse.jetty.server.Server - jetty-9.4.40.v20210413; built: 2021-04-13T20:42:42.668Z; git: b881a572662e1943a14ae12e7e1207989f218b74; jvm 1.8.0_265-b01
2022-07-21T21:03:25,164 INFO [task-runner-0-priority-0] org.apache.druid.storage.google.GoogleDataSegmentPuller - Pulling index at bucket[rg-druid-coldstorage] path[dev/datasource1/2022-07-18T00:00:00.000-05:00_2022-07-19T00:00:00.000-05:00/2022-07-21T19:22:39.929Z/0/index.zip] to outDir[/var/druid/task/coordinator-issued_compact_datasource1_npfkagdk_2022-07-21T21:03:19.969Z/work/datasource1/2022-07-18T05:00:00.000Z_2022-07-19T05:00:00.000Z/2022-07-21T19:22:39.929Z/0]
2022-07-21T21:03:25,170 INFO [main] org.eclipse.jetty.server.session - DefaultSessionIdManager workerName=node0
2022-07-21T21:03:25,170 INFO [main] org.eclipse.jetty.server.session - No SessionScavenger set, using defaults
2022-07-21T21:03:25,172 INFO [main] org.eclipse.jetty.server.session - node0 Scavenging every 600000ms
2022-07-21T21:03:25,275 INFO [main] com.sun.jersey.server.impl.application.WebApplicationImpl - Initiating Jersey application, version 'Jersey: 1.19.3 10/24/2016 03:43 PM'
2022-07-21T21:03:26,139 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@26f5e45d{/,null,AVAILABLE}
2022-07-21T21:03:26,164 INFO [main] org.eclipse.jetty.server.AbstractConnector - Started ServerConnector@1ad1be5d{HTTP/1.1, (http/1.1)}{0.0.0.0:8100}
2022-07-21T21:03:26,164 INFO [main] org.eclipse.jetty.server.Server - Started @6155ms
2022-07-21T21:03:26,164 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [ANNOUNCEMENTS]
2022-07-21T21:03:26,164 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Successfully started lifecycle [module]
2022-07-21T21:03:26,239 INFO [task-runner-0-priority-0] org.apache.druid.storage.google.GoogleDataSegmentPuller - Loaded 3417335 bytes from [dev/datasource1/2022-07-18T00:00:00.000-05:00_2022-07-19T00:00:00.000-05:00/2022-07-21T19:22:39.929Z/0/index.zip] to [/var/druid/task/coordinator-issued_compact_datasource1_npfkagdk_2022-07-21T21:03:19.969Z/work/datasource1/2022-07-18T05:00:00.000Z_2022-07-19T05:00:00.000Z/2022-07-21T19:22:39.929Z/0]
2022-07-21T21:03:26,410 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.CompactionTask - Generate compaction task spec with segments original query granularity [{type=period, period=P1D, timeZone=UTC, origin=null}]
2022-07-21T21:03:26,578 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.CompactionTask - Generated [1] compaction task specs
2022-07-21T21:03:26,629 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.AbstractBatchIndexTask - [forceTimeChunkLock] is set to true in task context. Use timeChunk lock
2022-07-21T21:03:26,638 WARN [task-runner-0-priority-0] org.apache.druid.indexing.common.task.CompactionTask - indexSpec is not ready: [{
  "type" : "index_parallel",
  "id" : "coordinator-issued_compact_datasource1_npfkagdk_2022-07-21T21:03:19.969Z",
  "groupId" : "coordinator-issued_compact_datasource1_npfkagdk_2022-07-21T21:03:19.969Z",
  "resource" : {
    "availabilityGroup" : "coordinator-issued_compact_datasource1_npfkagdk_2022-07-21T21:03:19.969Z",
    "requiredCapacity" : 1
  },
  "spec" : {
    "dataSchema" : {
      "dataSource" : "datasource1",
      "timestampSpec" : {
        "column" : "__time",
        "format" : "millis",
        "missingValue" : null
      },
      "dimensionsSpec" : {
        "dimensions" : [ {
          "type" : "string",
          "name" : "DomainGroup",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "CampaignSegment",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "List",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "NetBlockISP",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "DeliveryType",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "SendingDomain",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "UserCountry",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "CampaignSubSegment",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "EventType",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "long",
          "name" : "DeliveryDate",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : false
        }, {
          "type" : "long",
          "name" : "ListAddedDate",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : false
        }, {
          "type" : "string",
          "name" : "DomainGroupLimit",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "DomainGroupQuality",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "SendingIp",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "ListSource",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "ListCountry",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "UserListSegment",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "UserIpAddress",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "long",
          "name" : "DeliveryID",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : false
        }, {
          "type" : "long",
          "name" : "EmailListID",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : false
        }, {
          "type" : "long",
          "name" : "MtaProfileQueueRevID",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : false
        }, {
          "type" : "long",
          "name" : "NetBlockID",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : false
        }, {
          "type" : "string",
          "name" : "SourceVertical",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "Client",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "CampaignTitle",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "TriggerType",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "TriggerPurpose",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "IpAddressID",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "ClientID",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "MasterEmailListID",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "MtaProfileID",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "VirtualListID",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        }, {
          "type" : "string",
          "name" : "EmailTriggerID",
          "multiValueHandling" : "SORTED_ARRAY",
          "createBitmapIndex" : true
        } ],
        "dimensionExclusions" : [ "__time", "cnt", "Revenue" ]
      },
      "metricsSpec" : [ {
        "type" : "floatSum",
        "name" : "Revenue",
        "fieldName" : "Revenue",
        "expression" : null
      }, {
        "type" : "longSum",
        "name" : "cnt",
        "fieldName" : "cnt",
        "expression" : null
      } ],
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "DAY",
        "queryGranularity" : "DAY",
        "rollup" : true,
        "intervals" : [ "2022-07-18T05:00:00.000Z/2022-07-19T05:00:00.000Z" ]
      },
      "transformSpec" : {
        "filter" : null,
        "transforms" : [ ]
      }
    },
    "ioConfig" : {
      "type" : "index_parallel",
      "inputSource" : {
        "type" : "druid",
        "dataSource" : "datasource1",
        "interval" : "2022-07-18T05:00:00.000Z/2022-07-19T05:00:00.000Z"
      },
      "inputFormat" : null,
      "appendToExisting" : false,
      "dropExisting" : false
    },
    "tuningConfig" : {
      "type" : "compaction",
      "maxRowsPerSegment" : 25000000,
      "appendableIndexSpec" : {
        "type" : "onheap"
      },
      "maxRowsInMemory" : 1000000,
      "maxBytesInMemory" : 0,
      "skipBytesInMemoryOverheadCheck" : false,
      "maxTotalRows" : 50000000,
      "numShards" : null,
      "splitHintSpec" : null,
      "partitionsSpec" : {
        "type" : "dynamic",
        "maxRowsPerSegment" : 25000000,
        "maxTotalRows" : 50000000
      },
      "indexSpec" : {
        "bitmap" : {
          "type" : "roaring",
          "compressRunOnSerialization" : true
        },
        "dimensionCompression" : "lz4",
        "metricCompression" : "lz4",
        "longEncoding" : "longs",
        "segmentLoader" : null
      },
      "indexSpecForIntermediatePersists" : {
        "bitmap" : {
          "type" : "roaring",
          "compressRunOnSerialization" : true
        },
        "dimensionCompression" : "lz4",
        "metricCompression" : "lz4",
        "longEncoding" : "longs",
        "segmentLoader" : null
      },
      "maxPendingPersists" : 0,
      "forceGuaranteedRollup" : false,
      "reportParseExceptions" : false,
      "pushTimeout" : 0,
      "segmentWriteOutMediumFactory" : null,
      "maxNumConcurrentSubTasks" : 1,
      "maxRetry" : 3,
      "taskStatusCheckPeriodMs" : 1000,
      "chatHandlerTimeout" : "PT10S",
      "chatHandlerNumRetries" : 5,
      "maxNumSegmentsToMerge" : 100,
      "totalNumMergeTasks" : 10,
      "logParseExceptions" : false,
      "maxParseExceptions" : 2147483647,
      "maxSavedParseExceptions" : 0,
      "maxColumnsToMerge" : -1,
      "awaitSegmentAvailabilityTimeoutMillis" : 0,
      "partitionDimensions" : [ ]
    }
  },
  "context" : {
    "useLineageBasedSegmentAllocation" : true,
    "storeCompactionState" : true,
    "appenderatorTrackingTaskId" : "coordinator-issued_compact_datasource1_npfkagdk_2022-07-21T21:03:19.969Z",
    "priority" : 25,
    "forceTimeChunkLock" : true
  },
  "dataSource" : "datasource1"
}].
Trying the next indexSpec.
2022-07-21T21:03:26,638 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.CompactionTask - Ran [1] specs, [0] succeeded, [1] failed
2022-07-21T21:03:26,642 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "coordinator-issued_compact_datasource1_npfkagdk_2022-07-21T21:03:19.969Z",
  "status" : "FAILED",
  "duration" : 1513,
  "errorMsg" : "Ran [1] specs, [0] succeeded, [1] failed",
  "location" : {
    "host" : null,
    "port" : -1,
    "tlsPort" : -1
  }
}
2022-07-21T21:03:26,648 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]
2022-07-21T21:03:26,649 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER]
2022-07-21T21:03:26,657 INFO [main] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@1ad1be5d{HTTP/1.1, (http/1.1)}{0.0.0.0:8100}
2022-07-21T21:03:26,658 INFO [main] org.eclipse.jetty.server.session - node0 Stopped scavenging
2022-07-21T21:03:26,659 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@26f5e45d{/,null,STOPPED}
2022-07-21T21:03:26,673 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [NORMAL]
2022-07-21T21:03:26,674 INFO [main] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Starting graceful shutdown of task[coordinator-issued_compact_datasource1_npfkagdk_2022-07-21T21:03:19.969Z].
2022-07-21T21:03:26,696 INFO [LookupExtractorFactoryContainerProvider-MainThread] org.apache.druid.query.lookup.LookupReferencesManager - Lookup Management loop exited. Lookup notices are not handled anymore.
2022-07-21T21:03:26,703 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting
2022-07-21T21:03:26,810 INFO [main] org.apache.zookeeper.ZooKeeper - Session: 0x205e5d1180d559b closed
2022-07-21T21:03:26,810 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x205e5d1180d559b
2022-07-21T21:03:26,824 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [INIT]
Finished peon task

Welcome @cpriest! I’ve heard about this happening when historicals aren’t evenly balanced. Could that be the case here?

@cpriest (welcome to the Forum!!) I did a bit of digging around under desks and I believe this error crops up when the task itself can’t get all the segments that it needs in the period that’s being asked for. Are you able to determine if all the segments that will be covered by the interval for this task (2022-07-18T05:00:00.000Z/2022-07-19T05:00:00.000Z) are fully loaded and available in the cluster?

And also, is that period being written to by another process? I believe the compaction task will try and get a lock on the entire period to be compacted. (Incidentally, there’s an experimental switch available re: locking – Task reference · Apache Druid )

I wonder if you might try and reduce the inputSegmentSizeBytes so that the autocompaction will only want to go for segments that are smaller than, say, 100000000 versus 419430400 bytes. Then you could work your way up, maybe?

If I understand auto compaction correctly it is keeping up with the most recent data; in our use case, the most recent data is being overwritten about every 5 minutes. We’re using segment sizes of P1D to have segments that aren’t MB in size.

I’ll try the inputSegmentSizeBytes and look into that experimental switch and see what I get.

As you have:

just checking that you know this means the coordinator will only look for data to compact that is older than 3 days. But I think you know that already… ?? :smiley: just thought I’d check :smiley:

What mechanism are you using to do this out of interest?

Batch import from cloud bucket exports

I did not know that; now it is confusing as to why it would fail due to locking; three-day-old data should be pretty static.

So I’m just trying a manual compaction task with this:

{
	"type"        : "compact",
	"dataSource"  : "esp-rejects-prod",
	"ioConfig"    : {
		"type"     : "compact",
		"inputSpec": {
			"type"    : "interval",
			"interval": "2022-07-01T05:00:00Z/2022-07-02T05:00:00Z",
			"timeZone": "CST6CDT"
		}
	},
	"tuningConfig": {
		"type"                    : "index_parallel",
		"maxRowsPerSegment"       : 50000000,
		"maxRowsInMemory"         : 1000000000,
		"partitionsSpec"          : {
			"type"             : "dynamic",
			"maxRowsPerSegment": 50000000,
			"maxTotalRows"     : 922337203685477580
		},
		"maxNumConcurrentSubTasks": 1
	},
	"context": {
		"useLineageBasedSegmentAllocation": true,
		"storeCompactionState"            : true,
		"priority"                        : 25
	}
}

Which is getting rejected in the overlord log I believe because of this line, which is super confusing:

2022-07-22T16:34:21,240 INFO [qtp1296450832-103] org.apache.druid.indexing.overlord.TaskLockbox - Cannot create a new taskLockPosse for request[TimeChunkLockRequest{lockType=EXCLUSIVE, groupId='compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z', dataSource='esp-rejects-prod', interval=2022-07-01T00:00:00.000Z/2022-07-03T00:00:00.000Z, preferredVersion='null', priority=25, revoked=false}] because existing locks[[TaskLockPosse{taskLock=TimeChunkLock{type=EXCLUSIVE, groupId='compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z', dataSource='esp-rejects-prod', interval=2022-07-01T05:00:00.000Z/2022-07-02T05:00:00.000Z, version='2022-07-22T16:34:14.999Z', priority=25, revoked=false}, taskIds=[compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z]}]] have same or higher priorities

That makes it seem like it’s refusing the lock because the same task already has the lock?

Full Log:

2022-07-22T16:34:14,986 INFO [qtp1296450832-94] org.apache.druid.indexing.overlord.MetadataTaskStorage - Inserting task compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z with status: TaskStatus{id=compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z, status=RUNNING, duration=-1, errorMsg=null}
2022-07-22T16:34:14,996 INFO [qtp1296450832-94] org.apache.druid.indexing.overlord.TaskLockbox - Adding task[compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z] to activeTasks
2022-07-22T16:34:14,999 INFO [TaskQueue-Manager] org.apache.druid.indexing.common.task.AbstractBatchIndexTask - [forceTimeChunkLock] is set to true in task context. Use timeChunk lock
2022-07-22T16:34:14,999 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.TaskLockbox - Added task[compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z] to TaskLock[TimeChunkLock{type=EXCLUSIVE, groupId='compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z', dataSource='esp-rejects-prod', interval=2022-07-01T05:00:00.000Z/2022-07-02T05:00:00.000Z, version='2022-07-22T16:34:14.999Z', priority=25, revoked=false}]
2022-07-22T16:34:14,999 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.MetadataTaskStorage - Adding lock on interval[2022-07-01T05:00:00.000Z/2022-07-02T05:00:00.000Z] version[2022-07-22T16:34:14.999Z] for task: compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z
2022-07-22T16:34:15,009 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.TaskQueue - Asking taskRunner to run: compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z
2022-07-22T16:34:15,009 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.RemoteTaskRunner - Added pending task compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z
2022-07-22T16:34:15,009 INFO [rtr-pending-tasks-runner-0] org.apache.druid.indexing.overlord.RemoteTaskRunner - Coordinator asking Worker[gce-druid-data-2:8091] to add task[compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z]
2022-07-22T16:34:15,015 INFO [rtr-pending-tasks-runner-0] org.apache.druid.indexing.overlord.RemoteTaskRunner - Task compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z switched from pending to running (on [gce-druid-data-2:8091])
2022-07-22T16:34:15,028 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[gce-druid-data-2:8091] wrote RUNNING status for task [compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z] on [TaskLocation{host='null', port=-1, tlsPort=-1}]
2022-07-22T16:34:15,034 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[gce-druid-data-2:8091] wrote RUNNING status for task [compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z] on [TaskLocation{host='gce-druid-data-2', port=8100, tlsPort=-1}]
2022-07-22T16:34:19,826 INFO [qtp1296450832-115] org.apache.druid.indexing.overlord.TaskLockbox - Task[compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z] already present in TaskLock[compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z]
2022-07-22T16:34:21,240 INFO [qtp1296450832-103] org.apache.druid.indexing.overlord.TaskLockbox - Cannot create a new taskLockPosse for request[TimeChunkLockRequest{lockType=EXCLUSIVE, groupId='compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z', dataSource='esp-rejects-prod', interval=2022-07-01T00:00:00.000Z/2022-07-03T00:00:00.000Z, preferredVersion='null', priority=25, revoked=false}] because existing locks[[TaskLockPosse{taskLock=TimeChunkLock{type=EXCLUSIVE, groupId='compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z', dataSource='esp-rejects-prod', interval=2022-07-01T05:00:00.000Z/2022-07-02T05:00:00.000Z, version='2022-07-22T16:34:14.999Z', priority=25, revoked=false}, taskIds=[compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z]}]] have same or higher priorities
2022-07-22T16:34:21,883 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[gce-druid-data-2:8091] wrote FAILED status for task [compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z] on [TaskLocation{host='gce-druid-data-2', port=8100, tlsPort=-1}]
2022-07-22T16:34:21,883 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[gce-druid-data-2:8091] completed task[compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z] with status[FAILED]
2022-07-22T16:34:21,883 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.TaskQueue - Received FAILED status for task: compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z
2022-07-22T16:34:21,883 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.RemoteTaskRunner - Shutdown [compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z] because: [notified status change from task]
2022-07-22T16:34:21,883 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.RemoteTaskRunner - Cleaning up task[compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z] on worker[gce-druid-data-2:8091]
2022-07-22T16:34:21,886 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z] from activeTasks
2022-07-22T16:34:21,886 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z] from TaskLock[TimeChunkLock{type=EXCLUSIVE, groupId='compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z', dataSource='esp-rejects-prod', interval=2022-07-01T05:00:00.000Z/2022-07-02T05:00:00.000Z, version='2022-07-22T16:34:14.999Z', priority=25, revoked=false}]
2022-07-22T16:34:21,886 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.TaskLockbox - TaskLock is now empty: TimeChunkLock{type=EXCLUSIVE, groupId='compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z', dataSource='esp-rejects-prod', interval=2022-07-01T05:00:00.000Z/2022-07-02T05:00:00.000Z, version='2022-07-22T16:34:14.999Z', priority=25, revoked=false}
2022-07-22T16:34:21,888 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.MetadataTaskStorage - Deleting TaskLock with id[156091]: TimeChunkLock{type=EXCLUSIVE, groupId='compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z', dataSource='esp-rejects-prod', interval=2022-07-01T05:00:00.000Z/2022-07-02T05:00:00.000Z, version='2022-07-22T16:34:14.999Z', priority=25, revoked=false}
2022-07-22T16:34:21,900 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.MetadataTaskStorage - Updating task compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z to status: TaskStatus{id=compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z, status=FAILED, duration=6854, errorMsg=Ran [1] specs, [0] succeeded, [1] failed}
2022-07-22T16:34:21,911 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.TaskQueue - Task done: AbstractTask{id='compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z', groupId='compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z', taskResource=TaskResource{availabilityGroup='compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z', requiredCapacity=1}, dataSource='esp-rejects-prod', context={useLineageBasedSegmentAllocation=true, storeCompactionState=true, priority=25, forceTimeChunkLock=true}}
2022-07-22T16:34:21,911 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.TaskQueue - Task FAILED: AbstractTask{id='compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z', groupId='compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z', taskResource=TaskResource{availabilityGroup='compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z', requiredCapacity=1}, dataSource='esp-rejects-prod', context={useLineageBasedSegmentAllocation=true, storeCompactionState=true, priority=25, forceTimeChunkLock=true}} (6854 run duration)
2022-07-22T16:34:21,911 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.RemoteTaskRunner - Task[compact_esp-rejects-prod_bikaplfa_2022-07-22T16:34:14.982Z] went bye bye.
2022-07-22T16:34:36,093 INFO [LookupCoordinatorManager--5] org.apache.druid.server.lookup.cache.LookupCoordinatorManager - Not updating lookups because no data exists
2022-07-22T16:34:38,298 INFO [TaskQueue-StorageSync] org.apache.druid.indexing.overlord.TaskQueue - Synced 2 tasks from storage (0 tasks added, 0 tasks removed).
2022-07-22T16:34:41,058 INFO [qtp1296450832-111] org.apache.druid.indexing.overlord.MetadataTaskStorage - Inserting task index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z with status: TaskStatus{id=index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z, status=RUNNING, duration=-1, errorMsg=null}
2022-07-22T16:34:41,070 INFO [qtp1296450832-111] org.apache.druid.indexing.overlord.TaskLockbox - Adding task[index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z] to activeTasks
2022-07-22T16:34:41,070 INFO [TaskQueue-Manager] org.apache.druid.indexing.common.task.AbstractBatchIndexTask - [forceTimeChunkLock] is set to true in task context. Use timeChunk lock
2022-07-22T16:34:41,070 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.TaskLockbox - Added task[index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z] to TaskLock[TimeChunkLock{type=EXCLUSIVE, groupId='index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z', dataSource='esp-revenue-prod', interval=2022-07-20T00:00:00.000-05:00/2022-07-21T00:00:00.000-05:00, version='2022-07-22T16:34:41.070Z', priority=50, revoked=false}]
2022-07-22T16:34:41,070 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.MetadataTaskStorage - Adding lock on interval[2022-07-20T00:00:00.000-05:00/2022-07-21T00:00:00.000-05:00] version[2022-07-22T16:34:41.070Z] for task: index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z
2022-07-22T16:34:41,079 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.TaskQueue - Asking taskRunner to run: index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z
2022-07-22T16:34:41,079 INFO [TaskQueue-Manager] org.apache.druid.indexing.overlord.RemoteTaskRunner - Added pending task index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z
2022-07-22T16:34:41,079 INFO [rtr-pending-tasks-runner-0] org.apache.druid.indexing.overlord.RemoteTaskRunner - Coordinator asking Worker[gce-druid-data-2:8091] to add task[index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z]
2022-07-22T16:34:41,084 INFO [rtr-pending-tasks-runner-0] org.apache.druid.indexing.overlord.RemoteTaskRunner - Task index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z switched from pending to running (on [gce-druid-data-2:8091])
2022-07-22T16:34:41,103 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[gce-druid-data-2:8091] wrote RUNNING status for task [index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z] on [TaskLocation{host='null', port=-1, tlsPort=-1}]
2022-07-22T16:34:41,110 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[gce-druid-data-2:8091] wrote RUNNING status for task [index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z] on [TaskLocation{host='gce-druid-data-2', port=8100, tlsPort=-1}]
2022-07-22T16:34:46,088 INFO [qtp1296450832-103] org.apache.druid.indexing.overlord.TaskLockbox - Task[index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z] already present in TaskLock[index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z]
2022-07-22T16:34:46,136 INFO [qtp1296450832-90] org.apache.druid.indexing.overlord.TaskLockbox - Task[index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z] already present in TaskLock[index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z]
2022-07-22T16:34:56,321 INFO [DatabaseRuleManager-Exec--0] org.apache.druid.metadata.SQLMetadataRuleManager - Polled and found 1 rule(s) for 1 datasource(s)
2022-07-22T16:34:59,269 INFO [org.apache.druid.metadata.SqlSegmentsMetadataManager-Exec--0] org.apache.druid.metadata.SqlSegmentsMetadataManager - Polled and found 32,816 segments in the database
2022-07-22T16:35:04,361 INFO [qtp1296450832-103] org.apache.druid.metadata.IndexerSQLMetadataStorageCoordinator - Found these segments already exist in DB: []
2022-07-22T16:35:04,363 INFO [qtp1296450832-103] org.apache.druid.metadata.IndexerSQLMetadataStorageCoordinator - Published segments to DB: [esp-revenue-prod_2022-07-20T05:00:00.000Z_2022-07-21T05:00:00.000Z_2022-07-22T16:34:41.070Z]
2022-07-22T16:35:05,234 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[gce-druid-data-2:8091] wrote SUCCESS status for task [index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z] on [TaskLocation{host='gce-druid-data-2', port=8100, tlsPort=-1}]
2022-07-22T16:35:05,234 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[gce-druid-data-2:8091] completed task[index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z] with status[SUCCESS]
2022-07-22T16:35:05,234 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.TaskQueue - Received SUCCESS status for task: index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z
2022-07-22T16:35:05,234 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.RemoteTaskRunner - Shutdown [index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z] because: [notified status change from task]
2022-07-22T16:35:05,234 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.RemoteTaskRunner - Cleaning up task[index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z] on worker[gce-druid-data-2:8091]
2022-07-22T16:35:05,237 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z] from activeTasks
2022-07-22T16:35:05,237 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.TaskLockbox - Removing task[index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z] from TaskLock[TimeChunkLock{type=EXCLUSIVE, groupId='index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z', dataSource='esp-revenue-prod', interval=2022-07-20T00:00:00.000-05:00/2022-07-21T00:00:00.000-05:00, version='2022-07-22T16:34:41.070Z', priority=50, revoked=false}]
2022-07-22T16:35:05,237 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.TaskLockbox - TaskLock is now empty: TimeChunkLock{type=EXCLUSIVE, groupId='index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z', dataSource='esp-revenue-prod', interval=2022-07-20T00:00:00.000-05:00/2022-07-21T00:00:00.000-05:00, version='2022-07-22T16:34:41.070Z', priority=50, revoked=false}
2022-07-22T16:35:05,240 WARN [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.MetadataTaskStorage - Cannot find lock[TimeChunkLock{type=EXCLUSIVE, groupId='index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z', dataSource='esp-revenue-prod', interval=2022-07-20T00:00:00.000-05:00/2022-07-21T00:00:00.000-05:00, version='2022-07-22T16:34:41.070Z', priority=50, revoked=false}]
2022-07-22T16:35:05,243 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.MetadataTaskStorage - Updating task index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z to status: TaskStatus{id=index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z, status=SUCCESS, duration=24133, errorMsg=null}
2022-07-22T16:35:05,255 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.TaskQueue - Task done: AbstractTask{id='index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z', groupId='index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z', taskResource=TaskResource{availabilityGroup='index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z', requiredCapacity=1}, dataSource='esp-revenue-prod', context={forceTimeChunkLock=true, useLineageBasedSegmentAllocation=true}}
2022-07-22T16:35:05,255 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.TaskQueue - Task SUCCESS: AbstractTask{id='index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z', groupId='index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z', taskResource=TaskResource{availabilityGroup='index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z', requiredCapacity=1}, dataSource='esp-revenue-prod', context={forceTimeChunkLock=true, useLineageBasedSegmentAllocation=true}} (24133 run duration)
2022-07-22T16:35:05,255 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.RemoteTaskRunner - Task[index_parallel_esp-revenue-prod_akeenlhk_2022-07-22T16:34:41.051Z] went bye bye.
2022-07-22T16:35:38,298 INFO [TaskQueue-StorageSync] org.apache.druid.indexing.overlord.TaskQueue - Synced 2 tasks from storage (0 tasks added, 0 tasks removed).
2022-07-22T16:35:56,324 INFO [DatabaseRuleManager-Exec--0] org.apache.druid.metadata.SQLMetadataRuleManager - Polled and found 1 rule(s) for 1 datasource(s)
2022-07-22T16:36:00,015 INFO [org.apache.druid.metadata.SqlSegmentsMetadataManager-Exec--0] org.apache.druid.metadata.SqlSegmentsMetadataManager - Polled and found 32,817 segments in the database
2022-07-22T16:36:00,539 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Assigning 'primary' for segment [esp-revenue-prod_2022-07-20T05:00:00.000Z_2022-07-21T05:00:00.000Z_2022-07-22T16:34:41.070Z] to server [gce-druid-data-2:8083] in tier [_default_tier]
2022-07-22T16:36:00,543 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Assigning 'replica' for segment [esp-revenue-prod_2022-07-20T05:00:00.000Z_2022-07-21T05:00:00.000Z_2022-07-22T16:34:41.070Z] to server [gce-druid-data-3:8083] in tier [_default_tier]
2022-07-22T16:36:00,543 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Skipping replica assignment for tier [_default_tier]
2022-07-22T16:36:00,543 INFO [Coordinator-Exec--0] org.apache.druid.server.coordinator.rules.LoadRule - Loading in progress, skipping drop until loading is complete
2022-07-22T16:36:36,094 INFO [LookupCoordinatorManager--5] org.apache.druid.server.lookup.cache.LookupCoordinatorManager - Not updating lookups because no data exists
2022-07-22T16:36:38,298 INFO [TaskQueue-StorageSync] org.apache.druid.indexing.overlord.TaskQueue - Synced 2 tasks from storage (0 tasks added, 0 tasks removed).
2022-07-22T16:36:56,327 INFO [DatabaseRuleManager-Exec--0] org.apache.druid.metadata.SQLMetadataRuleManager - Polled and found 1 rule(s) for 1 datasource(s)
2022-07-22T16:37:00,758 INFO [org.apache.druid.metadata.SqlSegmentsMetadataManager-Exec--0] org.apache.druid.metadata.SqlSegmentsMetadataManager - Polled and found 32,816 segments in the database

OK yeah I don’t understand that…!!

Sorry I’ve not had a chance to try and replicate this yet.
I notice that in your interval you have a timezone specified – was that intentional? I wondered if there’s an issue when the compact job has a timezone, and Druid is just using UTC.

Sorry for the late reply; summer vacation happened.

Yeah, that’s intentional because we import as P1D segments to keep the segment size up and align with our prior system. That solved a big problem we had early on.

This is still ongoing, every compaction fails.