Index parallel tasks deadlock

Hey everybody!

I’m here because I’m facing the following problem and want to see if there is a solution for it, or at least a better workaround.

The situation is the following, At Jampp we have an aggregator system that periodically (couple of times per hour) generates ORC files that are ingested by Druid using batch inserts, with index parallel and 2 concurrent subtasks.

We’ve been in the situation when we end up posting too many index parallel jobs, and Druid ends up in a deadlock because it only runs the coordinating tasks and has no slots for the actual processing.

With this in mind, the following questions arose:

  1. Is there a way to set up a particular middle manager to exclusively handle this coordinating tasks? (Much like and exclusive CORE partition on a YARN cluster).
  2. Is there a way to prioritise concurrent subtasks over the coordinators, to avoid this deadlock?
  3. Is there a third option, that we are not seeing, that would allow us to prevent this deadlock?

What we are currently doing is limiting the concurrency of posted jobs on the client side, but as the amount of clients starts to increase, controlling this becomes more difficult.

Ingestion spec ``` { "type": "single_phase_sub_task", "id": "single_phase_sub_task_aperol_master_olkghdcn_2021-07-06T14:40:26.555Z", "groupId": "druid_master_7257368_0", "resource": { "availabilityGroup": "single_phase_sub_task_aperol_master_olkghdcn_2021-07-06T14:40:26.555Z", "requiredCapacity": 1 }, "supervisorTaskId": "druid_master_7257368_0", "numAttempts": 0, "spec": { "dataSchema": { "dataSource": "aperol_master", "timestampSpec": { "column": "created", "format": "millis", "missingValue": null }, "dimensionsSpec": { "dimensions": [ ... ], "dimensionExclusions": [ ... ] }, "metricsSpec": [ ... ] }, "transformSpec": { "filter": null, "transforms": [] } }, "ioConfig": { "type": "index_parallel", "inputSource": { "type": "s3", "uris": null, "prefixes": null, "objects": [ ... ], "properties": null }, "inputFormat": { "type": "orc", "flattenSpec": { "useFieldDiscovery": true, "fields": [] } }, "appendToExisting": true }, "tuningConfig": { "type": "index_parallel", "maxRowsPerSegment": 5000000, "appendableIndexSpec": { "type": "onheap" }, "maxRowsInMemory": 1000000, "maxBytesInMemory": 0, "maxTotalRows": null, "numShards": null, "splitHintSpec": null, "partitionsSpec": { "type": "dynamic", "maxRowsPerSegment": 5000000, "maxTotalRows": null }, "indexSpec": { "bitmap": { "type": "roaring", "compressRunOnSerialization": true }, "dimensionCompression": "lz4", "metricCompression": "lz4", "longEncoding": "longs", "segmentLoader": null }, "indexSpecForIntermediatePersists": { "bitmap": { "type": "roaring", "compressRunOnSerialization": true }, "dimensionCompression": "lz4", "metricCompression": "lz4", "longEncoding": "longs", "segmentLoader": null }, "maxPendingPersists": 0, "forceGuaranteedRollup": false, "reportParseExceptions": true, "pushTimeout": 0, "segmentWriteOutMediumFactory": null, "maxNumConcurrentSubTasks": 2, "maxRetry": 5, "taskStatusCheckPeriodMs": 1000, "chatHandlerTimeout": "PT10S", "chatHandlerNumRetries": 5, "maxNumSegmentsToMerge": 100, "totalNumMergeTasks": 10, "logParseExceptions": true, "maxParseExceptions": 0, "maxSavedParseExceptions": 1, "maxColumnsToMerge": -1, "buildV9Directly": true, "partitionDimensions": [] } }, "context": { "forceTimeChunkLock": false, "taskLockTimeout": 14400000 }, "dataSource": "aperol_master" } ```

Relates to Apache Druid 0.21.0

Hmmm… maybe you could use a strong affinity for your tasks?

Also, just clarifying – by “coordinating task” I just want to check that you’re talking about the “parent” task that kicks off the index_parallel tasks?