Are you using index_parallel? How many files do you have? What is your worker capacity set to?
In general, there is no reason to set the
MaxNumSubTasks any higher than the total worker slots. You can figure out total worker slots for middle managers as
druid.worker.capacity x # of middle managers. As an example, let’s say that Yours is set to 3 on each data node, which would only allow
MaxSubTasks to be set 60. In Imply Cloud, by default
druid.worker.capacity is set to use about 30% of resources to allow room for queries. Because we are focused on speeding up this load, let’s change that setting to 8 on each data node. That will allow you to raise the
MaxNumSubTasks to 160. If you can reduce the number of files to be ingested to be about 160 (or a multiple thereof), that should create the most optimized load. Each file should be about 50 - 500MB each and ideally sorted by your _time column.