Re: [druid-user] Native batch indexing is slow?

Are you using index_parallel? How many files do you have? What is your worker capacity set to?

In general, there is no reason to set the MaxNumSubTasks any higher than the total worker slots. You can figure out total worker slots for middle managers as druid.worker.capacity x # of middle managers. As an example, let’s say that Yours is set to 3 on each data node, which would only allow MaxSubTasks to be set 60. In Imply Cloud, by default druid.worker.capacity is set to use about 30% of resources to allow room for queries. Because we are focused on speeding up this load, let’s change that setting to 8 on each data node. That will allow you to raise the MaxNumSubTasks to 160. If you can reduce the number of files to be ingested to be about 160 (or a multiple thereof), that should create the most optimized load. Each file should be about 50 - 500MB each and ideally sorted by your _time column.