Druid Batch ingestion: Performance tuning

Can I get some guidance on performance tuning Druid batch ingestion for higher volume of data

We use Druid version 0.15.1 and its cluster with 1 master node, 2 data nodes and 1 query node.

For batch ingestion we observed following performance (index_parallel task)

23 million record file loaded in 17 minutes

135 million record file loaded in 8 hours

Looking into above statistics, there seems a performance degradation with higher volume of data. Any suggestions?



You may like to increase maxNumSubtask if you have enough task slots available along with index_parallel indexer in druid 0.15 if not using already.

Thanks and Regards,

Hi Vaibhav,

Could you please advise if split happens with a single huge file?

All examples with maxNumSubtask>1 deals with multiple files(filename*).

I tried submitting increasing maxNumSubtask, but only 1 index_sub task is getting created(total 2 tasks running where in first is index_parallel task and second is index_sub task)

Note: we have 2 MM nodes, and no other tasks are running currently.