Index_parallel job tuning

I’m testing around Index_parallel job for Druid ingestion. I’m using parameter maxNumConcurrentSubTasks to increase parallelism. Where can I find the logic on how the overlord splits the task into sub-tasks? I assumed that the job is split based on segment granularity but when I check the ingestion spec for partial_dimension_cardinality sub tasks, the interval and paths are same for all the subtasks which is same as parent task. How can I know which sub-tasks is working on which interval?

Some of my sub-tasks are also failing. When go the indexer logs at druid.indexer.logs.directory path and search for logs based on TaskId, I see the following only

hdfs dfs -cat /dod_uat/indexing-logs/partial_dimension_cardinality_audience_krishna_jgpahpop_2021-11-03T19_05_32.704Z

Finished peon task

Where can I find the error log to debug the failed tasks ?

Druid version: 0.21.1

This is the only file related that subtask

hdfs dfs -ls /dod_uat/indexing-logs | grep partial_dimension_cardinality_audience_krishna_jgpahpop_2021

-rw-r–r-- 3 dod_uat dod_uat 19 2021-11-03 19:05 /dod_uat/indexing-logs/partial_dimension_cardinality_audience_krishna_jgpahpop_2021-11-03T19_05_32.704Z

You can read more about how logs are generated and where they go in the updated Task Logs docs:

As regards which jobs get which source data, you can see this in the “Payload” tab for a given task. Each ingestion source splits things differently, e.g. by file.

I believe it helps, therefore, to split source files close to the resulting partitions that you’re aiming for.