Hadoop indexer task not running on all nodes in EMR cluster

Hi,

I’m running the Hadoop Index task from the CLI with segmentGranularity=DAY for 1 day of data.

The job is utilising a very small percentage of available cluster resources. (See attached)

Apps Submitted
Apps Pending
Apps Running
Apps Completed
Containers Running
Memory Used
Memory Total
Memory Reserved
VCores Used
VCores Total
VCores Reserved
Active Nodes
Decommissioning Nodes
Decommissioned Nodes
Lost Nodes
Unhealthy Nodes
Rebooted Nodes
2853
0
1
2852
15
51.91 GB
493.50 GB
0 B
15
185
0
23
0
620
263
0
0

In nutshell the number of final reducer is bound by the ratio between target partition size and actual number of rows per segments after rollup.

Thus if let say you have less than target partition size (by default 75K i guess) rolled up rows per day you will most likely see only one reducer.

Hi Slim,

Thanks for your help!

My target partition size was too high. I reduced it and it works now!