Mapper and Reducer complete to 100% then 5 minutes later job is killed. No error in logs.

I don’t even know where to begin providing details on this one.

Mapper runs for an hour. Completes 100% No errors.

Reducer runs for an hour. Completes 100% no errors.

Then the application container kills the reducer and runs it all over again.

The only error details I get are:

AttemptID:attempt_1451522080526_0003_r_000000_1 Timed out after 300 secs Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143

It’s worth noting I did have to set the following args to get this far:


Hi Dash, I’m sorry to hear you’ve been having these problems. Everyone is on vacation right now so all the Hadoop experts are not around to respond. I will ask someone to take a look when they get back.

As a quick note, hadoop is kind of silly with regards to % complete.

The % complete actually means % STARTED… not % FINISHED

The Druid reduce stage takes a long time and looks like its stuck at 100% when really it is still reducing.



You can work around this by decreasing your target partition row count. which will help the reduce phase complete faster, or increase hadoop timeout settings like

You were right.

I tuned our row flush boundaries and target partition count and got more mappers and reducers to handle the crunching. Reduced the processing time considerably.

The hadoop tasks complete now (including the indexer task).

Now I am just troubleshooting why our queries come back empty.

Almost there!

Figured out the empty queries. Ran out of disk space on our historical nodes. :slight_smile:

Added EBS and it’s all good! Woohoo!

Working hadoop indexing and query! :slight_smile: