[druid-user] Hadoop Indexing tasks fail when more than one task submitted simultaneously

Is that exception from the middle manager (the agent process always
running on the box) or from the peon (the actual task process forked
by the middle manager). The peon's log would be the one you find
through the overlord console, the middle manager's log would require
actually looking at a file written to by the middle manager.

That exception seems to be saying that it was unable to deserialize an
Exception or error body of some sort. My guess is that the exception
you are seeing there is actually a symptom and not the actual problem.
I'm going to suspect that an exception is getting thrown, Druid is
trying to serialize that error message with Jackson somewhere and the
place that is deserializing it doesn't know what to do with the
payload and is thus generating that exception. If this is the case,
you *should* be able to inspect the task's directory on the middle
manager box and hopefully see a file with the results of the task
written into it (i.e. with the JSON serialized form of the original
exception).

I'm not aware of anything in the community version that uses etcd, so
I'm going to guess that you have an extension running that is
leveraging etcd for something and the problem is in that extension.
Given that you only see the issue when multiple jobs are running,
there's gotta be a race between the peon processes somewhere. My best
guess for why that could be happening is that perhaps your etcd client
is actually using JNI bindings and there's something busted when
multiple separate processes attempt to use the same shared object.

Even if my guess is wrong, hopefully that provides enough information
to track it down a bit more.

--Eric

Thanks Eric for quick response and help. I checked my deployment again. When I am running 1 instance of middle manager, multiple jobs are queued and eventually succeed but with multiple middle manager instances, all jobs fail. So, this is something to do with my internal envt. I am debugging more for that.

You should be able to run multiple tasks from a single middle manager as well. There’s a confit setting for the number of task slots. Try running just one middle manager but with the number of slots available at higher than one and see if that fails or not. If it fails, it’s something with the environment the peons are running in. If it doesn’t fail, it’s something with the environment of the middle managers.

–Eric