Task runner going out of memory

I am running the overlord in local mode. I am feeding it the small sample wikipedia data and doing hadoop indexing. However, the task repeatedly goes out of memory.


2016-02-29T07:07:50,065 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Running job: job_1455808662548_0024
2016-02-29T07:07:53,343 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Job job_1455808662548_0024 running in uber mode : false
2016-02-29T07:07:53,345 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - map 0% reduce 0%
2016-02-29T07:08:08,171 INFO [task-runner-0] org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-02-29T07:08:09,050 INFO [task-runner-0] org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
2016-02-29T07:08:09,863 INFO [task-runner-0] org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server

Exception in thread “task-runner-0”
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread “task-runner-0”
2016-02-29T07:16:16,434 INFO [main-SendThread(172.19.39.25:2181)] org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 20105ms for sessionid 0x151e924c2369460, closing socket connection and attempting reconnect
2016-02-29T07:16:19,866 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
2016-02-29T07:16:20,414 INFO [main-SendThread(172.19.39.25:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server 172.19.39.25/172.19.39.25:2181. Will not attempt to authenticate using SASL (unknown error)

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread “main-EventThread”

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread “main-SendThread(172.19.39.25:2181)”

The hadoop job shows its completed.

Here are the overlord configs:

Default host: localhost. Default port: 8090. If you run each node type on its own node in production, you should override these values to be IP:8080

druid.host=localhost
druid.port=8090
druid.service=overlord

Run the overlord in local mode with a single peon to execute tasks

This is not recommended for production.

druid.indexer.queue.startDelay=PT0M

This setting is too small for real production workloads

druid.indexer.runner.javaOpts=-server -Xmx1g

These settings are also too small for real production workloads

Please see our recommended production settings in the docs (http://druid.io/docs/latest/Production-Cluster-Configuration.html)

druid.indexer.fork.property.druid.processing.numThreads=1
druid.indexer.fork.property.druid.computation.buffer.size=100000000


I run the overlord with following command:

sudo java -server -Xmx3g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath /home/rishi.mi/DruidConfigs/config/_common:/home/rishi.mi/DruidConfigs/config/overlord:lib/*:/home/rishi.mi/HadoopConfigs io.druid.cli.Main server overlord

I tried with Druid-0.8.2 and their was no problem with the hadoop indexing. The problem only comes up in Druid-0.8.3

I found that the problem only occured when I built from the druid-0.8.3 tag. I tried tags druid-0.8.3rc-1 and druid-0.8.3rc-4 and the problem did not occur with them, so I will possibly use the overlord from the rc4 tag.

I believe rc4 is the exact same as the stable and this problem is a factor of improper configuration.