Out of memory error while parsing 59MB file using index overlord

Hi ,

I found below error, its working correctly for other dates but failing for one day data.

Please let me know if I need do some config settings.

2016-01-23T16:31:52,192 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,219 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,220 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,220 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,220 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,221 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,221 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,222 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,223 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,224 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,225 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,226 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,227 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,227 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,227 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,228 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,228 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,229 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,229 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,229 INFO [main] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090
2016-01-23T16:31:52,273 INFO [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Running task: index_spini_production_v3_2016-01-23T16:31:45.895Z
2016-01-23T16:31:52,274 INFO [task-runner-0] io.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for task[index_spini_production_v3_2016-01-23T16:31:45.895Z]: LockListAction{}
2016-01-23T16:31:52,279 INFO [task-runner-0] io.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[index_spini_production_v3_2016-01-23T16:31:45.895Z] to overlord[http://ip-172-31-1-79.ap-southeast-1.compute.internal:8090/druid/indexer/v1/action]: LockListAction{}
2016-01-23T16:31:52,284 INFO [task-runner-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Searching for all [vvxzzprodutionDataid95bc6758.json] in and beneath [/home/ec2-user/applications/druid-0.8.1/real/indexing]
2016-01-23T16:31:52,294 INFO [task-runner-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Found files: [/home/ec2-user/applications/druid-0.8.1/real/indexing/vvxzzprodutionDataid95bc6758.json]
2016-01-23T16:31:54,001 INFO [task-runner-0] io.druid.indexing.common.task.IndexTask - Determining partitions for interval[2015-11-22T00:00:00.000Z/2015-11-23T00:00:00.000Z] with targetPartitionSize[5000000]
2016-01-23T16:31:54,019 INFO [task-runner-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Searching for all [vvxzzprodutionDataid95bc6758.json] in and beneath [/home/ec2-user/applications/druid-0.8.1/real/indexing]
2016-01-23T16:31:54,020 INFO [task-runner-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Found files: [/home/ec2-user/applications/druid-0.8.1/real/indexing/vvxzzprodutionDataid95bc6758.json]
2016-01-23T16:31:56,381 INFO [task-runner-0] io.druid.indexing.common.task.IndexTask - Estimated approximately [25,177.106839] rows of data.
2016-01-23T16:31:56,381 INFO [task-runner-0] io.druid.indexing.common.task.IndexTask - Will require [1] shard(s).
2016-01-23T16:31:56,383 INFO [task-runner-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Searching for all [vvxzzprodutionDataid95bc6758.json] in and beneath [/home/ec2-user/applications/druid-0.8.1/real/indexing]
2016-01-23T16:31:56,383 INFO [task-runner-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Found files: [/home/ec2-user/applications/druid-0.8.1/real/indexing/vvxzzprodutionDataid95bc6758.json]
2016-01-23T16:34:03,570 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 20953ms for sessionid 0x151cd44273903e2, closing socket connection and attempting reconnect
2016-01-23T16:34:23,224 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
2016-01-23T16:34:27,577 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2016-01-23T16:34:44,746 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to localhost/127.0.0.1:2181, initiating session
2016-01-23T16:35:37,846 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 48340ms for sessionid 0x151cd44273903e2, closing socket connection and attempting reconnect
2016-01-23T16:36:13,131 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2016-01-23T16:36:23,771 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to localhost/127.0.0.1:2181, initiating session
2016-01-23T16:36:45,324 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x151cd44273903e2 has expired, closing socket connection
2016-01-23T16:36:52,134 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: LOST
2016-01-23T16:36:55,041 WARN [main-EventThread] org.apache.curator.ConnectionState - Session expired event received
2016-01-23T16:37:16,404 INFO [Announcer-0] io.druid.curator.announcement.Announcer - Node[/druid/announcements/ip-172-31-1-79.ap-southeast-1.compute.internal:8100] is added to reinstate.
Exception in thread "HttpClient-Netty-Boss-0" java.lang.OutOfMemoryError: Java heap space
	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:340)
	at org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
	at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
	at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
2016-01-23T16:37:52,144 ERROR [ServerInventoryView-0] org.apache.curator.ConnectionState - Connection timed out for connection string (localhost) and timeout (15000) / elapsed (19840)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
	at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) [curator-client-2.8.0.jar:?]
	at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87) [curator-client-2.8.0.jar:?]
	at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) [curator-client-2.8.0.jar:?]
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:806) [curator-framework-2.8.0.jar:?]
	at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:490) [curator-framework-2.8.0.jar:?]
	at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:187) [curator-framework-2.8.0.jar:?]
	at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:38) [curator-framework-2.8.0.jar:?]
	at org.apache.curator.framework.recipes.cache.PathChildrenCache.refresh(PathChildrenCache.java:502) [curator-recipes-2.8.0.jar:?]
	at org.apache.curator.framework.recipes.cache.RefreshOperation.invoke(RefreshOperation.java:35) [curator-recipes-2.8.0.jar:?]
	at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:759) [curator-recipes-2.8.0.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_91]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_91]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [?:1.7.0_91]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_91]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_91]
	at java.lang.Thread.run(Thread.java:745) [?:1.7.0_91]
2016-01-23T16:37:52,142 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Uncaught Throwable while running task[IndexTask{id=index_spini_production_v3_2016-01-23T16:31:45.895Z, type=index, dataSource=spini_production_v3}]
java.lang.OutOfMemoryError: Java heap space
	at java.util.LinkedHashMap.createEntry(LinkedHashMap.java:442) ~[?:1.7.0_91]
	at java.util.HashMap.addEntry(HashMap.java:884) ~[?:1.7.0_91]
	at java.util.LinkedHashMap.addEntry(LinkedHashMap.java:427) ~[?:1.7.0_91]
	at java.util.HashMap.put(HashMap.java:505) ~[?:1.7.0_91]
	at com.fasterxml.jackson.databind.node.ObjectNode.replace(ObjectNode.java:368) ~[jackson-databind-2.4.4.jar:2.4.4]
	at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:241) ~[jackson-databind-2.4.4.jar:2.4.4]
	at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:62) ~[jackson-databind-2.4.4.jar:2.4.4]
	at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:14) ~[jackson-databind-2.4.4.jar:2.4.4]
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3066) ~[jackson-databind-2.4.4.jar:2.4.4]
	at com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:1833) ~[jackson-databind-2.4.4.jar:2.4.4]
	at com.metamx.common.parsers.JSONParser.parse(JSONParser.java:115) ~[java-util-0.27.0.jar:?]
	at io.druid.data.input.impl.StringInputRowParser.parseString(StringInputRowParser.java:86) ~[druid-api-0.3.9.jar:0.3.9]
	at io.druid.data.input.impl.StringInputRowParser.parse(StringInputRowParser.java:91) ~[druid-api-0.3.9.jar:0.3.9]
	at io.druid.data.input.impl.FileIteratingFirehose.nextRow(FileIteratingFirehose.java:54) ~[druid-api-0.3.9.jar:0.3.9]
	at io.druid.indexing.common.task.IndexTask.generateSegment(IndexTask.java:364) ~[druid-indexing-service-0.8.1.jar:0.8.1]
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:205) ~[druid-indexing-service-0.8.1.jar:0.8.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:235) [druid-indexing-service-0.8.1.jar:0.8.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:214) [druid-indexing-service-0.8.1.jar:0.8.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_91]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_91]
	at java.lang.Thread.run(Thread.java:745) [?:1.7.0_91]

Hi Aman,

The index tasks are done by peon spawned by middile manager, how did you configure your peons? They need sufficient heap to perform the indexing. Recommend reading: http://druid.io/docs/0.8.2/configuration/production-cluster.html

# Resources for peons
druid.indexer.runner.javaOpts=-server -Xmx3g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

I'm running Druid 8.3 RC3 + tranquility, and find this setting is not necessary, not sure if there's default values. 59M should not be a problem at all

Thanks Shuai,

Forget to update, it got resolved as per your suggestion.