Limits for number of input events for batch ingestion

Are you guys aware of any limits for the number of input events per day for druid ingestion using batch ingestion? I am using task based ingestion. I saw this doc and couldn’t find anything: http://druid.io/docs/latest/ingestion/tasks.html

Although its parent doc has something but it doesn’t indicate the number of input events: http://druid.io/docs/latest/ingestion/batch-ingestion.html

maxRowsInMemory
Integer
The number of rows to aggregate before persisting. Note that this is the number of post-aggregation rows which may not be equal to the number of input events due to roll-up. This is used to manage the required JVM heap size.
no (default == 75000)

I am trying to ingest 1302523 events in one day. I also have theta-sketches enabled for three dimensions with default values for size. Am getting the following exception which after googling I found that GC is spending too much time without achieving much:

2017-01-15T21:16:38,982 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Uncaught Throwable while running task[IndexTask{id=index_event_10_2017-01-15T20:26:28.015Z, type=index, dataSource=event_10}]
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.Arrays.copyOfRange(Arrays.java:3664) ~[?:1.8.0_74]
	at java.lang.String.<init>(String.java:207) ~[?:1.8.0_74]
	at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:330) ~[jackson-core-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:235) ~[jackson-core-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:441) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringMap(MapDeserializer.java:475) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:335) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:26) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3066) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2168) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.metamx.common.parsers.JSONPathParser.parse(JSONPathParser.java:99) ~[java-util-0.27.9.jar:?]
	at io.druid.data.input.impl.StringInputRowParser.parseString(StringInputRowParser.java:126) ~[druid-api-0.9.1.1.jar:0.9.1.1]
	at io.druid.data.input.impl.StringInputRowParser.parse(StringInputRowParser.java:131) ~[druid-api-0.9.1.1.jar:0.9.1.1]
	at io.druid.data.input.impl.FileIteratingFirehose.nextRow(FileIteratingFirehose.java:72) ~[druid-api-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.common.task.IndexTask.generateSegment(IndexTask.java:390) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:221) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_74]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_74]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_74]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_74]
2017-01-15T21:16:38,992 ERROR [main] io.druid.cli.CliPeon - Error when starting up.  Failing.
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.worker.executor.ExecutorLifecycle.join(ExecutorLifecycle.java:211) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.cli.CliPeon.run(CliPeon.java:287) [druid-services-0.9.1.1.jar:0.9.1.1]
	at io.druid.cli.Main.main(Main.java:105) [druid-services-0.9.1.1.jar:0.9.1.1]
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-16.0.1.jar:?]
	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-16.0.1.jar:?]
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.worker.executor.ExecutorLifecycle.join(ExecutorLifecycle.java:208) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	... 2 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.Arrays.copyOfRange(Arrays.java:3664) ~[?:1.8.0_74]
	at java.lang.String.<init>(String.java:207) ~[?:1.8.0_74]
	at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:330) ~[jackson-core-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:235) ~[jackson-core-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:441) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringMap(MapDeserializer.java:475) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:335) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:26) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3066) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2168) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.metamx.common.parsers.JSONPathParser.parse(JSONPathParser.java:99) ~[java-util-0.27.9.jar:?]
	at io.druid.data.input.impl.StringInputRowParser.parseString(StringInputRowParser.java:126) ~[druid-api-0.9.1.1.jar:0.9.1.1]
	at io.druid.data.input.impl.StringInputRowParser.parse(StringInputRowParser.java:131) ~[druid-api-0.9.1.1.jar:0.9.1.1]
	at io.druid.data.input.impl.FileIteratingFirehose.nextRow(FileIteratingFirehose.java:72) ~[druid-api-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.common.task.IndexTask.generateSegment(IndexTask.java:390) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:221) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_74]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_74]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_74]
	at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_74]

Theta sketches have quite a large footprint relative to other columns. If you have three of them, you probably need to either kick up your heap size or you lower maxRowsInMemory relative to the defaults. It’s ok for maxRowsInMemory to be less than the number of rows that you’re indexing; Druid will spill the excess to disk.