Unable to do batch ingesting in a small machine for testing purposes

Hello,

I’m trying to do batch ingesting because the real-time ingestion doesn’t allow me to insert a year-long dataset due to the windowPeriod.

Our test dataset is quite small (100k rows), but we’re getting “not enough memory” errors (I’m working inside a VM with 4GB of RAM). I’ve tried to tune some parameters to use less memory, but I’m tot touching the right key.

This is my “druid start script” (I know that’s not very clean or production-ready):

#!/usr/bin/env bash

cd /home/vagrant/DruidWorkspace

# Zookeeper
/opt/zookeeper/bin/zkServer.sh start /opt/zookeeper/conf/zoo.cfg > /home/vagrant/logs/zookeeper.log 2>&1 &

# Coordinator Node
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath /opt/druid/config/_common:/opt/druid/config/coordinator:/opt/druid/lib/*:/opt/druid/extensions-repo/*:/home/vagrant/DruidWorkspace/extensions-repo/* io.druid.cli.Main server coordinator > /home/vagrant/logs/druid_coordinator.log 2>&1 &

# Historical Node
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath /opt/druid/config/_common:/opt/druid/config/historical:/opt/druid/lib/*:/opt/druid/extensions-repo/*:/home/vagrant/DruidWorkspace/extensions-repo/* io.druid.cli.Main server historical > /home/vagrant/logs/druid_historical.log 2>&1 &

# Broker Node
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath /opt/druid/config/_common:/opt/druid/config/broker:/opt/druid/lib/*:/opt/druid/extensions-repo/*:/home/vagrant/DruidWorkspace/extensions-repo/* io.druid.cli.Main server broker > /home/vagrant/logs/druid_broker.log 2>&1 &

# Overlord Node
java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Ddruid.realtime.specFile=/vagrant/druidSpecs/prototype.spec -classpath /opt/druid/config/_common:/opt/druid/config/overlord:/opt/druid/lib/*:/opt/druid/extensions-repo/*:/home/vagrant/DruidWorkspace/extensions-repo/* io.druid.cli.Main server overlord > /home/vagrant/logs/druid_overlord.log 2>&1  &

# RealTime Node
java -Xmx512m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Ddruid.realtime.specFile=/vagrant/druidSpecs/prototype.spec -classpath /opt/druid/config/_common:/opt/druid/config/realtime:/opt/druid/lib/*:/opt/druid/extensions-repo/*:/home/vagrant/DruidWorkspace/extensions-repo/* io.druid.cli.Main server realtime > /home/vagrant/logs/druid_realtime.log 2>&1 &

# MiddleManager Node
java -Xmx64m -Xms64m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath /opt/druid/config/_common:/opt/druid/config/middleManager:/opt/druid/lib/*:/opt/druid/extensions-repo/*:/home/vagrant/DruidWorkspace/extensions-repo/* io.druid.cli.Main server middleManager > /home/vagrant/logs/druid_middleManager.log 2>&1 &

# Tranquility
/opt/tranquility/bin/tranquility server -configFile /opt/tranquility/conf/server.json > /home/vagrant/logs/tranquility.log 2>&1 &

This is the error log, found through the Overlord HTTP panel:

···
[io.druid.server.log.NoopRequestLoggerProvider@68406bdf]
2016-03-04T10:01:06,253 ERROR [main] io.druid.cli.CliPeon - Error when starting up.  Failing.
com.google.inject.ProvisionException: Guice provision errors:

1) **Not enough direct memory**.  **Please adjust -XX:MaxDirectMemorySize**, druid.processing.buffer.sizeBytes, or druid.processing.numThreads: **maxDirectMemory[924,844,032], memoryNeeded[4,294,967,296]** __= druid.processing.buffer.sizeBytes[1,073,741,824] * ( druid.processing.numThreads[3] + 1 )__
  at io.druid.guice.DruidProcessingModule.getIntermediateResultsPool(DruidProcessingModule.java:106)
  at io.druid.guice.DruidProcessingModule.getIntermediateResultsPool(DruidProcessingModule.java:106)
  while locating io.druid.collections.StupidPool<java.nio.ByteBuffer> annotated with @io.druid.guice.annotations.Global()
    for parameter 1 at io.druid.query.groupby.GroupByQueryEngine.<init>(GroupByQueryEngine.java:75)
  at io.druid.guice.QueryRunnerFactoryModule.configure(QueryRunnerFactoryModule.java:83)
  while locating io.druid.query.groupby.GroupByQueryEngine
    for parameter 0 at io.druid.query.groupby.GroupByQueryRunnerFactory.<init>(GroupByQueryRunnerFactory.java:79)
  at io.druid.guice.QueryRunnerFactoryModule.configure(QueryRunnerFactoryModule.java:80)
  while locating io.druid.query.groupby.GroupByQueryRunnerFactory
  while locating io.druid.query.QueryRunnerFactory annotated with @com.google.inject.multibindings.Element(setName=,uniqueId=26, type=MAPBINDER)
  at io.druid.guice.DruidBinders.queryRunnerFactoryBinder(DruidBinders.java:36)
  while locating java.util.Map<java.lang.Class<? extends io.druid.query.Query>, io.druid.query.QueryRunnerFactory>
    for parameter 0 at io.druid.query.DefaultQueryRunnerFactoryConglomerate.<init>(DefaultQueryRunnerFactoryConglomerate.java:34)
  while locating io.druid.query.DefaultQueryRunnerFactoryConglomerate
  at io.druid.guice.StorageNodeModule.configure(StorageNodeModule.java:53)
  while locating io.druid.query.QueryRunnerFactoryConglomerate
    for parameter 9 at io.druid.indexing.common.TaskToolboxFactory.<init>(TaskToolboxFactory.java:83)
  at io.druid.cli.CliPeon$1.configure(CliPeon.java:138)
  while locating io.druid.indexing.common.TaskToolboxFactory
    for parameter 0 at io.druid.indexing.overlord.ThreadPoolTaskRunner.<init>(ThreadPoolTaskRunner.java:76)
  at io.druid.cli.CliPeon$1.configure(CliPeon.java:164)
  while locating io.druid.indexing.overlord.ThreadPoolTaskRunner
  while locating io.druid.query.QuerySegmentWalker
    for parameter 3 at io.druid.server.QueryResource.<init>(QueryResource.java:90)
  while locating io.druid.server.QueryResource

2) **Not enough direct memory.  Please adjust -XX:MaxDirectMemorySize**, druid.processing.buffer.sizeBytes, or druid.processing.numThreads: __maxDirectMemory[924,844,032], memoryNeeded[4,294,967,296] = druid.processing.buffer.sizeBytes[1,073,741,824] * ( druid.processing.numThreads[3] + 1 )__
  at io.druid.guice.DruidProcessingModule.getIntermediateResultsPool(DruidProcessingModule.java:106)
  at io.druid.guice.DruidProcessingModule.getIntermediateResultsPool(DruidProcessingModule.java:106)
  while locating io.druid.collections.StupidPool<java.nio.ByteBuffer> annotated with @io.druid.guice.annotations.Global()
    for parameter 1 at io.druid.query.groupby.GroupByQueryEngine.<init>(GroupByQueryEngine.java:75)
  at io.druid.guice.QueryRunnerFactoryModule.configure(QueryRunnerFactoryModule.java:83)
  while locating io.druid.query.groupby.GroupByQueryEngine
    for parameter 2 at io.druid.query.groupby.GroupByQueryQueryToolChest.<init>(GroupByQueryQueryToolChest.java:113)
  at io.druid.guice.QueryToolChestModule.configure(QueryToolChestModule.java:72)
  while locating io.druid.query.groupby.GroupByQueryQueryToolChest
    for parameter 3 at io.druid.query.groupby.GroupByQueryRunnerFactory.<init>(GroupByQueryRunnerFactory.java:79)
  at io.druid.guice.QueryRunnerFactoryModule.configure(QueryRunnerFactoryModule.java:80)
  while locating io.druid.query.groupby.GroupByQueryRunnerFactory
  while locating io.druid.query.QueryRunnerFactory annotated with @com.google.inject.multibindings.Element(setName=,uniqueId=26, type=MAPBINDER)
  at io.druid.guice.DruidBinders.queryRunnerFactoryBinder(DruidBinders.java:36)
  while locating java.util.Map<java.lang.Class<? extends io.druid.query.Query>, io.druid.query.QueryRunnerFactory>
    for parameter 0 at io.druid.query.DefaultQueryRunnerFactoryConglomerate.<init>(DefaultQueryRunnerFactoryConglomerate.java:34)
  while locating io.druid.query.DefaultQueryRunnerFactoryConglomerate
  at io.druid.guice.StorageNodeModule.configure(StorageNodeModule.java:53)
  while locating io.druid.query.QueryRunnerFactoryConglomerate
    for parameter 9 at io.druid.indexing.common.TaskToolboxFactory.<init>(TaskToolboxFactory.java:83)
  at io.druid.cli.CliPeon$1.configure(CliPeon.java:138)
  while locating io.druid.indexing.common.TaskToolboxFactory
    for parameter 0 at io.druid.indexing.overlord.ThreadPoolTaskRunner.<init>(ThreadPoolTaskRunner.java:76)
  at io.druid.cli.CliPeon$1.configure(CliPeon.java:164)
  while locating io.druid.indexing.overlord.ThreadPoolTaskRunner
  while locating io.druid.query.QuerySegmentWalker
    for parameter 3 at io.druid.server.QueryResource.<init>(QueryResource.java:90)
  while locating io.druid.server.QueryResource

3) Not enough direct memory.  Please adjust -XX:MaxDirectMemorySize, druid.processing.buffer.sizeBytes, or druid.processing.numThreads: maxDirectMemory[924,844,032], memoryNeeded[4,294,967,296] = druid.processing.buffer.sizeBytes[1,073,741,824] * ( druid.processing.numThreads[3] + 1 )
  at io.druid.guice.DruidProcessingModule.getIntermediateResultsPool(DruidProcessingModule.java:106)
  at io.druid.guice.DruidProcessingModule.getIntermediateResultsPool(DruidProcessingModule.java:106)
  while locating io.druid.collections.StupidPool<java.nio.ByteBuffer> annotated with @io.druid.guice.annotations.Global()
    for parameter 3 at io.druid.query.groupby.GroupByQueryQueryToolChest.<init>(GroupByQueryQueryToolChest.java:113)
  at io.druid.guice.QueryToolChestModule.configure(QueryToolChestModule.java:72)
  while locating io.druid.query.groupby.GroupByQueryQueryToolChest
    for parameter 3 at io.druid.query.groupby.GroupByQueryRunnerFactory.<init>(GroupByQueryRunnerFactory.java:79)
  at io.druid.guice.QueryRunnerFactoryModule.configure(QueryRunnerFactoryModule.java:80)
  while locating io.druid.query.groupby.GroupByQueryRunnerFactory
  while locating io.druid.query.QueryRunnerFactory annotated with @com.google.inject.multibindings.Element(setName=,uniqueId=26, type=MAPBINDER)
  at io.druid.guice.DruidBinders.queryRunnerFactoryBinder(DruidBinders.java:36)
  while locating java.util.Map<java.lang.Class<? extends io.druid.query.Query>, io.druid.query.QueryRunnerFactory>
    for parameter 0 at io.druid.query.DefaultQueryRunnerFactoryConglomerate.<init>(DefaultQueryRunnerFactoryConglomerate.java:34)
  while locating io.druid.query.DefaultQueryRunnerFactoryConglomerate
  at io.druid.guice.StorageNodeModule.configure(StorageNodeModule.java:53)
  while locating io.druid.query.QueryRunnerFactoryConglomerate
    for parameter 9 at io.druid.indexing.common.TaskToolboxFactory.<init>(TaskToolboxFactory.java:83)
  at io.druid.cli.CliPeon$1.configure(CliPeon.java:138)
  while locating io.druid.indexing.common.TaskToolboxFactory
    for parameter 0 at io.druid.indexing.overlord.ThreadPoolTaskRunner.<init>(ThreadPoolTaskRunner.java:76)
  at io.druid.cli.CliPeon$1.configure(CliPeon.java:164)
  while locating io.druid.indexing.overlord.ThreadPoolTaskRunner
  while locating io.druid.query.QuerySegmentWalker
    for parameter 3 at io.druid.server.QueryResource.<init>(QueryResource.java:90)
  while locating io.druid.server.QueryResource

4) Not enough direct memory.  Please adjust -XX:MaxDirectMemorySize, druid.processing.buffer.sizeBytes, or druid.processing.numThreads: maxDirectMemory[924,844,032], memoryNeeded[4,294,967,296] = druid.processing.buffer.sizeBytes[1,073,741,824] * ( druid.processing.numThreads[3] + 1 )
  at io.druid.guice.DruidProcessingModule.getIntermediateResultsPool(DruidProcessingModule.java:106)
  at io.druid.guice.DruidProcessingModule.getIntermediateResultsPool(DruidProcessingModule.java:106)
  while locating io.druid.collections.StupidPool<java.nio.ByteBuffer> annotated with @io.druid.guice.annotations.Global()
    for parameter 4 at io.druid.query.groupby.GroupByQueryRunnerFactory.<init>(GroupByQueryRunnerFactory.java:79)
  at io.druid.guice.QueryRunnerFactoryModule.configure(QueryRunnerFactoryModule.java:80)
  while locating io.druid.query.groupby.GroupByQueryRunnerFactory
  while locating io.druid.query.QueryRunnerFactory annotated with @com.google.inject.multibindings.Element(setName=,uniqueId=26, type=MAPBINDER)
  at io.druid.guice.DruidBinders.queryRunnerFactoryBinder(DruidBinders.java:36)
  while locating java.util.Map<java.lang.Class<? extends io.druid.query.Query>, io.druid.query.QueryRunnerFactory>
    for parameter 0 at io.druid.query.DefaultQueryRunnerFactoryConglomerate.<init>(DefaultQueryRunnerFactoryConglomerate.java:34)
  while locating io.druid.query.DefaultQueryRunnerFactoryConglomerate
  at io.druid.guice.StorageNodeModule.configure(StorageNodeModule.java:53)
  while locating io.druid.query.QueryRunnerFactoryConglomerate
    for parameter 9 at io.druid.indexing.common.TaskToolboxFactory.<init>(TaskToolboxFactory.java:83)
  at io.druid.cli.CliPeon$1.configure(CliPeon.java:138)
  while locating io.druid.indexing.common.TaskToolboxFactory
    for parameter 0 at io.druid.indexing.overlord.ThreadPoolTaskRunner.<init>(ThreadPoolTaskRunner.java:76)
  at io.druid.cli.CliPeon$1.configure(CliPeon.java:164)
  while locating io.druid.indexing.overlord.ThreadPoolTaskRunner
  while locating io.druid.query.QuerySegmentWalker
    for parameter 3 at io.druid.server.QueryResource.<init>(QueryResource.java:90)
  while locating io.druid.server.QueryResource

4 errors
	at com.google.inject.internal.InjectorImpl$3.get(InjectorImpl.java:1014) ~[guice-4.0-beta.jar:?]
	at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1036) ~[guice-4.0-beta.jar:?]
	at io.druid.guice.LifecycleModule$2.start(LifecycleModule.java:153) ~[druid-api-0.3.13.jar:0.8.3]
	at io.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:71) [druid-services-0.8.3.jar:0.8.3]
	at io.druid.cli.CliPeon.run(CliPeon.java:232) [druid-services-0.8.3.jar:0.8.3]
	at io.druid.cli.Main.main(Main.java:99) [druid-services-0.8.3.jar:0.8.3]

I’ve remarked the parts of the log that are claiming for my attention. Following the not-very-precise instructions of the error log, I’ve tried to:

  • Pass the XX:MaxDirectMemorySize parameter to every one of the nodes started in my script. Doing the same with the -Xmx option.
  • Modifying some settings to reduce the druid.processing.buffer.sizeBytes parameter.
    But I think that the settings I’m touching are not the right ones. The stack trace refers to a “Peon”, which I imagine is a “dynamically started node”, which isn’t picking any of the realtime, broker, historical, overlord or middleManager settings… I guess, well, I’m not very sure ^^U.

Any idea? Thank you for your time!

Ouch, I’ve forgot to post the task spec:

{
    "type" : "index",
    "spec" : {
        "dataSchema" : {
            "dataSource": "sessions",
            "parser": {
                "type": "string",
                "parseSpec": {
                    "format" : "json",
                    "timestampSpec" : {
                        "column" : "start_ts",
                        "format" : "millis"
                    },
                    "dimensionsSpec" : {
                        "dimensions": ["result", "profile", "actions", "tags", "extra_data"],
                        "dimensionExclusions" : ["id"],
                        "spatialDimensions" : []
                    }
                }
            },
            "metricsSpec":[
                {
                    "type" : "count",
                    "name" : "sum_sessions"
                },
                {
                    "type": "longSum",
                    "name": "sum_queries",
                    "fieldName": "sum_queries"
                },
                {
                    "type": "longSum",
                    "name": "sum_clicks",
                    "fieldName": "sum_clicks"
                },
                {
                    "type": "longMax",
                    "name": "max_queries",
                    "fieldName": "max_queries"
                },
                {
                    "type": "longMax",
                    "name": "max_clicks",
                    "fieldName": "max_clicks"
                },
                {
                    "type": "longMax",
                    "name": "max_duration",
                    "fieldName": "max_duration"
                },
                {
                    "type": "longSum",
                    "name": "sum_duration",
                    "fieldName": "sum_duration"
                }
            ],
            "granularitySpec": {
                "type": "uniform",
                "segmentGranularity": "DAY",
                "queryGranularity": "fifteen_minute",
                "intervals" : [ "2015-01-01T00:00:00.000/2015-12-31T23:59:59.999" ]
            }
        },
        "ioConfig" : {
            "type" : "index",
            "firehose" : {
                "type" : "local",
                "baseDir" : "/home/vagrant/batch/",
                "filter" : "fake_dataset.json"
            }
        },
        "tuningConfig" : {
            "type" : "index",
            "targetPartitionSize" : 0,
            "rowFlushBoundary" : 0
        }
    }
}

I’ve also forgot to describe how I’m queing the task:

curl -X 'POST' -H 'Content-Type:application/json' -d @/vagrant/druidSpecs/batchTask.spec "http://localhost:8090/druid/indexer/v1/task"

Hey acorrea,

This message:

“Not enough direct memory. Please adjust -XX:MaxDirectMemorySize, druid.processing.buffer.sizeBytes, or druid.processing.numThreads: maxDirectMemory[924,844,032], memoryNeeded[4,294,967,296] = druid.processing.buffer.sizeBytes[1,073,741,824] * ( druid.processing.numThreads[3] + 1 )”

Means that you don’t have enough memory available to allocate the offheap buffers that the service will need. If you are running with 4GB of ram then try some more conservative settings for druid.processing.buffer.sizeBytes and druid.processing.numThreads. Perhaps:

druid.processing.buffer.sizeBytes=256000000

druid.processing.numThreads=2

You could also try the Imply quickstart, its bundled configs should work out of the box in a VM with 4GB of ram. That’s here: http://imply.io/docs/latest/quickstart

Hi again Gian,

thank you for your response. Where I should change these settings (in which files and/or command parameters)?

In conf/some_node_type/runtime.properties ? Everywhere? ^_^U , I’m a little bit lost, I’m not used to work on these JVM related environments.

Again, thank you for your time and your guidance.

Since it’s the peons that are throwing the error, try setting those properties on the middleManager/runtime.properties. But since you are somewhat memory constrained (4GB will work but is kind of low) then it wouldn’t hurt to add those properties to the historical and broker runtime.properties as well.