Heap Space OutofMemory Error while Batch ingestion from Postgresql

Hi,
I have done native-batch injection of 50Lakh Data from Postgresql through json to Druid.
Configurations:
I have made configurations in jvm config for each services as min=2048mb,max=4096mb,and I have also increased middlemanager’s directmemorysize to 12gb. I specified maxRowsInMemory=2000000 in tuning config and successfully ingested (50lakh) data to druid.

*Druid and PostgreSQL is on different server.

But I am facing an issue while injecting one billion data by native-batch injection from Postgresql to Druid using same specification it shows “heap space outofmemory error”.

Please help me to resolve my issue. I need to ingest 1Billion data from PostgreSQL to Druid.

What configuration changes I have to do to resolve my issue?

Here’s a recent discussion of this:

druid.indexer.runner.javaOptsArray might be of particular interest.

Are you able to share your logs?

Hi Vajiha,

Please paste your ingestion spec (you can sanitize field names/credentials etc.) here, and the logs if possible as mentioned by @Mark_Herrera above and we can try to help you out.

Also, please note that it is a best practice to give -XMX and -XMS the same value

Hi @Vijeth_Sagar ,
Thanks for your reply.
Given Ingestion Spec:
{
“type”: “index_parallel”,
“spec”: {
“dataSchema”: {
“dataSource”: “sample”,
“timestampSpec”: {
“format”: “auto”,
“column”: “time”
},
“dimensionsSpec”: {
“dimensions”: [
“quan”,
{“name”: “t_id”, “type”: “integer”},
{“name”: “id”, “type”: “integer”}

    ]
  }
},
"ioConfig": {
  "type": "index_parallel",
  "inputSource": {
    "type": "sql",
    "database": {
      "type": "postgresql",
      "connectorConfig": {
        "connectURI": "jdbc:postgresql://***.***.**.***:**/data",
        "user": "***",
        "password": "***"
      }
    },
    "sqls": ["SELECT * FROM samp_dat"]
  }
},
"tuningConfig": {
  "type": "index_parallel",
  "maxRowsInMemory":2000000
}

}
}

Guide me to solve the issue

Hi @Vijeth_Sagar ,

I have tried ingesting 12Milllion data from Postgresql to Druid. As per ur suggestion I have given XMX and XMS as 1024M for (Coordinator, Broker, Historical Services).
For Middle Manger & Router, XMX & XMS is set as 2048M.
I have got same “Heap Space Memory Error”
Pls look into this issue. I need to ingest 1 Billion data then.

Logs:
2022-11-07T07:47:04,602 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
2022-11-07T07:47:04,602 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=var/tmp
2022-11-07T07:47:04,602 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=
2022-11-07T07:47:04,602 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:os.name=Linux
2022-11-07T07:47:04,602 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64
2022-11-07T07:47:04,602 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:os.version=5.14.0-1054-oem
2022-11-07T07:47:04,602 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:user.name=root
2022-11-07T07:47:04,602 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:user.home=/root
2022-11-07T07:47:04,602 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:user.dir=/home/mw-user/apache-druid-24.0.0
2022-11-07T07:47:04,602 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:os.memory.free=1473MB
2022-11-07T07:47:04,602 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:os.memory.max=1979MB
2022-11-07T07:47:04,602 INFO [main] org.apache.zookeeper.ZooKeeper - Client environment:os.memory.total=1979MB
2022-11-07T07:47:04,603 INFO [main] org.apache.curator.utils.Compatibility - Using emulated InjectSessionExpiration
2022-11-07T07:47:04,759 INFO [main] org.apache.druid.segment.loading.SegmentLocalCacheManager - Using storage location strategy: [LeastBytesUsedStorageLocationSelectorStrategy]
2022-11-07T07:47:04,813 INFO [main] org.apache.druid.guice.StorageNodeModule - Segment cache not configured on ServerType [indexer-executor]. It will not be assignable for segment placement
2022-11-07T07:47:04,908 INFO [main] org.eclipse.jetty.util.log - Logging initialized @3135ms to org.eclipse.jetty.util.log.Slf4jLog
2022-11-07T07:47:04,919 INFO [main] org.apache.druid.server.initialization.jetty.JettyServerModule - Creating http connector with port [8100]
2022-11-07T07:47:05,001 WARN [main] org.eclipse.jetty.server.handler.gzip.GzipHandler - minGzipSize of 0 is inefficient for short content, break even is size 23
2022-11-07T07:47:05,020 INFO [main] org.apache.druid.offheap.OffheapBufferGenerator - Allocating new intermediate processing buffer[0] of size[104,857,600]
2022-11-07T07:47:05,044 INFO [main] org.apache.druid.offheap.OffheapBufferGenerator - Allocating new intermediate processing buffer[1] of size[104,857,600]
2022-11-07T07:47:05,066 INFO [main] org.apache.druid.offheap.OffheapBufferGenerator - Allocating new intermediate processing buffer[2] of size[104,857,600]
2022-11-07T07:47:05,088 INFO [main] org.apache.druid.offheap.OffheapBufferGenerator - Allocating new intermediate processing buffer[3] of size[104,857,600]
2022-11-07T07:47:05,109 INFO [main] org.apache.druid.offheap.OffheapBufferGenerator - Allocating new intermediate processing buffer[4] of size[104,857,600]
2022-11-07T07:47:05,131 INFO [main] org.apache.druid.offheap.OffheapBufferGenerator - Allocating new intermediate processing buffer[5] of size[104,857,600]
2022-11-07T07:47:05,180 INFO [main] org.apache.druid.offheap.OffheapBufferGenerator - Allocating new result merging buffer[0] of size[104,857,600]
2022-11-07T07:47:05,202 INFO [main] org.apache.druid.offheap.OffheapBufferGenerator - Allocating new result merging buffer[1] of size[104,857,600]
2022-11-07T07:47:05,230 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [INIT]
2022-11-07T07:47:05,230 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [NORMAL]
2022-11-07T07:47:05,233 INFO [main] org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting
2022-11-07T07:47:05,235 INFO [main] org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=localhost sessionTimeout=30000 watcher=org.apache.curator.ConnectionState@210635fd
2022-11-07T07:47:05,240 INFO [main] org.apache.zookeeper.common.X509Util - Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation
2022-11-07T07:47:05,243 INFO [main] org.apache.zookeeper.ClientCnxnSocket - jute.maxbuffer value is 4194304 Bytes
2022-11-07T07:47:05,247 INFO [main] org.apache.zookeeper.ClientCnxn - zookeeper.request.timeout value is 0. feature enabled=
2022-11-07T07:47:05,251 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2022-11-07T07:47:05,254 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /127.0.0.1:48262, server: localhost/127.0.0.1:2181
2022-11-07T07:47:05,255 INFO [main] org.apache.curator.framework.imps.CuratorFrameworkImpl - Default schema
2022-11-07T07:47:05,284 WARN [main] org.apache.druid.segment.indexing.DataSchema - Rollup is enabled for dataSource [order] but no metricsSpec has been provided. Are you sure this is what you want?
2022-11-07T07:47:05,318 INFO [main] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Running with task: {
“type” : “index_parallel”,
“id” : “index_parallel_order_apginebg_2022-11-07T07:47:01.542Z”,
“groupId” : “index_parallel_order_apginebg_2022-11-07T07:47:01.542Z”,
“resource” : {
“availabilityGroup” : “index_parallel_order_apginebg_2022-11-07T07:47:01.542Z”,
“requiredCapacity” : 1
},
“spec” : {
“dataSchema” : {
“dataSource” : “order”,
“timestampSpec” : {
“column” : “time”,
“format” : “auto”,
“missingValue” : null
},
“dimensionsSpec” : {
“dimensions” : [ {
“type” : “string”,
“name” : “quan”,
“multiValueHandling” : “SORTED_ARRAY”,
“createBitmapIndex” : true
}, {
“type” : “string”,
“name” : “t_id”,
“multiValueHandling” : “SORTED_ARRAY”,
“createBitmapIndex” : true
}, {
“type” : “string”,
“name” : “p_id”,
“multiValueHandling” : “SORTED_ARRAY”,
“createBitmapIndex” : true
} ],
“dimensionExclusions” : [ “__time”, “time” ],
“includeAllDimensions” : false
},
“metricsSpec” : ,
“granularitySpec” : {
“type” : “uniform”,
“segmentGranularity” : “DAY”,
“queryGranularity” : {
“type” : “none”
},
“rollup” : true,
“intervals” :
},
“transformSpec” : {
“filter” : null,
“transforms” :
}
},
“ioConfig” : {
“type” : “index_parallel”,
“inputSource” : {
“type” : “sql”,
“sqls” : [ “SELECT * FROM new LIMIT 12000000” ],
“foldCase” : false,
“database” : {
“type” : “postgresql”,
“connectorConfig” : {
“createTables” : true,
“host” : “localhost”,
“port” : 1527,
“connectURI” : “jdbc:postgresql://...:/data",
“user” : "
",
“password” : "
**”,
“dbcp” : null
}
}
},
“inputFormat” : null,
“appendToExisting” : false,
“dropExisting” : false
},
“tuningConfig” : {
“type” : “index_parallel”,
“maxRowsPerSegment” : null,
“appendableIndexSpec” : {
“type” : “onheap”,
“preserveExistingMetrics” : false
},
“maxRowsInMemory” : 2000000,
“maxBytesInMemory” : 0,
“skipBytesInMemoryOverheadCheck” : false,
“maxTotalRows” : null,
“numShards” : null,
“splitHintSpec” : null,
“partitionsSpec” : null,
“indexSpec” : {
“bitmap” : {
“type” : “roaring”,
“compressRunOnSerialization” : true
},
“dimensionCompression” : “lz4”,
“metricCompression” : “lz4”,
“longEncoding” : “longs”,
“segmentLoader” : null
},
“indexSpecForIntermediatePersists” : {
“bitmap” : {
“type” : “roaring”,
“compressRunOnSerialization” : true
},
“dimensionCompression” : “lz4”,
“metricCompression” : “lz4”,
“longEncoding” : “longs”,
“segmentLoader” : null
},
“maxPendingPersists” : 0,
“forceGuaranteedRollup” : false,
“reportParseExceptions” : false,
“pushTimeout” : 0,
“segmentWriteOutMediumFactory” : null,
“maxNumConcurrentSubTasks” : 1,
“maxRetry” : 3,
“taskStatusCheckPeriodMs” : 1000,
“chatHandlerTimeout” : “PT10S”,
“chatHandlerNumRetries” : 5,
“maxNumSegmentsToMerge” : 100,
“totalNumMergeTasks” : 10,
“logParseExceptions” : false,
“maxParseExceptions” : 2147483647,
“maxSavedParseExceptions” : 0,
“maxColumnsToMerge” : -1,
“awaitSegmentAvailabilityTimeoutMillis” : 0,
“maxAllowedLockCount” : -1,
“partitionDimensions” :
}
},
“context” : {
“forceTimeChunkLock” : true,
“useLineageBasedSegmentAllocation” : true
},
“dataSource” : “order”
}
2022-11-07T07:47:05,318 INFO [main] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Attempting to lock file[var/druid/task/index_parallel_order_apginebg_2022-11-07T07:47:01.542Z/lock].
2022-11-07T07:47:05,319 INFO [main] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Acquired lock file[var/druid/task/index_parallel_order_apginebg_2022-11-07T07:47:01.542Z/lock] in 1ms.
2022-11-07T07:47:05,321 INFO [main] org.apache.druid.indexing.common.task.AbstractBatchIndexTask - forceTimeChunkLock[true] is set to true or mode[REPLACE_LEGACY] is replace. Use timeChunk lock
2022-11-07T07:47:05,322 INFO [main] org.apache.druid.segment.loading.SegmentLocalCacheManager - Using storage location strategy: [LeastBytesUsedStorageLocationSelectorStrategy]
2022-11-07T07:47:05,325 INFO [task-runner-0-priority-0] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Running task: index_parallel_order_apginebg_2022-11-07T07:47:01.542Z
2022-11-07T07:47:05,327 WARN [task-runner-0-priority-0] org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask - Intervals are missing in granularitySpec while this task is potentially overwriting existing segments. Forced to use timeChunk lock.
2022-11-07T07:47:05,327 WARN [task-runner-0-priority-0] org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask - maxNumConcurrentSubTasks[1] is less than or equal to 1. Running sequentially. Please set maxNumConcurrentSubTasks to something higher than 1 if you want to run in parallel ingestion mode.
2022-11-07T07:47:05,329 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [SERVER]
2022-11-07T07:47:05,330 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.AbstractBatchIndexTask - forceTimeChunkLock[true] is set to true or mode[REPLACE_LEGACY] is replace. Use timeChunk lock
2022-11-07T07:47:05,331 WARN [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Chat handler is already registered. Skipping chat handler registration.
2022-11-07T07:47:05,331 INFO [main] org.eclipse.jetty.server.Server - jetty-9.4.48.v20220622; built: 2022-06-21T20:42:25.880Z; git: 6b67c5719d1f4371b33655ff2d047d24e171e49a; jvm 1.8.0_342-8u342-b07-0ubuntu1~20.04-b07
2022-11-07T07:47:05,337 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Determining intervals and shardSpecs
2022-11-07T07:47:05,349 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x1000005e1e90008, negotiated timeout = 30000
2022-11-07T07:47:05,351 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED
2022-11-07T07:47:05,359 INFO [main-EventThread] org.apache.curator.framework.imps.EnsembleTracker - New config event received: {}
2022-11-07T07:47:05,360 INFO [main-EventThread] org.apache.curator.framework.imps.EnsembleTracker - New config event received: {}
2022-11-07T07:47:05,367 WARN [NodeRoleWatcher[OVERLORD]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeRoleWatcher - Ignored event type [CONNECTION_RECONNECTED] for node watcher of role [overlord].
2022-11-07T07:47:05,367 WARN [NodeRoleWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeRoleWatcher - Ignored event type [CONNECTION_RECONNECTED] for node watcher of role [coordinator].
2022-11-07T07:47:05,373 INFO [main] org.eclipse.jetty.server.session - DefaultSessionIdManager workerName=node0
2022-11-07T07:47:05,373 INFO [main] org.eclipse.jetty.server.session - No SessionScavenger set, using defaults
2022-11-07T07:47:05,373 INFO [main] org.eclipse.jetty.server.session - node0 Scavenging every 600000ms
2022-11-07T07:47:05,378 INFO [NodeRoleWatcher[COORDINATOR]] org.apache.druid.discovery.BaseNodeRoleWatcher - Node [http://localhost:8081] of role [coordinator] detected.
2022-11-07T07:47:05,378 INFO [NodeRoleWatcher[OVERLORD]] org.apache.druid.discovery.BaseNodeRoleWatcher - Node [http://localhost:8081] of role [overlord] detected.
2022-11-07T07:47:05,379 INFO [NodeRoleWatcher[COORDINATOR]] org.apache.druid.discovery.BaseNodeRoleWatcher - Node watcher of role [coordinator] is now initialized with 1 nodes.
2022-11-07T07:47:05,379 INFO [NodeRoleWatcher[OVERLORD]] org.apache.druid.discovery.BaseNodeRoleWatcher - Node watcher of role [overlord] is now initialized with 1 nodes.
2022-11-07T07:47:05,438 INFO [main] com.sun.jersey.server.impl.application.WebApplicationImpl - Initiating Jersey application, version ‘Jersey: 1.19.4 05/24/2017 03:20 PM’
2022-11-07T07:47:05,806 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@47c5cbf2{/,null,AVAILABLE}
2022-11-07T07:47:05,812 INFO [main] org.eclipse.jetty.server.AbstractConnector - Started ServerConnector@67b3960b{HTTP/1.1, (http/1.1)}{0.0.0.0:8100}
2022-11-07T07:47:05,812 INFO [main] org.eclipse.jetty.server.Server - Started @4041ms
2022-11-07T07:47:05,812 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [ANNOUNCEMENTS]
2022-11-07T07:47:05,812 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Successfully started lifecycle [module]
Terminating due to java.lang.OutOfMemoryError: Java heap space

Hi Vajiha,

Not really sure why I did not receive a notification for your reply so apologies about that.

  • Can we change this “maxNumConcurrentSubTasks” : 1 to a higher number and make it a parallel task? This will depend on the number of slots available in your data node.

  • I would start with increasing the xmx/xms for the peons by increasing it here: druid.indexer.runner.javaOptsArray

  • You can definitely experiment with increasing XMX/XMS to higher numbers for other processes but within these guidelines: Basic cluster tuning · Apache Druid

  • After the above changes, you can reduce the following in the ingestion spec and test: maxBytesInMemory/maxRowsInMemory

1 Like

A few suggestions:

In order to parallelize execution of ingestion from SQL you’ll want to:

  • increase maxNumConcurrentSubTasks to the number of parallel workers you want to dedicate to the job (these are worker slots on the MMs).
  • split up the SQL into many different SQL requests each SQL request will be used in a different task to acquire rows. One way to split them is by timeframe and aligning them to the segment granularity.
  • you have rollup turned on but no metrics, are you trying to do aggregation during ingestion? if not, turn this off.

Also, depending on the expected query patterns you may benefit from secondary partitioning. It will be useful at query time for segment pruning if you have very common filter criteria on for example, t_id or p_id.

Thanks @Vijeth_Sagar & @Sergio_Ferragut for your quick response and suggestions,

I will try increasing ‘maxNumConcurrentSubTasks’ and other parameters.