Historical node CPU and RAM goes high

Hi guys,

I have two historical node 32GB and 8 cpu each with 1000 segment in each and each segment have 5 Million rows.

Config :

druid.processing.buffer.sizeBytes=2000000000

druid.processing.formatString=processing_%s

druid.processing.numThreads=13

jvm config :

-server

-Xms12g

-Xmx30g

When I do query for Group By with filter for one of the highest cardinality of segment , CPU and RAM goes high and then kill historical process.

I can’t use topN query because I need all group by values from query.

Please help me to configure historical node for big queries.

Thanks,

Jitesh Mogre

can you share the logs ? not sure what is killing the process ? is it OOM ?
also you can share the other configs and make sure to adjust -XX:MaxDirectMemorySize.

Hi Slim,

No logs found except query logs. No errors on broker too.

It may be OOM. When I do query CPU goes 800%(8 cpu) and memory goes 100%.

updated

**jvm : **

-server

-Xms12g

-Xmx28g

Historical Config :

druid.host=xxx.xxx.xxx.xxx

druid.service=druid/historical

druid.port=80

Storage (Historical)

druid.server.maxSize=100000000000

druid.server.tier=tier_0_test

Segment Cache (Historical)

druid.segmentCache.locations=[{“path”: “/analytics_data/segCache/”, “maxSize”: 100000000000}]

#druid.segmentCache.deleteOnRemove

druid.segmentCache.infoDir=/analytics_data/segInfo/

Jetty Server

druid.server.http.numThreads=100

druid.server.http.maxIdleTime=PT10m

Processor

druid.processing.buffer.sizeBytes=1500000000

druid.processing.formatString=processing_%s

druid.processing.numThreads=13

#Query Configuration

druid.query.groupBy.maxIntermediateRows=100000

druid.query.groupBy.maxResults=1073741824

druid.query.search.maxSearchLimit=10000

#caching

#druid.historical.cache.useCache=true

#druid.historical.cache.populateCache=true

#druid.zk.service.host=xxx.xxx.xxx.xxx:2181

druid.zk.service.host=xxx.xxx.xxx.xxx:2181

druid.discovery.curator.path=/druid/discNew

druid.zk.paths.base=/druid

druid.extensions.loadList=[“druid-kafka-eight”, “druid-s3-extensions”, “mssql-metadata-storage”]

druid.extensions.directory=/opt/druid-0.9.1.1/extensions/

druid.extensions.hadoopDependenciesDir=/opt/druid-0.9.1.1/hadoop-dependencies/

DB

druid.metadata.storage.type=mssql

druid.metadata.storage.connector.connectURI=xxx.xxx.xxx.xxx

druid.metadata.storage.connector.user=xxx

druid.metadata.storage.connector.password=xxx

druid.metadata.storage.connector.useValidationQuery=true

druid.metadata.storage.tables.base=new_druid

druid.metadata.storage.tables.segments=new_segment_table

druid.metadata.storage.tables.rules=new_rule_table

druid.metadata.storage.connector.primaryKey=pk_new_tbl

druid.metadata.storage.tables.config=new_config_table

#druid.metadata.storage.tables.tasks=new_scale_task_tbl

#druid.metadata.storage.tables.taskLog=new_tasklog_scale_tbl

#druid.metadata.storage.tables.taskLock=new_tasklock_scale_tbl

druid.metadata.storage.tables.audit=new_audit_scale_tbl

Seg. Loader

druid.storage.type=s3

druid.s3.accessKey=xxx.xxx.xxx.xxx

druid.s3.secretKey=xxx.xxx.xxx.xxx

druid.storage.bucket=cluster-scale-bucket

druid.storage.baseKey=cluster-segement

druid.storage.disableAcl=true

#druid.storage.archiveBucket=cluster_scale_bucket

druid.storage.archiveBucket=user-cluster-data-files

druid.storage.archiveBaseKey=cluster-segement

druid.startup.logging.logProperties=true

druid.monitoring.monitors=[“com.metamx.metrics.JvmMonitor”]

druid.emitter=LoggingEmitter

druid.emitter.logging.logLevel=info

#druid.coordinator.startDelay=PT30S

#druid.coordinator.period=PT30S

Your historical nodes should have logs where will see what is causing the process to crash.

2016-09-16T18:34:19,413 INFO [topN_scale_cluster_[2016-09-11T00:28:00.000Z/2016-09-11T00:29:00.000Z]] io.druid.segment.CompressedPools - Allocating new littleEndByteBuf[2]

2016-09-16T18:34:19,413 INFO [topN_scale_cluster_[2016-09-11T00:26:00.000Z/2016-09-11T00:27:00.000Z]] io.druid.segment.CompressedPools - Allocating new littleEndByteBuf[9]

2016-09-16T18:34:19,413 INFO [topN_scale_cluster_[2016-09-11T00:12:00.000Z/2016-09-11T00:13:00.000Z]] io.druid.segment.CompressedPools - Allocating new littleEndByteBuf[5]

2016-09-16T18:34:19,413 INFO [topN_scale_cluster_[2016-09-11T00:08:00.000Z/2016-09-11T00:09:00.000Z]] io.druid.segment.CompressedPools - Allocating new littleEndByteBuf[10]

2016-09-16T18:34:19,413 INFO [topN_scale_cluster_[2016-09-11T00:02:00.000Z/2016-09-11T00:03:00.000Z]] io.druid.segment.CompressedPools - Allocating new littleEndByteBuf[7]

2016-09-16T18:34:19,413 INFO [topN_scale_cluster_[2016-09-11T00:25:00.000Z/2016-09-11T00:26:00.000Z]] io.druid.segment.CompressedPools - Allocating new littleEndByteBuf[6]

2016-09-16T18:34:19,413 INFO [topN_scale_cluster_[2016-09-11T00:06:00.000Z/2016-09-11T00:07:00.000Z]] io.druid.segment.CompressedPools - Allocating new littleEndByteBuf[3]

2016-09-16T18:34:19,413 INFO [topN_scale_cluster_[2016-09-11T00:30:00.000Z/2016-09-11T00:31:00.000Z]] io.druid.segment.CompressedPools - Allocating new littleEndByteBuf[4]

2016-09-16T18:34:19,413 INFO [topN_scale_cluster_[2016-09-11T00:14:00.000Z/2016-09-11T00:15:00.000Z]] io.druid.segment.CompressedPools - Allocating new littleEndByteBuf[8]

2016-09-16T18:34:20,732 INFO [topN_scale_cluster_[2016-09-11T00:28:00.000Z/2016-09-11T00:29:00.000Z]] io.druid.offheap.OffheapBufferPool - Allocating new intermediate processing buffer[0] of size[1,500,000,000]

2016-09-16T18:34:20,802 INFO [topN_scale_cluster_[2016-09-11T00:25:00.000Z/2016-09-11T00:26:00.000Z]] io.druid.offheap.OffheapBufferPool - Allocating new intermediate processing buffer[1] of size[1,500,000,000]

2016-09-16T18:34:21,013 INFO [topN_scale_cluster_[2016-09-11T00:08:00.000Z/2016-09-11T00:09:00.000Z]] io.druid.offheap.OffheapBufferPool - Allocating new intermediate processing buffer[2] of size[1,500,000,000]

2016-09-16T18:34:21,017 INFO [topN_scale_cluster_[2016-09-11T00:30:00.000Z/2016-09-11T00:31:00.000Z]] io.druid.offheap.OffheapBufferPool - Allocating new intermediate processing buffer[3] of size[1,500,000,000]

2016-09-16T18:34:21,025 INFO [topN_scale_cluster_[2016-09-11T00:06:00.000Z/2016-09-11T00:07:00.000Z]] io.druid.offheap.OffheapBufferPool - Allocating new intermediate processing buffer[4] of size[1,500,000,000]

2016-09-16T18:34:21,039 INFO [topN_scale_cluster_[2016-09-11T00:15:00.000Z/2016-09-11T00:16:00.000Z]] io.druid.offheap.OffheapBufferPool - Allocating new intermediate processing buffer[5] of size[1,500,000,000]

2016-09-16T18:34:21,083 INFO [topN_scale_cluster_[2016-09-11T00:18:00.000Z/2016-09-11T00:19:00.000Z]] io.druid.offheap.OffheapBufferPool - Allocating new intermediate processing buffer[6] of size[1,500,000,000]

2016-09-16T18:34:21,093 INFO [topN_scale_cluster_[2016-09-11T00:24:00.000Z/2016-09-11T00:25:00.000Z]] io.druid.offheap.OffheapBufferPool - Allocating new intermediate processing buffer[7] of size[1,500,000,000]

2016-09-16T18:34:21,099 INFO [topN_scale_cluster_[2016-09-11T00:02:00.000Z/2016-09-11T00:03:00.000Z]] io.druid.offheap.OffheapBufferPool - Allocating new intermediate processing buffer[8] of size[1,500,000,000]

2016-09-16T18:34:22,973 INFO [topN_scale_cluster_[2016-09-11T00:14:00.000Z/2016-09-11T00:15:00.000Z]] io.druid.offheap.OffheapBufferPool - Allocating new intermediate processing buffer[9] of size[1,500,000,000]

Thanks Slim for your reply.

After this log CPU and Memory goes high and process kill.

are you setting -XX:MaxDirectMemorySize ?
set it to something like -XX:MaxDirectMemorySize=1024g

Aslo i see the num thread is 13 (druid.processing.numThreads=13) while you only have 8 CPU ? i would go with something less than 8

I have set -Xmx28g for max memory. My node is 32 GB. I hope this is your expected answer.
OR need to add -XX:MaxDirectMemorySize=1024g in jvm.config?

new changes -

Processor

druid.processing.buffer.sizeBytes=2147483647

druid.processing.formatString=processing_%s

druid.processing.numThreads=7

You can also try 0.9.2 for larger groupbys as the entire engine has been rewritten.

druid.processing.numThreads=13

``

You have 8 cores and you set 13 threads ? so that why it turn 800%.