My Druid is very slow

Hi guys!

we have deployed Druid in a lab environment (v0.8.1).

Cluster features:

(x1) Broker node: Intel® Xeon® CPU E5-2640 0 @ 2.50GHz (x6), 8GB total memory (Xmx/Xms: 3GB, NewSize/MaxNewSize 1GB, MaxDirectMemorySize: 4GB), 18GB total Disk.

druid.cache.sizeInBytes=50000000
druid.processing.buffer.sizeBytes=629145600
druid.processing.numThreads=5
druid.query.groupBy.maxResults=10000000
druid.processing.numThreads=5

(x1) Historical node: Intel® Xeon® CPU E5-2640 0 @ 2.50GHz (x8), 12GB total memory (Xmx/Xms: 4GB, NewSize/MaxNewSize 2GB,MaxDirectMemorySize: 5GB ), 20GB total Disk

druid.segmentCache.locations=[{“path”: “/opt/druid/store/indexCache”, “maxSize”: 6442450944}]
druid.processing.buffer.sizeBytes=629145600
druid.server.http.numThreads=10
druid.server.http.maxIdleTime=PT5m
druid.processing.numThreads=7
druid.processing.columnCache.sizeBytes=0
druid.query.groupBy.singleThreaded=false
druid.query.groupBy.maxIntermediateRows=50000
druid.query.groupBy.maxResults=10000000
druid.cache.type=local
druid.cache.sizeInBytes=536870912
druid.cache.initialSize=268435456
druid.server.http.numThreads=50
druid.server.maxSize=6442450944

Segments => 7 shards in 7 intervals.
1 segment = 114MB (granularity = day) => 1.296.502 rows/segment

1 row = 30 dimensions and 3 metrics

**What’s is the problem?**When running a groupBy in broker node: curl -X POST ‘http://BROKER-ADDR:8080/druid/v2/?pretty’ -H ‘content-type: application/json’ -d @query_groupBy_01.json > resul.txtResult: Time Total is 27s
Query: query_groupBy_01.json

{
“queryType”: “groupBy”,
“dataSource”: “dsLab”,
“granularity”: “minute”,
“dimensions”: [ “field1”, “field2”, “field3” ],
“limitSpec”: { “type”: “default”, “limit”: 50, “columns”: [ {“dimension”: “inbytes”, “direction”: “descending”}] },
“aggregations”: [
{ “type”: “doubleSum”, “name”: “inbytes”, “fieldName”: “inbytes” }
],
“intervals”: [
“2015-08-01T00:00:00.000/2015-08-08T00:00:00.000”
],
“having”: {
“type”: “greaterThan”,
“aggregation”: “inbytes”,
“value”: 0.0
},
“context” : {
“timeout”: 120000,
“queryId”: “q0002”
}
}

We do not know why druid groupBy queries are so slowly… ¿any idea?

Thanks in advance

+1 my groupBy queries are also really really slow to the point that curl will timeout. I have increased curl max timeout and i think jetty server from broker sends time out anyway. I increased my broker and historical memory to about 30GBs and that still improved performance but i still cant query for a 10 day interval with a few javascript filters. this is my query:

{

"queryType": “groupBy”,

"dataSource": “datasource1”,

"granularity": “all”,

"dimensions": [“dim1”, “dim2”, “dim3”],

"limitSpec": {

"type": “default”, “limit”: 10, “columns”: [

{

"dimension": “avg_time”,

"direction": "DESCENDING"

}

**] **

},

"filter": {

"type": “and”,

"fields": [

{

"type": “javascript”,

"dimension": “time”,

"function": "function(x){return(x < 60000)}"

},

{

"type": “javascript”,

"dimension": “dim3”,

"function": "function(x){return(x == ‘value1’ || x == ‘value2’ || x == ‘value3’ || x == ‘value4’ || x == ‘value5’ || x == ‘value6’ || x == ‘value7’ || x == ‘value8’ || x == ‘value9’)}"

}

]

},

"aggregations": [

{

"type": “count”,

"name": "count"

},

{

"type": “longSum”,

"name": “sumTime”,

"fieldName": "sumTime"

}

],

"postAggregations": [

{ “type”: “arithmetic”,

"name": “avg_time”,

"fn": “/”,

"fields": [

{ “type”: “fieldAccess”, “fieldName”: “sumTime” },

{ “type”: “fieldAccess”, “fieldName”: “count” }

]

}

],

"intervals": [

"2016-01-01T00:00:00.000/2016-01-11T00:00:00.000"

],

"having" : {

"type": “greaterThan”,

"aggregation": “count”,

"value": 1000

},

"context": {

"priority": 1,

"chunkPeriod": "PT24H"

}

}

Alberto,

I found this doc pretty helpful: http://druid.io/docs/latest/operations/performance-faq.html

It details how brokers and historical nodes use the JVM heap and off heap memory to store and merge results. helped me to configure right jvm heap and what off heap memory sizes i needed to improve on some performance.

But if anyone can help with other configurations that would improve performance that would help a ton.

Hey Nicholas,

One simple thing you could try first is replacing your second JS filter with an OR of “selector” filters. Generally any non-JS filter or extractionFn will be faster than the equivalent JS based option. So it’s best to reach for the JS stuff when you’re trying to do something that isn’t supported by native filters or extractionFns. One tip is to use PlyQL in verbose mode to help write your Druid queries (http://github.com/implydata/plyql), it converts SQL to Druid queries in as close to the ideal way as it can.

Another thing you could do, if you have a lot of historical nodes, is reduce merging overhead by disabling caching on the broker and enabling it on the historical nodes instead. This will move merging to the historicals which generally works better for larger clusters.

Thx Nicholas McKoy, i readed FAQ but it did not solve our problem. I tried to test with several JVM configurations and none has solved the problem.

Somebody uses groupBy query in your druid deployment?

Alberto, can you use the existing Druid metrics to narrow down what is slow? Is it merging? Segment scans? Network time?

http://druid.io/docs/latest/operations/metrics.html

Hi guys!

We “solved” our problem with low groupBy. The problem is high cardinality in segments. The dimension with high cardinality contains IPv4 values

Fangjin,

How would i go about sending those metrics to a kafka topic? Is there a prop config to do that?

No, you emit metrics from Druid to a HTTP Kafka producer.