Understanding Druid Query performance and tuning druid for better query time

Hey Team,

We use Druid in our production setup and we are trying to tune foe performance. The following is the version I use.

Druid v 0.12.1

Ref: https://dzone.com/articles/scaling-complex-queries-using-druid

  • I understand that segment query latencies depend on number of cores across historicals vs number of segments, I have tuned this to reduce query time.

  • Now as mentioned in the above link, I am trying to reduce memory maps and set the “druid.server.maxSize” to match my available physical memory so that historical behaves as an in-memory store i.e. segments are loaded from physical memory.

  • Despite doing this, I do not see a significant improvement in query performance(query/segment/time).

Is the approach I use correct? I do not even see a spike in memory usage as well so I still doubt that segments are not being fetched from physical memory.

Just in case if every segment is loaded to memory on demand, I would better fare using compute optimized machines rather than memory optimized, am I correct in thinking so?

Thanks

–Kiran.

Hi Kiran,

what kind of queries did you test? Would you share one of them?

Also, what is your expectation for those queries?

Jihoon

@JihoonSon, Iam executing basic timeseries Query over 3 months of data
Query :

{

“queryType”: “timeseries”,

“dataSource”: “test-events”,

“intervals”: [

“2018-02-01T00:00:00.000Z/2018-07-18T00:00:00.000Z”

],

“granularity”: “all”,

“aggregations”: [

{

“type”: “hyperUnique”,

“name”: “count”,

“fieldName”: “userIdHll”

}

]

}