SSD vs EBS druid perfomance

Hi Everyone,

In all the documentation for production clusters, aws r3.8xl is mentioned. Is there any known limitation with using r4 series (same CPU and RAM but r4 has EBS unlike r3 SSD). I would assume there is going to be some network latency added to the read and write with r4. Anything other than that, that would significantly affect the cluster performance?

Thanks in advance. Let me know if you need any further information to better answer this.

EBS performance can vary a lot depending on usage and whether or not you have provisioned IOPS. I think overall your experience would depend on how much reading you are actually doing from EBS, vs the in-memory cache, and also what kind of EBS you are using.

Thanks for the response Gian. We are experiencing a query latency of ~2 seconds and we are trying to find root causes for that. To explain our use case better,

we have 15 nodes (r4.8xl) hosts, we specified max size of 300 gb per node and we have over 3.5 TB of data as of now. We are on 0.9.2

here is the query we are trying to use

{

“queryType” : “groupBy”,

“dataSource” : “sales-rank-daily”,

“dimensions”: [{“type” : “listFiltered”,

“delegate” : “category”,

“values”:["#####################",

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”,

“#####################”

]

}],

“limitSpec”: { “dimension”: “category-rank”, “direction”: “descending”, “dimensionOrder”: “numeric” },

“granularity” : “day”,

“filter”:{ “type” : “and”,

“fields”:[

{

“type”: “selector”,

“dimension”: “asin”,

“value”: “####”

},

{

“type”: “selector”,

“dimension”: “######”,

“value”: 1

}

      ]

},

“aggregations”: [

{ “type” : “doubleMin”, “name” : “sales-rank”, “fieldName” : “category-rank” }

],

“intervals”: [“2015-06-23/2017-06-23”]

}

I have given the default group by version set to v2. I have 9 hosts (r4 8xl) running (overlord, broker and coordinator) and 9 hosts (r4 8xl) running middle manager. Attaching the config for each nodes. We were going to try and see if changing to ssd would improve the performance. If you find anything that would improve the performance or something that we are doing wrong please do point. the cluster is going to grow larger like 5 times atleast.

JVM args for Nodes (3.42 KB)

historical.properties (549 Bytes)

middle-manager.properties (1.04 KB)

overlord.properties (402 Bytes)

coordinator.properties (640 Bytes)

broker.properties (571 Bytes)