Regarding Druid Query Performance


I am doing a POC on Druid, for this i have ingested ~400gb of TPC-H data into it. Queries are working but taking more than 10 seconds for group by and topN. As Druid documentation says that it will give sub second response, so just wanted to know whether my configurations are correct or not,

I am running Druid on AWS cluster (9 nodes), each node has 32 cores, 64 gb of RAM and 100 gb of space. I have configured 7 nodes for historical, 1 for broker and 1 running overlord, coordinator, zookeeper. Middle manager is running on one of the historical node.

Following are my historical and broker config,


If you look at system metrics (like iostat) are you getting a lot of reading from disk during your queries? If so you’re probably low on memory. In that case the first thing I’d try is lowering druid.processing.buffer.sizeBytes a little on your historicals to free up some memory for disk cache.

If you aren’t reading from disk much, maybe try checking:

  • are your segments a reasonable size?

  • are your jvms GCing too much?