Hi powers to be.
We have a kind of large druid cluster with TB’s of data, and 20K segments.
The first query we are running(TopN) for the first time take 60-400 sec while our cluster moves to 100%CPU (after cluster restart for example).
We are using memcache (read and write from historical, disabled on broker)and we see that after the first query we are getting blazing fast results as the query gets data from the memcache.
As that the performance we would like to show our customers we wanted to do a “prewarm” to the cache, we used JMeter to run queries on all of our dimensions thinking it will allow the first queries to work faster. and it did much faster,
Here is the issues er are facing, i would really like if someone can help point me to the right direction or explain what am i doing wrong.
As i said the prewarm is working fine and we used all or metric in the query (all 7), but :
Changing the query to have 1 less metric resulted back to queries that takes 60-400 sec, is the query key contains all the metric in the query ? any way around it ? (prewarm 7! options is not a good way to go)
I have two brokers, moving the query to the second broker result back to 60-400 sec per query, why ? (the data should be cached on the historical level, no ? dose the key contains the broker id ?)
Even in the same broker and the same query when i change the limit (threshold) in the query from 2000 to 60 the performance desegregates not back to 60 sec but it runs longer, why ?
Lookup - we are using lookup’s with JDBC to our mysql, for some reason what we dont get when the data is in the cache and we are running the query once more it runs fast, now if we will wait X minutes and rerun the query it will run slow again… once we removed lookups we were not able to reproduce the issue, Do we need to increase the firstCacheTimeout variable ? why is it heapping?