we have two datasources where we aggregate the data based on minute or day query segments.
While two of the dimensions have a cardinality of 4 and 20, the multivalue list has a cardinality of 2159455 on each partition, having about 830.000 rows each.
Our cluster contains two historical and four realtime server (each datasource splited to two servers having two partitions on 8 cores and 28gb ram - gc has a troughput of minimum 99.7 %, numThread set to 7 in the middlemanager configuration).
When we use a timeseries query we got not more than 30 requests per second at the daily datasource while the cpu gets up to 100% on both realtime servers with a response time from about 200ms and raising up to 600ms for each query.
We tried several settings, trying two partitions and eight partitions, moving the mulltivalue list to a single dimension (just for testing), using topn instead of timeseries, but we don’t get a better performance (response time and/or requests per second). Finally we dropped away some datas, resulting in half of the ingestions (multi value list -> cardinality of ~ 1.1 mio, 340000 rows each partition) which doesn’t speed up the query speed, but getting 60 concurrend requests per second.
This seems a little bit strange to me, because both of the resulting in the same query speed but only half of the concurrend requests when serving double row.
We allready checked the historical nodes which perform very well (response time about 90ms) and eliminate this possible problem by just quering the intervals stored at the realtime server(s).
Maybe some one have some ideas what we can do