I’m thinking about using Druid for a time-series project.
I’d like to ask how Druid handles large “IN” statements.
Basicly i’m collection data from a lot of IoT devices(they send a data entry every minute) and my users needs to be able to see aggregated/mean data for any selected amount of IoT devices, often they will select between 100-1000 devices, so it will be some large “IN” queries. Does Druid handle this well, or should i look for an alternative?
Another query i need to do is an aggregation/mean of the newest data entry from a selected batch of the IoT devices.
Druid keeps the values of the IN filter in memory during processing, and 100-1000 values sounds ok if they are not extremely long strings.
And, I believe you can query on the newest data entry by setting the ‘intervals’ in your query.
2017년 4월 5일 (수) 오전 4:27, Daniel Strøm firstname.lastname@example.org님이 작성:
Ok great. No shouldn’t be too long as it’s numeric id’s, so probably 1-99999
We’ve seen lousy performance on large “in” predicates when using the default concise bitmaps. They were a few times faster against roaring bitmap-indexed data.
It depends on your query, but you can definitely get into situations where mixing a few other filters with with large in clauses can get slow.