Large IN query

Hello

I’m thinking about using Druid for a time-series project.

I’d like to ask how Druid handles large “IN” statements.

Basicly i’m collection data from a lot of IoT devices(they send a data entry every minute) and my users needs to be able to see aggregated/mean data for any selected amount of IoT devices, often they will select between 100-1000 devices, so it will be some large “IN” queries. Does Druid handle this well, or should i look for an alternative?

Another query i need to do is an aggregation/mean of the newest data entry from a selected batch of the IoT devices.

Thank you,

Daniel

Hi Daniel,

Druid keeps the values of the IN filter in memory during processing, and 100-1000 values sounds ok if they are not extremely long strings.

And, I believe you can query on the newest data entry by setting the ‘intervals’ in your query.

Thanks,

Jihoon

2017년 4월 5일 (수) 오전 4:27, Daniel Strøm danielss89@gmail.com님이 작성:

Ok great. No shouldn’t be too long as it’s numeric id’s, so probably 1-99999

We’ve seen lousy performance on large “in” predicates when using the default concise bitmaps. They were a few times faster against roaring bitmap-indexed data.

It depends on your query, but you can definitely get into situations where mixing a few other filters with with large in clauses can get slow.