How to make sure filters are working efficiently

My understanding is that the main “index” data structure in Druid is the bitmap index that is automatically created for all string columns.

Is there a good way to understand (like a SQL EXPLAIN) how well the bitmap index is being used by different filters?

(I know that Druid SQL has EXPLAIN but that just says what JSON query it turns into, and I’m just using the JSON queries myself.)

For some permissions stuff I want to just smack a

AndFilter([Q, InDimFilter(‘project’, [‘projects’, ‘user’, ‘has’, ‘access’, ‘to’])])

around every query.

I am paranoid that maybe

AndFilter(

SelectorFilter(‘project’, ‘access’),

InDimFilter(‘project’, [‘projects’, ‘user’, ‘has’, ‘access’, ‘to’]))

would perform worse than the (equivalent) SelectorFilter(‘project’, ‘access’), if perhaps the “query planner” isn’t as smart as I’d hope.

(a) Is this going to be fine?

(b) How could I learn the answer to this question myself?

–dave

Hey Dave,

There isn’t an explain functionality for this right now (it would be nice though!). Today (Druid 0.12.3) Druid will not collapse the “selector” and the “in” together, so you would get better performance if you did it yourself. However, if you are using Druid SQL then the equivalent “project = ‘access’ AND project IN (‘projects’,‘user’,‘has’,‘access’,‘to’)” will be collapsed into “project = ‘access’”. This is one of the advantages of using Druid SQL over Druid’s native query language: the planner is more sophisticated in the optimizations it will do.