Is it possible to Select a dimension which got added in "dimensionExclusions"?

Hi there,

My understanding is adding a field into “dimensionExclusions” will exclude a dimension from indexing, but will it still end up in Druid? Should I be able to select the field while querying?

Thanks, Slim,

Yes, it is not indexing the field, but I still able to select the field from Druid (which is what I want). The behavior opens up an exciting requirement.

Let’s say we’ve event_id, event_name & event_attribute (which is a JSON field). I would like to index event_id & event_name, but dimension excludes event_attribute. So in this case, I should be able to select an event_id and list all the event_attribute since event_attribute will be excluded from roll-ups? Is that a correct assumption to make?

The term “indexed” here means persisting it into the druid data model. In order to do what you want, you must index the attribute as a dimension (assuming it is not a number you want to aggregate). You can then do a dimensional query to pull the distinct attribute values for any given selection criteria of time & event ID. Be aware that if the attribute has high cardinality it will blow up your roll up ratio. If you are not interested in the actual values of attributes but just the cardinality, you could look into indexing the attribute as a metric of aggregation type thetaSketch

Got it. I was hoping that I can “dimensionExclusions”: [“user_id”], a column and still able to do

"queryType": “select”,

"dataSource": “users”,

"threshold": 10,

"intervals": [

"2017-10-31T20:00:00.000Z/2017-10-31T20:30:00.000Z"

],

"dimensions": [“user_id”],

But it turned out I was wrong since I can see the field user_id is null. Is there anyway we can handle this scenario? I don’t want to index the user_id field but just want to show last 10 values.

Hey Ananth,

In Druid, all string fields are always indexed. For this reason, we often use “indexing” and “storing” as synonyms. So when we say “the field will not be indexed” that really means “the field will not be stored at all”. So that’s why you get nulls when you query it.

Fwiw, this may be changed in the future, so a string field could be stored without being indexed. But as of this writing (Druid 0.10) that is not yet the case.

Thanks, Gian, I think that would be an excellent feature to add. We have use cases like we want to show real-time stats on audit logs so Druid is a perfect fit for it. If something off the chart the users may want to see the individual transactions for the selected hour (which might contain unique values). The current exclusion restriction force us to use two different systems or drop Druid and adopt something like Elastic search to avoid the complexity. I’m looking forward to this feature.

Hi,

Is this still the case with druid. I have a similar use case, and i don’t want to use two different systems if i can avoid.

Thanks in advance,

Bhargav Bolla