Ambiguity in Indexing DimensionsSpec

Hey,

when indexing and reindexing, one has to set :

  • dimensions

  • dimensionExclusions

It has unexpected behavior because it doesn’t act as WhiteList and BlackList. Ie. when I’m reindexing segments and I provide

only “dimensions”, the resulting segments contain all dimensions from source segments, not only “dimensions”.

So that I don’t understand what “dimensions” mean if not a whitelist, from my point of view, I have to always specify “dimensionExclusions”

if I want to omit certain dimensions.

Is there anything like “Whitelist” so that I don’t have to always analyze metadata and put blacklist together?

Is this a clashing with “Schema-less dimensions” described near the bottom here: http://druid.io/docs/latest/ingestion/schema-design.html

Kyle

Kyle I tried even the opposite approach, leave “dimensions” empty and specify only “dimensionExclusions” :

https://pastebin.com/raw/DwXw8dmE

to reindex all segments we have in a datasource. But the segments didn’t change at all. All the excluded dimensions are still present.

Would you please check the indexing task in pastebin, what could I be doing wrong?

I believe when you are reindexing (using dataSource type inputSpec) then the dimensions that you want to keep need to be specified in that ingestionSpec’s dimensions and metrics fields. This is probably the whitelist that you’re looking for. If the dimensions field is null, then you always get them all. It’s possible parseSpec is ignored when re-ingesting segments?

I leave IoConfig.InputSpec.IngestionSpec.dimensions empty as according to documentation, it should be inferred from ParseSpec.

But as you say, maybe it is possible that it is ignored while reindexing/updating.

I tried these already (see pastebin) :

  1. specifying only “dimensions” with dimensions I want to keep

  2. specifying only “dimensionExclusions” with dimensions I want to remove

  3. now I’m going to try to specify both, but since 2) didn’t work, I think this won’t work either …

  4. next, I’m planning to set IoConfig.InputSpec.IngestionSpec.dimensions in case the ParseSpec is ignored

Ok, I tried both 3) and 4), segments get reindexed but they still contain all dimensions :-/ https://pastebin.com/raw/cxFBYn7q

I raised a bug because I cannot think of anything else I can try https://github.com/druid-io/druid/issues/5095