Let me tell you a little bit about my use case so that you understand what i am looking for, i will publish events to kafka, and then consume it in druid so that i can query it, i don’t know the schema of these events, and i want druid table to hold multiple events with different schemas, and the events schema can change at any time, also i want to be able to query these events using math functions like (max, avg,…) but these functions does not work with strings out of the box, i would need to keep using “PARSE_LONG” and also will need to include a regex filter in my query so that i can ignore rows that does not have this column, so basically i want schema less but without using string type for all columns, is it possible?
If i delete all the dimension from my spec, druid will detect all columns as dimensions but all the types will be string, but i don’t want that.
So i have couple of questions:
- If all dimensions are strings, will it affect query or index performance?
- What will happen when i add a new column, will it try to reindex old segments, or it will not affect them?
Is there some data format that requires some type of column definition, can you point me to one of them? and will it support adding new columns dynamically without changing schema?
And what will happen if you change the column type? let’s say that i have a column “X” of type “String” then i changed it to “Number”, is it possible?