What should be the data type for boolean data?

According to this the only available data types in Druid are String, Long, Float, Double. So, if I have a boolean field in the json data say for eg., is "isAnonymous":false what should be the data type for this in the schema? Should it be string as Druid takes it as a default datatype or should it be long as given here that BOOLEAN maps to LONG at runtime in druid?

Thank you @Hareesh! I am going to speak to some people and see if this can be checked…

We have been handling it with transform to long:

(that’s escaped value)

parse_long(if(""==“true”, 1, 0))"

I tried ingesting Avro boolean fields both as string and long without any parsing. In the string case I got null values and in the long case I got 0s, I assume these are the default values of these datatypes which means that you have to parse input value as string true/false or long 1/0.

1 Like

I guess there are different takes. In druid slack, it was recently suggested to use string, I’m not sure if there’s great advantage to either. All I can think of is that if it’s a dimension, you’ll get a bitmap index that can be used.

I think boolean values will always ultimately get stored as 0/1 - either as a long, or in the dictionary value for ‘false’ and ‘true’ (based on the slack comment).

Update
AVRO boolean fields if set as string in Druid get parsed correctly. Tried this using Schema Registry encoded Kafka messages.

1 Like