What should be the data type for boolean data?

According to this the only available data types in Druid are String, Long, Float, Double. So, if I have a boolean field in the json data say for eg., is "isAnonymous":false what should be the data type for this in the schema? Should it be string as Druid takes it as a default datatype or should it be long as given here that BOOLEAN maps to LONG at runtime in druid?

Thank you @Hareesh! I am going to speak to some people and see if this can be checked…

We have been handling it with transform to long:

(that’s escaped value)

parse_long(if(""==“true”, 1, 0))"

I tried ingesting Avro boolean fields both as string and long without any parsing. In the string case I got null values and in the long case I got 0s, I assume these are the default values of these datatypes which means that you have to parse input value as string true/false or long 1/0.

1 Like

I guess there are different takes. In druid slack, it was recently suggested to use string, I’m not sure if there’s great advantage to either. All I can think of is that if it’s a dimension, you’ll get a bitmap index that can be used.

I think boolean values will always ultimately get stored as 0/1 - either as a long, or in the dictionary value for ‘false’ and ‘true’ (based on the slack comment).