Regarding empty dimensions

Hi Team,

I use Kafka to ingest data into Druid. The datasource is schemaless. For example:

{“id”:20005900,“propmapid”:168959,“tenant”:30000001,“user”:10005960,“event”:2,“os”:“Linux”,“time”:1551378602002,“browser”:“Opera”}

{“id”:20005901,“propmapid”:168959,“tenant”:30000001,“user”:10005960,“event”:2,“os”:“Linux”,“time”:1551378604562,“browser”:“Opera”,utm_referrer:“Medium”}{“id”:20005902,“propmapid”:168959,“tenant”:30000001,“user”:10005960,“event”:2,“os”:“Linux”,“time”:1551378605562,“browser”:“Opera”,utm_source:“Google”}

So in storage, it will be stored as

id propmapid … __time utm_referrer utm_source

1:20005900 1:168959 … 1:1551378602002 2:Medium 3:Google

2:20005901 2:168959 … 2:1551378604562

3:20005902 3:168959 … 3:1551378605562

1, 2, 3… being row id.

While querying,

select * from actions_1 where utm_source=’’; it returned the expected result.

Since the utm variables are inserted only when they are present, Does the dimension ignore the event’s entry in the column or does it add empty string or null value to the column that may affect space requirements for ‘’(empty)/null values ?

Please clarify me on this case.

Can someone pls clarify this to me ?

Hi Vignesh,

It would add a “null”.

Thanks & Rgds

Venkat