[druid-user] Druid Datasource with 5000 colums


I am creating a datasource with 5000 columns.
In Postgres, it was 10+ tables. Now making as single datasource in Druid.

Can I go ahead with single datasource?
Or shall I create multiple datasource with joins?

I do insert into datasource one by one [One record at a time].

Do I need to some tuning like excluding indexes on string columns for the write/read operation to be seamless

Please advise.


Hi Abinaya,

Can you share a bit about your use case? I’d consider what my queries would look like when thinking about these things. Stated a little differently: what should the data model look like in order to facilitate my queries and really leverage Druid’s architecture?



P.S. Imply (my employer) has a free course called APACHE DRUID® INGESTION AND DATA MODELING which might be of interest to you.

That’s impressive number of columns.

The biggest problem is high cardinality string columns. So, you can probably start with all the columns that are not high cardinality string columns.
And make sure you don’t add columns that you don’t use for filtering in you dimensions.