Possible Druid Migration

Hi folks,

I have just discovered the project and wanted to consult with you regarding our use case.

We have network devices that generate 1PB(100+ tables) data a day(timestamp is very important). We currently have an ETL that process this data and writes parquet files. Then we add partitions to Hive metastore.

Our customers run ad-hoc queries mostly filtering(where clause). According to result some of them run queries with join clause.

I understand Druid is very good for filters, but we also want to have good join performance. I read that join is not good, but I saw Hive integration. Would Druid + Hive be faster than Hive + Parquet?

Best regards.

Hi BK,

You can load the Parquet to Druid to create data sources (pre-aggregated if required) and then query. This would be faster.

Thanks & Rgds