I have just discovered the project and wanted to consult with you regarding our use case.
We have network devices that generate 1PB(100+ tables) data a day(timestamp is very important). We currently have an ETL that process this data and writes parquet files. Then we add partitions to Hive metastore.
Our customers run ad-hoc queries mostly filtering(where clause). According to result some of them run queries with join clause.
I understand Druid is very good for filters, but we also want to have good join performance. I read that join is not good, but I saw Hive integration. Would Druid + Hive be faster than Hive + Parquet?