View creation with query only once per day

Hi,

We have just started working on Druid version 0.19.0.
Our product wants to execute nested query for n number of Columns for different attributes that will return around 20-30 million rows. Nested query is something like given below -

a. Data Set on Event- 20-30 Million rows (Summary Tables) 40+ Columns
b. Data Set on Profile - 2 - 3 Million rows - 10 Columns
c. Data Set on Subscription 25K rows - 10-15 Columns

The above query may take very long time and can’t be executed again and again.

Now my question is -

  1. Can we create/automate any view for such results that can be updated only once per day?
  2. Product can further query on the result data.
  3. Can we export this results to Google Studio?

Thanks
Amit

Hi Amit,

You can create a reindexing task and schedule it every day. Enable rollup to get similar functionality like group by. Make sure your reindex task creates a different data source. This is not really a view but a copy of your existing data with limited number of dimensions and pre-aggregated metrics. This should work faster than a so called view since most of the data is pre-aggregated and rolled up.

You can export results to any jdbc supported data source. If a system does not support any integrations then you can download results in a supported format e.g. csv, tsv, json by changing query results. Please refer to additional configurations for large result sets.

https://support.imply.io/hc/en-us/articles/360034310953-Tuning-Druid-for-Large-Result-Sets

Hope this helps,
Cheers,
Gaurav

Just to give you more help on the google studio integration, You can create a scheduler task using airflow which can 1) run the reindex task 2) run SQL query on the rollup data source 3) upload results in google storage 4) kick off ingestion task in google studio. If your cluster resides within GCP then you can reduce a few steps by directly saving the results in google storage and ingesting them directly from there.

Hi Gaurav,

Thanks for your quick response. Can you please share some examples or any link to create such a scheduler inside druid with rollup matics.
Will it create a new datasource daily or just replace the content of the datasource on a daily basis?
I use Druid web console so may I do the same with this console? Can you please share some screenshots if possible.

Thanks & Regards
Amit Srivastava
9899724484