Materialised Views

Hey , i am using materialized views in druid .
My query is lets say i added a new dimension in BaseDatasource and since it is important i want to add it to my derivedDataSource Spec too , will i have to create a new derivedDataSource or it can be added in the previous one too without modifying previously stored data.

Hi Ritik:

If you want to add a new dimension existing Druid datasource, it will be done like re-ingestion everything, old and new dimensions all over again. Or you can create a new datasource, but I don’t see a point to do so.

Hope this helps

Hey,
To add to Ming’s answer - AFAIK, it is possible to add a new dimension to an existing datasource without having to re-ingest all the data:

Say you have a datarouce with dimensions A, B.

You start ingesting new data from Jan 15th, with dimensions A, B and C, where C is the new dimension.

When you’ll query the data, the old rows (i.e from before Jan 15th) will be returned with “null” as the value of C, while the new rows (from Jan 15th onwards) will contain an actual value for C (whatever value you ingested).

That said, I’m not sure the same behavior applies to materialized views.

If you end up giving it a try, it’ll be great if you could share your findings.

Thanks,

Itai

Actually, re-reading Ming’s answer, I think Ming meant that this (i.e re-ingesting everything all over again) is in fact the behavior for materialized views (as opposed to base datasources).
Ming - is that correct?

If so - how does that work? Do you update the materialized view somehow and behind the scenes it kicks-off reingestion process? Do you delete the materialized view and re-create it?

Thanks!

As i try to add a new dimension in the existing materialized view spec ( derived Datasource ) - shows me following error in Overlord logs

org.apache.druid.indexing.materializedview.MaterializedViewSupervisor - Failed to start MaterializedViewSupervisor-Click_Dump_MView. Metadata in database(DerivedDataSourceMetadata{baseDataSource=Click_Dump, dimensions=[sld_dot_tld, ad_position, customer_id, keyword_term, ad_display_url], metrics=[m_net_bid, m_count]}) is different from new dataSource metadata(MaterializedViewSupervisorSpec{baseDataSource=Click_Dump, dimensions=[sld_dot_tld, click_status, ad_position, customer_id, keyword_term, ad_display_url], metrics=[m_net_bid, m_count]})


Mailtrack

Sender notified by

                [Mailtrack](https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&)
                15/01/20, 14:18:27

I think for adding any new dimension or metric , we need to remove the existing derived Datasource and add a new one with new set of dimension and metrics .
This is however not a good way of doing things. I think there must be a cleaner way or must be implemented such that existing derived datasource spec can be updated


Mailtrack

Sender notified by

                [Mailtrack](https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality5&)
                15/01/20, 14:19:08