My source dataset (on HDFS) is organized in YYYY/MM/DD format in Avro format. This data is updated every day in 10 day window. For Ex: data is generated for 1 to 10 days on Day11. On Day 12, day2 to day 12 is updated. Hence my source data keeps getting updated. Due course of time new dimensions and metrics will be inserted.
Is there a data loader to directly load Avro dataset into Druid ? (Now, i convert the data into JSON before ingesting).
How do i update data set in Druid as my source data set keeps updating. (Obvious solution is to drop last 10 days of data and insert again, but this means there is customer down time. How do i avoid that ?)
How data will be read when new dataset is being computed for the same time duration. Will there be a downtime ?
- How and what will happen when i change schema and ingest data with it.