druid with evolving data source schema

If data source schema changes. How does or what is the best strategy to plan for the change. Here is an example. Suppose our original schema


is changed to

[{“timestamp”:“2015-04-24T17:13:44.547Z”,“col1”:“foo”,“col2”:“1.2”,“col3”:“3.8”, “col4”:“test123”},{…}]

i.e. adding col4 at some point of time in operation. I know we can create a new data source upon change and then reindex the old data to bring it up to date via IngestSegmentFirehose. Is it possible that we can change the schema without indexing so that query of old data still work for col1-3?

Also, can druid realtime node send alert based on data pattern, such as if the count in past 3 hours are above 10, alert someone?

Schema changes actually work automatically. You can add or remove columns at any time, and any columns that existed in your old data will continue to be queryable. If you query your old data for a column it doesn’t have (like col4) then you’ll get either nulls, or empty results, or zeroes, as appropriate for the kind of query you’re making.

Druid realtime nodes don’t have an alerting feature, although folks have built them outside of Druid by simply making queries periodically and inspecting the results.