Ingestion policies

I was looking at the ingestion methods for Druid and noticed that it seems like you have to say beforehand what columns will be loaded into the database for file ingestion.

Is there anyway you can have it auto create new columns based on what comes in - much like influxdb does? We have thousands of different columns and new ones added frequently in our application.

Thanks

Hey Steve,

In Druid you don’t have to specify your dimensions up front. If you specify “dimensions”: null in your schema, then any field you add in your data will automatically become a dimension.

You do need to specify your metrics up front, though.

Gian,

So metrics are specific dimensions? We have engine data where new channels ( metrics in car lingo ) are added all the time - is there any way to do this programmatically?

Hey Steve,

Check out this document if you haven’t yet: http://druid.io/docs/latest/ingestion/schema-design.html

In Druid lingo, dimensions are columns that you can group and filter on, and metrics are columns that you want to compute aggregates on (like Sum/Min/Max). Some other systems have a “tag” concept that is like Druid’s “dimensions”.

If you have timeseries data, you could model it as a stream where each message has a single “value” field that is a numeric value, and also a bunch of dimension columns that you can use for grouping and filtering. Then you might tell Druid to compute value_sum, value_min, value_max as aggregates in your metricsSpec, but discover dimensions dynamically.

Gian,

Thanks for the information, it sounds like with my time series data the metric would be just a “value” field and the name of the channel would be just another dimension along with the other meta data.

Yep, that should work.