I have been using Druid for two days and have worked my way through the wikipedia and twitter examples. Now, I am interested in batch ingesting my own data. However, I am a bit confused about this process, specifically the definition and usage of metrics.
Dimensions as I understand are used to slice and group your data. If I’m not mistaken, they are usually strings (do they have to be strings???). Metrics on the other hand are used to provide quantitative information about these groupings of dimensions - they seem to be derived from aggregation functions (min, max, avg, sum) applied to doubles or integers.
Let’s say I have data in the following form (just an example to work with):
String high_school, String city, String state, int num_students, int num_teachers, int num_rooms, double tuition
Clearly high_school, city, and state are dimensions I would like to be able to slice and group by. The other numerical data will be used to compute aggregates.
First question: In defining my spec file for the batch ingestion would I include the numerical fields as dimensions - even though they are not? You seem to have done this in the twitter example:
“dimensions”: [ … , “retweet_count”, “follower_count”, “friendscount”, … , “statuses_count”, … ],
I do not understand why you include these as dimensions.
Second question: Why are metrics defined in the spec file? This makes no sense to me. Does this mean Druid is pre-aggregating values? Does this mean I need to know what types of queries I will run before ingesting data? For instance, initially I might think I would like to see the total number of high school students in a certain city so I would define a metric to longSum the num_students field. However, what if after I ingest the data, I decide I’d rather know the max number of students at a high school in each city? I originally thought you could just define whatever aggregate you want in a query, but are you constrained to the metrics you defined in your ingestion spec file? If not, then what is the point of the metric in the spec file at all?