Granularity in input spec

Hi,

Can we define multiple segmentGranularity such as hour,day,week in granularitySpec while creating input spec in druid.

Thanks

Hi,

Druid supports only one granularitySpec per ingestion spec. May I ask you why you want to specify multiple ones?

Jihoon

I have a requirement to compute and return some aggregations (count, AVG difference etc.) on hourly, every four hours, every eight hours, half day, daily, weekly,
bi-weekly, monthly, quarterly, half yearly and yearly basis on some timestamp field say ‘completed time’. If druid stores the ‘completed time’ field as it is without any precomputed granularity level aggregation, then when the queries are run with different
granularities, I am believing there will be runtime overhead of running rollup (hour/day/week wise etc.) during query execution. If my assumption is right, is there any way to avoid runtime overhead?

I also have multiple timestamp fields like acquired time, completed time, created time and I want to have different granularities to be queried on all the 3 timestamp
fields. Is this possible within a single data source? If so how to achieve that?

Hi,

please find my answers below.

  1. Yes, you’re correct. If data is stored without rollup, you may have bigger overhead for aggregating more rows at runtime. I think there’s no way to avoid this except creating multiple dataSources per granularity you want. However, this kind of workaround should be considered only when you see serious performance degradation. Have you already done some performance test for your application? If so, please let me know what the result was. If not, I recommend to do it first. For testing, you may choose the granularity which is mostly used in your application for the timestamp granularity.

  2. The timestamp in Druid is a special field which is used for primary data partitioning. So, when you query with a filter on the timestamp, you can expect that Druid can efficiently filters out unnecessary data and executes the query fast.

If you have multiple fields containing timestamp values, you still need to choose one of them for Druid’s timestamp field. For others, you can store them as long types dimensions. You can use time functions (like timestamp_floor()) to apply different granularities for those dimensions. Please check http://druid.io/docs/latest/misc/math-expr.html#time-functions for details.

Jihoon