Realtime Plumber Settings

Hi I am using druid 0.6.171 trying to figure out the best plumber settings for my realtime node. I want to be able to query the data in 10 or 15 minute aggregations using groupby so I was thinking my segment granularity should be something like 5 minutes but cannot see how granularity can be between minute and hour, so I went with minute. Can this be set to 5 minutes? Also, the windowPeriod concerns me because it is 10 minutes which is greater than segment granularity. I did this because I don’t want to lose data when the realtime ingestion falls behind on the incoming event queue. What are the implications of setting this greater than segment granularity? Will segments only roll every 10 or 11 minutes? Thanks for your help!

My settings:

“plumber” : {

“type” : “realtime”,

“windowPeriod” : “PT10m”,

“intermediatePersistPeriod” : “PT5m”,

“segmentGranularity”: “minute”,

“basePersistDirectory” : “/mnt/vol00/druid/plumber”,

“rejectionPolicyFactory”: {

“type”: “serverTime”

}

}

-drew

Hi Drew,

The segmentGranularity is actually the storage (and therefore handoff) granularity, but isn’t necessarily the same as the indexGranularity, which is your aggregation granularity. You could try using a 5 minute indexGranularity, 10 minute windowPeriod, and “HOUR” segmentGranularity. In that case, segments will hand off every hour (starting 10 minutes past the hour) and you will be able to do aggregations that are multiples of 5 minutes.

Thanks for the response Gian, I guess I was unclear on the relationship to segments and aggregation granularity. There are a few references to indexGranularity in the docs and references to queryGranularity as well. Is it sufficient to set query granularity to 5 minutes or do I need to set both? Do I set indexGranularity in

“dataSchema”: {

“dataSource”: “agg_volume”,

“indexGranularity”: “minute”,

“parser” : {

-drew

Hi Drew, segmentGranularity is storage granularity and affects things like how often handoff happens and how big each segment will be. indexGranularity is the granularity of aggregation within a segment (as stored). queryGranularity is the granularity of aggregation for a particular query. The relationships between those need to be segmentGranularity >= indexGranularity and queryGranularity >= indexGranularity. queryGranularity and segmentGranularity don’t need to have any particular relationship.

If you want to set indexGranularity to 5 minute, you can do that with “indexGranularity”: {“type”: “duration”, “duration”: 300000}. Setting it to “minute” would work too (you can still use a 5-minute queryGranularity), it might just yield somewhat larger segments.

Hope this helps.