Tranquility w/ 24 hour tasks (and the availabilityGroup)

Thanks Gian for the help RE: task heartbeating.

One more…

We are using Tranquility for real-time tasks.

When using hour segment granularity, we see the firehoseId re-used after 24 hours, which results in a new task sitting in pending even though we have slots for it. I believe this is because the availabilityGroup does not include date:

156 val availabilityGroup = DruidBeamMaker.generateBaseFirehoseId(
157 location.dataSource,
158 beamTuning.segmentGranularity,
159 interval.start,
160 partition
161 )

The availability group is then substituted into the firehosePattern with:

val firehoseId = “%s-%04d” format(availabilityGroup, replicant)

Should we include some portion of the date into the availabilityGroup to allow for tasks running longer than 24 hours?

-brian

Hey Brian,

The base firehose id cycles on purpose to avoid creating too many dead znodes in zookeeper (Curator service discovery doesn’t clean up empty services). The cycling is based on your segmentGranularity. This is clearly a pretty gross hack, but it works well well in the expected use case for real time (windowPeriod < segmentGranularity).

Do you have a really long windowPeriod and is that why your tasks run for such a long time? If so, perhaps consider setting your segmentGranularity to “DAY”.

If that doesn’t work for you, there are actually a couple of PRs out that could potentially improve this. One of them adjusts the way the cycling works to also take into account windowPeriod. The other gets rid of Curator service discovery completely, so the hacky workaround is no longer needed and we can discover tasks based on their unique id instead of a reusable identifier. One or both of these PRs should be coming to a future release.

Thank Gian.

Yes, we were testing various size window periods.

We started with a generous window period, giving straggler events 8 hours to come in. (w/ 1H segment granularity)

Then, when we went from 8 -> 24, we saw this behavior.

Changing to DAY on segment granularity should do the trick.

(and agreed, in the typical use case this should not be a problem)

thanks!

-brian