Response not right when historical node full GC or broken

Hi,

we have druid cluster with 0.9.2.

I found, when there is one or a few historical node suffering full GC or broken, the coordinator node then do the balancing or shuffling on all segments,

but right now, querying to broker is not right, the broker just return the result based on lacking data, which is not actual one.

So are there some tuning things on this case ? Or just avoid full GC?

The docs in http://druid.io/docs/0.9.2/design/coordinator.html about Segment Availability says,

it can set a period of time to reassigned segments to other
historical node, and The lifetime represents a period of time in which the coordinator will not reassign a dropped segment.

But in the coordinator config page, I am not sure the two period time to set, is it millisToWaitBeforeDeleting ?

Thanks in advance.

The design of the brokers is to return data based on what is loaded into the cluster right now, rather than what should be loaded into the cluster. So if the cluster has too many nodes broken, the broker will return partial data.

The best way to deal with it is to avoid too many historical nodes being offline (including offline due to full GC) at the same time. You can also increase your zookeeper session timeout to make it so a full GC doesn’t knock the historicals offline. But it’s better to avoid them in the first place.

You could try bumping up the heap sizes a bit or switching to a different collector.

Thx, I will try the session timeout and tuning the JVM.

在 2017年9月8日星期五 UTC+8下午3:34:57,Gian Merlino写道: