I’m using Druid 0.9.2.
I have 2 historical tiers:
_default_tier with i3.8xlarge machines - for the last few days
historical tier with r3.8xlarge machines - for the rest of the data (of few years)
Occasionally, one on the nodes in the historical tier (r3.8xlarge) freezes, and does not respond, until we restart it.
This does not happen in the tier with the i3.8xlarge machines.
Looking at the logs, I see nothing that seems to explain this behavior.
I think that maybe the frozen historical node stops writing to the log when it freezes. Is this possible?
Did anyone else experience such a behavior?
What might cause it? Is it because the r3.8xlarge has a big difference between the RAM and storage sizes?
Can it be caused by some very heavy query?