Historical dies

Hi,
I’m experiencing some Hitorical crash (I’ve tow historical and it happen to both but never at the same time).

Log has thousands of rows like these:
[GC (GCLocker Initiated GC) [PSYoungGen: 36034K->4506K(1266688K)] 36138K->4618K(1272832K), 0.0067570 secs] [Times: user=0.03 sys=0.00, real=0.01 secs]

[GC (GCLocker Initiated GC) [PSYoungGen: 49930K->4621K(1140736K)] 50042K->4741K(1146880K), 0.0073980 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

[GC (GCLocker Initiated GC) [PSYoungGen: 27333K->4800K(1251328K)] 27453K->4928K(1257472K), 0.0084160 secs] [Times: user=0.04 sys=0.01, real=0.01 secs]

[GC (GCLocker Initiated GC) [PSYoungGen: 54177K->1344K(1250304K)] 54305K->5699K(1256448K), 0.0093800 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]

[Full GC (Ergonomics) [PSYoungGen: 1344K->0K(1250304K)] [ParOldGen: 4355K->5466K(14848K)] 5699K->5466K(1265152K), [Metaspace: 22992K->22992K(1069056K)], 0.0556870 secs] [Times: user=0.26 sys=0.00, real=0.05 secs]

``

and at the end:

Hi Maurizio,

In our experience, there can be several factors causing the JVM to segfault:

  • choice of garbage collector: We have seen segfaults when using G1GC under high load. If you are using G1, try switching to CMS and see if that helps.

  • JVM version. We haven’t used Java 8 under production load yet. Given that you are not using the latest Java 8 version, try the latest and see if that addresses the issue. Also try running with the latest Java 7 in case this is specific to 8.

  • If none of the above suggestions work, it might indicate a bug in Druid. It could be that for some reason the node is trying to read data from a segment that was memory unmapped. If this was the case I’m surprised we wouldn’t have encountered it ourselves.

Hi Xavier,
I’ll follow your suggestions keeping you updated on improvements.

I’d like to maintain Java 8 before downgrading to 7.

Thanks

Maurizio

Hi,
just an update.

I’ve moved to java 8_0_45 on all Historical nodes.

Than one Historical has XX:+USeParNewGC param and one not.

The one with CMS crashed again and the other didn’t.

I’ve now removed CMS param and started again.

I’ll keep historical monitored looking at behavior

Thanks

Maurizio

Same error message this time or different one?
Can you share your historical configs and full jvm parameters?

Are there any specific types of queries causing this problem?

Having lots of GC might also indicate your heap settings are not optimal.

Hi Xavier,
error seems to be the same:

A fatal error has been detected by the Java Runtime Environment: