Druid 0.11.0 historical restart segments loading full GC

After finish loading all cache segments before annoucing, The jvm will Full GC N time, the log snippet :

[Times: user=5.82 sys=0.01, real=0.57 secs]

6539.5646539.564: : [Full GC (Metadata GC Threshold) [GC concurrent-root-region-scan-start]

6539.628: [GC concurrent-root-region-scan-end, 0.0630629 secs]

6539.628: [GC concurrent-mark-start]

5297M->5297M(10G), 13.1743433 secs]

[Eden: 0.0B(496.0M)->0.0B(512.0M) Survivors: 16.0M->0.0B Heap: 5297.9M(10.0G)->5297.8M(10.0G)], [Metaspace: 2492435K->2492435K(3184640K)]

[Times: user=24.05 sys=0.01, real=13.18 secs]

6552.739: [Full GC (Last ditch collection) 5297M->5297M(10G), 12.9641259 secs]

[Eden: 0.0B(512.0M)->0.0B(512.0M) Survivors: 0.0B->0.0B Heap: 5297.8M(10.0G)->5297.8M(10.0G)], [Metaspace: 2492435K->2492435K(3184640K)]

[Times: user=23.86 sys=0.00, real=12.96 secs]

6565.703: [GC concurrent-mark-abort]

6565.731: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: requested by GC cause, GC cause: Metadata GC Threshold]

6565.731: [G1Ergonomics (Concurrent Cycles) initiate concurrent cycle, reason: concurrent cycle initiation requested]

6565.731: [GC pause (Metadata GC Threshold) (young) (initial-mark) 6565.731: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 21, predicted base time: 255.03 ms, remaining time: 0.00 ms, target pause time: 200.00 ms]

6565.731: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 1 regions, survivors: 0 regions, predicted young region time: 0.03 ms]

6565.731: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 1 regions, survivors: 0 regions, old: 0 regions, predicted pause time: 255.06 ms, target pause time: 200.00 ms]

6566.317: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: recent GC overhead higher than threshold after GC, recent GC overhead: 99.91 %, threshold: 10.00 %, uncommitted: 0 bytes, calculated expansion amount: 0 bytes (20.00 %)]

, 0.5859502 secs]

[Parallel Time: 584.2 ms, GC Workers: 13]

[GC Worker Start (ms): Min: 6565794.1, Avg: 6565794.3, Max: 6565794.5, Diff: 0.4]

[Ext Root Scanning (ms): Min: 161.1, Avg: 189.0, Max: 520.7, Diff: 359.6, Sum: 2456.4]

[Update RS (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.1, Sum: 0.1]

[Processed Buffers: Min: 0, Avg: 0.5, Max: 2, Diff: 2, Sum: 6]

[Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]

[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]

[Object Copy (ms): Min: 0.0, Avg: 0.1, Max: 0.1, Diff: 0.1, Sum: 0.9]

[Termination (ms): Min: 0.0, Avg: 331.7, Max: 359.4, Diff: 359.4, Sum: 4312.2]

[Termination Attempts: Min: 1, Avg: 1.5, Max: 3, Diff: 2, Sum: 20]

[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2]

[GC Worker Total (ms): Min: 520.6, Avg: 520.8, Max: 520.9, Diff: 0.4, Sum: 6769.8]

[GC Worker End (ms): Min: 6566315.1, Avg: 6566315.1, Max: 6566315.1, Diff: 0.0]

[Code Root Fixup: 0.0 ms]

[Code Root Purge: 0.1 ms]

[Clear CT: 0.4 ms]

[Other: 1.3 ms]

[Choose CSet: 0.0 ms]

[Ref Proc: 0.4 ms]

[Ref Enq: 0.0 ms]

[Redirty Cards: 0.4 ms]

[Humongous Register: 0.0 ms]

[Humongous Reclaim: 0.0 ms]

[Free CSet: 0.0 ms]

[Eden: 16.0M(512.0M)->0.0B(496.0M) Survivors: 0.0B->16.0M Heap: 5298.8M(10.0G)->5297.8M(10.0G)]

[Times: user=5.99 sys=0.02, real=0.59 secs]

6566.317: [GC concurrent-root-region-scan-start]

6566.317: [Full GC (Metadata GC Threshold) 6566.382: [GC concurrent-root-region-scan-end, 0.0653631 secs]

6566.382: [GC concurrent-mark-start]

5297M->5297M(10G), 12.9968900 secs]

[Eden: 0.0B(496.0M)->0.0B(512.0M) Survivors: 16.0M->0.0B Heap: 5297.8M(10.0G)->5297.8M(10.0G)], [Metaspace: 2492435K->2492435K(3184640K)]

[Times: user=24.02 sys=0.01, real=12.99 secs]

6579.314: [Full GC (Last ditch collection) 5297M->5297M(10G), 12.9172889 secs]

[Eden: 0.0B(512.0M)->0.0B(512.0M) Survivors: 0.0B->0.0B Heap: 5297.8M(10.0G)->5297.8M(10.0G)], [Metaspace: 2492435K->2492435K(3184640K)]

[Times: user=23.81 sys=0.00, real=12.92 secs]

Hey,
How many segments are being served by this one historical node ?

Did you checked the histogram or heap dump for more info on what is actually filling up the heap space ?

We use kafka indexing service to ingesting data. There is about 60k segments. When historical restart, It’s about 6G of 10G heap used. I tried incre the heap memory. Still not working.
When the zkcoordinator loading segments cache finished. It will cause Full GC above.

I resolve this by delete all segments-cache and restart. And The historical can annouce itself and downloading unzip segements from HDFS.

It seems like the cache loading progress will use some heap memory for loading .

Nishant Bangarwa nishant.monu51@gmail.com 于2018年6月18日周一 上午4:44写道:

The heap dump is too big, I has no time to analyz

朱健 zhujian19850829@gmail.com 于2018年6月19日周二 上午1:52写道: