GC overhead limit exceeded

Hi,
Got out of memory even though my data volume is not that big compared to the server.

Error Message: Terminating due to java.lang.OutOfMemoryError: GC overhead limit exceeded

Installed druid on a single machine for i3.4xlarge and launched by bin/start-medium

Screen Shot 2020-08-31 at 2.55.37 PM.png

And while ingesting data from S3, CPU usage is 99%.
Is it normal?

Screen Shot 2020-08-31 at 5.52.41 PM.png

2020년 8월 31일 월요일 오후 6시 9분 51초 UTC-7에 tiny657님이 작성:

This doc may help: https://druid.apache.org/docs/latest/ingestion/native-batch.html#parallel-task

I’d start with a config that uses less resources, and make sure that can run, then start to step up to utilize more of the resources. Also, what version of Druid are you using?

Screen Shot 2020-08-31 at 5.52.41 PM.png

I am using the latest version.

Screen Shot 2020-08-31 at 5.52.41 PM.png

@tiny657

Did you change the runtime.properties too?.
while ingestion in progress, ssh to middle manager node and check exactly how much Xms/Xmx the sessions is using.(It may be using default Xms1g).

ps -ef| grep Xmas
ps -ef| grep druid

after changing the config files, apply the changes to the system before ingestion.

systemctl daemon-reload (cent os)

Regards,
Jay R

Hi Jay,
Thanks for your reply.

I did not touch runtime.properties.
So middleManager is using -Xms1g -Xmx1g while ingesting.

Do you have any recommend heap size?

Regards,

  1. How many tasks spawned?. If its n, then each task can have Xmx = memory allocated for middle manager/n. Try increasing the Xmx in run time properties.
  2. Since the machine you used as only 16CPUs, decrease the maxNumConcurrentSubTasks to 8 (after exlcuding other processes needs).

Thanks.
After replacing runtime.config with -Xms5g -Xmx5g, there is no more OOM.

I am using druid.worker.capacity=4

One more quick question:

  • After changing druid.worker.capacity from 4 to 8, 4 workers are still running in parallel for ingestion.

Should I change the other config to increase the worker size?

Since it is single node, there will be few cores availble for middle manager,after excluding cores for other processes.
Then maxNumConcurrentSubTasks drives it with the above availble CPU cores.