I need some quick thoughts on troubleshooting our broker node in Production.
We use Druid v.0.8.3 in production. Everything looked fine until 2 days back when broker nodes started having slow query issues during our peak hour. We have a significant increase in queries recently. I looked at CPU and Memory on the broker nodes but so sign there. Our segment size is 5 minutes right now with 5 minutes window period. Later in the day, segments are changed to 15-minute segments by Hadoop batch pipeline.
Some facts and steps for troubleshooting:
There were some errors on middle managers data but these looked benign to me. Attached.
Attaching log which explains actual query finished very fast in like 50 ms.
Attaching Druid broker config.
On peak time we send more than 50 queries per second.
Attaching JVM profile logs.
/usr/bin/java -server -Xmx6g -Xms6g -XX:NewSize=1500m -XX:MaxNewSize=1500m -XX:MaxDirectMemorySize=16g -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Djava.io.tmpdir=/tmp -classpath config/_common:config/broker:lib/* -Ddruid.properties.file=config/broker/runtime.properties io.druid.cli.Main server broker
broker.log (5.21 KB)
brokerConfig.txt (1 KB)
jstat-profile.txt (397 KB)