broker hungs after some time

Hello.
We are trying druid in our sample cluster:

History 6(12) cores, 128GB RAM 400GB SSD

Broker 4(8) cores, 64GB RAM 400GB SSD

Coordinator 4(8) cores, 64GB RAM 400GB SSD

Broker runtime:

druid.service=druid/broker

druid.port=6082

druid.broker.http.numConnections=20

druid.server.http.numThreads=50

druid.processing.buffer.sizeBytes=2147483647

druid.processing.numThreads=7

Broker jvm:

-server

-Xms25g

-Xmx25g

-XX:NewSize=6g

-XX:MaxNewSize=6g

-XX:MaxDirectMemorySize=20g

-XX:+UseConcMarkSweepGC

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=var/tmp

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

History runtime:

druid.service=druid/historical

druid.port=8083

druid.server.http.numThreads=50

druid.processing.buffer.sizeBytes=1073741824

druid.processing.numThreads=11

druid.segmentCache.locations=[{“path”:“var/druid/segment-cache”,“maxSize”:30000000000}]

druid.server.maxSize=30000000000

druid.broker.cache.useCache=true

druid.broker.cache.populateCache=true

druid.cache.type=local

druid.cache.sizeInBytes=1000000000

**History **jvm:

-server

-Xms10g

-Xmx10g

-XX:NewSize=6g

-XX:MaxNewSize=6g

-XX:MaxDirectMemorySize=30g

-XX:+UseConcMarkSweepGC

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

-Djava.io.tmpdir=var/tmp

It’s only 10GB of data in segments 300MB each, and we constantly have error on query.

“Failure getting results from[http://imply.data:8100/druid/v2/] because of [org.jboss.netty.channel.ChannelException: Channel disconnected]”

Port could vary 8101, 8102.

Seems like realtime node error, but even if we are stoping everything exept history and broker error appears again.

Lot jvm settings was been tryed, no help.

broker.log (1.18 MB)

Error in pivot looks like this

Got error in query 1: connect ECONNREFUSED 127.0.0.1:6082 (in 42ms)

^^^^^^^^^^^^^^^^^^^^^^^^^^

Failed to introspect data source: ‘dsp_traff’ because connect ECONNREFUSED 127.0.0.1:6082

In History log sometimes appears rows like

4235.218: [GC (Allocation Failure) 4235.218: [ParNew: 5034431K->1497K(5662336K), 0.0062634 secs] 5958003K->925069K(9856640K), 0.0063176 secs] [Tim$

Are you by any chance behind a firewall? If so, are the correct ports open?

Yes. It seems to be correct. I disabled firewall and issue is gone.
Our queries executed sometimes. After the broker server was rebooted query had worked for about 5 min. That’s why we didn’t suspect firewall.

Thank you very much.