Queries failing on cluster

Hi, i am trying to understand why queries are failing. Might using magnetic disks instead of SSD for historical nodes - be a reason for timeouts on queries?

Thanks

I get these exceptions:

Caused by: org.apache.druid.java.util.common.RE: Failure getting results for query[8ec8a4b2-f4ad-4f29-a435-a36ede843931] url[http://…:8083/druid/v2/]

because of [org.jboss.netty.channel.ChannelException: Channel disconnected]

at org.apache.druid.client.JsonParserIterator.init(JsonParserIterator.java:155)

at org.apache.druid.client.JsonParserIterator.hasNext(JsonParserIterator.java:79)

at org.apache.druid.java.util.common.guava.BaseSequence.makeYielder(BaseSequence.java:89)

at org.apache.druid.java.util.common.guava.BaseSequence.toYielder(BaseSequence.java:69)

at org.apache.druid.java.util.common.guava.MappedSequence.toYielder(MappedSequence.java:49)

at org.apache.druid.java.util.common.guava.MergeSequence.lambda$toYielder$1(MergeSequence.java:64)

at org.apache.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:40)

at org.apache.druid.java.util.common.guava.BaseSequence.accumulate(BaseSequence.java:44)

at org.apache.druid.java.util.common.guava.MappedSequence.accumulate(MappedSequence.java:43)

at org.apache.druid.java.util.common.guava.MergeSequence.toYielder(MergeSequence.java:61)

at org.apache.druid.java.util.common.guava.LazySequence.toYielder(LazySequence.java:46)

at org.apache.druid.query.RetryQueryRunner$1.toYielder(RetryQueryRunner.java:97)

at org.apache.druid.common.guava.CombiningSequence.toYielder(CombiningSequence.java:79)

at org.apache.druid.java.util.common.guava.LimitedSequence.toYielder(LimitedSequence.java:53)

at org.apache.druid.java.util.common.guava.WrappingSequence$2.get(WrappingSequence.java:88)

at org.apache.druid.java.util.common.guava.WrappingSequence$2.get(WrappingSequence.java:84)

at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55)

at org.apache.druid.java.util.common.guava.WrappingSequence.toYielder(WrappingSequence.java:83)

at org.apache.druid.java.util.common.guava.MappedSequence.toYielder(MappedSequence.java:49)

at org.apache.druid.java.util.common.guava.WrappingSequence$2.get(WrappingSequence.java:88)

at org.apache.druid.java.util.common.guava.WrappingSequence$2.get(WrappingSequence.java:84)

at org.apache.druid.query.CPUTimeMetricQueryRunner$1.wrap(CPUTimeMetricQueryRunner.java:74)

at org.apache.druid.java.util.common.guava.WrappingSequence.toYielder(WrappingSequence.java:83)

at org.apache.druid.java.util.common.guava.WrappingSequence$2.get(WrappingSequence.java:88)

at org.apache.druid.java.util.common.guava.WrappingSequence$2.get(WrappingSequence.java:84)

at org.apache.druid.java.util.common.guava.SequenceWrapper.wrap(SequenceWrapper.java:55)

at org.apache.druid.java.util.common.guava.WrappingSequence.toYielder(WrappingSequence.java:83)

at org.apache.druid.java.util.common.guava.MappedSequence.toYielder(MappedSequence.java:49)

at org.apache.druid.java.util.common.guava.Yielders.each(Yielders.java:32)

at org.apache.druid.sql.avatica.DruidStatement.execute(DruidStatement.java:198)

… 25 more

Suppressed: java.lang.IllegalStateException: DefaultQueryMetrics must not be modified from multiple threads. If it is needed to gather dimension or metric information from multiple threads or from an async thread, this information should explicitly be passed between threads (e. g. using Futures), or this DefaultQueryMetrics’s ownerThread should be reassigned explicitly

at org.apache.druid.query.DefaultQueryMetrics.checkModifiedFromOwnerThread(DefaultQueryMetrics.java:51)

at org.apache.druid.query.DefaultQueryMetrics.reportMetric(DefaultQueryMetrics.java:264)

at org.apache.druid.query.DefaultQueryMetrics.reportCpuTime(DefaultQueryMetrics.java:235)

at org.apache.druid.query.CPUTimeMetricQueryRunner$1.after(CPUTimeMetricQueryRunner.java:88)

at org.apache.druid.java.util.common.guava.WrappingSequence.toYielder(WrappingSequence.java:95)

… 32 more

Caused by: java.util.concurrent.ExecutionException: org.jboss.netty.channel.ChannelException: Channel disconnected

at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)

at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)

at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)

at org.apache.druid.client.JsonParserIterator.init(JsonParserIterator.java:117)

Thanks

Hi Michal,

Is it possible to check in the historical nodes “http://…:8083/druid/v2/” does it have a similar disconnection error or any GC error around the same timestamp?

Does all the queries are failing or it happens rarely?

Thanks,

Hemanth

Hi, i don’t see any heap dump or out of memory error. Though the memory does reach its max.
I have 2 data nodes (16GB memory, 8 cores, magnetic disks) and one node with master&query&zk&metadata

I configured the ndoes according to these recommendations: https://druid.apache.org/docs/latest/operations/basic-cluster-tuning.html

I have currently about 460 million records (Running POC)

After I optimized the configuration according to the recommendation - i see that sometimes the timeouts are solved, but some times they start happening again.

I don’t see any errors on the logs.

Is this load supposed to be supported with these sized nodes?

Thanks

Another note, i am testing extensive queries, and they run for a long time.