Java ISE Error: Cannot coerce[java.util.ArrayList] to INTEGER

We are occasionally seeing this ISE error from the Druid Brokers.

"exception": "org.apache.druid.java.util.common.ISE: Cannot coerce[java.util.ArrayList] to INTEGER"
"query": "SELECT CASE "enabled_state" WHEN 'true' THEN  1 ELSE 0 END as "PPN", TIME_FLOOR("__time", 'PT1M') as "time" FROM "datasource" WHERE "header.host_name"= 'some_host' AND "__time" >= TIME_PARSE('2022-03-14T22:00:44.390Z') AND "__time" <= TIME_PARSE('2022-03-15T01:00:44.390Z')"

All queries producing this error are SQL using CASE statements in the SELECT. The errors are transient because we see the queries being successful most of the time, and we can copy and paste them into the Druid Console and run them without getting this error.

The field being selected is always a string-only field, and not an array as the error would suggest. This is happening for multiple users across multiple Datasources, and so it is not isolated to the above sample query.

We have been noticing this error since we upgraded to Druid 0.22.1. We have three Druid clusters deployed with various scale, and are seeing the error on all three of them. We have two lower volume clusters with around 1,000 to 1,200 queries per minute, and one production large volume cluster with about 3,400 to 4,000 queries per minute.

Here is our current Broker runtime.properties config. This config is the same for all three clusters, with the difference between them being the number of Brokers deployed. We have 3 Brokers each deployed to the smaller clusters, and 8 deployed to the production cluster.

druid.service=druid/broker
druid.plaintextPort=8082
druid.tlsPort=8282

# HTTP server settings
druid.server.http.numThreads=40
druid.server.http.gracefulShutdownTimeout=PT1M
druid.server.http.defaultQueryTimeout=60000

# HTTP client settings
druid.broker.http.numConnections=30
druid.broker.http.maxQueuedBytes=10000000

# Processing threads and buffers
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=16
druid.processing.numThreads=15

# Query protection settings
druid.query.scheduler.prioritization.strategy=threshold
druid.query.scheduler.prioritization.durationThreshold=PT24H
druid.query.scheduler.prioritization.adjustment=-1
druid.sql.planner.requireTimeCondition=false
druid.sql.planner.metadataSegmentCacheEnable=true
druid.sql.planner.metadataSegmentPollPeriod=60000

# Query cache disabled -- push down caching and merging to historicals and middlemanagers
druid.broker.cache.useCache=false
druid.broker.cache.populateCache=false

# Monitors that emit self-monitoring metrics
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.SysMonitor","org.apache.druid.java.util.metrics.JvmMonitor","org.apache.druid.java.util.metrics.JvmCpuMonitor","org.apache.druid.java.util.metrics.JvmThreadsMonitor","org.apache.druid.client.cache.CacheMonitor","org.apache.druid.server.metrics.QueryCountStatsMonitor"]

Here are the current JVM options we are using as seen from a running process.

java -Xms12G -Xmx12G -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Dorg.jboss.logging.provider=slf4j -Dnet.spy.log.LoggerImpl=net.spy.memcached.compat.log.SLF4JLogger -Dlog4j.shutdownCallbackRegistry=org.apache.druid.common.config.Log4jShutdown -Dlog4j.shutdownHookEnabled=true -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/var/log/druid.gc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=50 -XX:GCLogFileSize=10m -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:ConcGCThreads=2 -XX:+UseG1GC -XX:+UseBiasedLocking -Xms80G -Xmx80G -XX:MaxDirectMemorySize=40G -cp /opt/druid/conf/druid/_common:/opt/druid/conf/druid/broker:/opt/druid/lib/* org.apache.druid.cli.Main server broker

The Brokers are currently running on 16 CPU pods with 128GB of Memory.


We have been actively modifying the Broker settings since the upgrade from 0.19 to 0.22 as we have been dealing with performance issues with them following the upgrade. Before the upgrade we were running on 8 CPU pods with 32GB of Memory, and mostly using the default settings from the Basic Cluster Tuning guide (with 24G Heap, 7 proc threads, and 3 merge buffers).

After the upgrade the Brokers began running out of memory and crashing, and so we’ve had to size them up significantly. Even with these current settings we are seeing high CPU usage corresponding with high GC, eventually leading to QueryCapacityExceededException errors on the Brokers. When that happens we restart all the pods, and they return to normal for a few days. If we don’t restart them proactively then the API eventually becomes unresponsive, and Kubernetes liveness probe failures cause a pod restart.

Our users are mostly querying using SQL from Grafana, and are heavily using GroupBy queries.


We are uncertain if this error represents a bug with processing CASE statements, or is a sign that our Brokers are not configured correctly. We are willing to share more information to help troubleshoot this error, and to possibly resolve our ongoing Broker performance problems.

Hi sh-vs,

A few questions:

  • Are you using realtime ingestion?
  • Do you have heavy use of lookups or sketches?
  • Could you convert the SQL query to a native query and paste it?
  • Could you provide the exact error when you see : QueryCapacityExceededException

Thanks!

Thank you for your reply Vijeth.

We are using realtime ingestion. All ingest is coming from Kafka-Avro. Our standard task run time is 2 hours, and so any query longer than that hits MiddleManagers and Historicals.

We do not use lookups. We are only aware of a few users doing sketches, and it would not be heavy usage.

I found one of these errors logged today, and found this native query as logged from one of the Historicals. The native query was logged on three Historicals, and all looked the same as this one, and all logged the query as being successful.

Druid Native Query Log from Historical
{
  "queryType": "scan",
  "dataSource": {
    "type": "table",
    "name": "datasource_name"
  },
  "intervals": {
    "type": "segments",
    "segments": [
      {
        "itvl": "2022-03-16T00:00:00.000Z/2022-03-16T17:15:48.001Z",
        "ver": "2022-03-16T00:00:04.899Z",
        "part": 0
      },
      {
        "itvl": "2022-03-16T00:00:00.000Z/2022-03-16T17:15:48.001Z",
        "ver": "2022-03-16T00:00:04.899Z",
        "part": 1
      },
      {
        "itvl": "2022-03-16T00:00:00.000Z/2022-03-16T17:15:48.001Z",
        "ver": "2022-03-16T00:00:04.899Z",
        "part": 2
      },
      {
        "itvl": "2022-03-16T00:00:00.000Z/2022-03-16T17:15:48.001Z",
        "ver": "2022-03-16T00:00:04.899Z",
        "part": 5
      }
    ]
  },
  "virtualColumns": [
    {
      "type": "expression",
      "name": "v0",
      "expression": "case_searched((\"keep_alive.keep_alive_status\" == 'successful'),1,0)",
      "outputType": "LONG"
    },
    {
      "type": "expression",
      "name": "v1",
      "expression": "timestamp_floor(\"__time\",'PT1M',null,'UTC')",
      "outputType": "LONG"
    }
  ],
  "resultFormat": "compactedList",
  "batchSize": 20480,
  "order": "none",
  "filter": {
    "type": "selector",
    "dimension": "header.host_name",
    "value": "hostname",
    "extractionFn": null
  },
  "columns": [
    "v0",
    "v1"
  ],
  "legacy": false,
  "context": {
    "defaultTimeout": 60000,
    "finalize": false,
    "maxQueuedBytes": 2500000,
    "maxScatterGatherBytes": 9223372036854775807,
    "priority": 1,
    "queryFailTime": 1647451009155,
    "queryId": "379290e0-797c-46e8-8e3d-487fb9ff0af1",
    "sqlQueryId": "da9d77fc-0b68-4e44-91e7-5110cb855551",
    "timeout": 60000
  },
  "descending": false,
  "granularity": {
    "type": "all"
  }
},
{
  "query/time": 19,
  "query/bytes": 5607,
  "success": true,
  "identity": "druid_system"
}

Then one of the Brokers logged that the query failed with the ISE array error.

Druid SQL Query Log from Broker with ISE error
{
  "sqlQuery/time": 88,
  "sqlQuery/bytes": 40970,
  "success": false,
  "context": {
    "sqlQueryId": "da9d77fc-0b68-4e44-91e7-5110cb855551",
    "nativeQueryIds": "[379290e0-797c-46e8-8e3d-487fb9ff0af1]"
  },
  "identity": "user_account_name",
  "exception": "org.apache.druid.java.util.common.ISE: Cannot coerce[java.util.ArrayList] to INTEGER"
},
{
  "query": "SELECT CASE \"keep_alive.keep_alive_status\" WHEN 'successful' THEN  1 ELSE 0 END, TIME_FLOOR(\"__time\", 'PT1M') as \"time\" \\nFROM \"datasource_name\"\\nWHERE \"header.host_name\"= 'hostname' AND \"__time\" >= TIME_PARSE('2022-03-15T17:15:48.778Z') AND \"__time\" <= TIME_PARSE('2022-03-16T17:15:48.778Z')",
  "context": {
    "sqlQueryId": "da9d77fc-0b68-4e44-91e7-5110cb855551",
    "nativeQueryIds": "[379290e0-797c-46e8-8e3d-487fb9ff0af1]"
  }
}

Then the Broker logged this strack trace that may be helpful.

Druid Stack Trace from Broker
{
  "instant": {
    "epochSecond": 1647450949,
    "nanoOfSecond": 238000000
  },
  "thread": "qtp1905486482-191",
  "level": "ERROR",
  "loggerName": "org.apache.druid.sql.http.SqlResource",
  "message": "Unable to send SQL response [da9d77fc-0b68-4e44-91e7-5110cb855551]",
  "thrown": {
    "commonElementCount": 0,
    "localizedMessage": "Cannot coerce[java.util.ArrayList] to INTEGER",
    "message": "Cannot coerce[java.util.ArrayList] to INTEGER",
    "name": "org.apache.druid.java.util.common.ISE",
    "extendedStackTrace": [
      {
        "class": "org.apache.druid.sql.calcite.rel.QueryMaker",
        "method": "coerce",
        "file": "QueryMaker.java",
        "line": 282,
        "exact": false,
        "location": "druid-sql-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.apache.druid.sql.calcite.rel.QueryMaker",
        "method": "lambda$remapFields$2",
        "file": "QueryMaker.java",
        "line": 200,
        "exact": false,
        "location": "druid-sql-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.apache.druid.java.util.common.guava.MappingYieldingAccumulator",
        "method": "accumulate",
        "file": "MappingYieldingAccumulator.java",
        "line": 61,
        "exact": false,
        "location": "druid-core-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.apache.druid.java.util.common.guava.BaseSequence",
        "method": "makeYielder",
        "file": "BaseSequence.java",
        "line": 90,
        "exact": false,
        "location": "druid-core-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.apache.druid.java.util.common.guava.BaseSequence",
        "method": "toYielder",
        "file": "BaseSequence.java",
        "line": 69,
        "exact": false,
        "location": "druid-core-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.apache.druid.java.util.common.guava.ConcatSequence",
        "method": "makeYielder",
        "file": "ConcatSequence.java",
        "line": 84,
        "exact": false,
        "location": "druid-core-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.apache.druid.java.util.common.guava.ConcatSequence",
        "method": "wrapYielder",
        "file": "ConcatSequence.java",
        "line": 118,
        "exact": false,
        "location": "druid-core-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.apache.druid.java.util.common.guava.ConcatSequence",
        "method": "access$000",
        "file": "ConcatSequence.java",
        "line": 27,
        "exact": false,
        "location": "druid-core-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.apache.druid.java.util.common.guava.ConcatSequence$2",
        "method": "next",
        "file": "ConcatSequence.java",
        "line": 132,
        "exact": false,
        "location": "druid-core-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.apache.druid.sql.http.SqlResource",
        "method": "lambda$doPost$0",
        "file": "SqlResource.java",
        "line": 159,
        "exact": false,
        "location": "druid-sql-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider",
        "method": "writeTo",
        "file": "StreamingOutputProvider.java",
        "line": 71,
        "exact": false,
        "location": "jersey-core-1.19.3.jar",
        "version": "1.19.3"
      },
      {
        "class": "com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider",
        "method": "writeTo",
        "file": "StreamingOutputProvider.java",
        "line": 57,
        "exact": false,
        "location": "jersey-core-1.19.3.jar",
        "version": "1.19.3"
      },
      {
        "class": "com.sun.jersey.spi.container.ContainerResponse",
        "method": "write",
        "file": "ContainerResponse.java",
        "line": 302,
        "exact": false,
        "location": "jersey-server-1.19.3.jar",
        "version": "1.19.3"
      },
      {
        "class": "com.sun.jersey.server.impl.application.WebApplicationImpl",
        "method": "_handleRequest",
        "file": "WebApplicationImpl.java",
        "line": 1510,
        "exact": false,
        "location": "jersey-server-1.19.3.jar",
        "version": "1.19.3"
      },
      {
        "class": "com.sun.jersey.server.impl.application.WebApplicationImpl",
        "method": "handleRequest",
        "file": "WebApplicationImpl.java",
        "line": 1419,
        "exact": false,
        "location": "jersey-server-1.19.3.jar",
        "version": "1.19.3"
      },
      {
        "class": "com.sun.jersey.server.impl.application.WebApplicationImpl",
        "method": "handleRequest",
        "file": "WebApplicationImpl.java",
        "line": 1409,
        "exact": false,
        "location": "jersey-server-1.19.3.jar",
        "version": "1.19.3"
      },
      {
        "class": "com.sun.jersey.spi.container.servlet.WebComponent",
        "method": "service",
        "file": "WebComponent.java",
        "line": 409,
        "exact": false,
        "location": "jersey-servlet-1.19.3.jar",
        "version": "1.19.3"
      },
      {
        "class": "com.sun.jersey.spi.container.servlet.ServletContainer",
        "method": "service",
        "file": "ServletContainer.java",
        "line": 558,
        "exact": false,
        "location": "jersey-servlet-1.19.3.jar",
        "version": "1.19.3"
      },
      {
        "class": "com.sun.jersey.spi.container.servlet.ServletContainer",
        "method": "service",
        "file": "ServletContainer.java",
        "line": 733,
        "exact": false,
        "location": "jersey-servlet-1.19.3.jar",
        "version": "1.19.3"
      },
      {
        "class": "javax.servlet.http.HttpServlet",
        "method": "service",
        "file": "HttpServlet.java",
        "line": 790,
        "exact": false,
        "location": "javax.servlet-api-3.1.0.jar",
        "version": "3.1.0"
      },
      {
        "class": "com.google.inject.servlet.ServletDefinition",
        "method": "doServiceImpl",
        "file": "ServletDefinition.java",
        "line": 286,
        "exact": false,
        "location": "guice-servlet-4.1.0.jar",
        "version": "?"
      },
      {
        "class": "com.google.inject.servlet.ServletDefinition",
        "method": "doService",
        "file": "ServletDefinition.java",
        "line": 276,
        "exact": false,
        "location": "guice-servlet-4.1.0.jar",
        "version": "?"
      },
      {
        "class": "com.google.inject.servlet.ServletDefinition",
        "method": "service",
        "file": "ServletDefinition.java",
        "line": 181,
        "exact": false,
        "location": "guice-servlet-4.1.0.jar",
        "version": "?"
      },
      {
        "class": "com.google.inject.servlet.ManagedServletPipeline",
        "method": "service",
        "file": "ManagedServletPipeline.java",
        "line": 91,
        "exact": false,
        "location": "guice-servlet-4.1.0.jar",
        "version": "?"
      },
      {
        "class": "com.google.inject.servlet.FilterChainInvocation",
        "method": "doFilter",
        "file": "FilterChainInvocation.java",
        "line": 85,
        "exact": false,
        "location": "guice-servlet-4.1.0.jar",
        "version": "?"
      },
      {
        "class": "com.google.inject.servlet.ManagedFilterPipeline",
        "method": "dispatch",
        "file": "ManagedFilterPipeline.java",
        "line": 120,
        "exact": false,
        "location": "guice-servlet-4.1.0.jar",
        "version": "?"
      },
      {
        "class": "com.google.inject.servlet.GuiceFilter",
        "method": "doFilter",
        "file": "GuiceFilter.java",
        "line": 135,
        "exact": false,
        "location": "guice-servlet-4.1.0.jar",
        "version": "?"
      },
      {
        "class": "org.eclipse.jetty.servlet.FilterHolder",
        "method": "doFilter",
        "file": "FilterHolder.java",
        "line": 193,
        "exact": false,
        "location": "jetty-servlet-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.servlet.ServletHandler$Chain",
        "method": "doFilter",
        "file": "ServletHandler.java",
        "line": 1601,
        "exact": false,
        "location": "jetty-servlet-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.apache.druid.server.security.PreResponseAuthorizationCheckFilter",
        "method": "doFilter",
        "file": "PreResponseAuthorizationCheckFilter.java",
        "line": 82,
        "exact": false,
        "location": "druid-server-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.eclipse.jetty.servlet.FilterHolder",
        "method": "doFilter",
        "file": "FilterHolder.java",
        "line": 193,
        "exact": false,
        "location": "jetty-servlet-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.servlet.ServletHandler$Chain",
        "method": "doFilter",
        "file": "ServletHandler.java",
        "line": 1601,
        "exact": false,
        "location": "jetty-servlet-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.apache.druid.server.security.AllowHttpMethodsResourceFilter",
        "method": "doFilter",
        "file": "AllowHttpMethodsResourceFilter.java",
        "line": 78,
        "exact": false,
        "location": "druid-server-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.eclipse.jetty.servlet.FilterHolder",
        "method": "doFilter",
        "file": "FilterHolder.java",
        "line": 193,
        "exact": false,
        "location": "jetty-servlet-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.servlet.ServletHandler$Chain",
        "method": "doFilter",
        "file": "ServletHandler.java",
        "line": 1601,
        "exact": false,
        "location": "jetty-servlet-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.apache.druid.server.security.AllowOptionsResourceFilter",
        "method": "doFilter",
        "file": "AllowOptionsResourceFilter.java",
        "line": 75,
        "exact": false,
        "location": "druid-server-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.eclipse.jetty.servlet.FilterHolder",
        "method": "doFilter",
        "file": "FilterHolder.java",
        "line": 193,
        "exact": false,
        "location": "jetty-servlet-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.servlet.ServletHandler$Chain",
        "method": "doFilter",
        "file": "ServletHandler.java",
        "line": 1601,
        "exact": false,
        "location": "jetty-servlet-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.apache.druid.security.basic.authentication.BasicHTTPAuthenticator$BasicHTTPAuthenticationFilter",
        "method": "doFilter",
        "file": "BasicHTTPAuthenticator.java",
        "line": 208,
        "exact": false,
        "location": "druid-basic-security-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.apache.druid.server.security.AuthenticationWrappingFilter",
        "method": "doFilter",
        "file": "AuthenticationWrappingFilter.java",
        "line": 59,
        "exact": false,
        "location": "druid-server-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.eclipse.jetty.servlet.FilterHolder",
        "method": "doFilter",
        "file": "FilterHolder.java",
        "line": 193,
        "exact": false,
        "location": "jetty-servlet-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.servlet.ServletHandler$Chain",
        "method": "doFilter",
        "file": "ServletHandler.java",
        "line": 1601,
        "exact": false,
        "location": "jetty-servlet-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.apache.druid.security.basic.authentication.BasicHTTPAuthenticator$BasicHTTPAuthenticationFilter",
        "method": "doFilter",
        "file": "BasicHTTPAuthenticator.java",
        "line": 212,
        "exact": false,
        "location": "druid-basic-security-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.apache.druid.server.security.AuthenticationWrappingFilter",
        "method": "doFilter",
        "file": "AuthenticationWrappingFilter.java",
        "line": 59,
        "exact": false,
        "location": "druid-server-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.eclipse.jetty.servlet.FilterHolder",
        "method": "doFilter",
        "file": "FilterHolder.java",
        "line": 193,
        "exact": false,
        "location": "jetty-servlet-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.servlet.ServletHandler$Chain",
        "method": "doFilter",
        "file": "ServletHandler.java",
        "line": 1601,
        "exact": false,
        "location": "jetty-servlet-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.apache.druid.server.security.SecuritySanityCheckFilter",
        "method": "doFilter",
        "file": "SecuritySanityCheckFilter.java",
        "line": 77,
        "exact": false,
        "location": "druid-server-0.22.1.jar",
        "version": "0.22.1"
      },
      {
        "class": "org.eclipse.jetty.servlet.FilterHolder",
        "method": "doFilter",
        "file": "FilterHolder.java",
        "line": 193,
        "exact": false,
        "location": "jetty-servlet-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.servlet.ServletHandler$Chain",
        "method": "doFilter",
        "file": "ServletHandler.java",
        "line": 1601,
        "exact": false,
        "location": "jetty-servlet-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.servlet.ServletHandler",
        "method": "doHandle",
        "file": "ServletHandler.java",
        "line": 548,
        "exact": false,
        "location": "jetty-servlet-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.handler.ScopedHandler",
        "method": "nextHandle",
        "file": "ScopedHandler.java",
        "line": 233,
        "exact": false,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.session.SessionHandler",
        "method": "doHandle",
        "file": "SessionHandler.java",
        "line": 1624,
        "exact": false,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.handler.ScopedHandler",
        "method": "nextHandle",
        "file": "ScopedHandler.java",
        "line": 233,
        "exact": false,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.handler.ContextHandler",
        "method": "doHandle",
        "file": "ContextHandler.java",
        "line": 1435,
        "exact": false,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.handler.ScopedHandler",
        "method": "nextScope",
        "file": "ScopedHandler.java",
        "line": 188,
        "exact": false,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.servlet.ServletHandler",
        "method": "doScope",
        "file": "ServletHandler.java",
        "line": 501,
        "exact": false,
        "location": "jetty-servlet-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.session.SessionHandler",
        "method": "doScope",
        "file": "SessionHandler.java",
        "line": 1594,
        "exact": false,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.handler.ScopedHandler",
        "method": "nextScope",
        "file": "ScopedHandler.java",
        "line": 186,
        "exact": false,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.handler.ContextHandler",
        "method": "doScope",
        "file": "ContextHandler.java",
        "line": 1350,
        "exact": false,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.handler.ScopedHandler",
        "method": "handle",
        "file": "ScopedHandler.java",
        "line": 141,
        "exact": false,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.handler.gzip.GzipHandler",
        "method": "handle",
        "file": "GzipHandler.java",
        "line": 763,
        "exact": false,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.handler.HandlerList",
        "method": "handle",
        "file": "HandlerList.java",
        "line": 59,
        "exact": false,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.handler.StatisticsHandler",
        "method": "handle",
        "file": "StatisticsHandler.java",
        "line": 179,
        "exact": false,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.handler.HandlerWrapper",
        "method": "handle",
        "file": "HandlerWrapper.java",
        "line": 127,
        "exact": false,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.Server",
        "method": "handle",
        "file": "Server.java",
        "line": 516,
        "exact": false,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.HttpChannel",
        "method": "lambda$handle$1",
        "file": "HttpChannel.java",
        "line": 388,
        "exact": false,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.HttpChannel",
        "method": "dispatch",
        "file": "HttpChannel.java",
        "line": 633,
        "exact": true,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.HttpChannel",
        "method": "handle",
        "file": "HttpChannel.java",
        "line": 380,
        "exact": true,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.server.HttpConnection",
        "method": "onFillable",
        "file": "HttpConnection.java",
        "line": 277,
        "exact": true,
        "location": "jetty-server-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.io.AbstractConnection$ReadCallback",
        "method": "succeeded",
        "file": "AbstractConnection.java",
        "line": 311,
        "exact": true,
        "location": "jetty-io-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.io.FillInterest",
        "method": "fillable",
        "file": "FillInterest.java",
        "line": 105,
        "exact": true,
        "location": "jetty-io-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint",
        "method": "onFillable",
        "file": "SslConnection.java",
        "line": 540,
        "exact": true,
        "location": "jetty-io-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.io.ssl.SslConnection",
        "method": "onFillable",
        "file": "SslConnection.java",
        "line": 395,
        "exact": true,
        "location": "jetty-io-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.io.ssl.SslConnection$2",
        "method": "succeeded",
        "file": "SslConnection.java",
        "line": 161,
        "exact": true,
        "location": "jetty-io-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.io.FillInterest",
        "method": "fillable",
        "file": "FillInterest.java",
        "line": 105,
        "exact": true,
        "location": "jetty-io-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.io.ChannelEndPoint$1",
        "method": "run",
        "file": "ChannelEndPoint.java",
        "line": 104,
        "exact": true,
        "location": "jetty-io-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.util.thread.strategy.EatWhatYouKill",
        "method": "runTask",
        "file": "EatWhatYouKill.java",
        "line": 336,
        "exact": true,
        "location": "jetty-util-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.util.thread.strategy.EatWhatYouKill",
        "method": "doProduce",
        "file": "EatWhatYouKill.java",
        "line": 313,
        "exact": true,
        "location": "jetty-util-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.util.thread.strategy.EatWhatYouKill",
        "method": "tryProduce",
        "file": "EatWhatYouKill.java",
        "line": 171,
        "exact": true,
        "location": "jetty-util-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.util.thread.strategy.EatWhatYouKill",
        "method": "run",
        "file": "EatWhatYouKill.java",
        "line": 129,
        "exact": true,
        "location": "jetty-util-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread",
        "method": "run",
        "file": "ReservedThreadExecutor.java",
        "line": 383,
        "exact": true,
        "location": "jetty-util-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.util.thread.QueuedThreadPool",
        "method": "runJob",
        "file": "QueuedThreadPool.java",
        "line": 882,
        "exact": true,
        "location": "jetty-util-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "org.eclipse.jetty.util.thread.QueuedThreadPool$Runner",
        "method": "run",
        "file": "QueuedThreadPool.java",
        "line": 1036,
        "exact": true,
        "location": "jetty-util-9.4.40.v20210413.jar",
        "version": "9.4.40.v20210413"
      },
      {
        "class": "java.lang.Thread",
        "method": "run",
        "file": "Thread.java",
        "line": 748,
        "exact": true,
        "location": "?",
        "version": "1.8.0_232"
      }
    ]
  },
  "endOfBatch": false,
  "loggerFqcn": "org.apache.logging.slf4j.Log4jLogger",
  "threadId": 191,
  "threadPriority": 5
}

For the QueryCapacityExceededException errors they are always merge buffer complaints, and always the result of groupBy queries. These errors do not start happening until the Brokers have been online for a while (12 hours to 4 days). We tend to see GC go past a normal threshold, then these errors start, users begin seeing query timeouts, and we need to restart the Brokers to get queries working normally again.

"exception":"org.apache.druid.query.QueryCapacityExceededException: Cannot acquire 1 merge buffers. Try again after current running queries are finished."

We will look into the SQL issue in a bit but I am wondering if it makes a difference whether it is the historical or the peon that is answering the sub-query?. Am I right in assuming the query you sent would eventually run without error if run after some time (per your original post?)

For the broker issue, you would only see buffer issues with groupBy as those are the only query types that use merge buffers. Each V2 groupBy query will use 2 buffers, so your theoretical limit for groupBys in each broker is 8. Can we try bumping up the following and seeing how things work?

druid.processing.numMergeBuffers to 32

If you have high cardinality columns in your groupBy queries, then we could try and potentially increase druid.processing.buffer.sizeBytes as well (max 2GiB)

We see a mix of queries. Some are old time ranges that only hit the Historicals, others are short term only hitting the MiddleManagers, and others span both. So I wouldn’t suspect the Historical/MiddleManager to be part of the problem. Given that we can run the queries we see in the logs just fine in the Druid console I suspect that if users just re-ran the query that it would work fine. These ISE-ArrayList errors seem very transient and difficult to troubleshoot.

We have been increasing the number of merge buffers every few days to see how the Broker performance changes, and see some mild improvements to going to larger values. We will continue increasing the value, and may just go straight to the 32 you suggest.

However, yesterday we finally got jmx hooked up to the Druid Broker jvm, and have discovered what appears to be a bug with qtp threads hanging. Since it is a different issue than the one this thread was originally for (the ISE-ArrayList issue) I am going to start a new thread for it.