SQL queries on 0.10: "Table not found" for one of two brokers

Hi,

I’ve been evaluating the use of Druid and recently just upgraded our cluster from 0.9 to 0.10. Aside from needing to configure the allowed memory for everything, this process went relatively smoothly. The one remaining issue is puzzling.

The ability to perform SQL queries was a major selling point internally for this new release. Unfortunately, in our cluster that contains two brokers, only one seems to be able to execute SQL queries. The other is seemingly unable to find our table.

root@druid-broker:/druid/druid-dist> curl -X POST -H’Content-Type: application/json’ http://localhost:8082/druid/v2/sql/ -d ‘{“query”:“SELECT COUNT(*) FROM eventer”}’

{“error”:“Unknown exception”,“errorMessage”:“org.apache.calcite.runtime.CalciteContextException: From line 1, column 15 to line 1, column 21: Table ‘eventer’ not found”,“errorClass”:“org.apache.calcite.tools.ValidationException”,“host”:null}

``

Queries of this sort on the other broker work just fine.

From the logs:

2017-05-03T20:19:17,582 ERROR [qtp2073333566-78] org.apache.calcite.runtime.CalciteException - org.apache.calcite.sql.validate.SqlValidatorException: Table ‘eventer’ not found

2017-05-03T20:19:17,583 ERROR [qtp2073333566-78] org.apache.calcite.runtime.CalciteException - org.apache.calcite.runtime.CalciteContextException: From line 1, column 15 to line

1, column 21: Table ‘eventer’ not found

2017-05-03T20:19:17,583 WARN [qtp2073333566-78] io.druid.sql.http.SqlResource - Failed to handle query: SqlQuery{query=‘SELECT * FROM eventer LIMIT 2’, context={}}

org.apache.calcite.tools.ValidationException: org.apache.calcite.runtime.CalciteContextException: From line 1, column 15 to line 1, column 21: Table ‘eventer’ not found

    at org.apache.calcite.prepare.PlannerImpl.validate(PlannerImpl.java:187) ~[calcite-core-1.11.0.jar:1.11.0]

    at io.druid.sql.calcite.planner.DruidPlanner.plan(DruidPlanner.java:76) ~[druid-sql-0.10.0.jar:0.10.0]

    at io.druid.sql.http.SqlResource.doPost(SqlResource.java:86) [druid-sql-0.10.0.jar:0.10.0]

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_121]

    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_121]

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]

    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]

    at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) [

jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) [jersey-servlet-1.19.jar:1.19]

    at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) [jersey-servlet-1.19.jar:1.19]

    at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) [jersey-servlet-1.19.jar:1.19]

    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0]

    at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:286) [guice-servlet-4.1.0.jar:?]

    at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:276) [guice-servlet-4.1.0.jar:?]

    at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:181) [guice-servlet-4.1.0.jar:?]

    at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) [guice-servlet-4.1.0.jar:?]

    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) [guice-servlet-4.1.0.jar:?]

    at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120) [guice-servlet-4.1.0.jar:?]

    at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:135) [guice-servlet-4.1.0.jar:?]

    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759) [jetty-servlet-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) [jetty-servlet-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224) [jetty-server-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) [jetty-server-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) [jetty-servlet-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [jetty-server-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) [jetty-server-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) [jetty-server-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) [jetty-server-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) [jetty-server-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.server.Server.handle(Server.java:534) [jetty-server-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) [jetty-server-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) [jetty-server-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) [jetty-io-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:110) [jetty-io-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) [jetty-io-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) [jetty-util-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) [jetty-util-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) [jetty-util-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) [jetty-util-9.3.16.v20170120.jar:9.3.16.v20170120]

    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) [jetty-util-9.3.16.v20170120.jar:9.3.16.v20170120]

    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]

``

Any assistance in debugging this would be appreciated. Particularly: what would cause one Broker to end up with a different context from the other?

Thanks,

Alex

Hey Alex,

The brokers populate their list of SQL tables by doing metadata queries to your data nodes. It’s possible one of the brokers wasn’t able to do that. I bet restarting that broker would get it able to run SQL queries. I’ve seen one race where a broker could make a metadata query “too fast” for a freshly loaded datasource, and thereby miss it. So you might be hitting something like that.

Do you see any exceptions in the problematic broker’s log? That’d help figure out if you’re seeing a known or unknown issue. If it’s not currently known we’ll make sure to fix it.

Hi Gian,

Sorry, I forgot to mention that I’ve restarted the broker in question multiple times. Additionally there are no exceptions in the log (except for those specifically related to the query, as provided in my previous message).

My most recent restart however seems to have fixed things (for now!) I’ll post back if I run into this problem again.

Thanks!

If you ever run into it again, maybe check if you see any “Refreshing metadata for dataSource[xx]” messages or any interesting looking followups to that message.

Hi Gian,

Restarting my brokers continues to pose problems with respect to SQL queries. There appear to be some timing issues around fetching the SQL metadata:

  1. If I perform a SQL query before I see a “Refreshing metadata for dataSource[xx]” message show up in the log, that broker will always respond with a “table not found” message for all subsequent SQL queries (although non-SQL queries work fine).

  2. If I perform a SQL query after I see a “Refreshing metadata for dataSource[xx]” message show up in the log, the first set of SQL queries performed may hang for many seconds. It may take more than 2 minutes for a SQL query to return without timing out. After the first SQL query has returned successfully, all subsequent queries will return promptly.

Aside from those observations, there does not seem to be anything particularly notable going on in the logs.

Cheers,

Alex

Hmm, that’s interesting and definitely seems wrong. Do you ever see messages like “Refreshed metadata for dataSource in y ms.”? If so how big is “y”? And what API are you using to query – JDBC or the JSON-over-HTTP on /druid/v2/sql/?

I do see messages like that. Sometimes two, one after another, but usually just once soon after starting the broker. It usually reports something between 2000-3000ms.

I’ve been using the JSON-over-HTTP API.

Hi Gian, We also came across the same issue of Table not found.

Hmm, let me see if I can reproduce this and figure out what’s going on. Any repro steps would be helpful if you have a way to make this behavior appear in a smaller installation like the quickstart.

I raised https://github.com/druid-io/druid/issues/4258 to track.

Alex, Arpan, were either of you using multiple brokers behind load balancers?

Gian,

Yes, I was. This behaviour was still present when querying from/to localhost, however.

Alex