Exception submitting action for task

I am running hourly indexing from kafka. and today they start to fail. cluster is small . 3 coordinators,overlords,middle manager + 2 brokers + 6 historical all running 0.13.
when it starts it makes 2 kafka indexing tasks * 2 datasources . and runs fine the problem is when the middlemanager post to the overlord the hourly index completion. The task is never stopped and 2 more are created…
from what i see the data file in s3 is there the db entry in the segments table is fine and i can check even with multiple indexing task running the values in turnillo. however if the task is stopped from the druid task console then i stop seeing the results in turnillo although the the files and db are ok… its like it never announce the segments
there are merging index task that run fine
it seams that the overload is not informed that the task is completed and it continues to expect the confirmation from the middle manager.
i send part of the log of the task and of the overlord
In fact this is a race condition in indexersqlmetadatastoragecoordinator.java (log.info(“Not updating metadata, existing state is not the expected start state.”):wink:
basically one replica alter the UPDATE druid_dataSource SET commit_metadata_payload = _binary’{"typ…
and the other fails the check when comparing the original to the one in the db SELECT commit_metadata_payload FROM druid_dataSource WHERE dataSource…

unfortunately the only form we found to avid this is to reduce the replica count to 1

Logs ``` org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[index_kafka_ola-redirects_88414b9d10bcaf8_eohbhnoh] to overlord: [SegmentAllocateAction{dataSource='ola-redirects', timestamp=2021-02-26T04:00:00.000Z, queryGranularity={type=period, period=PT1H, timeZone=UTC, origin=null}, preferredSegmentGranularity={type=period, period=PT1H, timeZone=UTC, origin=null}, sequenceName='index_kafka_ola-redirects_88414b9d10bcaf8_0', previousSegmentId='ola-redirects_2021-02-26T03:00:00.000Z_2021-02-26T04:00:00.000Z_2021-02-26T03:00:00.286Z_1', skipSegmentLineageCheck='true'}]. 2021-02-26T04:00:00,563 WARN [task-runner-0-priority-0] org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Exception submitting action for task[index_kafka_ola-redirects_88414b9d10bcaf8_eohbhnoh] org.apache.druid.java.util.common.IOE: Scary HTTP status returned: 500 Server Error. Check your overlord logs for exceptions. at org.apache.druid.indexing.common.actions.RemoteTaskActionClient.submit(RemoteTaskActionClient.java:94) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.druid.indexing.appenderator.ActionBasedSegmentAllocator.allocate(ActionBasedSegmentAllocator.java:55) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver.getSegment(BaseAppenderatorDriver.java:331) [druid-server-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver.append(BaseAppenderatorDriver.java:399) [druid-server-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver.add(StreamAppenderatorDriver.java:180) [druid-server-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner.runInternal(IncrementalPublishingKafkaIndexTaskRunner.java:513) [druid-kafka-indexing-service-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.druid.indexing.kafka.IncrementalPublishingKafkaIndexTaskRunner.run(IncrementalPublishingKafkaIndexTaskRunner.java:232) [druid-kafka-indexing-service-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.druid.indexing.kafka.KafkaIndexTask.run(KafkaIndexTask.java:210) [druid-kafka-indexing-service-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:421) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating] at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:393) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_282] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282] 2021-02-26T04:00:00,564 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Will try again in [PT5.921S]. 2021-02-26T04:00:06,485 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[index_kafka_ola-redirects_88414b9d10bcaf8_eohbhnoh] to overlord: [SegmentAllocateAction{dataSource='ola-redirects', timestamp=2021-02-26T04:00:00.000Z, queryGranularity={type=period, period=PT1H, timeZone=UTC, origin=null}, preferredSegmentGranularity={type=period, period=PT1H, timeZone=UTC, origin=null}, sequenceName='index_kafka_ola-redirects_88414b9d10bcaf8_0', previousSegmentId='ola-redirects_2021-02-26T03:00:00.000Z_2021-02-26T04:00:00.000Z_2021-02-26T03:00:00.286Z_1', skipSegmentLineageCheck='true'}].

overlord
""2021-02-26T04:00:00,520 INFO [qtp581383895-73] org.apache.druid.metadata.IndexerSQLMetadataStorageCoordinator - Allocated pending segment [ola-redirects_2021-02-26T04:00:00.000Z_2021-02-26T05:00:00.000Z_2021-02-26T04:00:00.238Z] for sequence[index_kafka_ola-redirects_88414b9d10bcaf8_0] in DB
""2021-02-26T04:00:00,547 ERROR [qtp581383895-95] com.sun.jersey.spi.container.ContainerResponse - The exception contained within MappableContainerException could not be mapped to a response, re-throwing to the HTTP container
" java.lang.NoClassDefFoundError: com/mysql/jdbc/exceptions/MySQLTransientException
at org.apache.druid.metadata.storage.mysql.MySQLConnector.connectorIsTransientException(MySQLConnector.java:202) ~[?:?]
at org.apache.druid.metadata.SQLMetadataConnector.isTransientException(SQLMetadataConnector.java:159) ~[druid-server-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.metadata.SQLMetadataConnector$1.apply(SQLMetadataConnector.java:74) ~[druid-server-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.metadata.SQLMetadataConnector$1.apply(SQLMetadataConnector.java:70) ~[druid-server-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:92) ~[java-util-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.java.util.common.RetryUtils.retry(RetryUtils.java:114) ~[java-util-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.metadata.SQLMetadataConnector.retryTransaction(SQLMetadataConnector.java:145) ~[druid-server-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.metadata.IndexerSQLMetadataStorageCoordinator.allocatePendingSegment(IndexerSQLMetadataStorageCoordinator.java:390) ~[druid-server-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.common.actions.SegmentAllocateAction.tryAllocate(SegmentAllocateAction.java:270) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.common.actions.SegmentAllocateAction.tryAllocateFirstSegment(SegmentAllocateAction.java:223) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.common.actions.SegmentAllocateAction.perform(SegmentAllocateAction.java:168) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.common.actions.SegmentAllocateAction.perform(SegmentAllocateAction.java:55) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.common.actions.LocalTaskActionClient.submit(LocalTaskActionClient.java:74) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.overlord.http.OverlordResource$4.apply(OverlordResource.java:480) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.overlord.http.OverlordResource$4.apply(OverlordResource.java:469) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.overlord.http.OverlordResource.asLeaderWith(OverlordResource.java:1010) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.indexing.overlord.http.OverlordResource.doAction(OverlordResource.java:466) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
at sun.reflect.GeneratedMethodAccessor121.invoke(Unknown Source) ~[?:?]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_282]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_282]
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) ~[jersey-server-1.19.3.jar:1.19.3]
at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) ~[jersey-server-1.19.3.jar:1.19.3]
at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) ~[jersey-server-1.19.3.jar:1.19.3]
at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) ~[jersey-server-1.19.3.jar:1.19.3]
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) ~[jersey-server-1.19.3.jar:1.19.3]
at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) ~[jersey-server-1.19.3.jar:1.19.3]
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) ~[jersey-server-1.19.3.jar:1.19.3]
at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) ~[jersey-server-1.19.3.jar:1.19.3]
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) ~[jersey-server-1.19.3.jar:1.19.3]
at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) [jersey-server-1.19.3.jar:1.19.3]
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) [jersey-server-1.19.3.jar:1.19.3]
at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) [jersey-server-1.19.3.jar:1.19.3]
at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) [jersey-servlet-1.19.3.jar:1.19.3]
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) [jersey-servlet-1.19.3.jar:1.19.3]
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) [jersey-servlet-1.19.3.jar:1.19.3]
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0]
at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:286) [guice-servlet-4.1.0.jar:?]
at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:276) [guice-servlet-4.1.0.jar:?]
at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:181) [guice-servlet-4.1.0.jar:?]
at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) [guice-servlet-4.1.0.jar:?]
at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) [guice-servlet-4.1.0.jar:?]
at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120) [guice-servlet-4.1.0.jar:?]
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:135) [guice-servlet-4.1.0.jar:?]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) [jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.apache.druid.server.http.RedirectFilter.doFilter(RedirectFilter.java:71) [druid-server-0.13.0-incubating.jar:0.13.0-incubating]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) [jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.apache.druid.server.security.PreResponseAuthorizationCheckFilter.doFilter(PreResponseAuthorizationCheckFilter.java:84) [druid-server-0.13.0-incubating.jar:0.13.0-incubating]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) [jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.apache.druid.server.security.AllowOptionsResourceFilter.doFilter(AllowOptionsResourceFilter.java:76) [druid-server-0.13.0-incubating.jar:0.13.0-incubating]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) [jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.apache.druid.server.security.AllowAllAuthenticator$1.doFilter(AllowAllAuthenticator.java:85) [druid-server-0.13.0-incubating.jar:0.13.0-incubating]
at org.apache.druid.server.security.AuthenticationWrappingFilter.doFilter(AuthenticationWrappingFilter.java:60) [druid-server-0.13.0-incubating.jar:0.13.0-incubating]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) [jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.apache.druid.server.security.SecuritySanityCheckFilter.doFilter(SecuritySanityCheckFilter.java:88) [druid-server-0.13.0-incubating.jar:0.13.0-incubating]
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1642) [jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533) [jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473) [jetty-servlet-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:724) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:61) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.Server.handle(Server.java:531) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260) [jetty-server-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281) [jetty-io-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102) [jetty-io-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118) [jetty-io-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678) [jetty-util-9.4.10.v20180503.jar:9.4.10.v20180503]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
"

Welcome to Druid Forum! I am looking into your issue and will reply sometime today.

I’ve looked at similar issues, and I have not found a root cause. terminating and Resubmitting the ingestion task might be worth a try, or restarting your overlord process. Outside of that, you may want to file a Jira… Sorry to not have a better solution here.

thanks for the help… but even with one replica the system is falling… we have spend too much time trying to fix and even the one replica solution is failing in other points… going for the last solution redoing the cluster

After 2 days… we weer using wrong java 8 version use this openjdk-8-jre-headless=8u77-b03-3ubuntu3