Unstable Overlord after Druid upgrade to 0.10.1 version

Hello,
I did Druid upgrade to 0.10.1 version.
I follow rolling-updates procedures and use Rolling restart (graceful-termination-based) strategy for Middle Managers upgrade.

Unfortunately after upgrade Overlord logs WARNINGS and ERRORS:

  1. `Exception while sending request

`
2017-09-06 11:07:30,597 WARN i.d.i.k.KafkaIndexTaskClient [KafkaIndexTaskClient-prod_sample_1_events-0] Exception while sending request
io.druid.java.util.common.IAE: Received 400 Bad Request with body: Task must be paused before changing the end offsets
at io.druid.indexing.kafka.KafkaIndexTaskClient.submitRequest(KafkaIndexTaskClient.java:515) [druid-kafka-indexing-service-0.10.1.jar:0.10.1]
at io.druid.indexing.kafka.KafkaIndexTaskClient.setEndOffsets(KafkaIndexTaskClient.java:293) [druid-kafka-indexing-service-0.10.1.jar:0.10.1]
at io.druid.indexing.kafka.KafkaIndexTaskClient$11.call(KafkaIndexTaskClient.java:429) [druid-kafka-indexing-service-0.10.1.jar:0.10.1]
at io.druid.indexing.kafka.KafkaIndexTaskClient$11.call(KafkaIndexTaskClient.java:425) [druid-kafka-indexing-service-0.10.1.jar:0.10.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]

``

  1. Exception while getting current/latest offsets

2017-09-06 11:28:43,877 WARN i.d.i.k.s.KafkaSupervisor [KafkaSupervisor-prod_sample_1_events-Reporting-0] Exception while getting current/latest offsets java.util.concurrent.TimeoutException: Timeout waiting for task. at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:269) ~[guava-16.0.1.jar:?] at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:96) ~[guava-16.0.1.jar:?] at io.druid.indexing.kafka.supervisor.KafkaSupervisor.updateCurrentOffsets(KafkaSupervisor.java:1802) ~[?:?] at io.druid.indexing.kafka.supervisor.KafkaSupervisor.lambda$updateCurrentAndLatestOffsets$12(KafkaSupervisor.java:1810) ~[?:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_141] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_141] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_141] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_141] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_141] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_141] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]

  1. Kafka partitions [,,,] do not match task partition [,,,] ``occurs frequently

2017-09-06 11:25:31,410 WARN i.d.i.k.s.KafkaSupervisor [KafkaSupervisor-prod_sample_1_events-Reporting-0] Lag metric: Kafka partitions [0, 1, 2, 3, 4, 5, 6, 7] do not match task partitions [1, 2, 3, 4, 6]

``

Overlord hangs, and while trying to restart there are another problems:

Cannot get a connection, general error
`

2017-09-06 11:44:10,741 WARN i.d.j.u.c.RetryUtils [qtp1373220972-115] Failed on try 1, retrying in 720ms.
org.skife.jdbi.v2.exceptions.UnableToObtainConnectionException: java.sql.SQLException: Cannot get a connection, general error
    at org.skife.jdbi.v2.DBI.open(DBI.java:230) ~[jdbi-2.63.1.jar:2.63.1]
    at org.skife.jdbi.v2.DBI.withHandle(DBI.java:279) ~[jdbi-2.63.1.jar:2.63.1]
    at io.druid.metadata.SQLMetadataConnector$2.call(SQLMetadataConnector.java:129) ~[druid-server-0.10.1.jar:0.10.1]
    at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:63) [java-util-0.10.1.jar:0.10.1]
    at io.druid.java.util.common.RetryUtils.retry(RetryUtils.java:81) [java-util-0.10.1.jar:0.10.1]
    at io.druid.metadata.SQLMetadataConnector.retryWithHandle(SQLMetadataConnector.java:133) [druid-server-0.10.1.jar:0.10.1]
    at io.druid.metadata.SQLMetadataConnector.retryWithHandle(SQLMetadataConnector.java:142) [druid-server-0.10.1.jar:0.10.1]
    
at io.druid.metadata.IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForIntervals(IndexerSQLMetadataStorageCoordinator.java:127)
[druid-server-0.10.1.jar:0.10.1]
     at io.druid.metadata.IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval(IndexerSQLMetadataStorageCoordinator.java:119)
[druid-server-0.10.1.jar:0.10.1]
     at io.druid.indexing.common.actions.SegmentAllocateAction.perform(SegmentAllocateAction.java:156)
[druid-indexing-service-0.10.1.jar:0.10.1]
     at io.druid.indexing.common.actions.SegmentAllocateAction.perform(SegmentAllocateAction.java:54)
[druid-indexing-service-0.10.1.jar:0.10.1]
     at io.druid.indexing.common.actions.LocalTaskActionClient.submit(LocalTaskActionClient.java:64)
[druid-indexing-service-0.10.1.jar:0.10.1]
     at io.druid.indexing.overlord.http.OverlordResource$3.apply(OverlordResource.java:341)
[druid-indexing-service-0.10.1.jar:0.10.1]
     at io.druid.indexing.overlord.http.OverlordResource$3.apply(OverlordResource.java:330)
[druid-indexing-service-0.10.1.jar:0.10.1]
     at io.druid.indexing.overlord.http.OverlordResource.asLeaderWith(OverlordResource.java:673)
[druid-indexing-service-0.10.1.jar:0.10.1]
     at io.druid.indexing.overlord.http.OverlordResource.doAction(OverlordResource.java:327)
[druid-indexing-service-0.10.1.jar:0.10.1]
    at sun.reflect.GeneratedMethodAccessor55.invoke(Unknown Source) ~[?:?]
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_141]
    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_141]
    
at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
[jersey-server-1.19.3.jar:1.19.3]
     at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
[jersey-server-1.19.3.jar:1.19.3]
     at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
[jersey-server-1.19.3.jar:1.19.3]
    at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) [jersey-server-1.19.3.jar:1.19.3]
    
at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
[jersey-server-1.19.3.jar:1.19.3]
     at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
[jersey-server-1.19.3.jar:1.19.3]
     at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
[jersey-server-1.19.3.jar:1.19.3]
     at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
[jersey-server-1.19.3.jar:1.19.3]
     at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
[jersey-server-1.19.3.jar:1.19.3]
     at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
[jersey-server-1.19.3.jar:1.19.3]
     at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
[jersey-server-1.19.3.jar:1.19.3]
     at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
[jersey-server-1.19.3.jar:1.19.3]
    at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) [jersey-servlet-1.19.3.jar:1.19.3]
    
at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
[jersey-servlet-1.19.3.jar:1.19.3]
     at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)
[jersey-servlet-1.19.3.jar:1.19.3]
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0]
    at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:286) [guice-servlet-4.1.0.jar:?]
    at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:276) [guice-servlet-4.1.0.jar:?]
    at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:181) [guice-servlet-4.1.0.jar:?]
    at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) [guice-servlet-4.1.0.jar:?]
    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) [guice-servlet-4.1.0.jar:?]
    at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120) [guice-servlet-4.1.0.jar:?]
    at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:135) [guice-servlet-4.1.0.jar:?]
    
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
[jetty-servlet-9.3.19.v20170502.jar:9.3.19.v20170502]
    at io.druid.server.http.RedirectFilter.doFilter(RedirectFilter.java:72) [druid-server-0.10.1.jar:0.10.1]
    
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
[jetty-servlet-9.3.19.v20170502.jar:9.3.19.v20170502]
     at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
[jetty-servlet-9.3.19.v20170502.jar:9.3.19.v20170502]
     at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:224)
[jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]
     at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
[jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]
     at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
[jetty-servlet-9.3.19.v20170502.jar:9.3.19.v20170502]
     at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
[jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]
     at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
[jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]
     at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
[jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]
     at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
[jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]
     at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
[jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]
     at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
[jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]
    at org.eclipse.jetty.server.Server.handle(Server.java:534) [jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) [jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]
    
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
[jetty-server-9.3.19.v20170502.jar:9.3.19.v20170502]
     at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
[jetty-io-9.3.19.v20170502.jar:9.3.19.v20170502]
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) [jetty-io-9.3.19.v20170502.jar:9.3.19.v20170502]
    
at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
[jetty-io-9.3.19.v20170502.jar:9.3.19.v20170502]
     at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
[jetty-util-9.3.19.v20170502.jar:9.3.19.v20170502]
     at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
[jetty-util-9.3.19.v20170502.jar:9.3.19.v20170502]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
Caused by: java.sql.SQLException: Cannot get a connection, general error
    at org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:131) ~[commons-dbcp2-2.0.1.jar:2.0.1]
    at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:1413) ~[commons-dbcp2-2.0.1.jar:2.0.1]
    
at org.skife.jdbi.v2.DataSourceConnectionFactory.openConnection(DataSourceConnectionFactory.java:36)
~[jdbi-2.63.1.jar:2.63.1]
    at org.skife.jdbi.v2.DBI.open(DBI.java:212) ~[jdbi-2.63.1.jar:2.63.1]
    ... 63 more
Caused by: java.lang.InterruptedException
    
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
~[?:1.8.0_141]
     at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2048)
~[?:1.8.0_141]
    at org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:524) ~[commons-pool2-2.2.jar:2.2]
    at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:438) ~[commons-pool2-2.2.jar:2.2]
    at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:361) ~[commons-pool2-2.2.jar:2.2]
    at org.apache.commons.dbcp2.PoolingDataSource.getConnection(PoolingDataSource.java:119) ~[commons-dbcp2-2.0.1.jar:2.0.1]
    at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:1413) ~[commons-dbcp2-2.0.1.jar:2.0.1]
    
at org.skife.jdbi.v2.DataSourceConnectionFactory.openConnection(DataSourceConnectionFactory.java:36)
~[jdbi-2.63.1.jar:2.63.1]
    at org.skife.jdbi.v2.DBI.open(DBI.java:212) ~[jdbi-2.63.1.jar:2.63.1]
    ... 63 more

``

`Couldn't stop Thread and ``Failed to get task runner

2017-09-06
11:44:25,739 WARN o.e.j.u.t.QueuedThreadPool [Thread-41] qtp1373220972{STOPPING,40<=40<=40,i=0,q=36} Couldn’t stop Thread[qtp1373220972-102,5,main]
2017-09-06 11:44:25,739 WARN o.e.j.u.t.QueuedThreadPool [Thread-41] qtp1373220972{STOPPING,40<=40<=40,i=0,q=36} Couldn’t stop Thread[qtp1373220972-84,5,main]
2017-09-06 11:44:25,739 WARN o.e.j.u.t.QueuedThreadPool [Thread-41] qtp1373220972{STOPPING,40<=40<=40,i=0,q=36} Couldn’t stop Thread[qtp1373220972-91,5,main]

2017-09-06 11:51:03,706 ERROR i.d.i.k.s.KafkaSupervisor [KafkaIndexTaskClient-prod_sample_1_events-0] Failed to get task runner because I’m not the leader!

``

druid.host={{overlord-fqdn}}
druid.port={{overlord-port}}
druid.service=druid/overlord

# Upload all task logs to deep storage

druid.indexer.logs.type=s3

druid.indexer.logs.s3Bucket={{druid-s3-bucket-name}}

druid.indexer.logs.s3Prefix=prod/logs/v1

Run in remote mode

druid.indexer.runner.type=remote
druid.indexer.runner.minWorkerVersion=#{WORKER_VERSION}

Store all task state in the metadata storage

druid.indexer.storage.type=metadata

druid.monitoring.monitors=[“com.metamx.metrics.SysMonitor”,“io.druid.client.cache.CacheMonitor”,“io.druid.server.metrics.EventReceiverFirehoseMonitor”,“com.metamx.metrics.JvmMonitor”][root@druid-master01
~]#

``

druid.host={{mm-fqdn}}
druid.port={{mm-port}}
druid.service=druid/middlemanager
druid.monitoring.monitors=[“com.metamx.metrics.SysMonitor”,“io.druid.client.cache.CacheMonitor”,“io.druid.server.metrics.EventReceiverFirehoseMonitor”,“com.metamx.metrics.JvmMonitor”]

Resources for peons

druid.indexer.runner.javaOpts=-server
-Xmx6g -XX:MaxDirectMemorySize=4g -XX:+UseG1GC -XX:MaxGCPauseMillis=100
-Dcom.amazonaws.services.s3.enableV4
druid.indexer.task.baseTaskDir=/persistent/task/

# Store task logs in deep storage --> please refere to runtime.properties in the _common dir

druid.indexer.logs.type=s3

druid.indexer.logs.s3Bucket=s3-deep-storage

druid.indexer.logs.s3Prefix=prod/logs/v1

Peon properties

druid.indexer.fork.property.druid.monitoring.monitors=[“com.metamx.metrics.JvmMonitor”]
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=1073741824
druid.indexer.fork.property.druid.processing.numThreads=2
druid.indexer.fork.property.druid.processing.numMergeBuffers=1
#druid.indexer.fork.property.druid.segmentCache.locations=[{“path”: “/persistent/zk_druid”, “maxSize”: 0}]
druid.indexer.fork.property.druid.server.http.numThreads=100
druid.indexer.fork.property.druid.storage.archiveBaseKey=prod
druid.indexer.fork.property.druid.storage.archiveBucket=**{{druid-s3-bucket-name}}**-archive
druid.indexer.fork.property.druid.storage.baseKey=prod/v1
druid.indexer.fork.property.druid.storage.bucket={{druid-s3-bucket-name}}
druid.indexer.fork.property.druid.storage.type=s3

druid.worker.capacity=11
druid.worker.ip=localhost
druid.worker.version=0

``

The only option to stop it is to kill -9 overlord_pid
OUR CONFIGS:
overlord.runtime.properties
middleManager.runtime.properties
Any idea?

Regards,
Jan

Does seem you have an issue with the SQL metadata storage access, make sure that your overlord has right sql metadata storage the 0.10.1 and the good credentials.

you can manually restart the offset to force reading from the latest one but that can lead to lose of data https://github.com/druid-io/druid/pull/3678/files

Hi Slim,

there is no problem with SQL access otherwise ALL of the KIS tasks would fail (we have Overlord task success ratio 63.3% at the moment).

sample SQL access related logs from overlord:

INFO i.d.m.IndexerSQLMetadataStorageCoordinator [qtp1373220972-92] Updated metadata

INFO i.d.m.IndexerSQLMetadataStorageCoordinator [qtp1373220972-92] Published segment

INFO i.d.m.IndexerSQLMetadataStorageCoordinator [qtp1373220972-90] Alocating pending segment

INFO i.d.m.IndexerSQLMetadataStorageCoordinator [qtp1373220972-90] Found existing pending segment

``

RDS: max_allowed_packet = 1024102416

Regards,
Jan

Can you check the offset information of the metadata?
Please check the following query .

select from druid_dataSource where dataSource=‘datasource_id’;

Sincerely,

Yousun Jeong

Hey Jan,

It seems like something has really slowed down. I think the “SQLException: Cannot get a connection, general error” messages you’re getting mean that there are no free connections in the connection pool for too long, in turn probably because the ones that are currently in use are taking a long time to finish.

I would suggest first looking to see if your overlord is stuck because it’s GCing a lot. If not, then I’d check if your metadata store is healthy (can you query it directly, does it have any slow queries in its query log).

Hey Gian,

SQLException: Cannot get a connection, general error

``

this error I have only while trying to restart unresponsive overlord, which process eventually is killed with kill -9 command.
I did some research regarding system behavior, so I see the following patterns:

  1. decreased druid.indexer.fork.property.druid.server.http.numThreads from 100 --> 50
  2. added more middleManager servers (6core Xeon) —> increased peon number from 44 to 55 = (12thread -1 per server) * 5 servers
  3. found that overlord assigns more peons then its configured to do:
    for example for dataSource configure with: “taskCount”: 8 overlord assigned 24 !!! tasks (as I understand some of them stuck and never finished)
    and there are more paused successfully tasks then completed successfully
    grep ‘completed successfully’ /var/log/druid/overlord.log | wc -l
    11
    grep ‘paused successfully’ /var/log/druid/overlord.log | wc -l
    54

    when overlord is about to hung you can see Connection refused for many dataSources:

2017-09-08 20:20:58,300 INFO i.d.i.k.KafkaIndexTaskClient [KafkaIndexTaskClient-prod_sample_1_events-5] submitRequest failed for [http://druid-mm02:8104/druid/worker/v1/chat/index_kafka_prod_sample_1_events_068972c80c54c05_himhgfpp/offsets/current], with message [Connection refused (Connection refused)]

``

  1. checked number of open files:

ansible druid-mm -a “lsof | grep java | wc -l” -s

druid-mm05 | SUCCESS | rc=0 >>
1370934

druid-mm03 | SUCCESS | rc=0 >>
1731782

druid-mm02 | SUCCESS | rc=0 >>
1727398

druid-mm01 | SUCCESS | rc=0 >>
1715144

druid-mm04 | SUCCESS | rc=0 >>
1739617

``

  1. checked MySQL

`discovered that pendingSegments should be cleaned manually (on druid-user group), and that ```count of `````163280 ````````id should not take almost 2mins````

`
MySQL [druid]> select count(id) from druid_pendingSegments;

Hi Yousun/Jerry,

yes I checked the query you suggested, and everything is OK,

I also use overlord REST_API to obtain status info about KIS dataSource tasks, and found out that there is no problem at all, overlord knows exactly what are the latestOffsets :slight_smile:

"latestOffsets": {
  "0": 306648700,
  "1": 304854330,
  "2": 308518602,
  "3": 298993034,
  "4": 305214569,
  "5": 311090025,
  "6": 315678079,
  "7": 307856233
},

``

While observing the system I found that everything is fine until … the query of DEATH TO OVERLORD appears on brokers:slight_smile:
Overlord dies only and only when some queries for “suspected” dataSource appears. Does it make any sense ?
Is it possible that incorrect KIS for specific dataSource configuration can lead to overlords death ?

Regards,
Jan