New task getting failed due to few older tasks which never got completed

Hi,

I am using Tranquility 0.4.2 and Druid 0.7.3. I observed a scenario where few of my new tasks were getting marked as failed due to timeouts.

After checking the logs following is my theory :

  1. Few older older tasks, for eg - index_realtime_supply_2016-02-28T19:00:00.000Z_0_0 lost connection to zookeeper.
  2. Overlord finds the task to have disappeared and marks it as failed but the task process is actually still running.
  3. In Middle manager, the WorkerTaskMonitor’s exec thread pool size is same as the worker capacity. Right now, overload has marked the task as failed but the middle WorkerTaskMonitor hasn’t cancelled that task yet.
  4. Now when the new tasks, for eg index_realtime_demand_2016-03-11T03:00:00.000Z_0_0 gets assigned to this worker, the tasks get submitted but won’t start running immediately because the previous tasks weren’t cancelled yet and hence the executor doesn’t run the task.(selectStrategy was set to equalDistribution)
  5. Overlord kills the task as it din’t get started before the set timeout (5M)
  6. My worker capacity is 6. I was kind of convinced about this scenario when I observed from the logs that the new tasks were getting run only by WorkerTaskMonitor-0,WorkerTaskMonitor-1, WorkerTaskMonitor-3 and WorkerTaskMonitor-5. The last log lines for WorkerTaskMonitor-2 and 4 were for the very old tasks which are marked as failed by overlord but the process is still running.
    Please let me know if my understanding is correct. Else, how do I avoid such failures from happening.

Logs are the corresponding tasks are below:

Overlord logs for the new task index_realtime_demand_2016-03-11T03:00:00.000Z_0_0 that got killed because it din’t start before the timeout :

2016-03-11T03:00:02,076 INFO [qtp1252138909-161] io.druid.indexing.overlord.HeapMemoryTaskStorage - Inserting task index_realtime_demand_2016-03-11T03:00:00.000Z_0_0 with status: TaskStatus{id=index_realtime_demand_2016-03-11T03:00:00.000Z_0_0, status=RUNNING, duration=-1}

2016-03-11T03:00:02,076 INFO [TaskQueue-Manager] io.druid.indexing.overlord.TaskQueue - Asking taskRunner to run: index_realtime_demand_2016-03-11T03:00:00.000Z_0_0

2016-03-11T03:00:02,076 INFO [TaskQueue-Manager] io.druid.indexing.overlord.RemoteTaskRunner - Added pending task index_realtime_demand_2016-03-11T03:00:00.000Z_0_0

2016-03-11T03:00:02,077 INFO [pool-18-thread-1] io.druid.indexing.overlord.RemoteTaskRunner - Coordinator asking Worker[10.72.22.102:8086] to add task[index_realtime_demand_2016-03-11T03:00:00.000Z_0_0]

2016-03-11T03:00:02,079 INFO [pool-18-thread-1] io.druid.indexing.overlord.RemoteTaskRunner - Task index_realtime_demand_2016-03-11T03:00:00.000Z_0_0 switched from pending to running (on [10.72.22.102:8086])

2016-03-11T03:05:02,079 ERROR [pool-18-thread-1] io.druid.indexing.overlord.RemoteTaskRunner - Something went wrong! [10.72.22.102:8086] never ran task [index_realtime_demand_2016-03-11T03:00:00.000Z_0_0]! Timeout: (300000 >= PT5M)!

2016-03-11T03:05:02,079 INFO [pool-18-thread-1] io.druid.indexing.overlord.RemoteTaskRunner - Worker[10.72.22.102:8086] completed task[index_realtime_demand_2016-03-11T03:00:00.000Z_0_0] with status[FAILED]

2016-03-11T03:05:02,081 INFO [pool-18-thread-1] io.druid.indexing.overlord.TaskQueue - Received FAILED status for task: index_realtime_demand_2016-03-11T03:00:00.000Z_0_0

2016-03-11T03:05:02,081 ERROR [pool-18-thread-1] io.druid.indexing.overlord.RemoteTaskRunner - WTF?! Asked to cleanup nonexistent task: {class=io.druid.indexing.overlord.RemoteTaskRunner, taskId=index_realtime_demand_2016-03-11T03:00:00.000Z_0_0}

2016-03-11T03:05:02,081 INFO [pool-18-thread-1] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“alerts”,“timestamp”:“2016-03-11T03:05:02.081Z”,“service”:“overlord”,“host”:“10.72.14.10:8090”,“severity”:“component-failure”,“description”:“WTF?! Asked to cleanup nonexistent task”,“data”:{“class”:“io.druid.indexing.overlord.RemoteTaskRunner”,“taskId”:“index_realtime_demand_2016-03-11T03:00:00.000Z_0_0”}}]

2016-03-11T03:05:02,081 INFO [pool-18-thread-1] io.druid.indexing.overlord.HeapMemoryTaskStorage - Updating task index_realtime_demand_2016-03-11T03:00:00.000Z_0_0 to status: TaskStatus{id=index_realtime_demand_2016-03-11T03:00:00.000Z_0_0, status=FAILED, duration=-1}

2016-03-11T03:05:02,081 INFO [pool-18-thread-1] io.druid.indexing.overlord.TaskQueue - Task done: RealtimeIndexTask{id=index_realtime_demand_2016-03-11T03:00:00.000Z_0_0, type=index_realtime, dataSource=demand}

2016-03-11T03:05:02,081 INFO [pool-18-thread-1] io.druid.indexing.overlord.TaskQueue - Task FAILED: RealtimeIndexTask{id=index_realtime_demand_2016-03-11T03:00:00.000Z_0_0, type=index_realtime, dataSource=demand} (-1 run duration)

2016-03-11T04:06:20,373 INFO [PathChildrenCache-2] io.druid.indexing.overlord.RemoteTaskRunner - Worker[10.72.22.102:8086] wrote RUNNING status for task: index_realtime_demand_2016-03-11T03:00:00.000Z_0_0

2016-03-11T04:06:20,373 WARN [PathChildrenCache-2] io.druid.indexing.overlord.RemoteTaskRunner - Worker[10.72.22.102:8086] announced a status for a task I didn’t know about, adding to runningTasks: index_realtime_demand_2016-03-11T03:00:00.000Z_0_0

2016-03-11T04:06:30,641 INFO [qtp1252138909-156] io.druid.indexing.common.actions.LocalTaskActionClient - Performing action for task[index_realtime_demand_2016-03-11T03:00:00.000Z_0_0]: LockListAction{}

2016-03-11T04:07:19,236 INFO [TaskQueue-Manager] io.druid.indexing.overlord.RemoteTaskRunner - Sent shutdown message to worker: 10.72.22.102:8086, status 200 OK, response: {“task”:“index_realtime_demand_2016-03-11T03:00:00.000Z_0_0”}

2016-03-11T04:07:19,238 ERROR [TaskQueue-Manager] io.druid.indexing.overlord.RemoteTaskRunner - Shutdown failed for index_realtime_demand_2016-03-11T03:00:00.000Z_0_0! Are you sure the task was running?

2016-03-11T04:07:19,244 INFO [PathChildrenCache-2] io.druid.indexing.overlord.RemoteTaskRunner - Worker[10.72.22.102:8086] wrote FAILED status for task: index_realtime_demand_2016-03-11T03:00:00.000Z_0_0

2016-03-11T04:07:19,244 INFO [PathChildrenCache-2] io.druid.indexing.overlord.RemoteTaskRunner - Worker[10.72.22.102:8086] completed task[index_realtime_demand_2016-03-11T03:00:00.000Z_0_0] with status[FAILED]

2016-03-11T04:08:19,226 INFO [TaskQueue-Manager] io.druid.indexing.overlord.RemoteTaskRunner - Cleaning up task[index_realtime_demand_2016-03-11T03:00:00.000Z_0_0] on worker[10.72.22.102:8086]

2016-03-11T04:08:19,227 INFO [PathChildrenCache-2] io.druid.indexing.overlord.RemoteTaskRunner - Task[index_realtime_demand_2016-03-11T03:00:00.000Z_0_0] went bye bye.

Corresponding logs in the middle manager for the task index_realtime_demand_2016-03-11T03:00:00.000Z_0_0. Started running 1 hour after the submission :

2016-03-11T03:00:02,066 INFO [TaskMonitorCache-0] io.druid.indexing.worker.WorkerTaskMonitor - Submitting runnable for task[index_realtime_demand_2016-03-11T03:00:00.000Z_0_0]

2016-03-11T04:06:20,349 INFO [WorkerTaskMonitor-5] io.druid.indexing.worker.WorkerTaskMonitor - Affirmative. Running task [index_realtime_demand_2016-03-11T03:00:00.000Z_0_0]

2016-03-11T04:06:20,354 INFO [pool-5-thread-6] io.druid.indexing.overlord.ForkingTaskRunner - Running command: java -cp /opt/druid/current/lib/derbynet-10.11.1.1.jar:/opt/druid/curr …

-03-11T03:00:00.000Z_0_0/132a7c32-2b38-480b-b303-39e64fbe40b5/task.json /tmp/persistent/task/index_realtime_demand_2016-03-11T03:00:00.000Z_0_0/132a7c32-2b38-480b-b303-39e64fbe40b5/status.json --nodeType realtime

2016-03-11T04:06:20,358 INFO [pool-5-thread-6] io.druid.indexing.overlord.ForkingTaskRunner - Logging task index_realtime_demand_2016-03-11T03:00:00.000Z_0_0 output to: /tmp/persistent/task/index_realtime_demand_2016-03-11T03:00:00.000Z_0_0/132a7c32-2b38-480b-b303-39e64fbe40b5/log

2016-03-11T04:07:19,213 INFO [qtp177657196-67] io.druid.indexing.overlord.ForkingTaskRunner - Killing process for task: index_realtime_demand_2016-03-11T03:00:00.000Z_0_0

2016-03-11T04:07:19,216 INFO [pool-5-thread-6] io.druid.indexing.common.tasklogs.FileTaskLogs - Wrote task log to: /var/log/druid/index_realtime_demand_2016-03-11T03:00:00.000Z_0_0.log

2016-03-11T04:07:19,218 INFO [pool-5-thread-6] io.druid.indexing.overlord.ForkingTaskRunner - Removing temporary directory: /tmp/persistent/task/index_realtime_demand_2016-03-11T03:00:00.000Z_0_0/132a7c32-2b38-480b-b303-39e64fbe40b5

2016-03-11T04:07:19,220 ERROR [WorkerTaskMonitor-5] io.druid.indexing.worker.WorkerTaskMonitor - I can’t build there. Failed to run task: {class=io.druid.indexing.worker.WorkerTaskMonitor, exceptionType=class java.util.concurrent.ExecutionException, exceptionMessage=java.lang.RuntimeException: java.io.IOException: Stream closed, task=index_realtime_demand_2016-03-11T03:00:00.000Z_0_0}

2016-03-11T04:07:19,222 INFO [WorkerTaskMonitor-5] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“alerts”,“timestamp”:“2016-03-11T04:07:19.221Z”,“service”:“middleManager”,“host”:“10.72.22.102:8086”,“severity”:“component-failure”,“description”:“I can’t build there. Failed to run task”,“data”:{“class”:“io.druid.indexing.worker.WorkerTaskMonitor”,“exceptionType”:“java.util.concurrent.ExecutionException”,“exceptionMessage”:“java.lang.RuntimeException: java.io.IOException: Stream closed”,“exceptionStackTrace”:“java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.io.IOException: Stream closed\n\tat java.util.concurrent.FutureTask.report(FutureTask.java:122)\n\tat java.util.concurrent.FutureTask.get(FutureTask.java:192)\n\tat io.druid.indexing.worker.WorkerTaskMonitor$1$1.run(WorkerTaskMonitor.java:131)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)\n\tat java.lang.Thread.run(Thread.java:745)\nCaused by: java.lang.RuntimeException: java.io.IOException: Stream closed\n\tat com.google.common.base.Throwables.propagate(Throwables.java:160)\n\tat io.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:262)\n\tat io.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:118)\n\t… 4 more\nCaused by: java.io.IOException: Stream closed\n\tat java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170)\n\tat java.io.BufferedInputStream.read1(BufferedInputStream.java:291)\n\tat java.io.BufferedInputStream.read(BufferedInputStream.java:345)\n\tat java.io.FilterInputStream.read(FilterInputStream.java:107)\n\tat com.google.common.io.ByteStreams.copy(ByteStreams.java:175)\n\tat io.druid.indexing.overlord.ForkingTaskRunner$1.call(ForkingTaskRunner.java:235)\n\t… 5 more\n”,“task”:“index_realtime_demand_2016-03-11T03:00:00.000Z_0_0”}}]

2016-03-11T04:07:19,224 INFO [WorkerTaskMonitor-5] io.druid.indexing.worker.WorkerTaskMonitor - Job’s finished. Completed [index_realtime_demand_2016-03-11T03:00:00.000Z_0_0] with status [FAILED]

Logs corresponding to an old task - index_realtime_supply_2016-02-28T19:00:00.000Z_0_0 which is still running.

2016-02-28T20:03:14,806 INFO [main-SendThread(erdr4003.grid.lhr1.inmobi.com:2181)] org.apache.zookeeper.ClientCnxn - Unable to read additional data from server sessionid 0x527e467d4

606f6, likely server has closed socket, closing socket connection and attempting reconnect

2016-02-28T20:03:29,004 ERROR [CuratorFramework-0] org.apache.curator.ConnectionState - Connection timed out for connection string (erdr4001.grid.lhr1.inmobi.com:2181,erdr4002.grid.lhr1.inmobi.com:2181,erdr4003.grid.lhr1.inmobi.com:2181) and timeout (15000) / elapsed (15002)

org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss

at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:197) [curator-client-2.7.0.jar:?]

at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:87) [curator-client-2.7.0.jar:?]

at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) [curator-client-2.7.0.jar:?]

at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:816) [curator-framework-2.7.0.jar:?]

at org.apache.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:802) [curator-framework-2.7.0.jar:?]

at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$400(CuratorFrameworkImpl.java:61) [curator-framework-2.7.0.jar:?]

at org.apache.curator.framework.imps.CuratorFrameworkImpl$4.call(CuratorFrameworkImpl.java:272) [curator-framework-2.7.0.jar:?]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_60]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_60]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_60]

at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]

2016-02-28T20:03:29,016 ERROR [CuratorFramework-0] org.apache.curator.ConnectionState - Connection timed out for connection string (erdr4001.grid.lhr1.inmobi.com:2181,erdr4002.grid.lhr1.inmobi.com:2181,erdr4003.grid.lhr1.inmobi.com:2181) and timeout (15000) / elapsed (15023)

org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss

2016-03-11T08:56:14,457 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2016-03-11T08:56:14.457Z”,“service”:“middleManager”,“host”:“10.72.22.102:8101”,“metric”:“rows/output”,“value”:0,“user2”:“supply”}]

2016-03-11T08:56:14,457 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2016-03-11T08:56:14.457Z”,“service”:“middleManager”,“host”:“10.72.22.102:8101”,“metric”:“persists/num”,“value”:0,“user2”:“supply”}]

2016-03-11T08:56:14,457 INFO [MonitorScheduler-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2016-03-11T08:56:14.457Z”,“service”:“middleManager”,“host”:“10.72.22.102:8101”,“metric”:“persists/time”,“value”:0,“user2”:“supply”}]

**Overlord logs for the c****orresponding task - **index_realtime_supply_2016-02-28T19:00:00.000Z_0_0 :

2016-02-28T19:00:02,643 INFO [qtp1252138909-147] io.druid.indexing.overlord.HeapMemoryTaskStorage - Inserting task index_realtime_supply_2016-02-28T19:00:00.000Z_0_0 with status: TaskStatus{id=index_realtime_supply_2016-02-28T19:00:00.000Z_0_0, status=RUNNING, duration=-1}

2016-02-28T19:00:02,643 INFO [TaskQueue-Manager] io.druid.indexing.overlord.TaskQueue - Asking taskRunner to run: index_realtime_supply_2016-02-28T19:00:00.000Z_0_0

2016-02-28T19:00:02,643 INFO [TaskQueue-Manager] io.druid.indexing.overlord.RemoteTaskRunner - Added pending task index_realtime_supply_2016-02-28T19:00:00.000Z_0_0

2016-02-28T19:00:02,644 INFO [pool-15-thread-1] io.druid.indexing.overlord.RemoteTaskRunner - Coordinator asking Worker[10.72.22.102:8086] to add task[index_realtime_supply_2016-02-28T19:00:00.000Z_0_0]

2016-02-28T19:00:02,646 INFO [pool-15-thread-1] io.druid.indexing.overlord.RemoteTaskRunner - Task index_realtime_supply_2016-02-28T19:00:00.000Z_0_0 switched from pending to running (on [10.72.22.102:8086])

2016-02-28T19:00:02,654 INFO [PathChildrenCache-1] io.druid.indexing.overlord.RemoteTaskRunner - Worker[10.72.22.102:8086] wrote RUNNING status for task: index_realtime_supply_2016-02-28T19:00:00.000Z_0_0

2016-02-28T19:00:12,400 INFO [qtp1252138909-143] io.druid.indexing.common.actions.LocalTaskActionClient - Performing action for task[index_realtime_supply_2016-02-28T19:00:00.000Z_0_0]: LockListAction{}

2016-02-28T19:00:13,597 INFO [qtp1252138909-152] io.druid.indexing.common.actions.LocalTaskActionClient - Performing action for task[index_realtime_supply_2016-02-28T19:00:00.000Z_0_0]: LockAcquireAction{interval=2016-02-28T19:00:00.000Z/2016-02-28T20:00:00.000Z}

2016-02-28T19:00:13,597 INFO [qtp1252138909-152] io.druid.indexing.overlord.TaskLockbox - Added task[index_realtime_supply_2016-02-28T19:00:00.000Z_0_0] to TaskLock[index_realtime_supply]

2016-02-28T19:00:13,621 INFO [qtp1252138909-167] io.druid.indexing.common.actions.LocalTaskActionClient - Performing action for task[index_realtime_supply_2016-02-28T19:00:00.000Z_0_0]: LockAcquireAction{interval=2016-02-28T19:00:00.000Z/2016-02-28T20:00:00.000Z}

2016-02-28T19:00:13,621 INFO [qtp1252138909-167] io.druid.indexing.overlord.TaskLockbox - Task[index_realtime_supply_2016-02-28T19:00:00.000Z_0_0] already present in TaskLock[index_realtime_supply]

2016-02-28T20:03:17,309 INFO [LeaderSelector-0] io.druid.indexing.overlord.TaskLockbox - Added task[index_realtime_supply_2016-02-28T19:00:00.000Z_0_0] to TaskLock[index_realtime_supply]

2016-02-28T20:03:17,309 INFO [LeaderSelector-0] io.druid.indexing.overlord.TaskLockbox - Reacquired lock on interval[2016-02-28T19:00:00.000Z/2016-02-28T20:00:00.000Z] version[2016-02-28T19:00:13.597Z] for task: index_realtime_supply_2016-02-28T19:00:00.000Z_0_0

2016-02-28T20:03:17,322 INFO [PathChildrenCache-2] io.druid.indexing.overlord.RemoteTaskRunner - Worker[10.72.22.102:8086] wrote RUNNING status for task: index_realtime_supply_2016-02-28T19:00:00.000Z_0_0

2016-02-28T20:03:17,322 WARN [PathChildrenCache-2] io.druid.indexing.overlord.RemoteTaskRunner - Worker[10.72.22.102:8086] announced a status for a task I didn’t know about, adding to runningTasks: index_realtime_supply_2016-02-28T19:00:00.000Z_0_0

2016-02-28T20:03:37,999 INFO [PathChildrenCache-2] io.druid.indexing.overlord.RemoteTaskRunner - Task[index_realtime_supply_2016-02-28T19:00:00.000Z_0_0] just disappeared!

2016-02-28T20:03:38,000 INFO [PathChildrenCache-2] io.druid.indexing.overlord.TaskQueue - Received FAILED status for task: index_realtime_supply_2016-02-28T19:00:00.000Z_0_0

2016-02-28T20:03:38,000 INFO [PathChildrenCache-2] io.druid.indexing.overlord.RemoteTaskRunner - Can’t shutdown! No worker running task index_realtime_supply_2016-02-28T19:00:00.000Z_0_0

2016-02-28T20:03:38,000 INFO [PathChildrenCache-2] io.druid.indexing.overlord.HeapMemoryTaskStorage - Updating task index_realtime_supply_2016-02-28T19:00:00.000Z_0_0 to status: TaskStatus{id=index_realtime_supply_2016-02-28T19:00:00.000Z_0_0, status=FAILED, duration=-1}

2016-02-28T20:03:38,001 INFO [PathChildrenCache-2] io.druid.indexing.overlord.TaskLockbox - Removing task[index_realtime_supply_2016-02-28T19:00:00.000Z_0_0] from TaskLock[index_realtime_supply]

2016-02-28T20:03:38,001 INFO [PathChildrenCache-2] io.druid.indexing.overlord.TaskQueue - Task done: RealtimeIndexTask{id=index_realtime_supply_2016-02-28T19:00:00.000Z_0_0, type=index_realtime, dataSource=supply}

2016-02-28T20:03:38,001 INFO [PathChildrenCache-2] io.druid.indexing.overlord.TaskQueue - Task FAILED: RealtimeIndexTask{id=index_realtime_supply_2016-02-28T19:00:00.000Z_0_0, type=index_realtime, dataSource=supply} (-1 run duration)

2016-02-28T22:58:52,797 INFO [qtp1252138909-128] io.druid.indexing.common.actions.LocalTaskActionClient - Performing action for task[index_realtime_supply_2016-02-28T19:00:00.000Z_0_0]: LockReleaseAction{interval=2016-02-28T19:00:00.000Z/2016-02-28T20:00:00.000Z}

2016-02-28T22:58:52,797 ERROR [qtp1252138909-128] io.druid.indexing.overlord.TaskLockbox - Lock release without acquire: {class=io.druid.indexing.overlord.TaskLockbox, task=index_realtime_supply_2016-02-28T19:00:00.000Z_0_0, interval=2016-02-28T19:00:00.000Z/2016-02-28T20:00:00.000Z}

2016-02-28T22:58:52,797 INFO [qtp1252138909-128] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“alerts”,“timestamp”:“2016-02-28T22:58:52.797Z”,“service”:“overlord”,“host”:“10.72.14.10:8090”,“severity”:“component-failure”,“description”:“Lock release without acquire”,“data”:{“class”:“io.druid.indexing.overlord.TaskLockbox”,“task”:“index_realtime_supply_2016-02-28T19:00:00.000Z_0_0”,“interval”:“2016-02-28T19:00:00.000Z/2016-02-28T20:00:00.000Z”}}]

Thanks,

Varsha

Is your ZK in the same datacenter as your Druid cluster? I’d make sure there’s that connection link is good first.

It is difficult to dig into issues because of sheer volume of requests for help, but for dedicated help, you can try http://imply.io/

Hi Varsha,

I am currently facing same error. Would you be able to share if you could figure out the resolution or the root case. Thank you.

-Manasa

Manasa, what version of Druid are you using?

hi, I meet with the same problem[ java.io.IOException: Stream closed, ], I tried both 0.9.2 and 0.9.1.1, and encounter the same problem. could you tell me where I may be wrong? thanks a lot. BTW: my hadoop version is 5.3.2-cdh, and I haven’t met this problem before.

在 2016年11月5日星期六 UTC+8上午6:49:22,Fangjin Yang写道: