Overlord failed to update segment metadata - Druid 0.13.0

Hi guys,

I have a Druid 0.13.0 cluster running and I encountered in a TASK FAILURE due to the overlord unable to create/allocate the segment. Before getting into the details of the error, I just want to share all the operations that I did to before having this issue.

  1. I created DAY segments for the time intervals of 2019-03-08/2019-03-09 and 2019-03-09/2019-03-10
  2. I compacted the segments into one with interval 2019-03-08/2019-03-10
  3. I realized that there were in the data due to pre-processing before ingestion therefore I dropped the compacted segment using a DELETE post to coordinator and then issue a kill task to the overlord for that time interval
  4. I fix my data pre-processing script and then issue the same task as in point 1 but I got this error below:
    2019-03-12T03:04:22,847 WARN [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Cannot allocate segment for timestamp[2019-03-09T18:59:13.000Z], sequenceName[MyDataSource-2019-03-10-datadt-2019-03-09_2019-03-12T01_24_58_232149Z].
    2019-03-12T03:04:22,849 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Shutting down…
    2019-03-12T03:04:22,850 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Encountered exception in BUILD_SEGMENTS.
    org.apache.druid.java.util.common.ISE: Failed to add a row with timestamp[2019-03-09T18:59:13.000Z]
    at org.apache.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:1046) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
    at org.apache.druid.indexing.common.task.IndexTask.run(IndexTask.java:466) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
    at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:421) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
    at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:393) [druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_201]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_201]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_201]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_201]
    2019-03-12T03:04:22,863 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - Unregistering chat handler[MyDataSource-2019-03-10-datadt-2019-03-09_2019-03-12T01_24_58_232149Z]
    2019-03-12T03:04:22,863 INFO [task-runner-0-priority-0] org.apache.druid.indexing.overlord.TaskRunnerUtils - Task [MyDataSource-2019-03-10-datadt-2019-03-09_2019-03-12T01_24_58_232149Z] status changed to [FAILED].
    2019-03-12T03:04:22,865 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
    “id” : “MyDataSource-2019-03-10-datadt-2019-03-09_2019-03-12T01_24_58_232149Z”,
    “status” : “FAILED”,
    “duration” : 1179,
    “errorMsg” : “org.apache.druid.java.util.common.ISE: Failed to add a row with timestamp[2019-03-09T18:59:13.000Z]\n…”
    }

``

I found that in the overlord there is an error:

2019-03-12T03:04:22,796 WARN [qtp531373404-61] org.apache.druid.metadata.IndexerSQLMetadataStorageCoordinator - Cannot allocate new segment for dataSource[MyDataSource], interval[2019-03-09T00:00:00.000Z/2019-03-10T00:00:00.000Z], maxVersion[2019-03-12T03:04:15.998Z]: conflicting segment[MyDataSource_2019-03-08T00:00:00.000Z_2019-03-10T00:00:00.000Z_2019-03-10T01:17:25.058Z_2].
2019-03-12T03:04:22,797 INFO [qtp531373404-61] org.apache.druid.indexing.overlord.TaskLockbox - Task[MyDataSource-mqttdt-2019-03-10-datadt-2019-03-09_2019-03-12T01_24_58_232149Z] already present in TaskLock[index_append_MyDataSource]
2019-03-12T03:04:22,804 WARN [qtp531373404-61] org.apache.druid.metadata.IndexerSQLMetadataStorageCoordinator - Cannot allocate new segment for dataSource[MyDataSource], interval[2019-03-09T18:00:00.000Z/2019-03-10T00:00:00.000Z], maxVersion[2019-03-12T03:04:15.998Z]: conflicting segment[MyDataSource_2019-03-08T00:00:00.000Z_2019-03-10T00:00:00.000Z_2019-03-10T01:17:25.058Z_2].
2019-03-12T03:04:22,805 INFO [qtp531373404-61] org.apache.druid.indexing.overlord.TaskLockbox - Task[MyDataSource-mqttdt-2019-03-10-datadt-2019-03-09_2019-03-12T01_24_58_232149Z] already present in TaskLock[index_append_MyDataSource]
2019-03-12T03:04:22,812 WARN [qtp531373404-61] org.apache.druid.metadata.IndexerSQLMetadataStorageCoordinator - Cannot allocate new segment for dataSource[MyDataSource], interval[2019-03-09T18:00:00.000Z/2019-03-09T19:00:00.000Z], maxVersion[2019-03-12T03:04:15.998Z]: conflicting segment[MyDataSource_2019-03-08T00:00:00.000Z_2019-03-10T00:00:00.000Z_2019-03-10T01:17:25.058Z_2].
2019-03-12T03:04:22,813 INFO [qtp531373404-61] org.apache.druid.indexing.overlord.TaskLockbox - Task[MyDataSource-mqttdt-2019-03-10-datadt-2019-03-09_2019-03-12T01_24_58_232149Z] already present in TaskLock[index_append_MyDataSource]
2019-03-12T03:04:22,819 WARN [qtp531373404-61] org.apache.druid.metadata.IndexerSQLMetadataStorageCoordinator - Cannot allocate new segment for dataSource[MyDataSource], interval[2019-03-09T18:30:00.000Z/2019-03-09T19:00:00.000Z], maxVersion[2019-03-12T03:04:15.998Z]: conflicting segment[MyDataSource_2019-03-08T00:00:00.000Z_2019-03-10T00:00:00.000Z_2019-03-10T01:17:25.058Z_2].
2019-03-12T03:04:22,821 INFO [qtp531373404-61] org.apache.druid.indexing.overlord.TaskLockbox - Task[MyDataSource-mqttdt-2019-03-10-datadt-2019-03-09_2019-03-12T01_24_58_232149Z] already present in TaskLock[index_append_MyDataSource]
2019-03-12T03:04:22,827 WARN [qtp531373404-61] org.apache.druid.metadata.IndexerSQLMetadataStorageCoordinator - Cannot allocate new segment for dataSource[MyDataSource], interval[2019-03-09T18:45:00.000Z/2019-03-09T19:00:00.000Z], maxVersion[2019-03-12T03:04:15.998Z]: conflicting segment[MyDataSource_2019-03-08T00:00:00.000Z_2019-03-10T00:00:00.000Z_2019-03-10T01:17:25.058Z_2].
2019-03-12T03:04:22,828 INFO [qtp531373404-61] org.apache.druid.indexing.overlord.TaskLockbox - Task[MyDataSource-mqttdt-2019-03-10-datadt-2019-03-09_2019-03-12T01_24_58_232149Z] already present in TaskLock[index_append_MyDataSource]
2019-03-12T03:04:22,834 WARN [qtp531373404-61] org.apache.druid.metadata.IndexerSQLMetadataStorageCoordinator - Cannot allocate new segment for dataSource[MyDataSource], interval[2019-03-09T18:50:00.000Z/2019-03-09T19:00:00.000Z], maxVersion[2019-03-12T03:04:15.998Z]: conflicting segment[MyDataSource_2019-03-08T00:00:00.000Z_2019-03-10T00:00:00.000Z_2019-03-10T01:17:25.058Z_2].
2019-03-12T03:04:22,835 INFO [qtp531373404-61] org.apache.druid.indexing.overlord.TaskLockbox - Task[MyDataSource-mqttdt-2019-03-10-datadt-2019-03-09_2019-03-12T01_24_58_232149Z] already present in TaskLock[index_append_MyDataSource]
2019-03-12T03:04:22,841 WARN [qtp531373404-61] org.apache.druid.metadata.IndexerSQLMetadataStorageCoordinator - Cannot allocate new segment for dataSource[MyDataSource], interval[2019-03-09T18:55:00.000Z/2019-03-09T19:00:00.000Z], maxVersion[2019-03-12T03:04:15.998Z]: conflicting segment[MyDataSource_2019-03-08T00:00:00.000Z_2019-03-10T00:00:00.000Z_2019-03-10T01:17:25.058Z_2].
2019-03-12T03:04:22,843 INFO [qtp531373404-61] org.apache.druid.indexing.overlord.TaskLockbox - Task[MyDataSource-mqttdt-2019-03-10-datadt-2019-03-09_2019-03-12T01_24_58_232149Z] already present in TaskLock[index_append_MyDataSource]
2019-03-12T03:04:22,854 WARN [qtp531373404-61] org.apache.druid.metadata.IndexerSQLMetadataStorageCoordinator - Cannot allocate new segment for dataSource[MyDataSource], interval[2019-03-09T18:59:00.000Z/2019-03-09T19:00:00.000Z], maxVersion[2019-03-12T03:04:15.998Z]: conflicting segment[MyDataSource_2019-03-08T00:00:00.000Z_2019-03-10T00:00:00.000Z_2019-03-10T01:17:25.058Z_2].
2019-03-12T03:04:23,417 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[10.2.1.213:8091] wrote FAILED status for task [MyDataSource-mqttdt-2019-03-10-datadt-2019-03-09_2019-03-12T01_24_58_232149Z] on [TaskLocation{host=‘10.2.1.213’, port=8100, tlsPort=-1}]
2019-03-12T03:04:23,418 INFO [Curator-PathChildrenCache-1] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[10.2.1.213:8091] completed task[MyDataSource-2019-03-10-datadt-2019-03-09_2019-03-12T01_24_58_232149Z] with status[FAILED]

``

The thing is that there is no such segment in deepstorage, neither in the segment metadata table in the metadata storage. So why the overlord fails to lookup the segment?

Any help is appreciated

kind regards,

Sergio

UPDATE

I use the coordinator API (below) to find out if that segment exist and it doesn’t!!!

GET http://myhost:8081/druid/coordinator/v1/datasources/MyDataSource/segments/MyDataSource_2019-03-08T00:00:00.000Z_2019-03-10T00:00:00.000Z_2019-03-10T01:17:25.058Z_2

``

POST http://myhost:8081/druid/coordinator/v1/metadata/datasources/MyDataSource/segments?full content=[“2019-02-27T00:00:00.000/2019-03-11T00:00:00.000”]

``

So how the segment metadata are updated to the overlord node??

I tried to delete the …/tmp dir and re-start and re-install the overlord but still getting the same problem. Should I restart Zookeeper?

Sergio

Hi

You can delete conflicting segement directly in the metadata.

table name is druid_pendingSegments druid_segments

在 2019年3月12日星期二 UTC+8上午11:46:41,srg写道: