Question regarding expected behavior if coordinator detects overlapping segments

Hi all,

Druid Version: 0.11.0

Ingestion technique: Tranquility Streaming Ingest

In our cluster, we are using tranquility for streaming ingest for two datasources (X and Y). Unfortunately, two tranquility jobs spun up for data source X, but with different configurations. What resulted was multiple overlapping segments for datasource X. Druid was trying to create segments for overlapping intervals such as 12:00:00/12:15:00 and 12:00:00/13:00:00 for datasource X. The coordinator caught what was going on and started complaining about this in the logs. At this time, the coordinator essentially stopped coordinating, It was just logging out that it found an overlapping segment every time coordination ran.

My question is if this is expected functionality? When coordination stopped, datasource Y ingestion essentially stopped as well. The indexing tasks were still spinning up and running, but segment handoff would not complete so the indexing tasks just sat in running state. Eventually, we caught the problem and cleaned up the overlapping segments. At this time all of the indexing tasks for Datasource Y completed and left us with no data loss. However, it would have been preferred that coordination continued for all datasources other than the corrupted datasource (X) since each data source is isolated as far as metadata is concerned so there shouldn’t be a reason the coordinator can’t continue on warning about a corrupted datasource X while being business as usual for other datasources.

Thanks!

Lucas

Giving this a quick bump in case it has fallen too deep into the thread list to be seen by many people.

Thanks!

Hi Lucas,

What was the specific error message you got? It sounds like something that shouldn’t be happening.

Hi Gian,

I have included some logs and a mysql select statements output below.

The coordinator log wording makes it seem like the coordinator is going to ignore the overlapping segment and do the rest of its work, but it seems that nothing is actually being done for the rest of the coordination period. The tasks that are running run to completion and then hang waiting for segment handoff.

Hi Gian,

I have posted some details in the thread about the logs we saw. Hoping you could take a look and see what you think about it.

Thanks!

Lucas