Segment loading ignores versions which leads to having all versions loaded

Hey,

I used insert-segment-to-db to load segments into metadata storage. There should be 625 active/used segments, but there is actually 1100 of them.

For the segments that got reindexed multiple times in the past I can see all their versions with “used=1” in :

SELECT * from druid_segments;

These are 2 versions of a single segment in s3 deep storage :

http://pastebin.com/raw/AZRzD6GN

http://pastebin.com/raw/tDUYKWTa

but both get loaded with “used=1”

As a strange consequence coordinator is loading and dropping segments indefinitely even though there is only one default rule : load forever.

Sorry I accidentally posted the message.

I used 0.9.1.1 and 0.9.2-SNAPSHOT from this branch https://github.com/druid-io/druid/pull/3399 for insert-segment-to-db.

I think that it might be a bug in the insert-segment-to-db tool.

Hi Jakub,

IMO, It is not a bug but a desired behavior to add all the metadata entries for segments discovered including the overshadowed ones.

The coordinator should be able to handle multiple segment entries generated from insert-segment-to-db tool. Only load the latest one and mark overshadowed ones as deleted.

Isn’t that happening in your case ?

Also, Did you let the insert-segment-to-db tool complete before running coordinator on your new cluster ? If you didn’t wait it to complete, It might be possible that the coordinator starts loading segments based on a incomplete view and then when all the segment entries are made, it will drop the overshadowed ones.

Fwiw, If you consider the case when insert-segment-to-db skips the overshadowed segments, in that case the files will remain in deep storage always and there is no easy way to delete those files, leading to files leaking in the deep storage. If you want the tool to not add overshadowed segments, consider running a KillTask which deletes all the overshadowed segments before running insert-segment-to-db tool.

Hi Nishant,

The coordinator should be able to handle multiple segment entries generated from insert-segment-to-db tool. Only load the latest one and mark overshadowed ones as deleted.
Isn’t that happening in your case ?

No, in my case all segments get loaded by the tool as equal, “used=1” which is really alarming :slight_smile: It might be a bug in the s3 version though, I’m using this unmerged PR https://github.com/druid-io/druid/pull/3399

Also, Did you let the insert-segment-to-db tool complete before running coordinator on your new cluster ?

Yes, I ran the insert tool on fresh Mysql DB and nothing was connected to it… I started coordinator afterwards and it got into infinite segment LOADing & DROPing loop because of that ^^ .

If you want the tool to not add overshadowed segments

I want it to load them into proper state “used=0” but it loads them into “used=1”, it treats all of them as last versions, which is a bug.

consider running a KillTask

Cool, I didn’t know about it, thanks !

I raised this PR to make overshadowed segments start off unused when inserted, which should prevent the load/drop churn in the cluster.

https://github.com/druid-io/druid/pull/3499