Ensuring indexing task uniqueness

Hello,

Is there are way to ensure an indexing task is unique through a client generated ID, or another method? I want to guarantee I am sending an indexing task only once to the overlord.

Hey Carlos,

You can provide an “id” in the indexing task json that will get used instead of the default auto-generated ID. These IDs are checked for uniqueness and so can be used to guarantee that you only submit a task one time.

Thank you Gian. This is exactly what I was looking for.

Now, how would I free up a previously used task ID?

In my use case, I generate the ID after the hashed name of the file to be ingested. I decided to delete the datasource after I realized I wasn’t capturing enough dimensions, and process the files again. Is the task ID taken forever? I guess I could delete the task record from the metadata DB, however, is there an already exposed API call from the coordinator or another node?

Thank you.

There is no official API for freeing up a previously used task ID, they are kept “forever”. But if you delete the row in the metadata store after the task completes (either success or failure) then you will be able to re-use it.

Thanks again Gian.