How Druid handles halted ingestions

Hi,

I’d like to ask a question about the tuningConfig.maxParseExceptions batch ingestion parameter.

In my case I’m using Druid to ingest CSV files, each one with about two million rows and several dozens of columns.

Let’s suppose that I set the tuningConfig.maxParseExceptions configuration parameter to 0. As explained in the official documentation about batch ingestion, this means that any parse exception would make the ingestion halt and the task fail. However, what would happen with the rows processed before that parse error? Would they be ingested or not?

To sum up, I’m wondering if Druid guarantees some kind of atomic functionality during batch ingestions with tuningConfig.maxParseExceptions=0 (that is, if no rows would be ingested if a single parse exception was found, which would be the desired functionality for my use case).

Thanks!

Yes, a batch should be atomic - all or nothing. I think segments may be published while it’s in progress, but should not be made available until it’s successfully completed. Then any remaining segments are published, all are set to available. (And any overshadowed segments marked as overshadowed.)

Of course, as a worrier, I’d definitely test, too!

2 Likes

Glad to hear this, thanks a lot!