Question on hybrid ingestion (realtime stream / batch file)

Hi guys,

I have druid working with realtime ingestion via tranquility, so far so good. I have some events happening outside of the ingestion window period (10min in my case), so reading the documentation I found the a hybrid solution is possible, so I’m trying to ingest the remaining events as a batch ingestion (I tried hadoop and index)… the tasks complete successfully but when querying the records, nothing has changed (the new events haven’t been aggregated). I tried also to ingest all events for that hour as a batch (those within and without the ingestion window) and still, the tasks complete successfully but when querying, the counters don’t change.

My questions are:

  • Can I merge the remaining events to those already ingested by the realtime or do I have to re-ingest the entire data for that hour?

  • Do I have to delete the segments for that hour (interval) before submitting the batch ingestion task? Issue a kill task?

Is there anything I’m missing?

Thanks in advance.

Regards,

Rodrigo

Hi Rodrigo,

When reIndexing using batch ingestion, you need to reIndex entire data set for that hour, this would create segments with newer version which would replace any segments with older version (i.e the ones generated by realtime ingestion in your case).

No, you don’t have to manually delete the segments before submitting batch index task.

Hi Nishant, all working now!
Thank you for the quick response