Hi - I’m still working through whether/how a hybrid batch/streaming ingestion setup will help us.
As I understand it, as we ingest realtime events a realtime node will periodically build segments and hand them off to historical nodes. Batch ingestion will completely replace those segments when it occurs.
Let’s say we have a one hour granularity. The goal of the hybrid setup is to not do a batch ingestion for a particular hour until we know that all of that events for that hour that have been ingested by realtime are in the batch, via whatever method is going on to build the batch - is that correct? So typically you’d just wait until the hour is “in the books” before you consider taking a batch for that hour?
Say you do that and you’ve completed a batch ingestion for the previous hour. What a realtime event is somehow delayed so that it is indexed in the last hour’s segment, but only hits a realtime node now? Will the realtime node build a new segment for that hour, hand it off and thus replace what was ingested as a batch? Is that just something fix by configuring the realtime nodes to ignore “stale” events?