In general, realtime ingestion is considered best-effort and even with a single realtime node there are conditions that could lead to data inconsistency. What many teams do is to run a realtime pipeline to allow for immediate querying of data (while tolerating some possible data discrepancies) and then run a batch ingestion pipeline to periodically rebuild an exact copy of the segment.
For the realtime pipeline, I’d also echo Gian in recommending taking a look at Tranquility which will help you manage ingestion, especially as the volume of data gets larger.
FYI, there’s a ton of work being done right now on reworking the real-time ingestion to provide more correctness guarantees. If you’re curious, you can read about it here: https://github.com/druid-io/druid/issues/1642