Druid ingestion performance comparison between batch processing and real time


I wonder if anyone did performance comparison between batch process and real time. I was reading this paper (http://static.druid.io/docs/druid.pdf) by Fangjin Yang and etc. It indicates that real time processing involves more notes/components than batch data.

So, if one has the option choosing between batch and real time, which one would be faster and more preferable?


Probably the best way to choose is whatever matches the way you generate your data in the first place. If it comes from a stream, or if you want data to be as fresh as possible, then check out realtime ingestion. If it comes from batch files then check out batch ingestion. If you could get it either way, and you don’t care about freshness, I’d probably go with batch since it has fewer moving parts and is therefore somewhat simpler to operate.