Druid ingestion performance comparison between batch processing and real time

Hi,

I wonder if anyone did performance comparison between batch process and real time. I was reading this paper (http://static.druid.io/docs/druid.pdf) by Fangjin Yang and etc. It indicates that real time processing involves more notes/components than batch data.

So, if one has the option choosing between batch and real time, which one would be faster and more preferable?

Thanks

Probably the best way to choose is whatever matches the way you generate your data in the first place. If it comes from a stream, or if you want data to be as fresh as possible, then check out realtime ingestion. If it comes from batch files then check out batch ingestion. If you could get it either way, and you don’t care about freshness, I’d probably go with batch since it has fewer moving parts and is therefore somewhat simpler to operate.