Data Ingestion Scalability

Hi, All,

I have a use-case of large amount of real-time data streaming in. The load of events/second or data/second might increase over time. So we do have a very demanding requirement for the streaming ingestion throughput and horizontal scalablity. I have a few questions regarding druid stream ingestion.

  1. Can we scale the ingestion throughput by adding more real-time nodes and partition data streams?

  2. Can we dynamically change data partition strategies while ingestion is going on?

  3. How does the throughput improve by adding one more real-time node?

  4. Has anyone done any experiments regarding the ingestion scalability?

Appreciate any comments or sharing related experience since this will help us move in the right direction without spending too much time experimenting.

-JD