Realtime nodes per dataSchema

I currently have a Druid configuration with Kafka<->Realtime ingestion.

I’ve two Realtime nodes with “shardSpec”: { “type”: “linear”,“partitionNum”: 0} and different setup for Kafka on each node.

So mainly both nodes do the same task for all setup dataSchema (a fault tolerant configuration).

Traffic is increasing, than I was thinking to add more Realtime nodes and have couples of Realtime nodes for each dataSchema.

Now questions are:

  • did you think having couple of Realtime nodes for each dataSchema with shard linear and different can be a good solution?

  • how Brokers will know which Realtime node manage each dataSchema when query belongs last hour?



Does anyone has ideas about it?


Hi Maurizio,

If you want to scale up realtime nodes, you can add more of them with increasing partitionNumbers.

Just an FYI that we’re starting to phase out usage of realtime nodes in favor of the indexing service.