Realtime nodes per dataSchema

Hi,
I currently have a Druid configuration with Kafka<->Realtime ingestion.

I’ve two Realtime nodes with “shardSpec”: { “type”: “linear”,“partitionNum”: 0} and different group.id setup for Kafka on each node.

So mainly both nodes do the same task for all setup dataSchema (a fault tolerant configuration).

Traffic is increasing, than I was thinking to add more Realtime nodes and have couples of Realtime nodes for each dataSchema.

Now questions are:

  • did you think having couple of Realtime nodes for each dataSchema with shard linear and different group.id can be a good solution?

  • how Brokers will know which Realtime node manage each dataSchema when query belongs last hour?

Thanks

Maurizio

Does anyone has ideas about it?

Thanks,
Maurizio

Hi Maurizio,

If you want to scale up realtime nodes, you can add more of them with increasing partitionNumbers.

Just an FYI that we’re starting to phase out usage of realtime nodes in favor of the indexing service.