Clustering with multiple druid

Hi Guys,

I am trying to setup a cluster of druid with Common zookeeper. I have setup 2 instances of druid with all nodes (broker,historical,runtime,coordinator) running, Each of these instances are talking to same zookeeper. Sharing common metadata storage and S3 for deep storage.

When i query the broker node, i am unable to get proper data always. For some requests, data count is less, whereas for some requests proper data returned.This happens randomly.

Please advice.

Thanks,

Suresh

Hi Suresh, two questions:

  1. How are you verifying data counts?

  2. Are there any interesting exceptions in the logs of the broker or historical?

– FJ

Hi Fangjin, Thanks for the reply.

I was querying through broker nodes to verify the data count by doing aggregations. I got a solution for this, found that since 2 of the realtime node had different data path data was not sync always. I was able to sort it out by having shared data path for realtime.

Please let me know if there any different approach.

-Suresh

Make sure to use a longSum aggregator and not a count aggregator at query time as the count aggregator counts the number of Druid rows, not the ingested data. If youa re running 2 RT nodes ingesting from Kafka, there may be small differences in data as two consumers may pull data at different rates from Kafka. This is one the reasons for Tranquility based ingestion. The finalized segment by 2 RT nodes should yield consistent results though.