I have setup 3 node cluster.
Node 1: Coordinator ,overlord,middle manager,router
Node 2: Broker,Historical
Node 3: Broker,Historical
Node is 16 cores,110 GB ram and Disk of 350GB each
I created the druid segment using Hive .I could see it created in deep storage ,also segment gets downloaded to both Historical nodes.
I have 9 million records,total data size is 572M.
Now I able to do select * uptil 250K records
select * from table limit 250000
But when I give more than 250K ,let says just select * from table for enitre 9 million.I get get an error ,saying GC overhead limit exceeded.
If increase druid.historical.jvm.heap.memory ,it works for higher count and not entire 9 million.
This is the same behaviour via hive and the rest api calls
From broker logs I see only one broker being hit i.e node 2.
I have set hive.druid.broker.address.default to point to node.
To my understanding the router will distribute the work load to all available broker and historical nodes and retrieve records.
Is My Understanding correct ? and if yes what configuration should I set to all a distributed run.
Does query performance depends on sized of the segment or the number of Segments/Records created.
For 9 million plus records would be the ideal cluster set up.Like How many nodes will required,how many cores and RAM etc?
Is tableau a recommended for visualization using druid