data size and replication

I have 1 datasource, with 2 replicants.

I have 2 historical nodes with each currently loaded with 40gb of data.

  1. Does that mean that the size of my data source = 40gb?

  2. If I were to reduce replicants to 1, will each node now have 20 gb of data?
    (half data on each node, if 1 goes down we lose half of data till its loaded again into remaining node).

  3. What happens when a 3rd node is added? how will data be distributed now?

  4. if i have 3 nodes with rep factor of 2, how many nodes can i lose and still have all data in cluster?

  1. Depends on what you mean by “size of my data source”, but, it sounds like it’s probably 40GB of data (80GB when replicated).

  2. Yes, after rebalancing, each node would have 20GB.

  3. Each node would have a third of the data.

  4. If you have a replication factor of 2, you can always lose 1 node and be okay, and you will likely be missing some data temporarily if you lose 2 nodes.