In my org we are in the process of getting rid of our Storm cluster. Doing so would mean for us to move from the Storm Tranquility to the Kafka Tranquility standalone application.
We are processing more than 30TB of compressed (avro snappy) data everyday. So performance matters to us.
- Since our data is heavily skewed we use the HashCodePartitioner in Storm. However I can’t find an option to specify the partitionner in the Tranquility JSON. Does this setting only apply to Storm? (Because it’s assumed that the data is pre partitionned evenly in Kafka?)
- Also I’m taking any suggestion on parameters that boost performances (I know about commit.periodMillis & tranquility.maxBatchSize so far)
- Do you guys have feedback on the Kafka indexing service (performance wise).I was tempted to use it but our cluster is still on 0.11.0, so I’m gathering intel for when we will update it.
Thank you for your responses!