How and where to deploy Tranquility Kafka

Hi,

I’ve got a question about how and where to deploy Tranquility Kafka. I’m asking since we have a Kafka cluster where our application topics contains personal data and we want to ingest BI data without personal data into Druid.

  • Kafka cluster - our own bare metal servers - personal data allowed

  • Druid cluster - Google Cloud Platform (GCP) - personal data NOT allowed

Tranquility Kafka seems like a good fit for this but I’m curious about how to do this without any personal data being sent to the Druid cluster. Is it possible to run Tranquility Kafka along the Kafka cluster and from there ingest the data to this other Druid cluster? It appeared to me that this is not possible but I’m not sure, hence I’m asking. If not possible, do you have any other suggestions?

  • Using Kafka Streams and creating BI topics without personal data could be one way to go, although I still don’t want to give access to the full Kafka cluster from GCP. Neither do I have any desire to set up another Kafka cluster in GCP.

  • Using Tranquility Server is an option yes, but it opens up for a lot of unwanted error scenarios.

Any advice is appreciated.

Thanks

// Jimmy

KIS has been the recommended as the approach for real time indexing from Kafka (druid 0.11+). https://groups.google.com/d/msg/druid-user/b1EXsUUraWQ/o12_h5n5BgAJ

KIS requires your realtime front end to publish events destined for Druid to a topic - one per data source. KIS will handle old data as well so there would not be a need for batch delta ingest for “old data”.

I think most people would suggest that your realtime front end anonymizes the data before it hits any of the Druid subsystems.

Hope this helps
Kyle

Hi Jimmy,

A bit late to the thread but +1 for what Kyle said. KIS allows Druid to read from Kafka in an exactly once manner, and load late data, all of which is a great improvement over Tranquility Kafka. And in Druid 0.12.0 (due out soon) it will gain a new ability to generate a more reasonable number of segments, better than the current behavior of one per Kafka partition (see “On the subject of segments” in http://druid.io/docs/latest/development/extensions-core/kafka-ingestion.html for the current, soon to be old, behavior).