Druid: Multiple data in same kafka topic

Hi Team,

We have a use case where we need to ingest streaming data into Druid where multiple data is coming into same Kafka topic.

How can we divide the data into two different data sources?.

Any urgent help will be really appreciated.

Thanks,

Ashish

Hi Ashish,
The best way would be to separate the data into two kafka topics in your ETL stack.

Hello Ashish,

Simply define two Kafka ingest specifications, each with a transformSpec filter (see doc below) that contains the dimension(s) and value(s) for that datasource. So, each of the ingest specs will point to the same Kafka topic. It’s an extremely powerful construct that eliminates having to create separate Kafka topics and populate them from upstream processes.

https://druid.apache.org/docs/latest/ingestion/index.html#transformspec

Your easiest path to success is to set up two different data sources, each consuming the same Kafka topic. Put in a filter on each one to capture only those messages that are desired. So, if you are collecting JSON strings, for example, and have a field that identifies the message stream or type, that would be a good field to filter on.

Here is a link to the Druid docs that are relevant: https://druid.apache.org/docs/latest/ingestion/index.html#transformspec

That is funny…Three responses at the same time with the same answer…

That’s a sign of a growing community!!

Thank you so much.

Let me try this option for my use case.

On Behalf Of J B

Good point!