[druid-user] One Kafka topic - many datasources

Yes. The transformspec should be able to filter out the records that are not needed for a specific tenant. See documentation here http://druid.io/docs/latest/tutorials/tutorial-transform-spec.html under “Load data with transform specs"

Rommel Garcia
Director, Field Engineering
rommel.garcia@imply.io

True, but we have over 5000 tenants and growing.

If I used the transformspec to filter records, that would still mean that I would need to have > 5000 indexing jobs with filter specs mapped to their respective datasources.
This would also mean that I would need to monitor each of these jobs.
That doesn’t seem like a viable solution.

Would it be possible to write a plugin for the kafka indexing job to do this?

You would still have to create 5000 ingest spec/jobs, even if you are only pulling from one Kafka topic since there’s a one-to-one mapping of ingest spec to data source. So regardless of the approach, I don’t see how you can create one ingest spec for all 5000 data sources.

Rommel Garcia

Director, Field Engineering

And yes, you can extend the Kafka Indexing Service.

Rommel Garcia

Director, Field Engineering

Here is a link for druid extension:
https://github.com/implydata/druid-example-extension

Regards,

Robert