We are ingesting data to a Datasource using kafka indexing tasks, and we want to apply partitioning based on dimensions. We have tried using re-indexing tasks, but we would like to do it directly using kafka-indexing tasks.
As far as i’ve read in druid documentation, druid should assume the kafka topic partitioning. We have tried using producer KeyedMessages, but without result.
Appreciate any help with this, druid documentation is not extensive with this topic.
Hi @keepler-ivanmarques - a single partition in Kafka ends up being consumed by one (and only one) task – though a task can connect to multiple partitions on its own. Like any consumer… That means that the data is “naturally” partitioned in Druid according to your Kafka partition. To partition differently, you can adjust the topic partitions inside Kafka and / or the rules that determine which events end up in which partition.
After ingestion, you can reindex using a
I think you can also give a
partitionsSpec inside the configuration for compaction tasks…???