Hi,
We are using kafka-indexing-service for real time ingestion. Since we have multiple tenant in our datasource(shared data source) and we query it with timestamp and tenant_id, I would like to know how can we automatically partition our data source with tenant_id using kafka-indexing-service?
I saw in druid documentation, it is suggested: “with realtime indexing, on option is partition on tenant_id upfront. You’d do this by tweaking the stream you send to Druid. If you’re using Kafka then
you can have your Kafka producer partition your topic by a hash of tenant_id”.
We already have our kafka topic partitioned by tenant_id, but i do not see the druid data segment been partitioned on tenant_id.
When we create batch load task, we can specify the batch load task with Single-demention partition like this:
Single-dimension partitioning
"partitionsSpec": {
"type": "dimension",
"targetPartitionSize": 5000000
"partitionDimension": tenant_id
}
Do we have such option for kafka-indexing-service on stream ingestion?
Anybody has a sample supervisor spec json that partition druid data source with tenant_id or a dimension other than timestamp ?
Thanks
Hong
2017년 7월 27일 목요일 오전 9시 42분 20초 UTC+9, Hong Wang 님의 말:
Hi,
We are using kafka-indexing-service for real time ingestion. Since we have multiple tenant in our datasource(shared data source) and we query it with timestamp and tenant_id, I would like to know how can we automatically partition our data source with tenant_id using kafka-indexing-service?
I saw in druid documentation, it is suggested: "with realtime indexing, on option is partition on tenant_id upfront. You'd do this by tweaking the stream you send to Druid. If you're using Kafka then
you can have your Kafka producer partition your topic by a hash of tenant_id".
We already have our kafka topic partitioned by tenant_id, but i do not see the druid data segment been partitioned on tenant_id.
When we create batch load task, we can specify the batch load task with Single-demention partition like this:
Single-dimension partitioning
"partitionsSpec": {
"type": "dimension",
"targetPartitionSize": 5000000
"partitionDimension": tenant_id
}
Do we have such option for kafka-indexing-service on stream ingestion?
Anybody has a sample supervisor spec json that partition druid data source with tenant_id or a dimension other than timestamp ?
Thanks
Hong
I am seeking for the same thing.
How did you end up that work?
I am looking for the answer too, please do share if you figure out.
Thanks,