Hi,
I have a Kafka topic with 20 partitions, and data has been published to the Kafka topic for the past 10 days. I created several 3 Druid supervisor tasks to read from same Kafka topic (each one creates a new data source) with below configuration i.e. varying number of task count per supervisor task reading from same Kafka topic
Druid Supervisor task 1 - useEarliestOffset - true, ioConfig.replicas = 2, taskCount = 1
Druid Supervisor task 2 - useEarliestOffset - true, ioConfig.replicas = 2, taskCount = 3
Druid Supervisor task 3 - useEarliestOffset - true, ioConfig.replicas = 2, taskCount = 5
Below is my supervisor template:
{
"type": "kafka",
"dataSchema": {
"dataSource": "datasource_name",
"parser": {
"type": "avro_stream",
"avroBytesDecoder": {
"type": "schema_registry",
"url": "SCHEMA-REGISTRY-ADDRESS"
},
"parseSpec": {
"format": "avro",
"timestampSpec": {
"column": "date_time_utc",
"format": "yyyy-MM-dd HH:mm:ss"
},
"flattenSpec": {
"useFieldDiscovery": true,
"fields": [
{
"name": "suite_id",
"type": "path",
"expr": "$.suite.id"
},
{
"name": "suite_name",
"type": "path",
"expr": "$.suite.name"
}
]
},
"dimensionsSpec": {
"dimensions": [
"suite_name",
"suite_source"
],
"dimensionExclusions": [],
"spatialDimensions": []
}
}
},
"metricsSpec": [
{
"type": "hyperUnique",
"name": "unique_suite_ids",
"fieldName": "suite_id"
}
],
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": "HOUR",
"rollup": true
}
},
"tuningConfig": {
"type": "kafka",
"maxSavedParseExceptions": 100,
"resetOffsetAutomatically": true
},
"ioConfig": {
"topic": "kafka_topic",
"useEarliestOffset": true,
"replicas": "2",
"taskCount": "{varying-values}",
"taskDuration": "PT15M",
"consumerProperties": {
"bootstrap.servers": "KAFKA-BOOTSTRAP-SERVER-ADDRESS"
}
}
}
``
**I am seeing the segment count and size different for each data source created, although all supervisor tasks are consuming data from the same Kafka topic. Is this expected? **
**How is the segments size/count impacted by the 'ioConfig.taskCount' value?**
Please help me out here.
Regards,
Vinay Patil