Help please understand the situation.
I have an example of my data (~50000 rows)
When I ingest it throught native batch engine (“type”: “index”)
Then I see that size of segment is about 1.1Mb (http://my-druid-host:8081/#/datasources/indicators)
When I ingest same data throught kafka indexing service (“type”: “kafka”)
Then I see that size of segment is about 575Kb (http://my-druid-host:8081/#/datasources/indicators)
Why is the size of the segments different for a same input data?
– BATCH INDEX
{
“type”: “index”,
“spec”: {
“dataSchema”: {
“dataSource”: “indicators”,
“parser”: {
“type”: “string”,
“parseSpec”: {
“format”: “json”,
“timestampSpec”: {
“column”: “time”,
“format”: “auto”
},
“dimensionsSpec”: {
“dimensions”: [“indicator”, “unit”, “unit_path”, { “name”: “value”, “type”: “double” }],
“dimensionExclusions”: ,
“spatialDimensions”:
}
}
},
“metricsSpec”: ,
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “DAY”,
“queryGranularity”: “NONE”,
“rollup” : false,
“intervals”: ["{interval_start}/{interval_end}"]
}
},
“ioConfig” : {
“type” : “index”,
“firehose” : {
“type” : “local”,
“baseDir” : “{baseDir}”,
“filter” : “{filter}”
}, “appendToExisting”: false
},
“tuningConfig”: {
“type”: “index”,
“targetPartitionSize” : 5000000,
“maxRowsInMemory” : 25000,
“forceExtendableShardSpecs” : false
}
}
}
– KAFKA INDEX
{
“type”: “kafka”,
“dataSchema”: {
“dataSource”: “indicators”,
“parser”: {
“type”: “string”,
“parseSpec”: {
“format”: “json”,
“timestampSpec”: {
“column”: “time”,
“format”: “auto”
},
“dimensionsSpec”: {
“dimensions”: [“indicator”, “unit”, “unit_path”, {
“name”: “value”,
“type”: “double”
}
],
“dimensionExclusions”: ,
“spatialDimensions”:
}
}
},
“metricsSpec”: ,
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “DAY”,
“queryGranularity”: “NONE”
}
},
“tuningConfig”: {
“type”: “kafka”,
“maxRowsPerSegment”: 5000000
},
“ioConfig”: {
“topic”: “druid_stream_ingestion”,
“consumerProperties”: {
“bootstrap.servers”: “{bootstrap_servers}”
},
“taskCount”: 1,
“replicas”: 1,
“taskDuration”: “PT1m”
}
}