Size Of Segment Files (compare Kafka and batch file index)

Help please understand the situation.

I have an example of my data (~50000 rows)

When I ingest it throught native batch engine (“type”: “index”)

Then I see that size of segment is about 1.1Mb (http://my-druid-host:8081/#/datasources/indicators)

When I ingest same data throught kafka indexing service (“type”: “kafka”)

Then I see that size of segment is about 575Kb (http://my-druid-host:8081/#/datasources/indicators)

Why is the size of the segments different for a same input data?

– BATCH INDEX

{

“type”: “index”,

“spec”: {

“dataSchema”: {

“dataSource”: “indicators”,

“parser”: {

“type”: “string”,

“parseSpec”: {

“format”: “json”,

“timestampSpec”: {

“column”: “time”,

“format”: “auto”

},

“dimensionsSpec”: {

“dimensions”: [“indicator”, “unit”, “unit_path”, { “name”: “value”, “type”: “double” }],

“dimensionExclusions”: ,

“spatialDimensions”:

}

}

},

“metricsSpec”: ,

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “DAY”,

“queryGranularity”: “NONE”,

“rollup” : false,

“intervals”: ["{interval_start}/{interval_end}"]

}

},

“ioConfig” : {

“type” : “index”,

“firehose” : {

“type” : “local”,

“baseDir” : “{baseDir}”,

“filter” : “{filter}”

}, “appendToExisting”: false

},

“tuningConfig”: {

“type”: “index”,

“targetPartitionSize” : 5000000,

“maxRowsInMemory” : 25000,

“forceExtendableShardSpecs” : false

}

}

}

– KAFKA INDEX

{

“type”: “kafka”,

“dataSchema”: {

“dataSource”: “indicators”,

“parser”: {

“type”: “string”,

“parseSpec”: {

“format”: “json”,

“timestampSpec”: {

“column”: “time”,

“format”: “auto”

},

“dimensionsSpec”: {

“dimensions”: [“indicator”, “unit”, “unit_path”, {

“name”: “value”,

“type”: “double”

}

],

“dimensionExclusions”: ,

“spatialDimensions”:

}

}

},

“metricsSpec”: ,

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “DAY”,

“queryGranularity”: “NONE”

}

},

“tuningConfig”: {

“type”: “kafka”,

“maxRowsPerSegment”: 5000000

},

“ioConfig”: {

“topic”: “druid_stream_ingestion”,

“consumerProperties”: {

“bootstrap.servers”: “{bootstrap_servers}”

},

“taskCount”: 1,

“replicas”: 1,

“taskDuration”: “PT1m”

}

}