Rollup real time ingestion data

Hi,

is it possible to roll up the data ingested using real-time ingestion via KAFKA

{

“type”: “kafka”,

“dataSchema”: {

“dataSource”: “dataSource_1”,

“parser”: {

“type”: “string”,

“parseSpec”: {

“format”: “json”,

“timestampSpec”: {

“column”: “CreationTime”,

“format”: “ddMMyyyyHHmmss”

},

“dimensionsSpec”: {

“dimensions”: ,

“dimensionExclusions”: [

“CreationTime”

]

}

}

},

“metricsSpec”: [

{

“name”: “Metirc_1”,

“type”: “doubleSum”,

“fieldName”: “Metirc_1”

}

],

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “HOUR”,

“queryGranularity”: “HOUR”,

“rollup”: true

}

},

“tuningConfig”: {

“type”: “kafka”,

“maxRowsPerSegment”: 10000000,

“maxRowsInMemory”: 100000,

“resetOffsetAutomatically”: true

},

“ioConfig”: {

“topic”: “topic_1”,

“consumerProperties”: {

“bootstrap.servers”: “IP:PORT”

},

“taskCount”: 1,

“replicas”: 1,

}

}

My requirement is to store data after aggregating it, so i want to know if ROLLUP feature is the right choice.

Cheers.

Will it work if i receive the data for 12-1 PM around 4 Pm in KAFKA topic.

My requirement is to store data after aggregating it, so i want to know if ROLLUP feature is the right choice.

Yes, rollup is supported for Kafka indexing as well, it works identically to rollup in batch tasks.

Thanks,

Jon

Thanks Jon.

I have tried using rollup with kafka indexing service, but i have come across few issues

1 —>> Though the segmentGranularity is HOUR, it creates more than one segments and shards data. How can i configure(force) it to create just 1 segment per hour.

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “HOUR”,

“queryGranularity”: “HOUR”,

“rollup”: true

}

},

“tuningConfig”: {

“type”: “kafka”,

“maxRowsPerSegment”: 10000000,

“maxRowsInMemory”: 100000,

“resetOffsetAutomatically”:true,

“forceExtendableShardSpecs” : true

},

“ioConfig”: {

“topic”: “t_gNodeB”,

“consumerProperties”: {

“bootstrap.servers”: “xyz,abc”

},

“taskCount”: 1,

“replicas”: 1,

“useEarliestOffset”: true,

“appendToExisting”: true

}

2 —> The data is not compressed as i can see the rolled up data is similarly sized to what i have in RAW form(without Rollups).

Any help is much appreciated…

Happy Learning !!

1 —>> Though the segmentGranularity is HOUR, it creates more than one segments and shards data. How can i configure(force) it to create just 1 segment per hour.

I’m not sure what shards are getting created in your environment, but I would say the recommended approach is to let the Kafka indexing tasks create multiple segments (for memory pressure reasons) and use a compaction task to merge the segments later on.

2 —> The data is not compressed as i can see the rolled up data is similarly sized to what i have in RAW form(without Rollups).

The amount of reduction depends on the input data, maybe your inputs have few repeated rows per unique set of dimension values.

Thanks,

Jon