Druid rollup

Hi,

I have created below an inputSpec.

{

“type”: “kafka”,

“dataSchema”: {

“dataSource”: “client_1_SLAAggregation”,

“parser”: {

“type”: “sla_aggregation”,

“parseSpec”: {

“format”: “json”,

“flattenSpec”: {

“useFieldDiscovery”: true,

“fields”: [

{

“type”: “path”,

“name”: “instrumentCreatedTime”,

“expr”: “$.payload.outboundMsg.instrumentCreatedTime”

},

{

“type”: “path”,

“name”: “task_id”,

“expr”: “$.taskId”

},

{

“type”: “path”,

“name”: “task_name”,

“expr”: “$.taskName”

},

{

“type”: “path”,

“name”: “queue_id”,

“expr”: “$.queueId”

},

{

“type”: “path”,

“name”: “queue_name”,

“expr”: “$.queueName”

},

{

“type”: “path”,

“name”: “slaStatusValue”,

“expr”: “$.slaStatusValue”

},

{

“type”: “path”,

“name”: “taskStatusValue”,

“expr”: “$.taskStatusValue”

}

]

},

“timestampSpec”: {

“column”: “instrumentCreatedTime”,

“format”: “auto”

},

“dimensionsSpec”: {

“dimensions”: [

“task_name”,

“task_id”,

“queue_name”,

“queue_id”,

“process_name”,

“sla_rule_name”

]

}

}

},

“metricsSpec”: [

{

“type”: “longMax”,

“name”: “sla_status_value”,

“fieldName”: “slaStatusValue”

},

{

“type”: “longMax”,

“name”: “task_status_value”,

“fieldName”: “taskStatusValue”

},

{

“type”: “longMin”,

“name”: “remaining_time_value”,

“fieldName”: “remainingTimeValue”

}

],

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “week”,

“queryGranularity”: “none”,

“rollup”: true

}

},

“tuningConfig”: {

“type”: “kafka”,

“maxRowsInMemory” : 12000000,

“maxRowsPerSegment”: 12000000

},

“ioConfig”: {

“topic”: “client_1_SLAMonitoring”,

“taskCount”: 1,

“consumerProperties”: {

“bootstrap.servers”: “localhost:9092”

}

}

}

In some cases rolling is fast but in some cases it is taking to rollup. Due to this previous record is also there with the updated record but it is taking time to replace the previous record.

Could anyone please help on druid rollup if anything I have missed and I am using druid-0.12.3.

Thanks,

Monica

Hi Monica,

Could you explain more on what do you mean by , In some cases rolling is fast but in some cases it is taking to rollup ?

To understand more on Rollup in druid you can watch this video: https://www.youtube.com/watch?v=u551R7voe7w

Hope this helps.

Thanks,

Vaibhav

Yes it is taking to rollup.

I did not understand your issue well, It will be great if you can explain the issue in the detail.

Thanks,

Vaibhav

Rollup behavior is not consistent. Whenever a query is fired, every time we expect query result to include only rolled up/merged records. But this is not happening always. Some times rollup happens quickly and queried result returns only merged/rolled up records. But other times queried result involves non merged records as well. So, is it possible to control query results so that query returns results after rolling up/merging always and also rollup has minimum latency.

Hi Monica,

I see you have queryGranularity set to NONE in your ingestion specs.

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “week”,

“queryGranularity”: “none”,

“rollup”: true

}

That means you will get the timestamp as it is coming from the source. Now could you please provide below details to understand you better:

1)what is your query,

2)What is expected output and

3)What is the actual output?

Thanks,

Vaibhav

I am working in the same team. So replying behalf of Monica…

We have SLA monitoring system. Here different system conditions passing through sequential states starting from normal to abnormal(critical). Application sends many SLA events per minute and with states like normal, warning, server, critical … etc. Anytime user queries, the current state of all the SLAs of all the system conditions should be returned and in addition to count of how many are there each SLA state. We want to roll-up the state to latest, so used slaStatusValue to hold only max value.

But query was not returning rolled-up data most of the times and last two or three previous states were retained along with latest state for sometime. So groupBy was returning duplicated for previous states ideally it should have counted only the latest state. To address this we created CompactTask to run roll-up explicitly. But this also did not solve our purpose. This is what we tried

  1. When the priority of compaction task was more than ingestion task, ingestion task was failing

  2. When the priority of ingestion was more than compaction, compaction tasks were going to waiting state

Our queries look like

{

“queryType” : “groupBy”,

“dataSource” : “SLAAggregation”,

“intervals” : [“2019-10-01/2019-10-10”],

“dimensions” : [“sla_status_value”, “assigned_to_user_id”, “team_name”, “team_id”],

“granularity” : “day”,

“threshold” : 150,

“filter”: { “type”: “selector”, “dimension”: “task_id”, “value”: “OCT_1_TEST:1:386932ee-e447-11e9-89b1-7ea1e161385a”

.

.

.

},

“aggregations” : [

{

“type” : “count”,

“name” : “sla_status_value”,

“fieldName” : "SLA "

}

]

}

It would be great, if someone can provide solution for our issue

Thanks,

Amba

Can someone please provide solution for it