Sorting raw data grouped at hourly level

Hi,

We have data at 15 minute granularity. we want to get the max value at hourly granularity from them and show the top 3 values among them.

I am using below query which gives me max value at hourly granularity but the sorting is happening only at hour granularity rather than all the data.

{

“queryType” : “groupBy”,

“dataSource” : “NodeB”,

“granularity” : “hour”,

“intervals” : [ “2019-07-24T06:00:48.959Z/2019-07-24T12:36:48.959Z” ],

“descending” : “true”,

“aggregations” : [ {

"type" : "doubleMaxExtended",

"name" : "check_NodeB_C_1",

"fieldName" : "NodeB_C_1"

} ],

“postAggregations” : ,

“virtualColumns” : ,

“context” : {

"timeout" : null,

"queryId" : "iDiLJbKkqU",

"queryPriority" : null

},

“dimensions” : [ ],

“limitSpec” : {

"type" : "default",

"limit" : 3,

"columns" : [ {

  "dimension" : "check_NodeB_C_1",

  "direction" : "descending",

  "dimensionOrder" : "numeric"

} ]

},

“having” : null

}

Please suggest solution.

Thanks

Amit

Hi All,
Please suggest solution.

Thanks

Amit

Hi Amit:

The output will always sorted by hour because your query granularity is hour. Do you want to sort all the outputs from check_NodeB_C_1 column in desc order, and ignore the hours they are in? Maybe you can write a quick script to process that outside Druid?

Thanks

Hi Amit,

Druid will always sort within a granularity. You can think of your query as making N queries, on for each granularity ‘hour’ in your case.

To do what you want you should set the granularity to ‘all’ and put the hour bucketing into the dimensions list.

In fact I think you need to do a nested group by.

I HIGHLY advice you to use SQL to compute this, it will be much easier to express what you want with DruidSQL (not that Druid SQL will always do granualrity ‘all’)

Best regards

Vadim