Top N percentage of the distribution

Hi,

I’m new to Druid and working on a use case, where I need to find the top x percentage of rows from a given dataset based on some metric.

I was able to calculate the percentages then using “topN” query, Druid is returning the N number of results ordered by percentage of the metric. But, I’m looking for top 30% and bottom 70% of the distribution (based on the percentage). Just wondering whether its possible with Druid?

One option is, I can iterate thru the results until it reaches 30% but this approach is not scalable as I don’t know the threshold value.

What I’m expecting is when I ask for top 30% of the dataset, it should return the rows whose sum of the percentages should become to 30. ( From the following response, it should return only top 2 rows (16% + 14%) ).

Here is the query I’m using:

{

“queryType” : “topN”,

“dataSource” : “agg_channel_account_metrics_daily_v1”,

“intervals” : [“2018-09-19/2018-09-21”],

“granularity” : “all”,

“dimension” : “channel_id”,

“metric”: {

“metric”: “some_metric_percent”

},

“threshold” : 20,

“filter”: {

“type”: “and”,

“fields”: [

{

“type”: “selector”,

“dimension”: “category”,

“value”: “All”

},

{

“type”: “selector”,

“dimension”: “type”,

“value”: “ABC”

}

]

},

“aggregations” : [

{

“type” : “longSum”,

“name” : “total_dimension1_by_channel”,

“fieldName” : “total_dimension1”

}

],

“postAggregations” : [

{ “type” : “constant”,

“name” : “total_dimension1_by_type”,

“value” : 100000

},

{

“type”: “javascript”,

“name”: “some_metric_percent”,

“fieldNames”: [“total_dimension1_by_type”, “total_dimension1_by_channel”],

“function”: “function(total_dimension1_by_type, total_dimension1_by_channel) { dimension1_percent = (total_dimension1_by_channel * 100) / total_dimension1_by_type; return Math.round(dimension1_percent*100)/100; }”

}

]

}

Following are the results (only for illustration, these are not the actual percentages) .

[

{

“timestamp”: “2018-09-19T00:00:00.000Z”,

“result”: [

{

“total_dimension1_by_channel”: 16000,

“total_dimension1_by_type”: 100000,

“channel_id”: “12”,

“some_metric_percent”: 16.00

},

{

“total_dimension1_by_channel”: 14000,

“total_dimension1_by_type”: 100000,

“channel_id”: “13”,

“some_metric_percent”: 14.00

},

{

“total_dimension1_by_channel”: 10000,

“total_dimension1_by_type”: 100000,

“channel_id”: “14”,

“some_metric_percent”: 10.00

},

{

“total_dimension1_by_channel”: 10000,

“total_dimension1_by_type”: 100000,

“channel_id”: “15”,

“some_metric_percent”: 10.00

},

{

“total_dimension1_by_channel”: 10000,

“total_dimension1_by_type”: 100000,

“channel_id”: “16”,

“some_metric_percent”: 10.00

},

{

“total_dimension1_by_channel”: 10000,

“total_dimension1_by_type”: 100000,

“channel_id”: “17”,

“some_metric_percent”: 10.00

},

{

“total_dimension1_by_channel”: 10000,

“total_dimension1_by_type”: 100000,

“channel_id”: “18”,

“some_metric_percent”: 10.00

},

{

“total_dimension1_by_channel”: 10000,

“total_dimension1_by_type”: 100000,

“channel_id”: “19”,

“some_metric_percent”: 10.00

},

{

“total_dimension1_by_channel”: 10000,

“total_dimension1_by_type”: 100000,

“channel_id”: “20”,

“some_metric_percent”: 10.00

}

}

]

Any help would be appreciated.

Thanks,

Ram