Druid - Timeseries with GROUP BY capabilities

Hello there,

Seems Druid does not handle timeseries request with GROUP BY capabilities.

Is there any obvious reason I miss?

Regards,

Yannick

Druid has a separate kind of query that handles GroupBy: http://druid.io/docs/latest/querying/groupbyquery.html

Hi David,

Thank you for the reply, but I meant GROUP BY capabilities for specific TIMESERIES query type.

In my company we have vizualization need with high amount of graphes, for instance a same metric value for many different servers for different kind of processes etc…

Using Druid GROUP BY request type means having a lot of overhead like the following in (very schematic view) :

{

“server” : “ServerNameA”,

“someprocess” : “processA”,

“somemetric” : “value”,

“timestamp” : “TS at t”

}

{

“server” : “ServerNameA”,

“someprocess” : “processA”,

“somemetric” : “value”,

“timestamp” : “TS at t+1”

}

{

“server” : “ServerNameA”,

“someprocess” : “processA”,

“somemetric” : “value”,

“timestamp” : “TS at t+2”

}

etc…

As you can see, you have lot of overhead (dimensions repeated n times for n points).

What we wonder is, can we have someting similar to this :

{

timeseries :

{

“server” : “serverA”,

“someprocess” : “processA”,

points : [

{

“timestamp” : “ts at t”,

“metric”: “value”

},

{

“timestamp” : “ts at t + 1”,

“metric”: “another value”

},

{

“timestamp” : “ts at t+2”,

“metric”: “value”

},

]

},

{

“server” : “serverB”,

“someprocess” : “processA”,

points : [

{

“timestamp” : “ts at t”,

“metric”: “value”

},

{

“timestamp” : “ts at t + 1”,

“metric”: “another value”

},

{

“timestamp” : “ts at t+2”,

“metric”: “value”

},

]

},

etc…

I maybe made some JSON mistakes but the main idea is here.

Any chance of getting such a behavior with Druid, or maybe this makes no sense for some obvious reason I don’t know ?

Regards,

Yannick

Le dim. 25 nov. 2018 à 22:44, David Glasser glasser@apollographql.com a écrit :

Hi Yannick,

I think it makes sense, but is not supported yet.

I think the best way to support this is supporting some new aggregator types (like reported in https://github.com/apache/incubator-druid/issues/2313) which merge inputs into an array.

With this aggregator, you can use the nested groupBy to reformat the output format.

But this will take a long time. Meanwhile, you can use SQL to reduce the output size. It doesn’t reduce as much as the nested output format you described, but is still better than native groupBy query.

Jihoon

Hi Jihoon,

OK thanks for the info.

Aggregator will be a good option indeed.

Feel like I wanna help.

Yannick

Hi Yannick,

Druid is open to anyone to contribute!

Please let me know if you need anything!

Jihoon