Custom Aggregation Query

Hi,
I’ve been wondering if its possible to query driud data with group by but without aggregation function.
I noticed that this is possible when using influxDB.

Something like: SELECT * FROM TABLE WHERE TS > ‘15/12/2018 12:12:12’ GROUP BY ID

So, is there any way to do that. maybe custom aggregation function or …

Thanks

What would the result of that query be (in influx db)? Could you please give an example the output?

In general you do not need aggregates, but you do need to list out the columns in the selected query:

SELECT page, channel FROM “wikipedia” WHERE “__time” >= CURRENT_TIMESTAMP - INTERVAL ‘1’ DAY GROUP BY 1, 2

Best regards,

Vadim

Hi again,

InfluxDB gives a detailed aggregation result where we have the real groups of aggregation performed.

Something like this:

SELECT num1, num2 FROM “MYTABLE” WHERE “__time” >= CURRENT_TIMESTAMP - INTERVAL ‘1’ DAY GROUP BY id1, id2

name: MYTABLE

tags: id1=ABC#123, id2=ABC
time num1 num2


1549308606000000000 883 566
1549308617000000000 543 765
1549308627000000000 764 123

name: MYTABLE
tags: id1=ABC#456, id2=ABC
time num1 num2


1549309011000000000 765 123
1549309020000000000 756 564
1549309030000000000 345 353

Thanks,

Tamer

Hello again,

My use case is to store time-series data in Apache druid and query them from Apache spark, so the question is:

Is there a way to store those data partitioned by some columns in Druid and then retrieve them by a query from Spark in the same way they are stored.

Any advice here ?

Regards

Tamer

Hey Tamer,

In general it makes more sense to use Druid queries to query Druid data rather than to query it from some other system, since Druid is designed to be a query system too. But you can load Druid segments from Spark if you want. I don’t know of any officially supported options on either end, but you might find this useful: https://github.com/implydata/druid-hadoop-inputformat (with modifications to support the latest versions)

Gian