Want to query Druid with SQL?

If you ever find yourself wishing to use SQL to query Druid check out: https://github.com/facetjs/facet-cli

It is a CLI for facetjs ( https://github.com/facetjs/facetjs ). It does not do much beyond providing a command line utility for facetjs.

It works by taking your query, which can be a facet query or an fSQL query that is parsed into a facet query internally, and executing it using facet’s Druid query planner.

Sub-query nesting is supported allowing for GROUP BYs within GROUP BYs (check out the last example in the readme).

If you are interested in the queries that are happening under the hood run the cli with the --verbose flag set.

This may be useful if you are learning how Druid queries work.

The goal is to eventually support all the goodness of ANSI SQL SELECT queries including sub-query filters, JOINs, Window functions, e.t.c.

If you check it out please provide feedback via the issues on GH.

My favorite SQL query that I’ve tried so far is this one, because it “works” even though I’m not sure if it actually makes sense or not:

$ facet -h broker -q “SELECT SUM(__time) FROM twitterstream WHERE __time BETWEEN ‘2015-04-01’ AND ‘2015-05-01’”

[

{

“SUM___time”: 131535726041800640000

}

]

You can never have enough time.

Oh wow, that is a very funny find.

Running that query in verbose mode (-v) you can see

$ facet -h broker -q “SELECT SUM(__time) FROM twitterstream WHERE __time BETWEEN ‘2015-04-01’ AND ‘2015-05-01’” -v

You can see that it issues the following Druid query:

{

“queryType”: “timeseries”,

“dataSource”: “twitterstream”,

“intervals”: [

"2015-04-01/2015-05-01T00:00:01"

],

“granularity”: “all”,

“aggregations”: [

{

  "name": "SUM___time",

  "type": "doubleSum",

  "fieldName": "__time"

}

]

}

I did not know that Druid could aggregate over __time. Learn something new every day.

It’s a secret to everybody

Vidam,

This looks very exciting, thanks for sharing! I think I’ll learn a lot from verbose mode.

Conrad

Vadim, this is very cool! Is there a good reference for fSQL that you could point me to? I find myself tripped up by simple stuff like making a multi-dimension GROUP BY clause, e.g.,

select count(), dim_A, dim_B from datasource group by dim_A, dim_B

returns

Could not parse query as SQL: SQL parse error Expected “!=”, “<”, “<=”, “<>”, “=”, “>”, “>=”, “AND”, “OR”, “in”, Mandatory Whitespace, [*/], [+-] or end of input but “,” found. on select count(), dim_A, dim_B from datasource group by dim_A, dim_B

Great point.
Multidimensional GROUP BYs are not supported by facet at this time and the error it gave you must be made a lot clearer.

I have opened a GitHub issue about that https://github.com/facetjs/facet-cli/issues/2

I have also started a document today to which I hope to add over the coming week: http://facetjs.org/docs/fSQL

Finally regarding actually making GROUP BYs on several dimensions: that is currently not supported but what is possible is to make nested subqueries (see last example in https://github.com/facetjs/facet-cli/blob/master/README.md). I believe it is a much more useful tool than multidimensional GROUP BYs. I probably need to write some docs about why I believe that to be the case.

Great, thanks!

I just stumbled on my own thread while googling “Druid SQL”. lol.
It should be noted for anyone looking at this thread that Druid now supports SQL natively http://druid.io/docs/latest/querying/sql

You can query it from the command line (http://druid.io/docs/latest/tutorials/tutorial-query.html#dsql-client), via http (http://druid.io/docs/latest/querying/sql.html#client-apis), or from the built in web console (http://druid.io/docs/latest/operations/druid-console.html#sql)