Help with druid query object creation

Hi All,

I am trying to convert a HQL query(which is similar to SQL) into a Druid Query Object(started with GroupBy query). Assume a sample query as follows :

Considering a sample query "select a,b from table where date>=start_time and date<=end_time and a!=‘test’ "

I notice that there are selector and logical filters , as well as intervals in Druid which can be be used for where clauses. But couldn’t find an exact equivalent for predicates like >=, <=, between , in , not in …etc .

Can someone please let me know how to proceed from here? That will be really helpful.

Regards

Rajitha

Hi Rajitha,

Since you want to grab data from two sources you need to use a “GroupBy query”. I will (try) to build a druid query based on you SQL example.

{
“queryType”: “groupBy”,
“dataSource”: {
“type”: “table”,
“name”: “table_name”
},
“granularity”: “day”,
“dimensions”: [
“a”,
“b”
],
“aggregations”: [
{
“type”: “count”,
“name”: “row_count”
},
{
“type”: “longSum”,
“name”: “b_sum”,
“fieldName”: “b”
}
],
“filter”: {
“type”: “not”,
“field”: {
“filter”: {
“type”: “selector”,
“dimension”: “a”,
“value”: “test”
}
}
},
“intervals”: [
“/”
]
}

I hope that helps. Also take a look here, and check which properties are mandatory for groupBy query. What you also should consider is using a TimeSeries query, because you take data only from one table.

With best regards
Martin Schneider

Hi Martin,

Thanks for your reply. :slight_smile:

But in my case, I do not know upfront from the sql query as to what should be set in the Intervals section of the druid query. My query input will be something like “select sum(x) from table where y=‘somevalue’ and z=‘somevalue’” like a sql query which hits a regular rdbms table…Here y and z are completely unknown to me . Hence I don’t know as to what column is the date related column to be set in the intervals section of druid query.

Hope I am clear with my issue. Basically I am trying to integrate Druid into our generic centralised reporting system where the input can be a sql query and will hit one of the datastores based on availability of data in them. Here Druid is one of the datastore.

Regards

Rajitha

Hi Rajitha,

I get your problem. A “dirty” workaround would be to us a startDate which far in the past like “1900-01-01T00:00:00.000” and an endDate which is far in the future like “3000-12-31T23:59:59.999”. And put this as static in the query. BUT that would contradict the main purpose of druid in my eyes. Druid is great in creating dashboard, like Pivot (Google-group discussion) or Panoramix (Google-group discussion). Both examples display how druid can be used. Take a closer look at Pivot, since they provide an sql-like semantic for querying data from druid. That could provide you some solutions for your problem.

To sum it up consider the use-case where you want to use druid. Druid is create for dashboards, which often have a time boundary. On the contrary it’s not an option for creating reports on an event level.

Best

Martin

Hi Rajitha,

You might want to check out https://github.com/implydata/plyql is it a SQL-like interface to Druid (wrapped up as a command line tool). I think you may find the --verbose option particularly useful since it will show you how PlyQL converts certain SQL queries to Druid. There are some cool “tricks” encoded in PlyQL for doing things like <= > e.t.c.