Hi,
I need to find
“Select DISTINCT countryName from wikipedia. How do I write that in Druid.”
I researched on Select Query, but not sure what to write in pagingSpecs and Threshold, I need all the distinct records.
I even researched on Scan Query, but I am not sure how to get distinct records. Also, on a large dataset (thousands of records), it is getting stuck
Thanks,
Tushar
Also, scan is used for streaming? So how do I use it for batch?
You can use Druid SQL select distinct
select distinct “country_name”
from wikipedia2
Or Druid Native -
[{
“queryType”: “topN”,
“dataSource”: {
“type”: “table”,
“name”: “wikipedia2”
},
“virtualColumns”: ,
“dimension”: {
“type”: “default”,
“dimension”: “country_name”,
“outputName”: “d0”,
“outputType”: “STRING”
},
“metric”: {
“type”: “dimension”,
“previousStop”: null,
“ordering”: {
“type”: “lexicographic”
}
},
“threshold”: 5000,
“intervals”: {
“type”: “intervals”,
“intervals”: [
“-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z”
]
},
“filter”: null,
“granularity”: {
“type”: “all”
},
“aggregations”: ,
“postAggregations”: ,
“descending”: false
}]
Eric Graham
Solutions Engineer -** **Imply
**cell: **303-589-4581
email: eric.graham@imply.io
www.imply.io
The SQL syntax would be
query.sql
{
“query”: “SELECT DISTINCT country_name FROM wikipedia2”
}
$ curl -XPOST -H’Content-Type: application/json’ http://localhost:8082/druid/v2/sql/ -d @query.sql
[{“country_name":“AUSTRALIA”}]
Eric Graham
Solutions Engineer -** **Imply
**cell: **303-589-4581
email: eric.graham@imply.io
www.imply.io