Select/ Scan in Druid

Hi,

I need to find

“Select DISTINCT countryName from wikipedia. How do I write that in Druid.”

I researched on Select Query, but not sure what to write in pagingSpecs and Threshold, I need all the distinct records.

I even researched on Scan Query, but I am not sure how to get distinct records. Also, on a large dataset (thousands of records), it is getting stuck

Thanks,

Tushar

Also, scan is used for streaming? So how do I use it for batch?

You can use Druid SQL select distinct

select distinct “country_name”

from wikipedia2

Or Druid Native -

[{

“queryType”: “topN”,

“dataSource”: {

“type”: “table”,

“name”: “wikipedia2”

},

“virtualColumns”: ,

“dimension”: {

“type”: “default”,

“dimension”: “country_name”,

“outputName”: “d0”,

“outputType”: “STRING”

},

“metric”: {

“type”: “dimension”,

“previousStop”: null,

“ordering”: {

“type”: “lexicographic”

}

},

“threshold”: 5000,

“intervals”: {

“type”: “intervals”,

“intervals”: [

“-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z”

]

},

“filter”: null,

“granularity”: {

“type”: “all”

},

“aggregations”: ,

“postAggregations”: ,

“descending”: false

}]

Eric Graham

Solutions Engineer -** **Imply

**cell: **303-589-4581

email: eric.graham@imply.io

www.imply.io

The SQL syntax would be

query.sql

{

“query”: “SELECT DISTINCT country_name FROM wikipedia2”

}

$ curl -XPOST -H’Content-Type: application/json’ http://localhost:8082/druid/v2/sql/ -d @query.sql

[{“country_name":“AUSTRALIA”}]

Eric Graham

Solutions Engineer -** **Imply

**cell: **303-589-4581

email: eric.graham@imply.io

www.imply.io