Druid Scan, get next batch

Hi All,

I am trying get records in a batch using scan, but I am confused about how to get the next batch?

For the below request how to, how to get next set of 5 records,

{

“queryType”:“scan”,

“dataSource”:“my_datasource”,

“intervals”:[“2018-05-17/2020-03-30”],

“batchSize”: 1000,

“limit”: 5

}

Regards,

Kiran

Hi Kiran,

The Scan query returns raw Apache Druid rows in streaming mode. As you have put the limit condition in the query that means you are limiting the row returned to just 5 .

batchSize: The maximum number of rows buffered before being returned to the client. Default is 20480
limit: How many rows to return. If not specified, all rows will be returned.

For more details on scan queries and available options please read Druid Scan Queries .

Thanks and Regards,

Vaibhav

Hi Vaibhav,

Thank you for the response. If the caller (client) can process only n number of records (limit), what should be the next rest call?

Is there an id with which client can get next set of records. I am thinking in line with the Elastic search Scroll query.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html#request-body-search-scroll

Regards,

Kiran

Hi Kiran,

Have you found any workaround to this problem about getting the next batch using scan query? We are facing exactly same situation.

Thanks.

Hi Kiran,

Could you please help me in understanding your problem.

From 0.17 version onwards select query is replaced with scan . I understand that select have the feature to get data batch by batch , but this infact clog the broker heap . So this feature is replaced with scan query where the data is streamed back to broker. This is highly efficient and effective . Often select helps in the pagination by splitting the data to multiple batches . But with scan this need to be done by the client side.

I would like to understand the way you fire ur scan query. Is it using JDBC or rest end point ?

Hi Tijo,

Can you guide if you have anything in mind on how to get next batch using Scan? As you said, that has to done by client, do you have any suggestions here?

Hi Kiran,
I assume that you are using the rest api to query, at least for scan query .

As I understand when you fire your query with a batch size , the broker will execute a sync call to historical and then each historical will return the results in a batch size mentioned in the query . The results are streamed to the broker . As the client consumes the records, the next set of records are streamed from historical . You can also fire the queries to historical if you want to increase the parallelism.

From the client side, you can read this data in a streaming fashion. What i do while writing a client program , I assume the response stream has unlimited data and read the stream based on the clients capability .

I have a crude implementation of reading data from a scan query https://github.com/tijoparacka/DruidExperiments/blob/master/executeScanquery/src/main/java/io/druid/example/scanquery/ScanRequest.java

This implementation just read the data in streaming way but do nothing. Not sure if this is helpful.

I am not sure how to leverage the client program if used with JDBC.

Hope this is helpful