We are trying to access Druid via broker and coordinator URLs via Presto. We are seeing some challenges in querying large Druid data sources.
Small data sources are easily query able
73,243,851 rows -Real time Streaming data
18,100,826 rows -Batch Load
817,205 rows-Real time Streaming data.
Let me know if we can optimize broker/historical nodes for fast data retrieval.
A few questions:
- How are you querying Druid through Presto? AFAIK, there’s no Presto-Druid connector (there is an open issue in GitHub though).
- What is your query pattern - are you trying to retrieve millions of rows, or are you trying to aggregate many rows (using Druid’s capabilities) and get a small set of rows as a result?
We have written custom druid to presto connector. We are able to query smaller table but big table is the issue.
Also we have modified the broker settings for query.
As per the query pattern, I’m not sure I fully understand, can you provide some more details:
- What kind of query are you running?
- What is the expected size of the resultset?
- What is the problem with the big table?