We are running Druid 0.9.2 (Imply version) on a cluster configuration - we have ~475 mil records ingested.
But when I try to run a simple plyql command on a metric (impressions) like:
plyql -h [broker]:8082 -q “SELECT impressions FROM meru-clearchannel-druid”
the data nodes go down (8 cores on100% on both nodes, 16GB RAM full, datasource not visible anymore)
Can you please help with the investigation?
historical_12GB.gc (7.04 KB)
PlyQL queries without aggregation will use Druid’s “select” query which is (surprisingly) a serious resource hog. We plan to replace it with an efficient implementation in an upcoming release, based on the scan-query community contributed extension (http://druid.io/docs/latest/development/extensions-contrib/scan-query.html). If you need to pull a lot of data out of Druid without aggregation then I would suggest using that extension instead of PlyQL for now.
Thanks Gian for your response - I'll go with suggested approach
Keep up with the good work!