SQL Groupby -not enough disk space

Hi,

I am executing an SQL, on total 5million records and around 30 segments (30days data). The SQL does work for even for 1 day.

The following is the SQL, if i remove “APPROX_COUNT_DISTINCT_DS_THETA”, then the query works.

select Category, CountryName, Store_Name,

APPROX_COUNT_DISTINCT(distinct Product_id), APPROX_COUNT_DISTINCT_DS_THETA(Product_id)

from retail_data_rollup_index_hyper where

TIME_EXTRACT("__time",‘day’)=1

group by Category, CountryName,Store_Name

I have already set “maxOnDiskStorage=2”, i dont know what this parameter actually does. Any thoughts?

Error:

Resource limit exceeded / Not enough disk space to execute this query. Try raising druid.query.groupBy.maxOnDiskStorage. / org.apache.druid.query.ResourceLimitExceededException / on host localhost:8083

Regards, Chari.

Hi

https://druid.apache.org/docs/latest/querying/groupbyquery.html

Setdruid.query.groupBy.maxOnDiskStorage to a value like 1Gb in bytes.

Thanks & Regards

Venkat

Thanks. It works.

It sometime for me to realize that i need to specify the storage size in bytes. I was giving in GB, my bad.

Regards, Chari.

Hi Chari

Could you please explain a bit more on how you got it working?

druid.query.groupBy.maxOnDiskStorage basically needs to be set at a global level and it can’t be exceeded at context level for the query.

It basically going to spill the intermediate results to disk.

Thanks & Rgds

Venkat

Hi Venkat,

I added the configuration in Broker runtime properties.

Yes, it did spill to disk. But that is okay for our scenario. Even with disk spill, the latency is acceptable.

Regards, Chari.