[druid-user] Error in Cumulative sum

Hi, I tried to find cumulative sum , this works in small data set only, but it shows error in large data set. Could you please help?

select c1.*,Cum_Sum from “50mbcsvdata_auraa_net_staging” c1
join
(SELECT a.bnx_id as ID,a.tripduration as Student_Trip,SUM(b.tripduration) AS Cum_Sum
FROM “50mbcsvdata_auraa_net_staging” a, “50mbcsvdata_auraa_net_staging” b
WHERE a.bnx_id >= b.bnx_id
GROUP BY a.bnx_id,a.tripduration ORDER BY a.bnx_id) c2
on c2.ID = c1.bnx_id

Error is :
Error: Resource limit exceeded

Subquery generated results beyond maximum[100000]

org.apache.druid.query.ResourceLimitExceededException

Hi,
Yes, that is limited to only very small datasets because it requires a cartesian product join to do it that way.
Unfortunately Apache Druid does not yet have window functions that will give you a cumulative sum for any data volume.
The good news is that if you look at the github repo and search for Window Functions, you’ll see there are folks working on it.

  • Sergio

I don’t know if any of these would be helpful or more efficient than your query, but you might be
interested in Hellmar’s blog post on cumulative sums in druid.