ChunkPeriod with TopN Question

Hi,

i have a TopN Query, which gets the last dataset of the last seven days with a granularity of “day”.

This queue works fine, but has a latency of 800ms to 1.2 seconds, which i try to lower. The database has about 2.4 Billion Datasets and 90 GB of data stored in one datasource.

Now i read, that with setting chunkPeriod i am able to paralise the query and maybe speed up the queue.

When i execute the query with a chunkPeriod of “P1D” i just retrieve a single day, instead of seven. When i use “P2D”, i retrieve one day with two results which are the same.

I read in another thread, that i have to set “merge on” at the server(s), where the only merge i found was at the coordinator configuration, but i don’t think, that this is the right one.

Did i missunderstood the chunkPeriod feature or maybe someone can point me to the right way :slight_smile:

Thanks

Matthias

Hi,

Can you post the query you are sending and expected response (same as you would get if chunkPeriod is not used)?

– Himanshu

Hi Himanshu,

here are the requests and the corresponding results without chunkPeriod and setting chunkPeriod from P1D to P6D.

As i could see, the returned json are invalid after setting the chunkPeriod from P2D to P6D.

But maybe, there is a problem in my query.

Thanks

Matthias

ChunkPeriod_P3D (2.12 KB)

ChunkPeriod_P4D (2.12 KB)

ChunkPeriod_P5D (1.77 KB)

ChunkPeriod_P6D (2.41 KB)

ChunkPeriod_P1D (1.84 KB)

ChunkPeriod_P2D (2.11 KB)

No_ChunkPeriod (2.92 KB)

Hi,

I tried to reproduce it and there is something wrong, it appears topN merging can’t happen after getting results for individual chunks. please do not use chunkPeriod with TopN for now. I have created an issue to investigate it. (https://github.com/druid-io/druid/issues/2262)

– Himanshu

Hi,

thanks for your answer and opening the issue.