Comparison effective performance query topN between segmentGranularity by day and by hour

I am processing on large dataset. With the topN query on 2 case (segmentGranularity by day vs segmentGranularity by hour) on the sam “queryGranularity” is by “hour”.
Case 01: by day
“granularitySpec” : {
“type” : “uniform”,
“segmentGranularity” : “day”,
“queryGranularity” : “hour”,
“intervals” : [“2016-08-22/2016-08-23”]
}

Case 02: by hour
“granularitySpec” : {
“type” : “uniform”,
“segmentGranularity” : “hour”,
“queryGranularity” : “hour”,
“intervals” : [“2016-08-22/2016-08-23”]
}

But the time of query on “segmentGranularity” : “day” is slower than “segmentGranularity” : “hour”. Can anyone explain me about this case? Why segment by day is slower than by hour ? And between store data segment by day and by hour, how can I choose the segment type? And how can it effect my query ?
Thanks so much !

One possibility is that if you don’t have a large amount of data, you would get fewer “day” segments than “hour” segments (since there are fewer days than hours) and this will lead to worse parallelism on the query side.

@Gian Merlino
Can you give me a link to base your theory above ? More specifically about parallelism on the query side. Thanks a billion.
Vào 14:01:09 UTC+7 Thứ Sáu, ngày 26 tháng 8 năm 2016, Gian Merlino đã viết:

Parallelism in Druid queries happens at the segment level. So if you have 100 cores but only 10 segments, then Druid can only use 10 cores at a time for a given query (although it could potentially use all 100 cores if you have many concurrent queries). Generally you want the number of segments to be greater than the number of cores in your cluster to get the best parallelism.

Gian
Thanks for support, I clear the issue.
best regard.

Vào 10:26:35 UTC+7 Thứ Sáu, ngày 26 tháng 8 năm 2016, Thao Nguyen đã viết: