Hello,
I want to manage data in HOUR segments granularity but with 5min data inside for queries.
I use this spec :
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "HOUR",
"queryGranularity": {
"type": "duration",
"duration": 300000,
"origin": "1970-01-01T00:00:00.000Z"
},
"rollup": true,
"intervals": null
},
It works on one of my bench but not an another… maybe due to an extra conf and/or weird behavior…
Is there any other setting that can explain this behavior ?
Thanks
What do you mean by “doesn’t work”? That it throws an error or doesn’t rollup the data correctly? Are you using the same query and is the data the same on both systems?
Hello Rachel,
No errors throws, Supervisor and task run correctly but indeed data doesn’t rollup correctly.
No the data are not the same on both systems, its differents sources.
I test with that query
SELECT distinct __time FROM datasource_5min
one return data every 5 min
14 results in 0.42s
2019-09-19T19:00:00.000Z
2019-09-19T19:05:00.000Z
2019-09-19T19:10:00.000Z
other server return every hour
2021-08-19T10:00:00.000Z
2021-08-19T11:00:00.000Z
2021-08-19T12:00:00.000Z
to add some details, there only few rows to manage in the 1st system, but huge amount in the second (several thousands per seconds).
Do this huge amount of data can affect the manner that Druid will store it ? I see an publishing phase at the end of each task… there is maybe a rollup manage differently between the 2 systems…
Can you verify in the console that you have set up your query granularity as expected? Have those segments been compacted in the 2nd system? You can also check the metadata with queries:
https://docs.imply.io/2021.07/druid/querying/segmentmetadataquery/#analysistypes
Hello and thank you Rachel for your quick answers.
Finally we found the root cause, it was due to our source data in kafka… so nothing related to Druid sorry 
1 Like