I am running native queries using the .net application on the druid database.
I am not having any issue if continuously accessing data from the druid DB. But if the database is idle for some time like no data accessed from UI then after some time it takes too much time on accessing the data first time and then no issue with subsequent requests.
For example: If no data is accessed from the database for 10 minutes and then executing the query through postman then it takes around 30 seconds to return the data but if the same query is executed again instantly then it returns data in less than 2 seconds.
Is there a configuration change we need to make to solve this issue or is there any other workaround?
A couple of things come to mind: one may be that your query results are cached but are being evicted for some reason [see Using query caching · Apache Druid] - another is that the data may have been shipped off the middle managers and is now on the historicals - this would happen over time though…
One thought is since druid does memory mapping of segments it is expected that the os will unload unused files from the page cache. Hence when a query runs after a few minutes time is spent in loading the segments into page cache. Two ways to reduce this
Schedule a query to run every few seconds.
Tune kernel param vm.vfs_cache_pressure to reduce the tendency of the os to unload unused files from page cache.
I have multiple databases on the druid. So need to run the query every second on all the DBs or running only on a single DB will work?
you need to run on each data source as the segments are different files
This approach did not solve the issue.
I have around 150 data sources on druid so I first fetched the list of all data sources then run the same query on every database. But when I open the UI again then from there the first time it still takes time.
Attached is the current structure and status of the druid DB.
A couple of things come to mind… how are your segment sizes, etc?
Also, when you simply
curl the SQL endpoint, do you get the same performance differences? I was wondering if there’s something on the .Net end?
The maximum size for any segment till now is 10MB. I am using NuGet Gallery | Druid4Net 2.2.0 package to query the druid data source from the .net application.
I think @Vijay_Narayanan1 is right here – plus also bear in mind that there is an overhead associated with each segment as it is loaded – that’s one of the reasons for the recommended segment sizes of 500-700MB.
Are you running in single node quickstart, also? That’s really great for learning concepts and things, but as you do more serious testing and development, I definitely would start looking at setting up a cluster – even if it’s just a tiny one! I have one running on a bunch of Raspberry Pis…!