We have multiple deployments of Druid with different amounts of data to ingest.
For each installation we had fine tune the buffer_sizeBytes and fork_buffer_SizeBytes in order to support the ingestion and try to have the lower memory usage…
Finding the right figures for those parameters is complex and so far very “empirical” (we gradually increased until no more errors ) …
My question: is there any formula or advice to find the right value ? maybe based on input throughput/segment size/…
Any advice will be appreciate.
Just checking that you’re referring to
Have to say that I’ve not heard of anyone particularly having a formula or anything… and I suspect that’s because it does depend on query patterns:
The TopN and GroupBy queries use these buffers to store intermediate computed results. As the buffer size increases, more data can be processed in a single pass.
Would be interesting to do though… (makes note to self!) … cos knowing things like that could be useful to know if it’s worth spinning up a different tier with different buffer configurations … hmmm… sorry I couldn’t be more helpful…
Yes I’m talking about
Thanks for your answer… don’t hesitate if you’ve new advices
Hello, just to clarify one point : those 2 parameters are only involved during a query (I guess to merge results…) and not at all at ingestion phase ?
Adding to what Peter mentioned, the distinction being that “druid.indexer.fork.property.druid.processing.buffer.sizeBytes” affects queries on realtime data while the former is for historical queries.