Recommended segment file size with a "but"

If you’re reading this, you’ve probably already seen the following language in the docs:

For Druid to operate well under heavy query load, it is important for the segment file size to be within the recommended range of 300MB-700MB.

But, like so many Druid things, this can be taken as a guideline rather than a hard and fast rule. Much bigger and much smaller segments can both work too. The main downside of much bigger segments is poor parallelization. The main downside of much smaller segments is high overhead per segment.

This tidbit came out of a discussion within the Apache Druid workspace. Feel free to join us there. Here’s the link to the complete discussion.

1 Like

Thanks for highlighting this Mark – that “recommended range” is very often missed … especially if there are not many incoming events in Kafka (by which I mean like < 100 a minute) or when like someone’s just done a really small local install, and then goes on to upgrade to a cluster and go into production but doesn’t turn on autocompaction and doesn’t realise this impacts broker memory usage because of all the segments…

NOT THAT I’VE EVER DONE THAT OF COURSE!!!

As you said – it’s a guide – I seem to remember once hearing that the row count is more important than the segment size? Maybe @Kyle_Hoondert can correct me…?