Discussion - If druid has recommended segment size then why the concept segment granularity

Why does the concept of segment granularity exist when druid has recommended segment size? Handoffs? Segment swapping?

please refer to this white paper it has the answer to this question


I read the druid pdf till section 5 query API (inclusive) but not able to understand the reason. Can you please point me to the section that I should go through again?

it is explained over all the paper. segment granularity is used to logically and physically partition the data, hence you can replace a segment by another from the same interval, or a new segment with higher segment interval will overshadow multiple small interval over the same period of time.

The bottom line, the segment granularity is used to partition the data and define logical view. The recommend size is to optimize that partitioning the data, and one does not exclude the other.

Let me re phrase the question.

Having segments is for sharding/partitioning based on timeseries. But if druid works best with 300 MB to 700 MB segment size then why can the realtime node not just keep on using the same segment till that range is reached. Realtime node does the partitioning based on time why not the size? It will still be partitioned and the timestamp can still be there for the overshadowing and replacement.

You can not build a logical timeline based on the strategy you are describing, in addition replication will be meaning less.

For instance suppose you have 2 realtime node working as one replica of the other.

now due to the distributed nature of thing, both nodes will not receive the message in the same order.

Hence if you just do partition based on size you will produce two different segments with approximately the same size while the goal was to have two replicas !!!

The same thing apply to timeline imaging you partition by size now when i issue a query with interval t1 to t2 how you will know which segments to query ?

Not sure what is the end goal of this discussion ?

Was just trying to understand why this was so as I could not understand why such a decision was taken. Thanks for all the explanations. It helps.

Great glad i was able to explain it :D. Please let me know if you have more questions.