Segment Sized\

Hi Druid Community,

I wanted to understand this partition of segment size in a better manner
by giving a simple example [ I am using kafka indexing service for insertion
having 3 tasks ]. I have kept the segment granularity as HOUR. By default it is
creating different segment for each partition of KAFKA and according to limit
on maxRowsPerSegment (50 million) it is creating 2 partitions for each segment
(KAFKA partition)

Attaching image (KAFKA_Indexing.JPEG)

What I wanted to understand was – the size of
complete S0 needs to be within 500
MB or segment_0_1/segment_0_2 can seperately be remain under 500 MB.


Arpan Khagram

Mobile: +91 8308993200


Hey Arpan,

Each of segment-0_1 and segment-0_2 should be approximately 500 MB.

To be clear, the recommendation that segments be sized somewhere around ~500MB is a rough guideline and not a constraint. The idea is that if your segments are too small, the overhead of scanning so many segments will become significant and your query performance will suffer. On the other hand, if your segments are too large, you will not be able to take full advantage of parallelism in you queries and again performance will suffer. In most deployments, segments sized in the mid-hundreds of MB seem to perform best.


In this case at what level is the index maintained - is there an index for each of these segment-0_1, segment-0_2 etc or is it at the segment level say for time 08AM to 09AM … which would encompass all the partitions S0/S1/S2?

For e.g with the design Arpan has (of segment granularity of 1 hour) when a query is fired to fetch records for say 15 mins … would the broker load only one of either ‘segment-0_1 or segment-0_2’ or would it load the entire 1 hr segment … which means both these shards are loaded (and also S1 and S2 are loaded - as they are all part of the same DataSource and same cover a 1 hr worth of stream data)?

I am trying to understand if each of these shards are indexed and loaded at query time independently or are all of these loaded when a query falls in the same hour range … being served by these segments. Assuming the query is first reviewed for time range and then the broker identifies the segments to be loaded and queried… wont all these shards/partitions S0, S1 and S2 be loaded … which together could go upto 3GB if each of the ‘segment-‘x’_1/2’ are sized for approx. 500MB.

In this case shouldn’t the total segment … aggregated input of all the tasks (3 in this instance) contributing to the 1 hr segment be approximately 500MB and not the partitions ?



Shouldn’t the segment S0 be 500 MB (and not ‘segment-0_1 and segment-0_2’ individually 500MB each)

Hey Aju,

Each segment (i.e. segment-0_1 or segment-0_2) is self-contained, meaning that it includes its own indexes for the data contained in that segment.

There’s some details on how brokers distribute query requests among historicals that might be interesting here:

In your example, the broker would forward the query (not load the segment as they are all already loaded on the historical nodes) to the historicals that are serving all of segment-0_1, segment-0_2, segment-1_1, segment-1_2, segment-2_1, and segment-2_2. When the query runs on the historical, the ones that run on the second set of segments (segment-0_2, segment-1_2, and segment-2_2) would return pretty quickly as they realize there’s nothing for them to do since their events start at the 2nd half of the hour.

The reason the ~500MB is associated with a partition and not the time interval is that each partition can be (and usually is) handled by different historical nodes in a cluster setup. Hence in an ideal setup, the scan time wouldn’t be for reading 6 * 500 MB = 3GB of data sequentially, but 6 parallel scans of 500MB each.