Managing size of segments in tranquility ingestion

Hi,

We are evaluating Druid cluster for real time logs analysis use case. The data is being ingested from Kafka through Tranquility.

But the amount of logs and hence data is quite variable throughout the day. To keep the segments short, i am using windowPeriod of 15 Minutes and segmentGranularity of 30 Minutes -

“granularitySpec” : {

“type” : “uniform”,

“segmentGranularity” : “FIFTEEN_MINUTE”,

“queryGranularity” : “none”

},

“ioConfig” : {

“type” : “realtime”

},

“tuningConfig” : {

“type” : “realtime”,

“maxRowsInMemory” : “50000”,

“intermediatePersistPeriod” : “PT10M”,

“windowPeriod” : “PT10M”

}

},

“properties” : {

“task.partitions” : “1”,

“task.replicants” : “1”

}

This results in following segment sizes sample (each segment is 15 minutes)

Hi Chaitanya,

It’s not possible with Tranquility to do this level of customization. It makes one segment per partition per hour.

However, you could try the Kafka indexing service to read directly from Kafka into Druid (http://druid.io/docs/latest/development/extensions-core/kafka-ingestion.html). It lets you set a max segment size. It currently (as of Druid 0.11.x) can have an issue where it creates too many small segments, but this is being addressed in 0.12.0 (upcoming; an RC1 is out if you want to try it).

Thanks for clarifying Gian.

With Kafka Indexing Service, i can see that we can set maxRowsPerSegment parameter. What happens when the number of rows exceed this value for a segment ?

How does this parameter relate to QueryGranularity. Meaning, if set QueryGranularity as “HOUR”, and for some hours, the number of rows cross the limit maxRowsPerSegment, will druid reject the rows ?

  • Chaitanya

Hi Chaitanya,

When the number of rows exceeds maxRowsPerSegment then Druid will start publishing a new segment. It won’t reject any rows.