Suggestion on Realtime optimization

I’m running on a 0.9.1.1 version with Realtime and Kafka.

I’ve a datasource that is increased a lot as traffic from 10-15 milion rows per hour to 50-55 million.

This is currently causing a lot of lag on Kafka during some peak periods.

The datasource has 58 dimensions and 27 metrics.

The tuningConfig is setup as:
“tuningConfig”: {
“type” : “realtime”,

 "maxRowsInMemory": 400000,

 "intermediatePersistPeriod": "PT10m",

 "windowPeriod": "PT1h",

 "basePersistDirectory": "\/usr\/local\/dataStorage",

 "rejectionPolicy": {

   "type": "serverTime"

 },

 "shardSpec": {

   "type": "linear",

   "partitionNum": 0

   }

}

``

and firehose as:
“firehose”: {

   "type": "kafka-0.8",

   "consumerProps": {

     "zookeeper.connect": "10.80.X.Y:2181,10.80.X.Y:2181,10.80.X.Y:2181",

     "[zookeeper.connection.timeout.ms](http://zookeeper.connection.timeout.ms/)" : "15000",

     "[zookeeper.session.timeout.ms](http://zookeeper.session.timeout.ms/)" : "15000",

     "[zookeeper.sync.time.ms](http://zookeeper.sync.time.ms/)" : "5000",

     "[group.id](http://group.id/)": "group5",

     "fetch.message.max.bytes" : "1048586",

     "auto.offset.reset": "largest",

     "auto.commit.enable": "false"

   },

``

Some questions:

  • the lag can be caused by the amount of dimension and metrics and when realtime receives a lot of rows needs time to index it and slow down?

  • how the performance are affected increasing or decreasing the maxRowsInMemory?

  • the row size is around 1700 byte, increasing the fetch.message.max.bytes from 1048586 to 1548576 could be better or not?

  • if I clone the realtime and set one per Kafka partition will the shardSpec be like
    realtime1
    realtime2
    realtime3
    “shardSpec”: {
    “type”: “linear”,
    “partitionNum”: 0
    }
    “shardSpec”: {
    “type”: “linear”,
    “partitionNum”: 1
    }
    “shardSpec”: {
    “type”: “linear”,
    “partitionNum”: 2
    }

having three partitions at Kafka side?

Thanks
Alessandro Conti

Hi Alessandro

A simple and quick solution could be check your Linux boxes load averages.

Load Averages should be less than or equal to your cores

Example for a “4” core machine …Your load average should not cross value "4 " …if it crosses 4 it means your processing is bottle neck and you might require more cores to process your data within time.

Thank you,

Kishore

The load average doesn’t seems to be an issue, it’s always lower the limit.
Also looking with Jconsole the CPU Usage is around 20-30%

Can you please help me with the questions I’ve typed?

Regards

Alessandro