[Tranquility + Spark Streaming] How to increase throughput or is there some "best practice"?

Hi,

I’ve used tranquility server for a while and recently I try to change to spark streaming + tranquility. After days of trial, I find that the throughput of transform and send to peons is low.

In my case, kafka topic has 8 partitions and typically 10k~20k events per 5 seconds will send to 8 peons. I am wondering why it should take 6~7 seconds to do that. Process time longer than streaming window and it will make it unusable.

Any idea how to increase throughput or is there some “best practice”?

Sorry, I made a mistake. 100k~200k per 5 seconds

在 2016年3月18日星期五 UTC+8上午11:24:57,Ninglin Du写道:

Hi Ninglin, which version of Tranquility are you using? I know there were several optimizations and fixes that Gian made around throughput in the latest release.

0.7.4

my kafka topic has 8 partitions and about 25,000 events/sec. Firstly I used 8 executors to send to 9 partitions. Spark streaming window is set to 30 seconds. I found that 30 seconds of events will use 1 min to send to peons. Rebalance (Repartition the data) add 1 second overhead, but decrease the cost to about 40 seconds. That confuse me a lot. The throughput is rather low.

I tried to change tranquility.matchBatchSize to 400,000, druidBeam.firehoseChunkSize to 200,000 and druidBeam.firehoseBufferSize to 128m, but it seemed not improved.

At last, I set --executor-cores to 2. In this way 16 cores are used, and the send time reduce to 12~15 seconds. Though it solved the problem, but I still do not know how to improve throughput. Poor configurations? I don’t know how to tune it.

在 2016年3月19日星期六 UTC+8上午5:27:22,Fangjin Yang写道:

Hey Ninglin,

Are you using a singleton Beam for your sending? Not using a singleton is probably the most common problem with Tranquility Spark performance.

If you did check that, another thing to look at is how your DruidBeams stack is created. That can have performance implications too. If you’re using untyped Maps, you’ll get the best performance using a DruidBeams.fromConfig stack (rather than a DruidBeams.builder stack). The fromConfig stack triggers some new optimizations that were added in 0.7.4.

If you’re using custom types, you’ll get best performance if you cache your timestamp in your custom object, and use a Timestamper or timeFn that just accesses that cached timestamp. Otherwise the timestamp will need to be computed a few different times, and this can be expensive.

Hi Ninglin, can you let us know if Gian’s suggestions are helpful?

I imitate your example to implement a singleton Beam. Paste code (scala) is not welcome, but I just want to make sue.

class MapEventBeamFactory(file: String)

extends BeamFactory[java.util.Map[String, AnyRef]] {

override def makeBeam: Beam[java.util.Map[String, AnyRef]] = MapEventBeamFactory.getInstance(file)

}

object MapEventBeamFactory {

@volatile private var instance: Beam[java.util.Map[String, AnyRef]] = null

def getInstance(file: String): Beam[java.util.Map[String, AnyRef]] = {

if (instance == null) {

synchronized {

if (instance == null) {

val dataSource = getDataSource(file)

val props = dataSource.propertiesBasedConfig

val configs = DruidBeams.fromConfig(dataSource)

.discoveryPath(props.discoPath)

if (props.properties.getProperty(“druid.firehose.pattern”, “”) == “”) configs

else {

configs.location(

DruidLocation(

props.druidIndexingServiceName,

props.properties.getProperty(“druid.firehose.pattern”),

dataSource.dataSource

)

)

}

instance = configs.buildBeam()

}

}

}

instance

}

}

``

DruidBeams.fromConfig and ** java.util.Map[String, Any]** are used. If there is no problem with singleton implementation, I think I just did as you said.

在 2016年3月22日星期二 UTC+8上午3:10:49,Gian Merlino写道:

Not yet, still something should be confirmed.

在 2016年3月23日星期三 UTC+8上午8:36:08,Fangjin Yang写道:

Hi,
Ninglin

I am working on spark streaming and druid these days, i found that my configuration for the beamFactory is always wrong,could you please send a demon code to my email?

neters@foxmail.com

thank you very much

在 2016年3月18日星期五 UTC+8上午11:24:57,Ninglin Du写道:

em, I have not use tranquility+spark for a long time and switched to tranquility+kafka. You can send me a copy of yours and related codes in detail.

在 2017年11月2日星期四 UTC+8上午2:53:10,孙伟写道: