Tranquility upgrade issue

Hi,

I’m attempting to update my application from using tranquility 0.2.16 to the latest and greatest version…I’ve tried version 0.3.2 and 0.5 and I get the same result. With 0.3.2 or 0.5 the app hangs right after the curator / zookeeper connection.

0.3.2 and 0.5:

2015-10-30 20:26:17.672 INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server edgar-2/192.168.90.12:2181. Will not attempt to authenticate using SASL (unknown error)

2015-10-30 20:26:17.675 INFO org.apache.zookeeper.ClientCnxn - Socket connection established to edgar-2/192.168.90.12:2181, initiating session

2015-10-30 20:26:17.676 DEBUG org.apache.zookeeper.ClientCnxn - Session establishment request sent on edgar-2/192.168.90.12:2181

2015-10-30 20:26:17.679 TRACE org.apache.zookeeper.ClientCnxnSocket - readConnectResult 37 0x[0,0,0,0,0,0,ffffff9c,40,2,50,ffffff8b,ffffffb1,6c,ffffffcd,0,ffffffa0,0,0,0,10,ffffffff,78,c,2,6,1d,61,fffffff5,ffffffb9,1,64,ffffffc7,ffffffbf,ffffffd6,53,fffffff8,0,]

2015-10-30 20:26:17.681 INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server edgar-2/192.168.90.12:2181, sessionid = 0x2508bb16ccd00a0, negotiated timeout = 40000

2015-10-30 20:26:17.685 INFO org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED

2015-10-30 20:26:17.686 TRACE org.apache.curator.utils.DefaultTracerDriver - Trace: connection-state-parent-process - 3 ms

2015-10-30 20:26:31.027 DEBUG org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x2508bb16ccd00a0 after 1ms

2015-10-30 20:26:44.374 DEBUG org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x2508bb16ccd00a0 after 0ms

2015-10-30 20:26:57.709 DEBUG org.apache.zookeeper.ClientCnxn - Got ping response for sessionid: 0x2508bb16ccd00a0 after 0ms

Basically it will just sit there and get ping responses. The code is hung in the DruidBeams.buildJavaService(). 0.2.16 would follow those messages with this (and it functions normally after that):

2015-10-30 20:00:01.198 WARN org.apache.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.

2015-10-30 20:00:01.199 TRACE org.apache.curator.utils.DefaultTracerDriver - Trace: 1 ms

2015-10-30 20:00:01,546 INFO [main] org.hibernate.validator.internal.util.Version - HV000001: Hibernate Validator 5.0.1.Final

2015-10-30 20:00:01.693 INFO io.druid.guice.JsonConfigurator - Loaded class[class io.druid.guice.ExtensionsConfig] from props[druid.extensions.] as [ExtensionsConfig{searchCurrentClassloader=true, coordinates=, localRepository=’/home/bon/.m2/repository’, remoteRepositories=[http://repo1.maven.org/maven2/, https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local]}]

2015-10-30 20:00:01.943 INFO com.metamx.emitter.core.LoggingEmitter - Start: started [true]

2015-10-30 20:00:02.037 INFO com.twitter.finagle - Finagle version 6.18.0 (rev=a12a6ab5ce5d213c3753ad5884daa712df1b973b) built at 20140625-103953

2015-10-30 20:00:02.090 DEBUG org.apache.zookeeper.ClientCnxn - Reading reply sessionid:0x3508bb16cac0030, packet:: clientPath:null serverPath:null finished:false header:: 1,3 replyHeader:: 1,34359771081,0 request:: '/druid,F response:: s{12884947409,12884947409,1403217745330,1403217745330,0,8,0,0,0,8,25775706342}

For all three versions I have pinned the curator framework to 2.6.0 and the jackson libraries to 1.9.13. Tranquility 0.2.16 continues to work, the others do not. Any ideas on what might be the cause?

We are running zookeeper 3.4.6 and druid-0.6.173. I’m trying to get ready to upgrade to druid 0.7.X

Thanks

Mark

Hey Mark,

Could you post the code that you’re using to set up the beam stack?

And could you post a thread dump of the process that shows a stack trace of where it’s getting stuck? (jstack -l [pid])

Btw, if you’re upgrading anyway, Druid 0.8.1 has been out for a while and has a lot of improvements over 0.7.x.

Hmmm… I thought you had to upgrade to 0.7.X before 0.8?

In any event stack trace attached. Strange I would have thought to see something running in the io.druid namespace.

Beam code follows. You end up seeing the connecting to druid log message but not the following starting data flush timer.

   log.info("Starting Curator")

curator = CuratorFrameworkFactory.builder()

.connectString(config.getDruidZookeeper()).retryPolicy(new BoundedExponentialBackoffRetry(1000, 120000, 10)).build()

curator.start()

    log.info("Connecting to Druid")

log.info(“Timestamp value: {}”, config.getDruidDataTimestamp())

    // build the druid beam service

druidService = DruidBeams

.builder(new Timestamper<Map<String, Object>>() {

def cnt = 0

@Override

public DateTime timestamp(Map<String, Object> theMap) {

def dt = new DateTime(theMap.get(config.getDruidDataTimestamp().toLowerCase()))

return dt

}

})

.curator(curator)

.timestampSpec(new TimestampSpec(config.getDruidDataTimestamp().toLowerCase(), “millis”))

.discoveryPath(config.getDruidDiscoveryPath())

.location(new DruidLocation(new DruidEnvironment(config.getDruidIndexService(), config.getDruidFirehosePattern()), config.getDruidDataSource()))

.rollup(DruidRollup.create(DruidDimensions.specific(dim.druidDimensions), config.getDruidDataAggregators().druidAggregators, config.getDruidDataGranularity().granularity))

.tuning(ClusteredBeamTuning.create(config.getDruidTuningSegmentGranularity(), config.getDruidTuningWarmingPeriod(),

config.getDruidTuningWindowPeriod(), config.getDruidTuningPartitions(), config.getDruidTuningReplicants()))

.buildJavaService()

    log.info("Starting data flush timer")

stack.txt (18.9 KB)

I’ve done a little more research. Version 0.2.19 is the last one that appears to work. Version 0.2.2 shows the hanging behaviour.
Going to see if something sticks out.

Annnnndddd… it looks like I had a dimension and aggregator with the same name.

On the earlier versions of tranquility “it worked”. On newer versions of tranquility (like 0.5.0) it just dies after the curator connection with no messaging. Looks like there might have been some dependency issues too, but not all the versions I tested were release versions

Mark

Hey Mark,

Glad you figured out what was going on. And, yeah, that dimension-metric name check was added at some point because Druid actually gets confused when it’s asked to create a metric with the same name as a dimension (you can get around that by changing the metric name).

And, from the stack trace, it looks like things were stuck because your “main” thread had exited. Strange that there was no exception message printed on the console. Usually the JVM will print a stack trace when “main” dies, unless something had caught and suppressed the error.