Druid + Docker + Tranquility - Zookeeper znode issue

I am trying to ingest data into Druid from spark. Both Druid and Spark run on docker containers each. I am using Tranquility for ingesting data into Druid. The docker for Druid is the official one from druid-io: https://github.com/druid-io/docker-druid

I am seeing this error:

2016-01-31 14:50:57,637 DEBG ‘zookeeper’ stdout output:

2016-01-31 14:50:57,636 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /192.168.99.1:49197

2016-01-31 14:50:57,636 [myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old client /192.168.99.1:49197; will be dropped if server is in r-o mode

2016-01-31 14:50:57,636 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@841] - Refusing session request for client /192.168.99.1:49197 as it has seen zxid 0x50 our last zxid is 0x23 client must try another server

2016-01-31 14:50:57,636 [myid:] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /192.168.99.1:49197 (no session established for client)

2016-01-31 14:50:58,795 DEBG ‘zookeeper’ stdout output:

2016-01-31 14:50:58,789 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x152981fdc7a0004 type:create cxid:0x2 zxid:0x24 txntype:-1 reqpath:n/a Error Path:/tranquility/beams/overlord/test Error:KeeperErrorCode = NoNode for /tranquility/beams/overlord/test

2016-01-31 14:50:58,856 DEBG ‘zookeeper’ stdout output:

2016-01-31 14:50:58,856 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x152981fdc7a0004 type:create cxid:0xc zxid:0x2a txntype:-1 reqpath:n/a Error Path:/tranquility/beams/overlord/test/mutex/locks Error:KeeperErrorCode = NoNode for /tranquility/beams/overlord/test/mutex/locks

2016-01-31 14:50:58,902 DEBG ‘zookeeper’ stdout output:

2016-01-31 14:50:58,902 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x152981fdc7a0004 type:create cxid:0x17 zxid:0x2e txntype:-1 reqpath:n/a Error Path:/tranquility/beams/overlord/test/mutex/leases Error:KeeperErrorCode = NoNode for /tranquility/beams/overlord/test/mutex/leases

I am assuming that the required path on zookeeper is not found : /tranquility/beams/overlord/*. I tried grepping for the properties file on the docker-machine (on Mac OSX) with no luck.

I was able to get the list of paths from zookeeper and I could see /discovery/druid:overlord present. This is how my BeamFactory code looks like:

class EventRDDBeamFactory extends BeamFactory[Map[String,String]] {

lazy val makeBeam: Beam[Map[String,String]] = {

val curator = CuratorFrameworkFactory.newClient(

  "**192.168.99.100:3181**",

new BoundedExponentialBackoffRetry(100, 3000, 5))

curator.start()

val indexService = "**druid/overlord**" 

val discoveryPath = "**/discovery**"

val dataSource = "test"

val dimensions = IndexedSeq(“ip”)

val aggregators = Seq(new CountAggregatorFactory(“website”))

val timestampFn = (message: Map[String,String]) => new DateTime(message.get(“time”).get)

DruidBeams

.builder(timestampFn)

.curator(curator)

.discoveryPath(discoveryPath)

.location(DruidLocation.create(indexService, dataSource))

.rollup(DruidRollup(SpecificDruidDimensions(dimensions), aggregators, QueryGranularity.MINUTE))

.tuning(

ClusteredBeamTuning(

segmentGranularity = Granularity.HOUR,

windowPeriod = new Period(“PT10M”),

partitions = 1,

replicants = 1

)

)

.buildBeam()

}

}

Can anyone please help me understand what could be the issue here?

I also have a few questions:

  1. Looking at the supervisord config for the docker image, I didnt see any command line overrides specified for the overlord’s service name and the path. I could not find the required *.properties file on the docker container. Is there any way (maybe an API) that is available that helps find out these values and maybe even override those? The overlord console only lists the set of tasks etc., but not the config.

  2. I had also played around a bit by creating the required path manually on zookeeper and gave complete access on the same (/tranquility/beams/overlord/test/*). Even then, I got the same error. This makes me think it is not necessarily an issue with zookeeper paths?

I’m not 100% sure if anyone has actually been able to get the Druid docker image working. I think you’ll have much better luck with the Docker distribution here http://imply.io/download if you are just starting out with Druid.

Thanks Fangjin for the response, I am starting with Imply packed distribution. I have started a single machine druid instance using the quickstart.

I have made the following changes to the quickstart.conf:

:verify bin/verify-java

:verify bin/verify-default-ports

!p10 zk bin/run-zk conf-quickstart

coordinator bin/run-druid coordinator conf-quickstart

broker bin/run-druid broker conf-quickstart

historical bin/run-druid historical conf-quickstart

!p80 overlord bin/run-druid overlord conf-quickstart

!p90 middleManager bin/run-druid middleManager conf-quickstart

#tranquility-server bin/tranquility server -configFile conf-quickstart/tranquility/server.json

Uncomment to use Tranquility Kafka

#tranquility-kafka bin/tranquility kafka -configFile conf-quickstart/tranquility/kafka.json

Apart from this, I haven’t made any changes to any other config files. I am able to get the requests get to Druid through tranquility library, but I see the following error on zookeeper:

2016-02-10 12:52:31,558 INFO [SyncThread:0] org.apache.zookeeper.server.ZooKeeperServer - Established session 0x152cb3c23a20005 with negotiated timeout 40000 for client /10.196.192.38:49836

2016-02-10 12:52:33,943 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x152cb3c23a20005 type:create cxid:0x2 zxid:0x44 txntype:-1 reqpath:n/a Error Path:/tranquility/beams/druid:overlord/druid_ingest Error:KeeperErrorCode = NoNode for /tranquility/beams/druid:overlord/druid_ingest

2016-02-10 12:52:33,963 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x152cb3c23a20005 type:create cxid:0xc zxid:0x4a txntype:-1 reqpath:n/a Error Path:/tranquility/beams/druid:overlord/druid_ingest/mutex/locks Error:KeeperErrorCode = NoNode for /tranquility/beams/druid:overlord/druid_ingest/mutex/locks

2016-02-10 12:52:33,980 INFO [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x152cb3c23a20005 type:create cxid:0x17 zxid:0x4e txntype:-1 reqpath:n/a Error Path:/tranquility/beams/druid:overlord/druid_ingest/mutex/leases Error:KeeperErrorCode = NoNode for /tranquility/beams/druid:overlord/druid_ingest/mutex/leases

My datasource name is druid_ingest. I don’t see any real-time indexing tasks on the coordinator either. And there are no datasources returned from the broker API:8082/druid/v2/datasources.

I see this in the overlord logs, when the overlord starts up:

2016-02-10T13:05:49,020 WARN [main] com.metamx.common.RetryUtils - Failed on try 1, retrying in 2,054ms.

org.skife.jdbi.v2.exceptions.UnableToObtainConnectionException: java.sql.SQLException: Cannot create PoolableConnectionFactory (java.net.ConnectException : Error connecting to server localhost on port 1,527 with message Connection refused.)

    at org.skife.jdbi.v2.DBI.open(DBI.java:230) ~[jdbi-2.63.1.jar:2.63.1]

    at org.skife.jdbi.v2.DBI.withHandle(DBI.java:279) ~[jdbi-2.63.1.jar:2.63.1]

    at io.druid.metadata.SQLMetadataConnector$2.call(SQLMetadataConnector.java:108) ~[druid-server-0.8.3-iap1.jar:0.8.3-iap1]

    at com.metamx.common.RetryUtils.retry(RetryUtils.java:38) [java-util-0.27.4.jar:?]

    at io.druid.metadata.SQLMetadataConnector.retryWithHandle(SQLMetadataConnector.java:113) [druid-server-0.8.3-iap1.jar:0.8.3-iap1]

    at io.druid.metadata.SQLMetadataConnector.createTable(SQLMetadataConnector.java:157) [druid-server-0.8.3-iap1.jar:0.8.3-iap1]

    at io.druid.metadata.SQLMetadataConnector.createConfigTable(SQLMetadataConnector.java:231) [druid-server-0.8.3-iap1.jar:0.8.3-iap1]

    at io.druid.metadata.SQLMetadataConnector.createConfigTable(SQLMetadataConnector.java:374) [druid-server-0.8.3-iap1.jar:0.8.3-iap1]

    at io.druid.guice.JacksonConfigManagerModule$1.start(JacksonConfigManagerModule.java:56) [druid-common-0.8.3-iap1.jar:0.8.3-iap1]

    at com.metamx.common.lifecycle.Lifecycle.start(Lifecycle.java:244) [java-util-0.27.4.jar:?]

    at io.druid.guice.LifecycleModule$2.start(LifecycleModule.java:155) [druid-api-0.3.13.jar:0.8.3-iap1]

    at io.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:71) [druid-services-0.8.3-iap1.jar:0.8.3-iap1]

    at io.druid.cli.ServerRunnable.run(ServerRunnable.java:38) [druid-services-0.8.3-iap1.jar:0.8.3-iap1]

    at io.druid.cli.Main.main(Main.java:99) [druid-services-0.8.3-iap1.jar:0.8.3-iap1]

Caused by: java.sql.SQLException: Cannot create PoolableConnectionFactory (java.net.ConnectException : Error connecting to server localhost on port 1,527 with message Connection refused.)

    at org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:2152) ~[commons-dbcp2-2.0.1.jar:2.0.1]

    at org.apache.commons.dbcp2.BasicDataSource.createDataSource(BasicDataSource.java:1903) ~[commons-dbcp2-2.0.1.jar:2.0.1]

    at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:1413) ~[commons-dbcp2-2.0.1.jar:2.0.1]

    at org.skife.jdbi.v2.DataSourceConnectionFactory.openConnection(DataSourceConnectionFactory.java:36) ~[jdbi-2.63.1.jar:2.63.1]

    at org.skife.jdbi.v2.DBI.open(DBI.java:212) ~[jdbi-2.63.1.jar:2.63.1]

    ... 13 more

Caused by: java.sql.SQLNonTransientConnectionException: java.net.ConnectException : Error connecting to server localhost on port 1,527 with message Connection refused.

    at org.apache.derby.client.am.SQLExceptionFactory.getSQLException(Unknown Source) ~[derbyclient-10.11.1.1.jar:?]

    at org.apache.derby.client.am.SqlException.getSQLException(Unknown Source) ~[derbyclient-10.11.1.1.jar:?]

    at org.apache.derby.jdbc.ClientDriver.connect(Unknown Source) ~[derbyclient-10.11.1.1.jar:?]

    at org.apache.commons.dbcp2.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:39) ~[commons-dbcp2-2.0.1.jar:2.0.1]

    at org.apache.commons.dbcp2.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:205) ~[commons-dbcp2-2.0.1.jar:2.0.1]

    at org.apache.commons.dbcp2.BasicDataSource.validateConnectionFactory(BasicDataSource.java:2162) ~[commons-dbcp2-2.0.1.jar:2.0.1]

    at org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:2148) ~[commons-dbcp2-2.0.1.jar:2.0.1]

    at org.apache.commons.dbcp2.BasicDataSource.createDataSource(BasicDataSource.java:1903) ~[commons-dbcp2-2.0.1.jar:2.0.1]

    at org.apache.commons.dbcp2.BasicDataSource.getConnection(BasicDataSource.java:1413) ~[commons-dbcp2-2.0.1.jar:2.0.1]

    at org.skife.jdbi.v2.DataSourceConnectionFactory.openConnection(DataSourceConnectionFactory.java:36) ~[jdbi-2.63.1.jar:2.63.1]

    at org.skife.jdbi.v2.DBI.open(DBI.java:212) ~[jdbi-2.63.1.jar:2.63.1]

    ... 13 more

My datasource is created, because of the above? Is there any way I can debug here?

Adding my code for reference:

val DRUID_INDEX_SERVICE = “druid/overlord”

val DRUID_DISCOVERY_PATH = “/discovery”

val DATA_SOURCE = “druid_ingest”

val DRUID_TRANQUILITY_RETRY_POLICY = new BoundedExponentialBackoffRetry(100, 3000, 5)

val DRUID_TRANQUILITY_TUNING = ClusteredBeamTuning(

segmentGranularity = Granularity.HOUR,

windowPeriod = new Period(“PT10M”),

partitions = 1,

replicants = 1)

lazy val BeamInstance: Beam[Map[String, String]] = {

val curator = CuratorFrameworkFactory.newClient(

ConfigConstants.DRUID_ZOOKEEPER,

  ConfigConstants.DRUID_TRANQUILITY_RETRY_POLICY)

curator.start()

val dimensions =

val aggregators =

val timestampFn = (message: Map[String, String]) => CommonUtil.convertMillisToJodaDate(message.get(“timestamp”).get)

DruidBeams

.builder(timestampFn)

.curator(curator)

.discoveryPath(ConfigConstants.DRUID_DISCOVERY_PATH)

.location(DruidLocation.create(ConfigConstants.DRUID_INDEX_SERVICE, ConfigConstants.DATA_SOURCE))

.rollup(DruidRollup(SpecificDruidDimensions(dimensions), aggregators, QueryGranularity.MINUTE))

.tuning(ConfigConstants.DRUID_TRANQUILITY_TUNING)

.buildBeam()

}

Any help here would be greatly appreciated? I also posted this on the IRC channel, but I didn’t get any response.

Hey Ram,

That error makes it sound like your coordinator isn’t starting up properly (the default single-machine setup involves the coordinator running a derby server on port 1527). Is there anything interesting in the coordinator logs?

Thanks Gian for quickly jumping to help.

I don’t see anything weird in the coordinator logs. In-fact, I see that derby server is running on port 1527 and the coordinator process which runs on 8081 is managing the derby server as well (from netstat). But I see a connect exception to the derby server in the overlord logs, as I had mentioned above.

Anyways I am attaching the complete var/sv folder, zipped up. The logs were taken after a few rounds of my spark job trying to pump in data into druid through tranquility. I have replaced my druid host as ‘hostname.com’ and my client as ‘CLIENT_IP’.

Please let me know if you find anything missing. Thanks again for the help.

  • Ram

sv.zip (69 KB)

Hey Ram,

Actually looking through those logs I think the exceptions you see are “normal” startup things (they’re mostly transient errors caused by the fact that the cluster hasn’t fully started up yet). It seems from the logs that things are OK once the cluster has got going. What exactly is not working? What happens when you run your Tranquility program?

Guys what are the !p10 !p80 !p90 prefixes in supervise configuration? Is it an ordering? I cannot figure that out from bin/supervise

Looks like start or kill ordering but why such weird numbers like 10, 80, 90 ???