Using SOCKS for Druid dependencies

Hello, I am running into the following issue when downloading extensions during coordinator node startup:

$ java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath config/_common:config/coordinator:lib/* io.druid.cli.Main server coordinator

2015-08-15T01:18:33,340 INFO [main] io.druid.guice.PropertiesModule - Loading properties from common.runtime.properties

2015-08-15T01:18:33,344 INFO [main] io.druid.guice.PropertiesModule - Loading properties from runtime.properties

Aug 15, 2015 1:18:33 AM org.hibernate.validator.internal.util.Version

INFO: HV000001: Hibernate Validator 5.1.3.Final

2015-08-15T01:18:37,870 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.guice.ExtensionsConfig] from props[druid.extensions.] as [ExtensionsConfig{searchCurrentClassloader=true, coordinates=[io.druid.extensions:mysql-metadata-storage], defaultVersion=‘0.8.0’, localRepository=’/x/home/tsoliman/.m2/repository’, remoteRepositories=[https://repo1.maven.org/maven2/, https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local]}]

2015-08-15T01:18:38,172 INFO [main] io.druid.initialization.Initialization - Loading extension[io.druid.extensions:mysql-metadata-storage] for class[io.druid.cli.CliCommandCreator]

2015-08-15T01:18:58,462 ERROR [main] io.druid.initialization.Initialization - Unable to resolve artifacts for [io.druid.extensions:mysql-metadata-storage:jar:0.8.0 (runtime) -> < [ (https://repo1.maven.org/maven2/, releases+snapshots), (https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local, releases+snapshots)]].

``

The machine is behind a firewall, but it has internet access via a SOCKS5 proxy on localhost:12345. I have tried the following with no avail:

  • adding -DsocksProxyHost=localhost -DsocksProxyPort=12345 to the startup
  • adding the SOCKS values to the ~/.m2/settings.xml
  • export MAVEN_OPTS="-DsocksProxyHost=localhost -DsocksProxyPort=12345"

Is there a way I can have Druid utilize the SOCKS proxy to download the necessary extensions?

Any help is greatly appreciated.

Thanks,

Tim

Hi Tim, this document should help with working around a firewall for tutorials:
https://github.com/druid-io/druid/blob/master/docs/content/tutorials/firewall.md

You can also take a look at: http://druid.io/docs/latest/operations/including-extensions.html

The section on not having dependencies download locally should help with your problem.

Hi Yang,

I am trying to connect with my EMR cluster which is located in an office server using druid which is running in my local office machine. Due to firewall , inside overload log I could see Druid is unable to connect with the server. (log from overload node shown below). I am trying to find a way to create secured socket tunnel. It will be appreciated if you can help me with this .

2016-04-20T19:38:52,432 INFO [task-runner-0] io.druid.indexing.common.task.HadoopIndexTask - Starting a hadoop determine configuration job...
2016-04-20T19:38:53,215 WARN [task-runner-0] org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-04-20T19:38:53,256 INFO [task-runner-0] io.druid.indexer.path.StaticPathSpec - Adding paths[s3://datalytics-datastore/engagement/2016-03-15-12/part-m-00000.gz]
2016-04-20T19:38:53,273 INFO [task-runner-0] io.druid.indexer.HadoopDruidDetermineConfigurationJob - DateTime[2016-03-15T00:00:00.000Z], spec[HadoopyShardSpec{actualSpec=NoneShardSpec, shardNum=0}]
2016-04-20T19:38:53,275 INFO [task-runner-0] io.druid.indexer.JobHelper - Deleting path[/tmp/druid-indexing/AD3_EXTRA/2016-04-20T193828.806Z]
2016-04-20T19:39:14,040 INFO [task-runner-0] org.apache.hadoop.ipc.Client - Retrying connect to server: 172.31.16.55/172.31.16.55:9000. Already tried 0 time(s); maxRetries=45
2016-04-20T19:39:34,045 INFO [task-runner-0] org.apache.hadoop.ipc.Client - Retrying connect to server: 172.31.16.55/172.31.16.55:9000. Already tried 1 time(s); maxRetries=45

Hi,

one more thing, I tried running overload by using the below command hoping that might create the socket tunnel, but didnt work.

java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -DsocksProxyHost=localhost -DsocksProxyPort=8157 -classpath config/_common:config/overlord:lib/*: io.druid.cli.Main server overlord

Hey Anindita,

I have not tested this personally but try adding these to your command line:

-Dhadoop.hadoop.rpc.socket.factory.class.default=org.apache.hadoop.net.SocksSocketFactory

-Dhadoop.hadoop.socks.server=localhost:8157

(the double hadoop.hadoop. is intentional; Druid strips the first “hadoop.” and passes along the rest to the hadoop client)

Hi Gian,

I just now used the below mentioned commands and it worked !. Cool, thanks .