Submit indexing tasks from scala

Hi,

I am trying to submit indexing tasks using scala’s system process API. We want to submit our index tasks directly from our code.

I was able to run a GET to get task status successfully as shown below.

val cmd = Seq(“curl” ,“cnn-druid-7477.ccg21.dev.corp.com:8090/druid/indexer/v1/task/index_hadoop_conversion_2016-10-12T19:37:11.169Z/status”)

cmd: Seq[String] = List(curl, cnn-druid-7477.ccg21.dev.corp.com:8090/druid/indexer/v1/task/index_hadoop_conversion_2016-10-12T19:37:11.169Z/status)

cmd.!!

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

195 195 0 195 0 0 16619 0 --:–:-- --:–:-- --:–:-- 21666

res44: String =

“{“status”:{“id”:“index_hadoop_conversion_2016-10-12T19:37:11.169Z”,“status”:“SUCCESS”,“duration”:322958},“task”:“index_hadoop_conversion_2016-10-12T19:37:11.169Z”}”

But the POST to submit a index task fails to trigger the indexing successfully.

val cmd = Seq(“curl” , “–data” , “@darwin_insights_latest.json” ,“cnn-druid-7477.ccg21.dev.corp.com:8090/druid/indexer/v1/task”)

cmd: Seq[String] = List(curl, --data, @darwin_insights_latest.json, cnn-druid-7477.ccg21.dev.corp.com:8090/druid/indexer/v1/task)

cmd.!!

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

100 9391 0 0 100 9391 0 1186k --:–:-- --:–:-- --:–:-- 1834k

res51: String = “”

Has anyone attempted something like this? The other options are to use Tranquility or other Druid APIs for submit index tasks. Before i move on to other options, wanted to know if its possible to index this way.

Thanks,

Kasi.

Hey Kasi,

This should work; maybe you can add “-v” to the curl command to get more info. You can also use an HTTP library to submit the tasks directly from Scala without going through curl. Tranquility uses Finagle (https://twitter.github.io/finagle/) for this, and there are a lot of other options.

Thanks Gian.

With the verbose, i see some details. Looks like the connection was successful. Is there anything obvious here? I will try with the HTTP library too.

POST /druid/indexer/v1/task HTTP/1.1

User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.14.3.0 zlib/1.2.3 libidn/1.18 libssh2/1.4.2

Host: cnn-druid-7477.ccg21.dev.corp.com:8090

Accept: /

Content-Length: 9391

Content-Type: application/x-www-form-urlencoded

Expect: 100-continue

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed

0 9391 0 0 0 0 0 0 --:–:-- --:–:-- --:–:-- 0< HTTP/1.1 100 Continue

} [data not shown]

< HTTP/1.1 415 Unsupported Media Type

< Date: Wed, 12 Oct 2016 23:33:18 GMT

< Content-Length: 0

< Server: Jetty(9.2.5.v20141112)

<

100 9391 0 0 100 9391 0 993k --:–:-- --:–:-- --:–:-- 1310k* Connection #0 to host cnn-druid-7477.ccg21.dev.corp.com left intact

  • Closing connection #0

res0: String = “”

Hey Kasi,

You’ll need to set the content type to application/json. Using curl you can pass in the parameter -H ‘Content-Type:application/json’.

Using an HTTP library would make handling the responses a lot cleaner and would be a good way to go.

Thanks David. I had used -H in Curl, but didn’t work.

I tried with the HttpClient library and it worked perfectly. I am posting the Scala code snippet here for everyone’s reference.