Error while ingestion through command line

I am trying to create an ingestion task through local machine on remote Druid server using command post-index-task. Specification file is stored at local machine and json data to be ingested is located at Druid server. When I run the command "post-index-task --file --url " from local machine, it submit the task to Druid and shows success but after that it gives error: urllib2.URLError: <urlopen error [Errno 111] Connection refused>

Full trace is:

__Beginning indexing data for cmh
Task started: index_parallel_cmh_2019-09-30T06:56:08.593Z
Task log: /druid/indexer/v1/task/index_parallel_cmh_2019-09-30T06:56:08.593Z/log
Task status: /druid/indexer/v1/task/index_parallel_cmh_2019-09-30T06:56:08.593Z/status
Task index_parallel_cmh_2019-09-30T06:56:08.593Z still running…
Task index_parallel_cmh_2019-09-30T06:56:08.593Z still running…
Task finished with status: SUCCESS
Completed indexing data for cmh. Now loading indexed data onto the cluster…
Traceback (most recent call last):
File “/home/user/Downloads/apache-druid-0.15.0-incubating/bin/post-index-task-main”, line 174, in
main()
File “/home/user/Downloads/apache-druid-0.15.0-incubating/bin/post-index-task-main”, line 171, in main
await_load_completion(args, datasource, load_timeout_at)
File “/home/user/Downloads/apache-druid-0.15.0-incubating/bin/post-index-task-main”, line 119, in await_load_completion
response = urllib2.urlopen(req, None, response_timeout)
File “/usr/lib/python2.7/urllib2.py”, line 154, in urlopen
return opener.open(url, data, timeout)
File “/usr/lib/python2.7/urllib2.py”, line 429, in open
response = self._open(req, data)
File “/usr/lib/python2.7/urllib2.py”, line 447, in _open
'open’, req)
File “/usr/lib/python2.7/urllib2.py”, line 407, in call_chain
result = func(*args)
File “/usr/lib/python2.7/urllib2.py”, line 1228, in http_open
return self.do_open(httplib.HTTPConnection, req)
File “/usr/lib/python2.7/urllib2.py”, line 1198, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>

What is this error related to?

Thanks,

Hemant

Hi Hemanth,

Could you check in the coordinator console if the data source is loaded?

Also, Could you check if you can get a response from druid services [ coordinator/Historical ] by doing telnet?

Thanks and Regards,

Vaibhav

Hi Vaibhav,

I checked the coordinator console and data source is loaded.
I also used telnet to Druid server ip and coordinator port 8081 and it shows “connected”.

Hi Hemanth,

Post indexing it’s trying to get the load status from druid coordinator and seems a timeout is happening.

112 def await_load_completion(args, datasource, timeout_at):
113 while True:
114 url = args.coordinator_url.rstrip("/") + “/druid/coordinator/v1/loadstatus”
115 req = urllib2.Request(url)
116 add_basic_auth_header(args, req)
117 timeleft = timeout_at - time.time()
118 response_timeout = min(max(timeleft, 5), 10)
119 response = urllib2.urlopen(req, None, response_timeout)
120 response_obj = json.loads(response.read())

You can check from your local machine to druid cluster if there are any network latency. Could you also check the coordinator log if the issue is consistent?

Thanks and Regards,

Vaibhav

There is no network latency. Could you please tell me where coordinator logs are stored?

It will be <DRUID_ROOT>/var/sv/ unless you have modified the logging configurations.

eg :

apache-druid-0.16.0-incubating/var/sv/

You can use lsof to identify the correct directory in case you did see as above and want to see where the running processes are logging.

lsof -p 63921 | grep -i historical.log

Thanks,

Vaibhav

I checked historical log and I don’t see those error in it.

Not HIstorical, Its coordinator. I just gave you an example to find the logging directory [grepping historical].

Are you facing this issue with each of your ingestion consistently?

What happens when you submit the job directly on the server?

How did you confirm there is no latency? Could you do a ping on the coordinator from your localhost and attach the details here?

Hey Hemant,

Can you try running following command

sh post-index-task --file cmh_jason_spec.json --url http://:8888

Thank you,

Niraj Dedhia