[druid-user] Data Ingestion Through Curl

Hi All,
I want to use below command to ingest some data every day at 0 GMT for 1 time.
Now in which server I have to place the data file before execute below command.

  1. Coordinator server
  2. Router server
    3 historical server or anything else
    Also let me know if I have to create that file in Hdfs location or normal file.
    Currently I tried to run this command and task is initiating but it failed due to file not found exception.
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/tutorial/wikipedia-index.json [http://localhost:8081/druid/indexer/v1/task](http://localhost:8081/druid/indexer/v1/task)

Hi!

That does depend on what the location of the file is given in that json - I believe it’s a relative path to quickstart/tutorial ?

“inputSource” : {
“type” : “local”,
“baseDir” : “quickstart/tutorial/”,
“filter” : “wikiticker-2015-09-12-sampled.json.gz”
},

In which case the usual rules about relative paths and where a process is running will apply.

Perhaps you would be best to update that so it is an absolute path to remove all confusion. Remembering that the same path must contain the same file on all your nodes.

In production it is likely you would not be ingesting from local files – maybe NFS at a push – but more likely from a cloud storage system like S3 or Min.IO or HDFS or GCS / ABS and / or Kafka / Kinesis.

Hi Marshall,

I am getting below error message. I have already put the file in hdfs of /tmp.

Thanks,
Pathik

Are permissions good, and what is your path and file specification? /tmp\n\tat looks odd.

Hi,

Please find the attachment.

Thanks,
Pathik

You mentioned hdfs but the type is “local” in the spec. You’d want to put the file on all nodes under /tmp in that case. You’re also using firehose, which is deprecated. Here’s a more recent example of how to use hdfs: https://druid.apache.org/docs/latest/ingestion/native-batch-input-sources.html#hdfs-input-source

1 Like

Thanks it worked.

1 Like

Excellent, glad to hear it!

1 Like

Hi Ben,

Yesterday we have updated Druid version 0.23.0 and after that when I am submitting the curl command it is not initiating the task but when I copy content of json file and submitting it from UI , then it is working.

Thanks,
Pathik

Hi Pathik,

There was a similar discussion related to UI vs CURL not working which you can read here in the Druid Forum.
In that discussion, the issue with the curl command make sure you have the correct host and port. For ingestion you can use either the router or overlord as the host with their corresponding port numbers. Additionally they had authentication turned on, so the curl command also needed to specify credentials.

Hope this helps,

Sergio

Thanks