Batch ingestion from python script

Hi everyone,

I wrote a python script that creates a json file ready to be ingested in druid.

It resides in a container that queries data once per day and the result is not big, that’s why I was thinking about native ingestion.

Is there a way to do a batch ingestion like in the tutorial

bin/post-index-task --file quickstart/tutorial/wikipedia-index.json
(from http://druid.io/docs/latest/tutorials/tutorial-batch.html )

from the script itself? Or am I too far way from reality / another solution is preferrable?

Note: the script and the druid instance are in different containers in different hosts.
P.S.: I apologize if I wrote some heresies, since I am not an expert.

Any hint would be very appreciated & happy 2019!

Luca

You can submit a task by HTTP POST to the overlord, this section on that tutorial page (http://druid.io/docs/latest/tutorials/tutorial-batch.html#extra-loading-data-without-the-script) has an example curl command for doing so.

Thanks,

Jon

Mr.Wei thank you very much for the prompt feedback.

My last question is, is it possible to give as

"baseDir" :
the address of the directory of the other docker container where the script produces the result to be ingested or batch ingestion works only locally from the same machine?
Just to know if I can follow this path or need to find another solution.
Thanks again and sorry for bothering
Luca

Hi The baseDir needs to be on the same host on which the task is running.
You might also want to take a look at the HTTPFirehose http://druid.io/docs/latest/ingestion/firehose.html

For using that your script would generate the data on some container and expose it via HTTP to the container running druid ingestion task.

It worked! Love you guys