We have deployed a druid cluster with multiple Middle Manager and running batch ingestion task by posting metadata file to overlord.If I understood correctly overlord can give this task to any Middle Manager for ingestion.So providing a local path will be a problem as it will be looking for the file in its own file system where file might not be present.
So I wanted to make remote location where I will place my data file and any middle Manager can read from there while ingestion.
I tried different options but dint work.
If someone has tried this,please explain.
Also I have tried placing data file in hdfs and the executing the batch ingestion which was working fine.I was just thinking if I can use any remote location for placing my data file.
storing input data files in a distributed storage like HDFS or S3 is pretty common. Also you will need to configure deep storage to store Druid segments, they also need to be accessible from all the historical nodes.
Thanks for the reply.
I have already configured HDFS to store druid segments and is accessible from all historical node.
Now just for understand, I feel it will make our ingestion process slow as first we need to copy input file to HDFS and the middle manager will be reading the file from hdfs to run ingestion task.