batch ingestion from hdfs location

Hello,

Could you please let me know, if

  1. there exist any repos for batch ingestion, via a custom mapper reduce java code, directly from hdfs to druid.

  2. how do I supply the hdfs paths otherwise in a flat file “paths” for batch data ingestion in druid ?

Thanks,

Sumit

See Inline

Hello,

Could you please let me know, if

  1. there exist any repos for batch ingestion, via a custom mapper reduce java code, directly from hdfs to druid.

Have you looked at Druid HadoopIndexTask, It internally uses MR to ingest data, you do not need to write up a custom MR job to ingest data.

  1. how do I supply the hdfs paths otherwise in a flat file “paths” for batch data ingestion in druid ?

To ingest data from HDFS, you can set your pathSpec for the batch ingestion to the location where your input files are present in hdfs.

“pathSpec”: {

“type”: “static”,

“paths”: “hdfs://”

}