Intake from HDFS so utterly frustrating


I am still fighting with - with all the dependency injection hiding everything, I just don’t see how to fix this. At least not without making the local paths in historical incompatible.

It got so complicated to debug, I even set up a druid staging system (zk,postgres,coordinator/overlord,indexer,historical) all on one box. Works like a charm, should have done that a long time ago.

Now that I got some freedom to test (and stuck on finding where in the code things go wrong), I thought: well, maybe we can just use s3 instead, seems to be the most used. As we were forced to HDFS a longtime ago (s3 transfer costs were starting to exceed hardware costs), I set up Minio instead. Quick and painless.

The indexers get this in there runtime property:







And in the “index_hadoop” task, I add:

“jobProperties”: {

“fs.s3a.access.key”: “foo”,

“fs.s3a.secret.key”: “bar”,

“fs.s3a.connection.ssl.enabled”: false,

“fs.s3a.endpoint”: “”,

“”: true


And bam, I get a very nice segment in minio generated by Hadoop.

HOWEVER, in the database, the payload is “loadSpec”:{“type”:“local” } (the path is the correct one inside the bucket)

This is beyond frustrating and I really would appreciate some ideas. All I want is a new segment with a valid meta entry based on files in HDFS. Or is there any other task type that can source from HDFS?

I am this close to just pushing it back into Kafka and using “index_kafka”, that’s how desperate I am.