HDFS output path with datasource name twice


I am doing indexing with Druid (version 0.8.0) using the “index_hadoop” method and the resulting segments are not showing up anywhere that I can query, and what it looks like is happening is the output directory in HDFS that it is saving the segment to is duplicated.


Input index config:


“type” : “index_hadoop”,

“spec” : {

“dataSchema” : {

“dataSource” : “example1datasrc”,

“parser” : {

“type” : “string”,

“parseSpec” : {

“format” : “json”,

“timestampSpec” : {

“column” : “timestamp”,

“format” : “auto”


“dimensionsSpec” : {

“dimensions”: [“dim1”,“dim2”,“dim3”],

“dimensionExclusions” : ,

“spatialDimensions” :




“metricsSpec” : [{“name”:“count”,“type”:“count”},{“fieldName”:“cid”,“name”:“cid”,“type”:“hyperUnique”}],

“granularitySpec” : {

“type” : “uniform”,

“segmentGranularity” : “DAY”,

“queryGranularity” : “HOUR”,

“intervals” : [ “2015-12-20/2015-12-21” ]



“ioConfig” : {

“type” : “hadoop”,

“inputSpec” : {

“type” : “static”,

“paths” : “/tmp/example1datasrc-2015-12-20_473152148”





This gets submitted to an overlord node. However in the log when we get here:

2016-02-03T01:36:44,619 INFO [LocalJobRunner Map Task Executor #0] io.druid.indexer.HadoopDruidIndexerConfig - Running with config:

It has this field:

      "segmentOutputPath" : "hdfs://localhost:9000/user/druid/storage/example1datasrc"

And then the resulting place this ends up in HDFS is:


And is then not queryable (and does not show up in the coordinator web console). Data directly in /user/druid/storage/example1datasrc/ is queryable, as you would expect.

Any ideas on what could be causing this?

fwiw - I still didn’t figure out what this duplication was all about - but it turns out my basic problem was that my historical node (have only one in dev setup) had a max of 10G and so the hand-off of segments after the indexing to the historical node was failing. Increasing druid.server.maxSize in the historical config and restarting everything solved the funky issues I was running into.

Best, Brad

Hmmm, I recall we fixed that issue awhile ago. Can you update to 0.8.3 and see if you have the same problem?


dataSource appearing twice on hdfs should be fixed in druid-0.8.3 .

– Himanshu

Thanks for the replis, Himanshu and Fangjin -

I can confirm that - upgraded to 0.8.3 and the issue went away.

Best, Brad