Batch Ingestion Fails due to segmentOutputPath not existing

We have a batch ingestion that we run through s3 on an EMR cluster with a io_config like:

"ioConfig" : {
      "type" : "hadoop",
      "inputSpec" : {
        "type" : "granularity",
        "dataGranularity": "day",
        "inputPath": "s3a://bucket/path",
        "filePattern": ".*\\.json\\.gz",
        "pathFormat": "'event_date='yyyy-MM-dd"
      },
      "metadataUpdateSpec" : {
          "type":"mysql",
          "connectURI" : "",
          "password" : "",
          "segmentTable" : "druid_segments",
          "user" : ""
        },
      "segmentOutputPath" : ""
    }

This worked well for a while, but then started failing with a message like
"Error: java.lang.IllegalArgumentException: Can not create a Path from an empty string
"
This seems to be because the segmentOutputPath is required for the hadoop CLI indexer: Druid | Command Line Hadoop Indexer

And we are not adding it.

However, i’m not sure why it is suddenly demanding this field to be filled out, and when I add a path to the field, I get other errors. We were doing some deployments at the time, so there cold have been some unintentional changes. Would anyone know why this could have occurred, and how to fix?

Did anything change with your inputSpec?

Hi Mark, the issue ended up being that the segmentOutputPath was just getting parsed incorrectly, but was always required! Thanks

1 Like