Unable to ingest multiple S3 files

Hello,

I am trying to ingest multiple S3 files at once by setting the value of “paths” to a single string with the paths comma delimited as in the example here: http://druid.io/docs/latest/ingestion/batch-ingestion.html

I get the following exception:

Illegal character in scheme name at index 0

``

2016-04-22T02:31:48,136 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_ip_queries_2016-04-22T02:32:06.146Z, type=index_hadoop, dataSource=ip_queries}] java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?] at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:160) ~[druid-indexing-service-0.9.0.jar:0.9.0] at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:175) ~[druid-indexing-service-0.9.0.jar:0.9.0] at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:338) [druid-indexing-service-0.9.0.jar:0.9.0] at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:318) [druid-indexing-service-0.9.0.jar:0.9.0] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_72-internal] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_72-internal] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_72-internal] at java.lang.Thread.run(Thread.java:745) [?:1.8.0_72-internal] Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_72-internal] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_72-internal] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_72-internal] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_72-internal] at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:157) ~[druid-indexing-service-0.9.0.jar:0.9.0] ... 7 more Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 0: s3n://demandbase-druid-dev/ip_queries_production-1-2016-04-02-00-00-04-f77c551f-1a44-4c4c-ac56-c626ea59ed38-edited at org.apache.hadoop.fs.Path.initialize(Path.java:206) ~[?:?] at org.apache.hadoop.fs.Path.<init>(Path.java:172) ~[?:?] at io.druid.indexer.path.StaticPathSpec.addToMultipleInputs(StaticPathSpec.java:100) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0] at io.druid.indexer.path.StaticPathSpec.addInputPaths(StaticPathSpec.java:58) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0] at io.druid.indexer.HadoopDruidIndexerConfig.addInputPaths(HadoopDruidIndexerConfig.java:372) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0] at io.druid.indexer.JobHelper.ensurePaths(JobHelper.java:311) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0] at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:55) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0] at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:291) ~[druid-indexing-service-0.9.0.jar:0.9.0] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_72-internal] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_72-internal] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_72-internal] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_72-internal] at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:157) ~[druid-indexing-service-0.9.0.jar:0.9.0] ... 7 more Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 0: s3n://demandbase-druid-dev/ip_queries_production-1-2016-04-02-00-00-04-f77c551f-1a44-4c4c-ac56-c626ea59ed38-edited at java.net.URI$Parser.fail(URI.java:2848) ~[?:1.8.0_72-internal] at java.net.URI$Parser.checkChars(URI.java:3021) ~[?:1.8.0_72-internal] at java.net.URI$Parser.checkChar(URI.java:3031) ~[?:1.8.0_72-internal] at java.net.URI$Parser.parse(URI.java:3047) ~[?:1.8.0_72-internal] at java.net.URI.<init>(URI.java:746) ~[?:1.8.0_72-internal] at org.apache.hadoop.fs.Path.initialize(Path.java:203) ~[?:?] at org.apache.hadoop.fs.Path.<init>(Path.java:172) ~[?:?] at io.druid.indexer.path.StaticPathSpec.addToMultipleInputs(StaticPathSpec.java:100) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0] at io.druid.indexer.path.StaticPathSpec.addInputPaths(StaticPathSpec.java:58) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0] at io.druid.indexer.HadoopDruidIndexerConfig.addInputPaths(HadoopDruidIndexerConfig.java:372) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0] at io.druid.indexer.JobHelper.ensurePaths(JobHelper.java:311) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0] at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:55) ~[druid-indexing-hadoop-0.9.0.jar:0.9.0] at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:291) ~[druid-indexing-service-0.9.0.jar:0.9.0] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_72-internal] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_72-internal] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_72-internal] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_72-internal] at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:157) ~[druid-indexing-service-0.9.0.jar:0.9.0] ... 7 more 2016-04-22T02:31:48,150 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: { "id" : "index_hadoop_ip_queries_2016-04-22T02:32:06.146Z", "status" : "FAILED", "duration" : 2377 }

When I attempt to ingest a single path, my ingestion tasks are successful. Also, I have verified the scheme is right for all files.

What is the correct way to ingest multiple static files?

Hey Carlos,

Is it possible you are including spaces or other whitespace characters in your input paths? There should be no spaces surrounding the commas.

Thank you Gian, you were right.

Should the docs be revised here?

http://druid.io/docs/latest/ingestion/batch-ingestion.html

The example has spaces after the commas.

"paths" : "s3n://billy-bucket/the/data/is/here/data.gz, s3n://billy-bucket/the/data/is/here/moredata.gz, s3n://billy-bucket/the/data/is/here/evenmoredata.gz"

Hi Carlos,
yeah, the docs seem to be wrong here.

It would be great if you could create a PR for the doc fix.

Thanks for reporting the error.