Got 416 InvalidRange when using `prefixes` for directory ingestion from S3

Hello,

Installed the latest Druid from https://hub.docker.com/r/apache/druid

When I used this ingestion spec, got 416 InvalidRange error. “uris” is working properly.

{

“type”: “index_parallel”,

“spec”: {

“dataSchema”: {

“dataSource”: “testDataSource”,

“dimensionsSpec” : {

“dimensions” : [“a”, “b”, “c”, “d”, “e”]

},

“timestampSpec”: {

“column”: “dt”,

“format”: “yyyyMMdd”,

“missingValue”: “20200822”

},

“granularitySpec”: {

“segmentGranularity”: “day”,

“queryGranularity”: “none”

}

},

“ioConfig”: {

“type”: “index_parallel”,

“inputSource”: {

“type”: “s3”,

prefixes”: [“s3://abc/def/dt=20200826”]

},

“inputFormat”: {

“type”: “parquet”

}

},

“tuningConfig”: {

“type”: “index_parallel”,

“maxNumConcurrentSubTasks”: 4

}

}

}

Here is the error message.

2020-08-26T02:41:51,815 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider as a provider class

2020-08-26T02:41:51,816 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering com.fasterxml.jackson.jaxrs.smile.JacksonSmileProvider as a provider class

2020-08-26T02:41:51,816 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering org.apache.druid.server.initialization.jetty.CustomExceptionMapper as a provider class

2020-08-26T02:41:51,816 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering org.apache.druid.server.initialization.jetty.ForbiddenExceptionMapper as a provider class

2020-08-26T02:41:51,816 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering org.apache.druid.server.initialization.jetty.BadRequestExceptionMapper as a provider class

2020-08-26T02:41:51,816 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Registering org.apache.druid.server.StatusResource as a root resource class

2020-08-26T02:41:51,819 INFO [main] com.sun.jersey.server.impl.application.WebApplicationImpl - Initiating Jersey application, version ‘Jersey: 1.19.3 10/24/2016 03:43 PM’

2020-08-26T02:41:51,881 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.initialization.jetty.CustomExceptionMapper to GuiceManagedComponentProvider with the scope “Singleton”

2020-08-26T02:41:51,883 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.initialization.jetty.ForbiddenExceptionMapper to GuiceManagedComponentProvider with the scope “Singleton”

2020-08-26T02:41:51,883 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.initialization.jetty.BadRequestExceptionMapper to GuiceManagedComponentProvider with the scope “Singleton”

2020-08-26T02:41:51,884 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider to GuiceManagedComponentProvider with the scope “Singleton”

2020-08-26T02:41:51,889 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding com.fasterxml.jackson.jaxrs.smile.JacksonSmileProvider to GuiceManagedComponentProvider with the scope “Singleton”

2020-08-26T02:41:51,990 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.task.IndexTask - Encountered exception in DETERMINE_PARTITIONS.

java.lang.RuntimeException: java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: The requested range is not satisfiable (Service: Amazon S3; Status Code: 416; Error Code: InvalidRange; Request ID: 6H8NBT8Y8H6NFVDR; S3 Extended Request ID: v0xcvlQIGhSCkdVvj5N8c0NClnRLqEr6eqkpL3bgI7cXLkPwlrG1WSK/HWpSnZ1XM9wj35At3oY=), S3 Extended Request ID: v0xcvlQIGhSCkdVvj5N8c0NClnRLqEr6eqkpL3bgI7cXLkPwlrG1WSK/HWpSnZ1XM9wj35At3oY=

at org.apache.druid.data.input.impl.InputEntityIteratingReader.lambda$read$0(InputEntityIteratingReader.java:81) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.java.util.common.parsers.CloseableIterator$2.findNextIeteratorIfNecessary(CloseableIterator.java:83) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.java.util.common.parsers.CloseableIterator$2.(CloseableIterator.java:69) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.java.util.common.parsers.CloseableIterator.flatMap(CloseableIterator.java:67) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.data.input.impl.InputEntityIteratingReader.createIterator(InputEntityIteratingReader.java:103) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.data.input.impl.InputEntityIteratingReader.read(InputEntityIteratingReader.java:74) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.segment.transform.TransformingInputSourceReader.read(TransformingInputSourceReader.java:43) ~[druid-processing-0.17.0.jar:0.17.0]

at org.apache.druid.indexing.common.task.IndexTask.collectIntervalsAndShardSpecs(IndexTask.java:737) ~[druid-indexing-service-0.17.0.jar:0.17.0]

at org.apache.druid.indexing.common.task.IndexTask.createShardSpecsFromInput(IndexTask.java:656) ~[druid-indexing-service-0.17.0.jar:0.17.0]

at org.apache.druid.indexing.common.task.IndexTask.determineShardSpecs(IndexTask.java:631) ~[druid-indexing-service-0.17.0.jar:0.17.0]

at org.apache.druid.indexing.common.task.IndexTask.runTask(IndexTask.java:490) [druid-indexing-service-0.17.0.jar:0.17.0]

at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:138) [druid-indexing-service-0.17.0.jar:0.17.0]

at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runSequential(ParallelIndexSupervisorTask.java:844) [druid-indexing-service-0.17.0.jar:0.17.0]

at org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask.runTask(ParallelIndexSupervisorTask.java:469) [druid-indexing-service-0.17.0.jar:0.17.0]

at org.apache.druid.indexing.common.task.AbstractBatchIndexTask.run(AbstractBatchIndexTask.java:138) [druid-indexing-service-0.17.0.jar:0.17.0]

at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:419) [druid-indexing-service-0.17.0.jar:0.17.0]

at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:391) [druid-indexing-service-0.17.0.jar:0.17.0]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_232]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_232]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_232]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]

Caused by: java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: The requested range is not satisfiable (Service: Amazon S3; Status Code: 416; Error Code: InvalidRange; Request ID: 6H8NBT8Y8H6NFVDR; S3 Extended Request ID: v0xcvlQIGhSCkdVvj5N8c0NClnRLqEr6eqkpL3bgI7cXLkPwlrG1WSK/HWpSnZ1XM9wj35At3oY=), S3 Extended Request ID: v0xcvlQIGhSCkdVvj5N8c0NClnRLqEr6eqkpL3bgI7cXLkPwlrG1WSK/HWpSnZ1XM9wj35At3oY=

at org.apache.druid.data.input.s3.S3Entity.readFrom(S3Entity.java:72) ~[?:?]

at org.apache.druid.data.input.RetryingInputEntity.readFromStart(RetryingInputEntity.java:54) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.data.input.RetryingInputEntity$RetryingInputEntityOpenFunction.open(RetryingInputEntity.java:78) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.data.input.RetryingInputEntity$RetryingInputEntityOpenFunction.open(RetryingInputEntity.java:73) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.data.input.impl.RetryingInputStream.(RetryingInputStream.java:63) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.data.input.RetryingInputEntity.open(RetryingInputEntity.java:42) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.data.input.InputEntity.fetch(InputEntity.java:88) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.data.input.parquet.ParquetReader.(ParquetReader.java:68) ~[?:?]

at org.apache.druid.data.input.parquet.ParquetInputFormat.createReader(ParquetInputFormat.java:74) ~[?:?]

at org.apache.druid.data.input.impl.InputEntityIteratingReader.lambda$read$0(InputEntityIteratingReader.java:77) ~[druid-core-0.17.0.jar:0.17.0]

… 20 more

Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The requested range is not satisfiable (Service: Amazon S3; Status Code: 416; Error Code: InvalidRange; Request ID: 6H8NBT8Y8H6NFVDR; S3 Extended Request ID: v0xcvlQIGhSCkdVvj5N8c0NClnRLqEr6eqkpL3bgI7cXLkPwlrG1WSK/HWpSnZ1XM9wj35At3oY=)

at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1638) ~[aws-java-sdk-core-1.11.199.jar:?]

at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1303) ~[aws-java-sdk-core-1.11.199.jar:?]

at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1055) ~[aws-java-sdk-core-1.11.199.jar:?]

at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) ~[aws-java-sdk-core-1.11.199.jar:?]

at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) ~[aws-java-sdk-core-1.11.199.jar:?]

at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) ~[aws-java-sdk-core-1.11.199.jar:?]

at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) ~[aws-java-sdk-core-1.11.199.jar:?]

at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) ~[aws-java-sdk-core-1.11.199.jar:?]

at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) ~[aws-java-sdk-core-1.11.199.jar:?]

at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4229) ~[aws-java-sdk-s3-1.11.199.jar:?]

at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4176) ~[aws-java-sdk-s3-1.11.199.jar:?]

at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1380) ~[aws-java-sdk-s3-1.11.199.jar:?]

at org.apache.druid.storage.s3.ServerSideEncryptingAmazonS3.getObject(ServerSideEncryptingAmazonS3.java:89) ~[?:?]

at org.apache.druid.data.input.s3.S3Entity.readFrom(S3Entity.java:60) ~[?:?]

at org.apache.druid.data.input.RetryingInputEntity.readFromStart(RetryingInputEntity.java:54) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.data.input.RetryingInputEntity$RetryingInputEntityOpenFunction.open(RetryingInputEntity.java:78) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.data.input.RetryingInputEntity$RetryingInputEntityOpenFunction.open(RetryingInputEntity.java:73) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.data.input.impl.RetryingInputStream.(RetryingInputStream.java:63) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.data.input.RetryingInputEntity.open(RetryingInputEntity.java:42) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.data.input.InputEntity.fetch(InputEntity.java:88) ~[druid-core-0.17.0.jar:0.17.0]

at org.apache.druid.data.input.parquet.ParquetReader.(ParquetReader.java:68) ~[?:?]

at org.apache.druid.data.input.parquet.ParquetInputFormat.createReader(ParquetInputFormat.java:74) ~[?:?]

at org.apache.druid.data.input.impl.InputEntityIteratingReader.lambda$read$0(InputEntityIteratingReader.java:77) ~[druid-core-0.17.0.jar:0.17.0]

… 20 more

2020-08-26T02:41:52,011 WARN [task-runner-0-priority-0] org.apache.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - handler[index_parallel_search12_mkmlogkb_2020-08-26T02:41:47.271Z] not currently registered, ignoring.

2020-08-26T02:41:52,015 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

“id” : “index_parallel_search12_mkmlogkb_2020-08-26T02:41:47.271Z”,

“status” : “FAILED”,

“duration” : 337,

“errorMsg” : “java.lang.RuntimeException: java.io.IOException: com.amazonaws.services.s3.model.AmazonS3Exception: …”,

“location” : {

“host” : null,

“port” : -1,

“tlsPort” : -1

}

}

2020-08-26T02:41:52,143 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.http.security.StateResourceFilter to GuiceInstantiatedComponentProvider

2020-08-26T02:41:52,156 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.http.SegmentListerResource to GuiceManagedComponentProvider with the scope “PerRequest”

2020-08-26T02:41:52,160 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.QueryResource to GuiceInstantiatedComponentProvider

2020-08-26T02:41:52,163 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.segment.realtime.firehose.ChatHandlerResource to GuiceInstantiatedComponentProvider

2020-08-26T02:41:52,165 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.http.security.ConfigResourceFilter to GuiceInstantiatedComponentProvider

2020-08-26T02:41:52,167 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.query.lookup.LookupListeningResource to GuiceInstantiatedComponentProvider

2020-08-26T02:41:52,168 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.query.lookup.LookupIntrospectionResource to GuiceInstantiatedComponentProvider

2020-08-26T02:41:52,169 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding org.apache.druid.server.StatusResource to GuiceManagedComponentProvider with the scope “Undefined”

2020-08-26T02:41:52,188 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@11309dd4{/,null,AVAILABLE}

2020-08-26T02:41:52,197 INFO [main] org.eclipse.jetty.server.AbstractConnector - Started ServerConnector@73234691{HTTP/1.1,[http/1.1]}{0.0.0.0:8100}

2020-08-26T02:41:52,197 INFO [main] org.eclipse.jetty.server.Server - Started @4907ms

2020-08-26T02:41:52,198 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Starting lifecycle [module] stage [ANNOUNCEMENTS]

2020-08-26T02:41:52,210 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Successfully started lifecycle [module]

2020-08-26T02:41:52,215 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]

2020-08-26T02:41:52,216 INFO [main] org.apache.druid.curator.announcement.Announcer - Unannouncing [/druid/listeners/lookups/__default/http:172.18.0.6:8100]

2020-08-26T02:41:52,227 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER]

2020-08-26T02:41:52,231 INFO [main] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@73234691{HTTP/1.1,[http/1.1]}{0.0.0.0:8100}

2020-08-26T02:41:52,232 INFO [main] org.eclipse.jetty.server.session - node0 Stopped scavenging

2020-08-26T02:41:52,233 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@11309dd4{/,null,UNAVAILABLE}

2020-08-26T02:41:52,240 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [NORMAL]

2020-08-26T02:41:52,240 INFO [main] org.apache.druid.server.listener.announcer.ListenerResourceAnnouncer - Unannouncing start time on [/druid/listeners/lookups/__default/http:172.18.0.6:8100]

2020-08-26T02:41:52,240 INFO [LookupExtractorFactoryContainerProvider-MainThread] org.apache.druid.query.lookup.LookupReferencesManager - Lookup Management loop exited. Lookup notices are not handled anymore.

2020-08-26T02:41:52,240 INFO [main] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Starting graceful shutdown of task[index_parallel_search12_mkmlogkb_2020-08-26T02:41:47.271Z].

2020-08-26T02:41:52,254 INFO [Curator-Framework-0] org.apache.curator.framework.imps.CuratorFrameworkImpl - backgroundOperationsLoop exiting

2020-08-26T02:41:52,256 INFO [main] org.apache.zookeeper.ZooKeeper - Session: 0x1000d680aea002a closed

2020-08-26T02:41:52,256 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x1000d680aea002a

2020-08-26T02:41:52,257 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [INIT]

Finished peon task

A thought - could you try putting a trailing / at the end of prefixes?

Thanks, peter.

It works after putting a trailing ‘/’.
It would be better to modify this document for druid users.

2020년 8월 28일 금요일 오전 6시 53분 38초 UTC-7에 peter.m…@imply.io님이 작성:

Sure thang :slight_smile: