Druid Namespace Lookups on S3 fail if the file is in the root of the bucket

We were expecting to be able to place the lookup files at the root of a bucket, and found we were running into NPEs. Placing the same file into a “folder” by adding a single / to the key when placing the file in S3 solves the problem.

Added a unit test to druid-s3-extensions to illustrate this, in S3TimestampVersionedDataFinderTest:

@Test
public void testSimpleNoPrefix() throws S3ServiceException
{
  String bucket = "bucket";
  RestS3Service s3Client = EasyMock.createStrictMock(RestS3Service.class);

  S3Object object0 = new S3Object();

  object0.setBucketName(bucket);
  object0.setKey("blah.gz");
  object0.setLastModifiedDate(new Date());
  Pattern pattern = Pattern.compile("blah.gz");

  EasyMock.expect(s3Client.listObjects(EasyMock.eq(bucket), EasyMock.anyString(), EasyMock.eq("/"))).andReturn(
      new S3Object[]{object0}
  ).once();
  S3TimestampVersionedDataFinder finder = new S3TimestampVersionedDataFinder(s3Client);

  EasyMock.replay(s3Client);

  URI latest = finder.getLatestVersion(URI.create(String.format("s3://%s/%s", bucket, object0.getKey())), pattern);

  EasyMock.verify(s3Client);

  URI expected = URI.create(String.format("s3://%s/%s", bucket, object0.getKey()));

  Assert.assertEquals(expected, latest);
}

@Test
public void testSimplePrefix() throws S3ServiceException
{
  String bucket = "bucket";
  RestS3Service s3Client = EasyMock.createStrictMock(RestS3Service.class);

  S3Object object0 = new S3Object();

  object0.setBucketName(bucket);
  object0.setKey("/blah.gz");
  object0.setLastModifiedDate(new Date());
  Pattern pattern = Pattern.compile("blah.gz");

  EasyMock.expect(s3Client.listObjects(EasyMock.eq(bucket), EasyMock.anyString(), EasyMock.eq("/"))).andReturn(
      new S3Object[]{object0}
  ).once();
  S3TimestampVersionedDataFinder finder = new S3TimestampVersionedDataFinder(s3Client);

  EasyMock.replay(s3Client);

  URI latest = finder.getLatestVersion(URI.create(String.format("s3://%s/%s", bucket, object0.getKey())), pattern);

  EasyMock.verify(s3Client);

  URI expected = URI.create(String.format("s3://%s/%s", bucket, object0.getKey()));

  Assert.assertEquals(expected, latest);
}

Running io.druid.storage.s3.S3TimestampVersionedDataFinderTest
Tests run: 5, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.512 sec <<< FAILURE! - in io.druid.storage.s3.S3TimestampVersionedDataFinderTest
testSimpleNoPrefix(io.druid.storage.s3.S3TimestampVersionedDataFinderTest) Time elapsed: 0.007 sec <<< ERROR!
java.lang.NullPointerException: null
at io.druid.storage.s3.S3TimestampVersionedDataFinder$1.call(S3TimestampVersionedDataFinder.java:71)
at io.druid.storage.s3.S3TimestampVersionedDataFinder$1.call(S3TimestampVersionedDataFinder.java:62)
at com.metamx.common.RetryUtils.retry(RetryUtils.java:38)
at io.druid.storage.s3.S3TimestampVersionedDataFinder.getLatestVersion(S3TimestampVersionedDataFinder.java:60)
at io.druid.storage.s3.S3TimestampVersionedDataFinderTest.testSimpleNoPrefix(S3TimestampVersionedDataFinderTest.java:92)

Results :

Tests in error:
S3TimestampVersionedDataFinderTest.testSimpleNoPrefix:92 » NullPointer

Tests run: 5, Failures: 0, Errors: 1, Skipped: 0

``

I haven’t looked at the code to identify why this is the case, because knowing a workaround means we can move forward, but it certainly seems like files in the root of the bucket should be valid.

https://github.com/druid-io/druid/pull/2738 should fix multiple lookup related confusions on s3 prefixes (hopefully)

Thanks for the info. I’ll add a comment about removing our workaround once we upgrade to a version with that PR in it.