io.druid.segment.loading.SegmentLoadingException: Exception loading segment

I’m trying to spin up a historical node, but I repeatedly keep getting a SegmentLoadingException. For example:

2015-11-25T20:40:57,339 ERROR [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[messages_2015-11-17T15:00:00.000Z_2015-11-17T16:00:00.000Z_2015-11-17T15:00:00.000Z], segment=DataSegment{size=14994, shardSpec=NoneShardSpec, metrics=[count], dimensions=[dimension1, dimension2, dimension3], version=‘2015-11-17T15:00:00.000Z’, loadSpec={type=local, path=/tmp/druid/localStorage/messages/2015-11-17T15:00:00.000Z_2015-11-17T16:00:00.000Z/2015-11-17T15:00:00.000Z/0/index.zip}, interval=2015-11-17T15:00:00.000Z/2015-11-17T16:00:00.000Z, dataSource=‘messages’, binaryVersion=‘9’}}

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[messages_2015-11-17T15:00:00.000Z_2015-11-17T16:00:00.000Z_2015-11-17T15:00:00.000Z]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:146) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:171) [druid-server-0.8.1.jar:0.8.1]

at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:42) [druid-server-0.8.1.jar:0.8.1]

at io.druid.server.coordination.BaseZkCoordinator$1.childEvent(BaseZkCoordinator.java:115) [druid-server-0.8.1.jar:0.8.1]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:510) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92) [curator-framework-2.8.0.jar:?]

at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]

at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:84) [curator-framework-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:508) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:759) [curator-recipes-2.8.0.jar:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_31]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_31]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_31]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_31]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_31]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_31]

at java.lang.Thread.run(Thread.java:745) [?:1.8.0_31]

Caused by: java.lang.IllegalArgumentException: Instantiation of [simple type, class io.druid.segment.loading.LocalLoadSpec] value failed: [/tmp/druid/localStorage/messages/2015-11-17T15:00:00.000Z_2015-11-17T16:00:00.000Z/2015-11-17T15:00:00.000Z/0/index.zip] does not exist

at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:2774) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.ObjectMapper.convertValue(ObjectMapper.java:2700) ~[jackson-databind-2.4.4.jar:2.4.4]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:140) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:93) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:151) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:142) ~[druid-server-0.8.1.jar:0.8.1]

… 18 more

Caused by: com.fasterxml.jackson.databind.JsonMappingException: Instantiation of [simple type, class io.druid.segment.loading.LocalLoadSpec] value failed: [/tmp/druid/localStorage/messages/2015-11-17T15:00:00.000Z_2015-11-17T16:00:00.000Z/2015-11-17T15:00:00.000Z/0/index.zip] does not exist

at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.wrapException(StdValueInstantiator.java:405) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:234) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build(PropertyBasedCreator.java:167) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:398) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1064) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:264) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:156) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:126) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:113) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:84) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:132) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:41) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:2769) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.ObjectMapper.convertValue(ObjectMapper.java:2700) ~[jackson-databind-2.4.4.jar:2.4.4]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:140) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:93) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:151) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:142) ~[druid-server-0.8.1.jar:0.8.1]

… 18 more

Caused by: java.lang.IllegalArgumentException: [/tmp/druid/localStorage/messages/2015-11-17T15:00:00.000Z_2015-11-17T16:00:00.000Z/2015-11-17T15:00:00.000Z/0/index.zip] does not exist

at com.google.api.client.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:119) ~[google-http-client-1.15.0-rc.jar:?]

at com.google.api.client.util.Preconditions.checkArgument(Preconditions.java:69) ~[google-http-client-1.15.0-rc.jar:?]

at io.druid.segment.loading.LocalLoadSpec.(LocalLoadSpec.java:49) ~[druid-server-0.8.1.jar:0.8.1]

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_31]

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_31]

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_31]

at java.lang.reflect.Constructor.newInstance(Constructor.java:408) ~[?:1.8.0_31]

at com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:125) ~[jackson-databind-2.4.4.jar:0.8.1]

at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:230) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build(PropertyBasedCreator.java:167) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:398) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1064) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:264) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeOther(BeanDeserializer.java:156) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:126) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:113) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:84) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:132) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:41) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:2769) ~[jackson-databind-2.4.4.jar:2.4.4]

at com.fasterxml.jackson.databind.ObjectMapper.convertValue(ObjectMapper.java:2700) ~[jackson-databind-2.4.4.jar:2.4.4]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:140) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:93) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:151) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:142) ~[druid-server-0.8.1.jar:0.8.1]

… 18 more

shortly followed by:

2015-11-25T20:40:58,087 ERROR [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[messages_2015-11-17T09:00:00.000Z_2015-11-17T10:00:00.000Z_2015-11-17T09:00:00.000Z], segment=DataSegment{size=12508, shardSpec=NoneShardSpec, metrics=[count], dimensions=[dimension1, dimension2, dimension3], version=‘2015-11-17T09:00:00.000Z’, loadSpec={type=local, path=/tmp/druid/localStorage/messages/2015-11-17T09:00:00.000Z_2015-11-17T10:00:00.000Z/2015-11-17T09:00:00.000Z/0/index.zip}, interval=2015-11-17T09:00:00.000Z/2015-11-17T10:00:00.000Z, dataSource=‘messages’, binaryVersion=‘9’}}

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[messages_2015-11-17T09:00:00.000Z_2015-11-17T10:00:00.000Z_2015-11-17T09:00:00.000Z]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:146) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:171) [druid-server-0.8.1.jar:0.8.1]

at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:42) [druid-server-0.8.1.jar:0.8.1]

at io.druid.server.coordination.BaseZkCoordinator$1.childEvent(BaseZkCoordinator.java:115) [druid-server-0.8.1.jar:0.8.1]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:510) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92) [curator-framework-2.8.0.jar:?]

at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]

at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:84) [curator-framework-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:508) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.8.0.jar:?]

at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:759) [curator-recipes-2.8.0.jar:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_31]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_31]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_31]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_31]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_31]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_31]

at java.lang.Thread.run(Thread.java:745) [?:1.8.0_31]

Caused by: io.druid.segment.loading.SegmentLoadingException: /path/to/druid/data/zk_druid/messages/2015-11-17T09:00:00.000Z_2015-11-17T10:00:00.000Z/2015-11-17T09:00:00.000Z/0/index.drd (No such file or directory)

at io.druid.segment.loading.MMappedQueryableIndexFactory.factorize(MMappedQueryableIndexFactory.java:40) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:94) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:151) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:142) ~[druid-server-0.8.1.jar:0.8.1]

… 18 more

Caused by: java.io.FileNotFoundException: /path/to/druid/data/zk_druid/messages/2015-11-17T09:00:00.000Z_2015-11-17T10:00:00.000Z/2015-11-17T09:00:00.000Z/0/index.drd (No such file or directory)

at java.io.FileInputStream.open(Native Method) ~[?:1.8.0_31]

at java.io.FileInputStream.(FileInputStream.java:138) ~[?:1.8.0_31]

at io.druid.segment.SegmentUtils.getVersionFromDir(SegmentUtils.java:24) ~[druid-api-0.3.9.jar:0.8.1]

at io.druid.segment.IndexIO.loadIndex(IndexIO.java:165) ~[druid-processing-0.8.1.jar:0.8.1]

at io.druid.segment.loading.MMappedQueryableIndexFactory.factorize(MMappedQueryableIndexFactory.java:37) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:94) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:151) ~[druid-server-0.8.1.jar:0.8.1]

at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:142) ~[druid-server-0.8.1.jar:0.8.1]

… 18 more

These errors happen continuously except using different times for each segment. Here is my historical/runtime.properties:

druid.host=

druid.port=

druid.service=

druid.historical.cache.useCache=true

druid.historical.cache.populateCache=true

Our intermediate buffer is also very small so longer topNs will be slow.

In prod: set sizeBytes = 512mb

druid.processing.buffer.sizeBytes=1073741824

We can only 1 scan segment in parallel with these configs.

In prod: set numThreads = # cores - 1

druid.processing.numThreads=8

maxSize should reflect the performance you want.

Druid memory maps segments.

memory_for_segments = total_memory - heap_size - (processing.buffer.sizeBytes * (processing.numThreads+1)) - JVM overhead (~1G)

The greater the memory/disk ratio, the better performance you should see

druid.segmentCache.locations=[{“path”: “/path/to/druid/data/zk_druid”, “maxSize”: 300000000000}]

druid.monitoring.monitors=[“io.druid.server.metrics.HistoricalMetricsMonitor”, “com.metamx.metrics.JvmMonitor”]

druid.server.http.numThreads=16

druid.server.maxSize=300000000000

With the first exception, there appears to be no /tmp/druid at all, when I don’t believe I have any configuration set to that path. With the second exception where it’s looking for index.drd, the directory only contains a file called downloadStartMarker that is empty.

Is there any sort of configuration I may be missing that is preventing the zip and index.drd file from appearing at the proper locations, respectively? There may have been something that went wrong when importing this data, which was done by sending events to Kafka for Druid to ingest. However, I’m able to query Druid with no issues and I see segments getting into the database used for metadata storage. Furthermore, I’m using s3 for deep storage, but the logs don’t indicate anything going on with s3.

Any help is greatly appreciated. Let me know if other information is needed here. Thank you!

  • Geoff

You have local filesystem set up as your deep storage and probably restarted the machine, removing the segment from /tmp

Thanks Fangjin for responding. There’s probably something I am missing, but I don’t believe I have the local filesystem set for local storage.

Here is my common.runtime.properties:

Extensions (no deep storage model is listed - using local fs for deep storage - not recommended for production)

Also, for production to use mysql add, “io.druid.extensions:mysql-metadata-storage”

druid.extensions.coordinates=[“io.druid.extensions:druid-examples”,“io.druid.extensions:druid-kafka-eight”,“io.druid.extensions:druid-s3-extensions”]

druid.extensions.localRepository=extensions-repo

druid.request.logging.type=emitter

druid.request.logging.feed=druid_requests

Zookeeper

druid.zk.service.host=

druid.zk.paths.base=

druid.discovery.curator.path=/druid/discovery

Metadata Storage (use something like mysql in production by uncommenting properties below)

by default druid will use derby

druid.extensions.coordinates=[“io.druid.extensions:”]

druid.metadata.storage.type=

druid.metadata.storage.connector.connectURI=

druid.metadata.storage.connector.user=

druid.metadata.storage.connector.password=

Deep storage (local filesystem for examples - don’t use this in production)

#druid.storage.type=local

#druid.storage.storageDirectory=/tmp/druid/localStorage

druid.storage.type=s3

druid.storage.bucket=

druid.s3.accessKey=

druid.s3.secretKey=

Query Cache (we use a simple 10mb heap-based local cache on the broker)

druid.cache.type=local

druid.cache.sizeInBytes=10000000

Indexing service discovery

druid.selectors.indexing.serviceName=overlord

Monitoring (disabled for examples, if you enable SysMonitor, make sure to include sigar jar in your cp)

druid.monitoring.monitors=[“com.metamx.metrics.SysMonitor”,“com.metamx.metrics.JvmMonitor”]

Metrics logging (disabled for examples - change this to logging or http in production)

#druid.emitter=noop

druid.emitter=logging

#druid.emitter.http.recipientBaseUrl=

Here is my historical runtime.properties:

druid.host=

druid.port=

druid.service=druid/corp/historical

druid.historical.cache.useCache=true

druid.historical.cache.populateCache=true

Our intermediate buffer is also very small so longer topNs will be slow.

In prod: set sizeBytes = 512mb

druid.processing.buffer.sizeBytes=1073741824

We can only 1 scan segment in parallel with these configs.

In prod: set numThreads = # cores - 1

druid.processing.numThreads=8

maxSize should reflect the performance you want.

Druid memory maps segments.

memory_for_segments = total_memory - heap_size - (processing.buffer.sizeBytes * (processing.numThreads+1)) - JVM overhead (~1G)

The greater the memory/disk ratio, the better performance you should see

druid.segmentCache.locations=[{“path”: “/path/to/druid/data/zk_druid”, “maxSize”: 300000000000}]

druid.monitoring.monitors=[“io.druid.server.metrics.HistoricalMetricsMonitor”, “com.metamx.metrics.JvmMonitor”]

druid.server.http.numThreads=16

druid.server.maxSize=300000000000

These errors were happening immediately after data was coming in from kafka, so I don’t know if that’s related to the files missing from /tmp.

In the meantime, I’ll keep poking around and see if I have anything misconfigured. Thanks for your help.

  • Geoff

Hey Geoff,

I see your problem - you have druid.extensions.coordinates defined twice in your property file:

druid.extensions.coordinates=[
“io.druid.extensions:druid-examples”,“io.druid.extensions:druid-kafka-eight”,“io.druid.extensions:druid-s3-extensions”]

druid.extensions.coordinates=[
“io.druid.extensions:”]

The second one is overwriting the first, preventing the s3 extension from being loaded, so Druid defaults to using local for deep storage.

Thanks David. I noticed that afterwards, reran everything again (including importing data to a new kafka topic), and I’m still seeing the same thing.

I’ll keep playing around but any suggestions/help is welcomed.

Okay great. You should see a log message like this on startup: INFO [main] io.druid.storage.s3.S3DataSegmentPusher - Configured S3 as deep storage

You should not be seeing this: INFO [main] io.druid.segment.loading.LocalDataSegmentPusher - Configured local filesystem as deep storage

2015-12-03T23:06:01,143 WARN [Coordinator-Exec–0] io.druid.server.coordinator.rules.LoadRule - Not enough [_default_tier] servers or node capacity to assign segment[messages_2015-12-03T22:05:00.000Z_2015-12-03T22:10:00.000Z_2015-12-03T22:05:00.000Z]! Expected Replicants[2]

2015-12-03T23:06:01,144 WARN [Coordinator-Exec–0] io.druid.server.coordinator.rules.LoadRule - Not enough [_default_tier] servers or node capacity to assign segment[messages_2015-12-03T22:00:00.000Z_2015-12-03T22:05:00.000Z_2015-12-03T22:00:00.000Z]! Expected Replicants[2]

Something tells me, the issues I’m seeing has to do with the format of the data being ingested by kafka as JSON. On a side note, should there be any issues if nested JSON objects or an array within the JSON object? i.e.:

{
“info”: {“name”: “meh”, “moreInfo”: [“one”, “two”]}

“someList”: [“foo”, “bar”, “baz”]
}

Awesome! Glad you got the S3 deep storage working.

Regarding the coordinator logs, by default Druid is configured to load 2 replications of each segment, making sure that they’re on different historical nodes, for high availability. Likely you’re seeing that message because you only have one historical node running. It’s not a big deal, but if you want to make the warning go away, you can modify the default rule set using the coordinator console.

I think your hunch is right and it is indeed a data format issue. Druid supports multi-value dimensions, but Druid does not supported nested dimensions which should be flattened before being read by Druid. In other words, from your example:

“someList”: [“foo”, “bar”, “baz”] => multi-value dimension which is valid

“info”: {“name”: “meh”, “moreInfo”: [“one”, “two”]} => nested dimension, should be flattened to something like:

{
“info_name”: “meh”,
“info_moreInfo”: [“one”, “two”]
}

When your realtime node starts accepting data, you’ll see a log message that says something like “Announcing segment for [interval]”. If you’re not seeing this, Druid is probably still throwing away your data because of format or the message timestamp is outside of the acceptable time range (window period).