JsonMappingException with Firehose extension

Moved this out of the Druid Development group into here in case this is the more appropriate forum.

tldr;

I wrote a firehose extension that initializes fine on Overlord startup but when it comes time to be called when parsing the JSON payload, it cannot be found. I have the META-INF file and the appropriate module binding afaict.

I believe I’ve followed the advice of what is posted here https://groups.google.com/forum/#!msg/druid-development/3IvbV4BK0yw/afiwEj7klIgJ however, I am getting a MappableContainerException is being thrown when submitting an index task.

I am attempting to create a custom firehose and while it seems to be registering properly (the Overlord starts up), when I issue the task, it throws the following error:

I have attempted to check the library values for all required dependencies in my custom module.

Questions

  1. If I am developing a DruidModule extension, it seems that the only way to get it on the classpath is to copy my jar to the necessary servers and do a local Maven install since I am not running my own Maven repo. Is there a way for me to just drop it in the druid/lib directory without having to install into a local or remote Maven repo?
  2. Do the common.runtime.properties need to be the same on every node in the Druid cluster regardless of server type?
  3. Is there a way to log out all of the named jackson subtypes?
  4. Is there a good example of what I need to mock to test out a Firehose? I’m still learning the system models and how they interact with one another.

Exception

2015-05-15T18:46:06.048850+00:00 i-316590ce druid_overlord: Caused by: com.fasterxml.jackson.databind.JsonMappingException: Could not resolve type id ‘cacheable-s3’ into a subtype of [simple type, class io.druid.data.input.FirehoseFactory]

2015-05-15T18:46:06.048855+00:00 i-316590ce druid_overlord: at [Source: HttpInputOverHTTP@bac9b0d; line: 60, column: 17]

2015-05-15T18:46:06.048859+00:00 i-316590ce druid_overlord: #011at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148) ~[jackson-databind-2.4.4.jar:2.4.4]

2015-05-15T18:46:06.048864+00:00 i-316590ce druid_overlord: #011at com.fasterxml.jackson.databind.DeserializationContext.unknownTypeException(DeserializationContext.java:862) ~[jackson-databind-2.4.4.jar:2.4.4]

2015-05-15T18:46:06.048870+00:00 i-316590ce druid_overlord: #011at com.fasterxml.jackson.databind.jsontype.impl.TypeDeserializerBase._findDeserializer(TypeDeserializerBase.java:167) ~[jackson-databind-2.4.4.jar:2.4.4]

2015-05-15T18:46:06.048875+00:00 i-316590ce druid_overlord: #011at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedForId(AsPropertyTypeDeserializer.java:99) ~[jackson-databind-2.4.4.jar:2.4.4]

2015-05-15T18:46:06.048940+00:00 i-316590ce druid_overlord: #011at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:84) ~[jackson-databind-2.4.4.jar:2.4.4]

2015-05-15T18:46:06.048956+00:00 i-316590ce druid_overlord: #011at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:132) ~[jackson-databind-2.4.4.jar:2.4.4]

2015-05-15T18:46:06.048961+00:00 i-316590ce druid_overlord: #011at com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:536) ~[jackson-databind-2.4.4.jar:2.4.4]

2015-05-15T18:46:06.048966+00:00 i-316590ce druid_overlord: #011at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:344) ~[jackson-databind-2.4.4.jar:2.4.4]

2015-05-15T18:46:06.048970+00:00 i-316590ce druid_overlord: #011at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1064) ~[jackson-databind-2.4.4.jar:2.4.4]

Task Spec

{

"type": "index",
"spec": {
    "dataSchema": {
        "dataSource": "druid-unload",
        "parser": {
            "parseSpec": {
                "dimensionsSpec": {
                    "dimensions": [
                       ...
                    ]
                },
                "timestampSpec": {
                    "column": "start_time",
                    "format": "yyyy-MM-dd HH:mm:ss"
                },
                "columns": [
                 ...
                ],
                "format": "csv",
                "listDelimiter": "|"
            },
            "type": "string"
        },
        "metricsSpec" : [
            {
                "type" : "count",
                "name" : "count"
            }
        ],
        "granularitySpec" : {
            "type" : "uniform",
            "intervals" : [ "2015-04-19/2015-04-20" ],
            "segmentGranularity": "DAY"
        }
    },
    "ioConfig" : {
        "type" : "index",
        "firehose" : {
            "type" : "cacheable-s3",
            "uris" : ["s3://bucket/druid-unload/y=2015/m=04/d=19/0000_part_00.gz"],
            "cacheLocation": "/mnt/aso"
        }
    },
    "tuningConfig": {
        "rowFlushBoundary": 0,
        "type": "index",
        "targetPartitionSize": 0
    }
}

}

Overlord common.runtime.properties

druid.extensions.coordinates=[“io.druid.extensions:mysql-metadata-storage”,“io.druid.extensions:druid-s3-extensions”, “com.monetate.io.druid:druid-s3-extensions:1.0-SNAPSHOT”]

curl overlord-host:8080/status | jq ‘.’

{

“version”: “0.7.1.1”,

“modules”: [

{
  "name": "com.monetate.io.druid.S3CacheableFirehoseDruidModule"
},
{
  "name": "io.druid.storage.s3.S3StorageDruidModule",
  "artifact": "druid-s3-extensions",
  "version": "0.7.1.1"
},
{
  "name": "io.druid.firehose.s3.S3FirehoseDruidModule",
  "artifact": "druid-s3-extensions",
  "version": "0.7.1.1"
},
{
  "name": "io.druid.metadata.storage.mysql.MySQLMetadataStorageModule",
  "artifact": "mysql-metadata-storage",
  "version": "0.7.1.1"
}

],

“memory”: {

"maxMemory": 4268163072,
"totalMemory": 4268163072,
"freeMemory": 4118678856,
"usedMemory": 149484216

}

}

Code for the extension is here: https://github.com/anthonyjso/druid-s3-extensions

I’m not exactly sure where to go next so any help would be appreciated. This is more of a POC than anything else but the goal would be to avoid repeated streaming from S3 to help speed up all the looping that occurs on the data.

Thanks,

Anthony

Hi Anthony,
I believe in the code attached you are registering the Module class instead of firehose factory class.

Instead of -

NamedType cacheableS3JsonType = new NamedType(**S3CacheableFirehoseDruidModule.**class, “cacheable-s3”);
change it to -

NamedType cacheableS3JsonType = new NamedType(S3CacheableFirehoseFactory.class, “cacheable-s3”);

for your other questions see inline

Thanks Nishant for the detailed response and also to Himanshu who replied to me directly about my goof when registering the sub type.