Class cast exception when trying Avro + S3 ingestion

Found a few other threads with the same exception, but no solutions… Any help would be greatly appreciated!

Exception:

java.lang.ClassCastException: io.druid.segment.transform.TransformingInputRowParser cannot be cast to io.druid.data.input.impl.StringInputRowParser
at io.druid.data.input.impl.prefetch.PrefetchableTextFilesFirehoseFactory.connect(PrefetchableTextFilesFirehoseFactory.java:87) ~[druid-api-0.12.3.jar:0.12.3]
at io.druid.indexing.common.task.IndexTask.collectIntervalsAndShardSpecs(IndexTask.java:467) ~[druid-indexing-service-0.12.3.jar:0.12.3]
at io.druid.indexing.common.task.IndexTask.createShardSpecsFromInput(IndexTask.java:401) ~[druid-indexing-service-0.12.3.jar:0.12.3]
at io.druid.indexing.common.task.IndexTask.determineShardSpecs(IndexTask.java:339) ~[druid-indexing-service-0.12.3.jar:0.12.3]
at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:237) ~[druid-indexing-service-0.12.3.jar:0.12.3]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:444) [druid-indexing-service-0.12.3.jar:0.12.3]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:416) [druid-indexing-service-0.12.3.jar:0.12.3]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]

Job Config:

{
“type” : “index”,
“spec” : {
“dataSchema” : {
“dataSource” : “users”,
“parser” : {
“type” : “avro_hadoop”,
“parseSpec” : {
“format” : “avro”,
“timestampSpec” : {
“column” : “timestamp”,
“format” : “millis”
},
“dimensionsSpec” : {
“dimensions”: [“uuid”],
“dimensionExclusions” : [“siteId”, “catId”],
“spatialDimensions” :
}
}
},
“metricsSpec” : [
{
“type” : “thetaSketch”,
“name” : “numUsers”
“fieldName”:“numUsers”
}
],
“granularitySpec” : {
“type” : “uniform”,
“segmentGranularity” : “MINUTE”,
“queryGranularity” : “MINUTE” }
},
“ioConfig” : {
“type” : “index”,
“firehose” : {
“type” : “static-s3”,
“prefixes”:[“s3://my-bucket/druid/test-data/1/”]
}
},
“tuningConfig” : {
“type” : “index”,
“targetPartitionSize” : 500000,
“maxRowsInMemory” : 750000
}
}
}

``

avro schema:
{
“type” : “record”,
“name” : “topLevelRecord”,
“fields” : [ {
“name” : “uuid”,
“type” : [ “string”, “null” ]
}, {
“name” : “timestamp”,
“type” : [ “long”, “null” ]
}, {
“name” : “siteId”,
“type” : [ “int”, “null” ]
}, {
“name” : “catId”,
“type” : [ “int”, “null” ]
} ]
}

``

This type of avro parser is only supported for the Hadoop batch indexing task (http://druid.io/docs/latest/ingestion/hadoop.html), it won’t work with the native batch index task:


     "parser" : {
        "type" : "avro_hadoop",

If you don’t have a Hadoop cluster, you can still run the Hadoop batch indexing task, it will use a LocalJobRunner instead on the peon.

Thanks,

Jon