Problem with batch ingestion

I’m having problem with batch ingestion. After submiting task to overlord, indexing task throws exception:

Exception while running task[IndexTask{id=index_detal2_2016-02-10T12:53:13.627Z, type=index, dataSource=detal2}]
java.lang.ClassCastException: io.druid.data.input.impl.StringInputRowParser cannot be cast to io.druid.data.input.impl.MapInputRowParser
at io.druid.segment.realtime.firehose.EventReceiverFirehoseFactory.connect(EventReceiverFirehoseFactory.java:64) ~[druid-server-0.8.3.jar:0.8.3]
at io.druid.indexing.common.task.IndexTask.getDataIntervals(IndexTask.java:234) ~[druid-indexing-service-0.8.3.jar:0.8.3]
at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:192) ~[druid-indexing-service-0.8.3.jar:0.8.3]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:285) [druid-indexing-service-0.8.3.jar:0.8.3]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:265) [druid-indexing-service-0.8.3.jar:0.8.3]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_80]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_80]

``

This is my task configuration:

{
“type”: “index”,
“spec”: {
“dataSchema”: {
“dataSource”: “detal2”,
“parser”: {
“type”: “string”,
“parseSpec”: {
“format”: “json”,
“timestampSpec”: {
“column”: “timestamp”
},
“dimensionsSpec”: {
“dimensions”: [
“source_id”,
“visualization_id”,
“placement_id”,
“placement_type”
]
}
}
},
“metricsSpec”: [
{
“type”: “longSum”,
“name”: “shows_vis”,
“fieldName”: “shows_vis”
},
{
“type”: “longSum”,
“name”: “clicks_real”,
“fieldName”: “clicks_real”
}
],
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “DAY”,
“queryGranularity”: “DAY”,
“intervals”: [
“2015-12-11/2016-01-29”
]
}
},
“ioConfig”: {
“type”: “index”,
“firehose”: {
“type”: “receiver”,
“serviceName”: “detal2”
}
}
}
}

``

I’m using druid 0.8.3.
Changing parseSpec to csv (with adding ‘columns’ field) doesn’t change exception.

if you are doing batch indexing via hadoop please set type=index_hadoopinstead oftype=index

No, I’m not using hadoop

W dniu środa, 10 lutego 2016 17:43:21 UTC+1 użytkownik Slim Bouguerra napisał:

ok so looks like you want to ingest data via HTTP.
if so please use mapInputParser so use type=map instead of type=string for parser.

Please let me know if it works or not.

If you want to ingest data over HTTP, please check out: http://druid.io/docs/0.9.0-rc1/tutorials/tutorial-streams.html from the latest RC

This is based off of http://imply.io/docs/latest/tutorial-streams, which might actually be a lot easier to get started with.

Yes, changing ‘string’ to ‘map’ did work. However I couldn’t find method to ‘finish’ batch ingestion task using EventReceiverFirehose, is there some undocumented way to do this?

W dniu czwartek, 11 lutego 2016 19:27:40 UTC+1 użytkownik Fangjin Yang napisał:

ERF is not meant for batch ingestion.