JsonParseException when trying the ORC index task

Hi All ,

I am trying to ingest a ORC formatted file sitting in s3 in to druid

the steps followed till now is :-

included “druid-orc-extensions” and druid-s3-extensions" in the druid load list in my common.runtime.properties

added my s3 properties as deep storage in common.runtime.properties

druid.storage.type=s3

druid.storage.bucket=bucketname

druid.storage.baseKey=druid/segments

druid.s3.accessKey=

druid.s3.secretKey=

druid.storage.storageDirectory=s3://bucketname/druid/segments

set up my index task as below :

{

“type” : “index”,

“spec” : {

“ioConfig” : {

“type” : “index”,

“firehose”: {

“type”: “static-s3”,

“inputFormat”: “org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat”,

“uris”: [“s3://bucketname/druid/segments/000000”],

“fetchTimeout”: 90000

},

“appendToExisting”: false

},

“dataSchema” : {

“dataSource” : “visitdests3ORC”,

“parser” : {

“type” : “orc”,

“parseSpec” : {

“format” : “json”,

“dimensionsSpec” : {

“dimensions” : [

“site_name”,

“product_ln_name”,

“dest_id”

]

},

“timestampSpec” : {

“format” : “auto”,

“column” : “local_dt”

}

}

},

“metricsSpec” : ,

“granularitySpec” : {

“type” : “uniform”,

“segmentGranularity” : “day”,

“queryGranularity” : “none”,

“intervals” : [“2017-01-01/2017-01-01”],

“rollup” : false

}

},

“tuningConfig” : {

“type” : “index”,

“targetPartitionSize” : 5000000,

“maxRowsInMemory” : 500,

“forceExtendableShardSpecs”: true,

“reportParseExceptions”: true

}

}

but while running the index task : I am getting the below error :

Caused by: com.fasterxml.jackson.core.JsonParseException: Unrecognized token ‘ORC’: was expecting (‘true’, ‘false’ or ‘null’)

at [Source: ORC%; line: 1, column: 4]

at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1581) ~[jackson-core-2.6.7.jar:2.6.7]

at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:533) ~[jackson-core-2.6.7.jar:2.6.7]

at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._reportInvalidToken(ReaderBasedJsonParser.java:2462) ~[jackson-core-2.6.7.jar:2.6.7]

at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1621) ~[jackson-core-2.6.7.jar:2.6.7]

at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:689) ~[jackson-core-2.6.7.jar:2.6.7]

at com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3776) ~[jackson-databind-2.6.7.jar:2.6.7]

at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3721) ~[jackson-databind-2.6.7.jar:2.6.7]

at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2726) ~[jackson-databind-2.6.7.jar:2.6.7]

at org.apache.druid.java.util.common.parsers.JSONPathParser.parseToMap(JSONPathParser.java:70) ~[java-util-0.13.0-incubating.jar:0.13.0-incubating]

at org.apache.druid.data.input.impl.StringInputRowParser.parseString(StringInputRowParser.java:155) ~[druid-api-0.13.0-incubating.jar:0.13.0-incubating]

at org.apache.druid.data.input.impl.StringInputRowParser.parse(StringInputRowParser.java:148) ~[druid-api-0.13.0-incubating.jar:0.13.0-incubating]

at org.apache.druid.segment.transform.TransformingStringInputRowParser.parse(TransformingStringInputRowParser.java:57) ~[druid-processing-0.13.0-incubating.jar:0.13.0-incubating]

at org.apache.druid.data.input.impl.FileIteratingFirehose.nextRow(FileIteratingFirehose.java:81) ~[druid-api-0.13.0-incubating.jar:0.13.0-incubating]

at org.apache.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:999) ~[druid-indexing-service-0.13.0-incubating.jar:0.13.0-incubating]

… 7 more

2019-03-29T07:37:15,804 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - Unregistering chat handler[index_visitdests3ORC_2019-03-29T07:37:08.424Z]

2019-03-29T07:37:15,804 INFO [task-runner-0-priority-0] org.apache.druid.indexing.overlord.TaskRunnerUtils - Task [index_visitdests3ORC_2019-03-29T07:37:08.424Z] status changed to [FAILED].

2019-03-29T07:37:15,805 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

“id” : “index_visitdests3ORC_2019-03-29T07:37:08.424Z”,

“status” : “FAILED”,

“duration” : 433,

“errorMsg” : “java.lang.RuntimeException: Max parse exceptions exceeded, terminating task…\n\tat org.apache.druid…”

}

2019-03-29T07:37:15,813 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop

Am I doing something wrong ? can somebody help me?

Thanks,

Anoosha

Hi Folks ,

Is there somebody who have tried ingesting a ORC file sitting in S3 .

Thanks,

Anoosha

I think this was covered here: https://groups.google.com/forum/#!topic/druid-user/22d08uJpszw