Indexing question /error

Hi,

i am having trying to load some data into a new system and as it;s only a demo i am trying to use index job and not hadoop that i use most of the time,

I am getting a null error, dose someone know where did it get to job wrong ? here is the info. files are all parquet, payload :

{

“type”: “index”,

“spec”: {

“ioConfig”: {

“type”: “index”,

“inputSpec”: {

“type”: “static”,

“inputFormat”: “io.druid.data.input.parquet.DruidParquetInputFormat”,

“paths”: “s3n:///partition_dt=20180718/”

}

},

“dataSchema”: {

“dataSource”: “AlonTest”,

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “DAY”,

“queryGranularity”: “DAY”,

“intervals”: [“2018-07-18T00:00:00Z/P1D”]

},

“parser”: {

“type”: “parquet”,

“parseSpec”: {

“format”: “timeAndDims”,

“dimensionsSpec”: {

“dimensions”: [

“rmodel_id”,

“amodel_id”,

“ff”,

“country_name”,

“software_version”,

“kk”,

“uu”,

“iii”,

“ccc”,

“p_version”,

“d1_version”,

“d2_version”,

“software_type”

]

},

“timestampSpec”: {

“format”: “auto”,

“column”: “startmeasuremen”

}

}

},

“metricsSpec”: [

{

“name”: “datacount”,

“type”: “count”

},

{

“name”: “vSum”,

“type”: “floatSum”,

“fieldName”: “v”

},

{

“name”: “aSum”,

“type”: “floatSum”,

“fieldName”: “a”

},

{

“name”: “b_sum”,

“type”: “floatSum”,

“fieldName”: “b”

}

]

},

“tuningConfig”: {

“type”: “index”,

“overwriteFiles”: “true”,

“useCombiner”: “false”,

“buildV9Directly”: “true”,

“numBackgroundPersistThreads”: 1,

“jobProperties”: {

“fs.s3.awsAccessKeyId” : “XXX”,

“fs.s3.awsSecretAccessKey” : “XXXX”,

“fs.s3.impl” : “org.apache.hadoop.fs.s3native.NativeS3FileSystem”,

“fs.s3n.awsAccessKeyId” : “XXXXX”,

“fs.s3n.awsSecretAccessKey” : “XXXXX”,

“fs.s3n.impl” : “org.apache.hadoop.fs.s3native.NativeS3FileSystem”,

“io.compression.codecs” : “org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec”

}

}

}

}

Here is the error from the log :

2018-07-19T12:20:16,831 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.initialization.jetty.CustomExceptionMapper to GuiceManagedComponentProvider with the scope “Singleton”

2018-07-19T12:20:16,832 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.initialization.jetty.ForbiddenExceptionMapper to GuiceManagedComponentProvider with the scope “Singleton”

2018-07-19T12:20:16,833 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider to GuiceManagedComponentProvider with the scope “Singleton”

2018-07-19T12:20:16,839 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding com.fasterxml.jackson.jaxrs.smile.JacksonSmileProvider to GuiceManagedComponentProvider with the scope “Singleton”

2018-07-19T12:20:17,003 INFO [task-runner-0-priority-0] io.druid.segment.realtime.appenderator.AppenderatorImpl - Shutting down…

2018-07-19T12:20:17,008 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_AlonTest_2018-07-19T12:20:13.338Z, type=index, dataSource=AlonTest}]

java.lang.NullPointerException

at io.druid.indexing.common.task.IndexTask.generateAndPublishSegments(IndexTask.java:655) ~[druid-indexing-service-0.12.1-iap8.jar:0.12.1-iap8]

at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:264) ~[druid-indexing-service-0.12.1-iap8.jar:0.12.1-iap8]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:457) [druid-indexing-service-0.12.1-iap8.jar:0.12.1-iap8]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:429) [druid-indexing-service-0.12.1-iap8.jar:0.12.1-iap8]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_161]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]

2018-07-19T12:20:17,013 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_AlonTest_2018-07-19T12:20:13.338Z] status changed to [FAILED].

2018-07-19T12:20:17,018 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

“id” : “index_AlonTest_2018-07-19T12:20:13.338Z”,

“status” : “FAILED”,

“duration” : 374,

“errorMsg” : null

}

2018-07-19T12:20:17,140 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.http.security.StateResourceFilter to GuiceInstantiatedComponentProvider

2018-07-19T12:20:17,179 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.http.SegmentListerResource to GuiceManagedComponentProvider with the scope “PerRequest”

2018-07-19T12:20:17,184 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.QueryResource to GuiceInstantiatedComponentProvider

2018-07-19T12:20:17,186 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.segment.realtime.firehose.ChatHandlerResource to GuiceInstantiatedComponentProvider

2018-07-19T12:20:17,189 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.http.security.ConfigResourceFilter to GuiceInstantiatedComponentProvider

2018-07-19T12:20:17,192 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.query.lookup.LookupListeningResource to GuiceInstantiatedComponentProvider

2018-07-19T12:20:17,194 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.query.lookup.LookupIntrospectionResource to GuiceInstantiatedComponentProvider

2018-07-19T12:20:17,195 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.StatusResource to GuiceManagedComponentProvider with the scope “Undefined”

2018-07-19T12:20:17,208 WARN [main] com.sun.jersey.spi.inject.Errors - The following warnings have been detected with resource and/or provider classes:

WARNING: A HTTP GET method, public void io.druid.server.http.SegmentListerResource.getSegments(long,long,long,javax.servlet.http.HttpServletRequest) throws java.io.IOException, MUST return a non-void type.

2018-07-19T12:20:17,214 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Started o.e.j.s.ServletContextHandler@6b9c42bd{/,null,AVAILABLE}

Any idea ?

Thank you,

Alon

For the local indexing task, you’ll need to define a “firehose” in the “ioConfig” instead of an “inputSpec”:

http://druid.io/docs/latest/ingestion/tasks.html

http://druid.io/docs/latest/development/extensions-core/s3.html

Thanks,

Jon

I tried that to and got some strange error messages, something like that:
io.druid.segment.transform.TransformingInputRowParser cannot be cast to io.druid.data.input.impl.StringInputRowParser and i was mistaken to think its a firehose thing… but no

today i did some more research and found this post the @gian wrote :

https://groups.google.com/forum/#!searchin/druid-user/io.druid.segment.transform.TransformingInputRowParser$20cannot$20be$20cast$20to$20io.druid.data.input.impl.StringInputRowParser%7Csort:date/druid-user/FOi11sJNPlM/lQAxooC6AAAJ

if that is true then Parquet is also not supported (why not ?)… anyone knows if that true? gian ?