error on hadoopIndexer over indexingService

Hi all,

Currently, we are working with HadoopIndexer over indexing service to reindex data adding new dimensions. We are working with:

Reindex nodes:

  • 3 druid_middleManagers (12 CPUs and 64GB RAM)

  • 1 druid_overlord

  • 1 hadoop_namenode

  • 2 hadoop_datanode
    Commons nodes:

  • 2 druid_historicals

  • 2 druid_coordinator

  • 2 druid_brokers

  • 2 druid_realtime
    We have 1 hour segments granularity, and we are trying to reindex 1 month. We are using hadoop-static reindexing because we haven’t a partition data on HDFS, we are using static files with raw data.

We are launching hadoopIndexer task to reindex 4 hour of data (4 segments of 1 hour), some task end with status “SUCCESS” but other task end with status “FAILED”. I have been seen this exception on the log task:

2015-09-06T11:01:49,916 WARN [Thread-125] org.apache.hadoop.mapred.LocalJobRunner - job_local394600922_0003

java.lang.Exception: java.lang.NullPointerException

at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.3.0.jar:?]

Caused by: java.lang.NullPointerException

at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:433) ~[hadoop-common-2.3.0.jar:?]

at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1399) ~[hadoop-common-2.3.0.jar:?]

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.writeSegmentDescriptor(IndexGeneratorJob.java:645) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.renameIndexFiles(IndexGeneratorJob.java:633) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.serializeOutIndex(IndexGeneratorJob.java:545) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:449) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]

at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:295) ~[druid-services-0.7.1.1-selfcontained.jar:0.7.1.1]

at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]

at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]

at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) ~[hadoop-mapreduce-client-core-2.3.0.jar:?]

at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) ~[hadoop-mapreduce-client-common-2.3.0.jar:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:1.7.0_03]

at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) ~[?:1.7.0_03]

at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.7.0_03]

at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:1.7.0_03]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:1.7.0_03]

at java.lang.Thread.run(Unknown Source) ~[?:1.7.0_03]

``

I have sheared about this exception on Internet and other forums, and I only found that this maybe is a problem with the s3 credentials… but I think that this isn’t the problem because others task works fine and all the tasks run with the same configuration.

I attach the task log file on this post. I hope that someone can help me with this issue …

Regards and thanks,

Andres

task.log (558 KB)

Hey Andres,

IIRC this can happen if you try to read an empty file, or an S3 “directory”. Can you try running again with any empty files removed from your pathSpec?

Hi Gian, when you said “pathSpec” is the same that “inputSpec” ???

Regards,

Andres

Ah, yeah, I mean “inputSpec”. I get those mixed up sometimes because in the code the object is called a PathSpec :slight_smile:

hahaha thanks Gian :slight_smile: I will try to check this tomorrow!

Regards,

Andrés Gómez

Developer****

redborder.net / agomez@redborder.net

Phone: +34 955 60 11 60

0e6e8de_1.png

square-twitter-20.png square-google-plus-20.png square-linkedin-20.png

Piénsalo antes de imprimir este mensaje

Este correo electrónico, incluidos sus anexos, se dirige exclusivamente a su destinatario. Contiene información CONFIDENCIAL cuya divulgación está prohibida por la ley o puede estar sometida a secreto profesional. Si ha recibido este mensaje por error, le rogamos nos lo comunique inmediatamente y proceda a su destrucción.

This email, including attachments, is intended exclusively for its addressee. It contains information that is CONFIDENTIAL whose disclosure is prohibited by law and may be covered by legal privilege. If you have received this email in error, please notify the sender and delete it from your system.

En 10 de septiembre de 2015 en 20:57:39, Gian Merlino (gianmerlino@gmail.com) escrito: