Hadoop segment hand off failed with druid-hdfs-storage in 0.10.0

Hi,

We are in the process of upgrading our Druid cluster form version 0.8.1 to 0.10.0. We do real time ingestion via IndexingService and use Hadoop as a deep storage. During this migration, we encountered the following issue in the MiddleManager while handing off the segment to Hadoop,

**2017-05-19T18:02:01,693 INFO [metrics8-2017-05-19T17:55:00.000Z-persist-n-merge] io.druid.storage.hdfs.HdfsDataSegmentPusher - Creating descriptor file at[hdfs://xxx:8020/.../0_descriptor.json]
Exception in thread "plumber_merge_0" java.lang.IllegalAccessError: tried to access method org.apache.hadoop.fs.FileSystem.rename(Lorg/apache/hadoop/fs/Path;Lorg/apache/hadoop/fs/Path;[Lorg/apache/hadoop/fs/Options$Rename;)V from class org.apache.hadoop.fs.HadoopFsWrapper
	at org.apache.hadoop.fs.HadoopFsWrapper.rename(HadoopFsWrapper.java:51)
	at io.druid.storage.hdfs.HdfsDataSegmentPusher.copyFilesWithChecks(HdfsDataSegmentPusher.java:161)
	at io.druid.storage.hdfs.HdfsDataSegmentPusher.push(HdfsDataSegmentPusher.java:142)
	at io.druid.segment.realtime.plumber.RealtimePlumber$2.doRun(RealtimePlumber.java:430)
	at io.druid.common.guava.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:42)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)**

Can you please let us know what could be the reason for this issue?

When we replaced the druid-hdfs-storage , which comes as part of druid version - 0.10.0, with the druid-hdfs-storage version 0.9.0 then everything works fine.

Can you also share what’s the change we made in the druid-hdfs-storage in the latest release when comparing 0.9.0?

//Sithik

It would be grateful if someone point us the cause for this issue.

Apart from the above cited issue, we encountered another issue when trying to start tranquility server with “druid-stats” extension. Here is the way we started and the exception we got

[tranquility-distribution-0.8.0]$ bin/tranquility server -configFile …/druid-0.10.0/conf-quickstart/tranquility/server.json -Ddruid.extensions.loadList=’[“druid-stats”]’ -Ddruid.extensions.directory=…/druid-0.10.0/extensions

2017-05-22 06:23:23,329 [main] INFO io.druid.guice.JsonConfigurator - Loaded class[class io.druid.guice.ExtensionsConfig] from props[druid.extensions.] as [ExtensionsConfig{searchCurrentClassloader=true, directory=’…/druid-0.10.0/extensions’, hadoopDependenciesDir=‘hadoop-dependencies’, loadList=[druid-stats]}]

2017-05-22 06:23:23,355 [main] INFO i.d.initialization.Initialization - Loading extension [druid-stats] for class [io.druid.initialization.DruidModule]

2017-05-22 06:23:23,369 [main] INFO i.d.initialization.Initialization - added URL[file:/xxx/tranquility-distribution-0.8.0/…/druid-0.10.0/extensions/druid-stats/druid-stats-0.10.0.jar]

2017-05-22 06:23:23,370 [main] INFO i.d.initialization.Initialization - added URL[file:/xxx/tranquility-distribution-0.8.0/…/druid-0.10.0/extensions/druid-stats/druid-stats-0.9.2.jar]

2017-05-22 06:23:23,378 [main] INFO i.d.initialization.Initialization - Adding local file system extension module [io.druid.query.aggregation.stats.DruidStatsModule] for class [io.druid.initialization.DruidModule]

java.lang.NoClassDefFoundError: io/druid/java/util/common/IAE

at java.lang.Class.getDeclaredFields0(Native Method)

at java.lang.Class.privateGetDeclaredFields(Class.java:2583)

at java.lang.Class.getDeclaredFields(Class.java:1916)

at com.fasterxml.jackson.databind.introspect.AnnotatedClass._findFields(AnnotatedClass.java:689)

Looks like a class file is missing. Can you suggest how to solve this as well?

//Sithik

Can someone help us for these issues please ?

//Sithik

Hey Sithik,

For Tranquility don’t install both jars; stick to druid-stats-0.9.2.jar. Tranquility is still built agains Druid 0.9.x (it will run fine against a Druid 0.10.x server though).

For the Hadoop thing, did you replace any of the hadoop jars in Druid’s classpath? If so that might explain this error.

Thank you Gian for the response.

Regarding Tranquility issue: As I was getting issue with druid-stats-0.10.0, I already started using the druid-stats-0.9.2 which is working fine and your reply also confirmed that we won’t get in to any issues by using druid-stats-0.9.2.jar

Regarding Hadoop issue: No, I haven’t changed any of the hadoop jars. As I said, If I replace **ONLY **druid-hdfs-storage-0.10.0 by druid-hdfs-storage-0.9.0, then everything works fine.

Can you please share what could be the possibility of this issue?

Thanks,

Sithik

I’m not sure exactly – are there any differences in jars in the extension directories between the 0.9.0 and 0.10.0 versions, other than the main extension jar itself?

Both the versions have totally 31 jars under druid-hdfs-storage extension directory.
The only mismatch I could see is a version number of “commons-math” jar

0.9 .0 : commons-math3-3.1.1.jar

0.10.0 : commons-math3-3.6.1.jar

Please advice further.

Thanks,

Sithik

Sorry, missed to ask this question. Will there be any issue if I go “druid-hdfs-storage” extension alone with 0.9.0 version while keeping everything else with 0.10.0?

Thanks,

Sithik

Hi Gian,

we have a configuration setup to create hourly segment with 15 partitions WITHOUT replication.

when we go “druid-hdfs-storage” extension alone with 0.9.0 version while keeping everything else with 0.10.0, we encountered an issue of having ONLY ONE partition being written on Hadoop while all other partitions fail with “lease mismatch issue” which should ideally happen when we have replication enabled but we haven’t enabled the replication. Hence using druid-hdfs-storage-0.9.0 is not the right approach I feel.

Can you please help us on this?

Thanks,

Sithik


I think this is bug

在 2017年5月25日星期四 UTC+8下午2:34:48,sit…@gmail.com写道: