Can't load batch data through hadoop index task

Hi, All:

I want to load the batch data from s3 file, It happened this issue as below when the task running,

2015-05-20T18:34:00,968 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Job job_1431709191676_6314 failed with state FAILED due to: Task failed task_1431709191676_6314_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

2015-05-20T18:34:01,139 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Counters: 12
	Job Counters
		Failed map tasks=4
		Launched map tasks=4
		Other local map tasks=3
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=24590
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=24590
		Total vcore-seconds taken by all map tasks=24590
		Total megabyte-seconds taken by all map tasks=37770240
	Map-Reduce Framework
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
2015-05-20T18:34:01,143 ERROR [task-runner-0] io.druid.indexer.DetermineHashedPartitionsJob - Job failed: job_1431709191676_6314
2015-05-20T18:34:01,143 INFO [task-runner-0] io.druid.indexer.JobHelper - Deleting path[/tmp/druid-indexing/wikipedia_hadoop/2015-05-20T183226.576Z]
2015-05-20T18:34:01,201 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_wikipedia_hadoop_2015-05-20T18:32:26.618Z, type=index_hadoop, dataSource=wikipedia_hadoop}]
java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_65]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_65]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_65]
	at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_65]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:228) ~[druid-indexing-service-0.7.1.1.jar:0.7.1.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:235) [druid-indexing-service-0.7.1.1.jar:0.7.1.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:214) [druid-indexing-service-0.7.1.1.jar:0.7.1.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_65]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_65]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_65]
	at java.lang.Thread.run(Thread.java:745) [?:1.7.0_65]
Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.DetermineHashedPartitionsJob] failed!
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:155) ~[druid-indexing-hadoop-0.7.1.1.jar:0.7.1.1]
	at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:84) ~[druid-indexing-hadoop-0.7.1.1.jar:0.7.1.1]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:334) ~[druid-indexing-service-0.7.1.1.jar:0.7.1.1]
	... 11 more
2015-05-20T18:34:01,218 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_wikipedia_hadoop_2015-05-20T18:32:26.618Z",
  "status" : "FAILED",
  "duration" : 69640
}
2015-05-20T18:34:01,221 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.coordination.AbstractDataSegmentAnnouncer.stop()] on object[io.druid.server.coordination.BatchDataSegmentAnnouncer@a31ac08].
2015-05-20T18:34:01,221 INFO [main] io.druid.server.coordination.AbstractDataSegmentAnnouncer - Stopping class io.druid.server.coordination.BatchDataSegmentAnnouncer with config[io.druid.server.initialization.ZkPathsConfig@58d3f4be]
2015-05-20T18:34:01,221 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/ip-10-3-12-100:8100]
2015-05-20T18:34:01,248 INFO [ServerInventoryView-0] io.druid.client.BatchServerInventoryView - Server Disappeared[DruidServerMetadata{name='ip-10-3-12-100:8100', host='ip-10-3-12-100:8100', maxSize=0, tier='_default_tier', type='indexer-executor', priority='0'}]
2015-05-20T18:34:01,250 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.indexing.worker.executor.ExecutorLifecycle.stop()] on object[io.druid.indexing.worker.executor.ExecutorLifecycle@50dfdc70].
2015-05-20T18:34:01,262 INFO [main] org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@aa11928{HTTP/1.1}{0.0.0.0:8100}
2015-05-20T18:34:01,264 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@66fda52a{/,null,UNAVAILABLE}
2015-05-20T18:34:01,266 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.indexing.overlord.ThreadPoolTaskRunner.stop()] on object[io.druid.indexing.overlord.ThreadPoolTaskRunner@5412d89b].
2015-05-20T18:34:01,267 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.client.ServerInventoryView.stop() throws java.io.IOException] on object[io.druid.client.BatchServerInventoryView@6a08de46].
2015-05-20T18:34:01,267 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.announcement.Announcer.stop()] on object[io.druid.curator.announcement.Announcer@432cf6ba].
2015-05-20T18:34:01,268 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[io.druid.curator.discovery.ServerDiscoverySelector@75c5cfb].
2015-05-20T18:34:01,269 INFO [main] io.druid.curator.CuratorModule - Stopping Curator
2015-05-20T18:34:01,283 INFO [main] org.apache.zookeeper.ZooKeeper - Session: 0x14d702834f80031 closed
2015-05-20T18:34:01,283 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.http.client.NettyHttpClient.stop()] on object[com.metamx.http.client.NettyHttpClient@6cafae75].
2015-05-20T18:34:01,284 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down
2015-05-20T18:34:01,315 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.metrics.MonitorScheduler.stop()] on object[com.metamx.metrics.MonitorScheduler@1c4e421f].
2015-05-20T18:34:01,315 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.emitter.service.ServiceEmitter.close() throws java.io.IOException] on object[com.metamx.emitter.service.ServiceEmitter@18fb5419].


Could the somebody help me to see it?

Thanks

在 2015年5月21日星期四 UTC+8上午2:41:42,luo…@conew.com写道:

Hi, All:

I want to load the batch data from s3 file, It happened this issue as below when the task running,

2015-05-20T18:34:00,968 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Job job_1431709191676_6314 failed with state FAILED due to: Task failed task_1431709191676_6314_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

2015-05-20T18:34:01,139 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Counters: 12
	Job Counters
		Failed map tasks=4
		Launched map tasks=4
		Other local map tasks=3
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=24590
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=24590
		Total vcore-seconds taken by all map tasks=24590
		Total megabyte-seconds taken by all map tasks=37770240
	Map-Reduce Framework
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
2015-05-20T18:34:01,143 ERROR [task-runner-0] io.druid.indexer.DetermineHashedPartitionsJob - Job failed: job_1431709191676_6314
2015-05-20T18:34:01,143 INFO [task-runner-0] io.druid.indexer.JobHelper - Deleting path[/tmp/druid-indexing/wikipedia_hadoop/2015-05-20T183226.576Z]
2015-05-20T18:34:01,201 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_wikipedia_hadoop_2015-05-20T18:32:26.618Z, type=index_hadoop, dataSource=wikipedia_hadoop}]
java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_65]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_65]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_65]
	at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_65]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:228) ~[druid-indexing-service-0.7.1.1.jar:0.7.1.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:235) [druid-indexing-service-0.7.1.1.jar:0.7.1.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:214) [druid-indexing-service-0.7.1.1.jar:0.7.1.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_65]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_65]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_65]
	at java.lang.Thread.run(Thread.java:745) [?:1.7.0_65]
Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.DetermineHashedPartitionsJob] failed!
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:155) ~[druid-indexing-hadoop-0.7.1.1.jar:0.7.1.1]
	at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:84) ~[druid-indexing-hadoop-0.7.1.1.jar:0.7.1.1]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:334) ~[druid-indexing-service-0.7.1.1.jar:0.7.1.1]
	... 11 more
2015-05-20T18:34:01,218 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_wikipedia_hadoop_2015-05-20T18:32:26.618Z",
  "status" : "FAILED",
  "duration" : 69640
}
2015-05-20T18:34:01,221 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.coordination.AbstractDataSegmentAnnouncer.stop()] on object[io.druid.server.coordination.BatchDataSegmentAnnouncer@a31ac08].
2015-05-20T18:34:01,221 INFO [main] io.druid.server.coordination.AbstractDataSegmentAnnouncer - Stopping class io.druid.server.coordination.BatchDataSegmentAnnouncer with config[io.druid.server.initialization.ZkPathsConfig@58d3f4be]
2015-05-20T18:34:01,221 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/ip-10-3-12-100:8100]
2015-05-20T18:34:01,248 INFO [ServerInventoryView-0] io.druid.client.BatchServerInventoryView - Server Disappeared[DruidServerMetadata{name='ip-10-3-12-100:8100', host='ip-10-3-12-100:8100', maxSize=0, tier='_default_tier', type='indexer-executor', priority='0'}]
2015-05-20T18:34:01,250 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.indexing.worker.executor.ExecutorLifecycle.stop()] on object[io.druid.indexing.worker.executor.ExecutorLifecycle@50dfdc70].
2015-05-20T18:34:01,262 INFO [main] org.eclipse.jetty.server.ServerConnector - Stopped ServerConnector@aa11928{HTTP/1.1}{[0.0.0.0:8100](http://0.0.0.0:8100)}
2015-05-20T18:34:01,264 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@66fda52a{/,null,UNAVAILABLE}
2015-05-20T18:34:01,266 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.indexing.overlord.ThreadPoolTaskRunner.stop()] on object[io.druid.indexing.overlord.ThreadPoolTaskRunner@5412d89b].
2015-05-20T18:34:01,267 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.client.ServerInventoryView.stop() throws java.io.IOException] on object[io.druid.client.BatchServerInventoryView@6a08de46].
2015-05-20T18:34:01,267 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.announcement.Announcer.stop()] on object[io.druid.curator.announcement.Announcer@432cf6ba].
2015-05-20T18:34:01,268 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[io.druid.curator.discovery.ServerDiscoverySelector@75c5cfb].
2015-05-20T18:34:01,269 INFO [main] io.druid.curator.CuratorModule - Stopping Curator
2015-05-20T18:34:01,283 INFO [main] org.apache.zookeeper.ZooKeeper - Session: 0x14d702834f80031 closed
2015-05-20T18:34:01,283 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.http.client.NettyHttpClient.stop()] on object[com.metamx.http.client.NettyHttpClient@6cafae75].
2015-05-20T18:34:01,284 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down
2015-05-20T18:34:01,315 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.metrics.MonitorScheduler.stop()] on object[com.metamx.metrics.MonitorScheduler@1c4e421f].
2015-05-20T18:34:01,315 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.emitter.service.ServiceEmitter.close() throws java.io.IOException] on object[com.metamx.emitter.service.ServiceEmitter@18fb5419].



And Here is the map reduce job failed log:

2015-05-20 18:33:35,145 ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.VerifyError: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;
	at java.lang.ClassLoader.defineClass1(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
	at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
	at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
	at com.fasterxml.jackson.datatype.guava.GuavaModule.setupModule(GuavaModule.java:22)
	at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:537)
	at io.druid.jackson.DefaultObjectMapper.<init>(DefaultObjectMapper.java:45)
	at io.druid.jackson.DefaultObjectMapper.<init>(DefaultObjectMapper.java:33)
	at io.druid.jackson.JacksonModule.jsonMapper(JacksonModule.java:44)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at com.google.inject.internal.ProviderMethod.get(ProviderMethod.java:104)
	at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
	at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
	at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
	at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
	at com.google.inject.Scopes$1$1.get(Scopes.java:65)
	at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
	at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:54)
	at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
	at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
	at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:84)
	at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
	at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
	at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1031)
	at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
	at com.google.inject.Scopes$1$1.get(Scopes.java:65)
	at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:40)
	at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
	at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
	at com.google.inject.internal.SingleMethodInjector.inject(SingleMethodInjector.java:83)
	at com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:110)
	at com.google.inject.internal.MembersInjectorImpl$1.call(MembersInjectorImpl.java:75)
	at com.google.inject.internal.MembersInjectorImpl$1.call(MembersInjectorImpl.java:73)
	at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
	at com.google.inject.internal.MembersInjectorImpl.injectAndNotify(MembersInjectorImpl.java:73)
	at com.google.inject.internal.Initializer$InjectableReference.get(Initializer.java:147)
	at com.google.inject.internal.Initializer.injectAll(Initializer.java:92)
	at com.google.inject.internal.InternalInjectorCreator.injectDynamically(InternalInjectorCreator.java:173)
	at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:109)
	at com.google.inject.Guice.createInjector(Guice.java:95)
	at com.google.inject.Guice.createInjector(Guice.java:72)
	at io.druid.guice.GuiceInjectors.makeStartupInjector(GuiceInjectors.java:57)
	at io.druid.indexer.HadoopDruidIndexerConfig.<clinit>(HadoopDruidIndexerConfig.java:95)
	at io.druid.indexer.HadoopDruidIndexerMapper.setup(HadoopDruidIndexerMapper.java:56)
	at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.setup(DetermineHashedPartitionsJob.java:220)
	at io.druid.indexer.DetermineHashedPartitionsJob$DetermineCardinalityMapper.run(DetermineHashedPartitionsJob.java:277)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:773)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) 

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)

druid expects v 2.4.4 of jackson libraries (in this case jackson-databind). It seems hadoop is bringing some older version, can you check? what version/distribution of hadoop are you using?

– Himanshu

I have change the jackson version to 2.3.2 and recompile the druid package, it can work now.
Thanks very much

在 2015年5月21日星期四 UTC+8上午8:45:32,Himanshu Gupta写道: