Rackspace cloudfiles extension for deep storage not working

Hello, I am trying to do an ingestion task with deep storage configured as cloudfiles, the ingestion task is failing with segmentOutputPath null pointer exception.

The ingestion task I am running is the default example of wikiticker-index.json. I have not made any changes in the ingestion task though.

Any ideas why its failing, as per my understanding the value of segmentOutputPath should be calculated internally depending on deep storage configuration type.

I am using the Implydata-1.3.0 package and running druid in local mode, with deep storage to cloud files.

Cloudfiles is a community extension and not supported by Imply or the Druid committers. You’ll have the most luck with finding the original author and getting support there. You can post the full stack trace of your error and we might be able to help/

Hi, attached the failed task log.

failed_task_log.txt (160 KB)

Remove segmentOutputPath from your indexing spec.

Hi Fang,

I am not specifying segmentOutputPath in the indexing spec. Attached the ingestion task spec for your reference.

Is there anything needs to be specified in the jobProperties ?

wikiticker-index.json (2.02 KB)

Hey Manish,

Seems like your deep storage is not configured.

You don’t set any segmentOutputPath in the indexing spec, and in the logs, when the task is printed the segmentOutputPath is null and should be replace by your deep storage path.

Try to reconfigure it, and just read the task logs, segmentOutputPath can’t be null.

Hope it helps, let me know.

Ben

Hi Benjamin,

There is nothing in the log, attached the task log for your reference,

The log says - cloudfiles deepStorage configured. See line 191 in log.

log.txt (159 KB)

See in the log “2016-07-28T12:37:53,631 INFO [main] io.druid.indexing.worker.executor.ExecutorLifecycle - Running with task:”
And the task submitted is the following one.

In this task definition, the task submitted your segmentOutputPath is set to null:

"ioConfig" : {
    "type" : "hadoop",
    "inputSpec" : {
        "type" : "static",
        "paths" : "quickstart/wikiticker-2016-06-27-sampled.json"
    },
    "metadataUpdateSpec" : null,
    "segmentOutputPath" : null
},

When the deep storage is defined correctly, your segmentOutputPath is not null.

I understand that Benjamin, but logs doesn’t indicate any clue on what might be going wrong in the deepstorage definition

You should investigate that way, see the overlord logs when it starts maybe ?

No abnormalities in overlord and coordinator logs . Attached for your reference.

overlord.log (69.4 KB)

coordinator.log (3.52 MB)

Hi Manish, this is a bug with the Cloudfiles extension and it actually doesn’t work with Hadoop indexing.

The problem is here if you want to fix it:

That needs to return an actual valid directory

I should also add that Cloudfiles is not an officially supported module by the Druid committers.

Hi Fang, thanks for confirming the bug. I will see if i can open a pull request for the same.

I am aware of that, its not a officially supported module, can it become part of officially supported module ?

Hi,

I just noticed the same thing in AzureDataSegmentPusher

Is that I am missing anything there ?

Hi,

I am trying out the cloudfiles fix, getting below exception any ideas on this.

Attached the job spec. Also note I can see the files getting created in the cloudfiles container (index.zip.0)

2016-08-02T11:12:17,657 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce
2016-08-02T11:14:16,498 ERROR [pool-23-thread-1] io.druid.indexer.JobHelper - Exception in retry loop
java.lang.NullPointerException
	at org.apache.hadoop.fs.swift.snative.SwiftNativeOutputStream.flush(SwiftNativeOutputStream.java:102) ~[hadoop-openstack-2.3.0.jar:?]
	at java.io.FilterOutputStream.flush(FilterOutputStream.java:140) ~[?:1.8.0_73]
	at java.io.DataOutputStream.flush(DataOutputStream.java:123) ~[?:1.8.0_73]
	at io.druid.indexer.JobHelper$4.push(JobHelper.java:375) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_73]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_73]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_73]
	at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_73]
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186) [hadoop-common-2.3.0.jar:?]
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) [hadoop-common-2.3.0.jar:?]
	at com.sun.proxy.$Proxy229.push(Unknown Source) [?:?]
	at io.druid.indexer.JobHelper.serializeOutIndex(JobHelper.java:386) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:703) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:469) [druid-indexing-hadoop-0.9.1.1.jar:0.9.1.1]
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) [hadoop-mapreduce-client-core-2.3.0.jar:?]
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) [hadoop-mapreduce-client-core-2.3.0.jar:?]
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) [hadoop-mapreduce-client-core-2.3.0.jar:?]
	at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) [hadoop-mapreduce-client-common-2.3.0.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_73]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_73]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_73]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_73]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_73]
2016-08-02T11:14:29,917 INFO [pool-23-thread-1] io.druid.indexer.JobHelper - Creating new ZipEntry[00000.smoosh]
2016-08-02T11:14:30,422 INFO [pool-23-thread-1] io.druid.indexer.JobHelper - Creating new ZipEntry[meta.smoosh]
2016-08-02T11:14:30,431 INFO [pool-23-thread-1] io.druid.indexer.JobHelper - Creating new ZipEntry[version.bin]
2016-08-02T11:14:35,769 INFO [communication thread] org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce

job.txt (3.27 KB)

Hi Manish, look at the stack trace. What is the var that is null? I’m not sure which version of Druid you are on, but maybe you can include the Druid line it is complaining about in the stack trace.