Problem with Ingesting data from S3: java.lang.NullPointerException

Hello!

I am trying to ingest parquet files from a bucket in S3 which is roughly 200MB in size and I’m getting the following error:

How do you have your deep storage defined?

I had originally the following in my environment file:

druid_storage_type=S3
druid_storage_storageDirectory=/opt/data/segments
druid_indexer_logs_type=file
druid_indexer_logs_directory=/opt/data/indexing-logs

Going off the understanding that I was pulling data from Amazon S3 so I needed to change the druid_storage_type to S3.

I’ve just changed it back to:

druid_storage_type=local
druid_storage_storageDirectory=/opt/data/segments
druid_indexer_logs_type=file
druid_indexer_logs_directory=/opt/data/indexing-logs

And now the data is properly being ingested into druid.

Thanks for pointing me in the right direction with your comment!

it’s
druid_storage_type=s3

not

druid_storage_type=S3

Got me too.

Marc

Hi,

Do you know if Druid supports High Available hdfs as cold storage? I’m getting java.net.UnknownHostException: myCluster error when i use the HA service name as hdfs url.

druid_storage_storageDirectory=hdfs://myCluster/druid-storageIt works fine when I use the url of one of the active namenodes.I have hdfs-site and core-site xml files in the druid classpath

important lines from my hdfs-site.xml file are:

dfs.nameservicesmyCluster

dfs.namenode.rpc-address.myCluster.nn2name-node-2:8020
dfs.namenode.rpc-address.myCluster.nn1name-node-1:8020

dfs.ha.namenodes.myClusternn1,nn2

BR
Murat

For deep storage yes it does.
Let me dig up a few things that might be helpful.

From a previous environment in which I leveraged HDFS running on a Hadoop cluster for Deep Storage.

I captured the following files from the Hadoop cluster and copied it to the the _common directory of every Druid node:

core-site.xml
mapred-site.xml
hdfs-site.xml
yarn-site.xml

I also had this in my common.runtime.properties:

druid.storage.type=hdfs
druid.storage.storageDirectory=/druid/segments