Druid using local storage to store index files instead of hdfs

druid.extensions.coordinates=[“io.druid.extensions:druid-examples”,“io.druid.extensions:druid-kafka-eight”]
druid.extensions.localRepository=extensions-repo

Zookeeper

druid.zk.service.host=localhost

Metadata Storage (use something like mysql in production by uncommenting properties below)

by default druid will use derby

druid.metadata.storage.type=mysql

druid.metadata.storage.connector.connectURI=jdbc:mysql://localhost:3306/druid

druid.metadata.storage.connector.user=druid

druid.metadata.storage.connector.password=diurd

Deep storage (local filesystem for examples - don’t use this in production)

druid.storage.type=hdfs
druid.storage.storageDirectory=/druidStorage

Query Cache (we use a simple 10mb heap-based local cache on the broker)

druid.cache.type=local
druid.cache.sizeInBytes=10000000

Indexing service discovery

druid.selectors.indexing.serviceName=overlord

Coordinator Service Discovery

druid.selectors.coordinator.serviceName=coordinator

Monitoring (disabled for examples, if you enable SysMonitor, make sure to include sigar jar in your cp)

druid.monitoring.monitors=[“com.metamx.metrics.SysMonitor”,“com.metamx.metrics.JvmMonitor”]

Metrics logging (disabled for examples - change this to logging or http in production)

druid.emitter=noop

``

{
“type”: “index_hadoop”,
“spec”: {
“dataSchema”: {
“dataSource”: “raw_events”,
“parser”: {
“type”: “string”,
“parseSpec”: {
“format”: “json”,
“timestampSpec”: {
“column”: “eventdate”,
“format”: “auto”
},
“dimensionsSpec”: {
“dimensions”: [
“devicestartdate”,
“device”,
“ipaddress”,
“lattitude”,
“longitude”
],
“dimensionExclusions”: ,
“spatialDimensions”:
}
}
},
“metricsSpec”: [
{
“type”: “count”,
“name”: “count”
},
{
“type”: “cardinality”,
“name”: “uniques”,
“fieldNames”: [ “device” ]
}
],
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “DAY”,
“queryGranularity”: “MINUTE”,
“intervals”: [“2016-03-31/2016-04-06”]
}
},
“ioConfig”: {
“type”: “hadoop”,
“inputSpec”: {
“type”: “static”,
“paths”: “s3://bucket/part-00000.gz”
}
},
“tuningConfig”: {
“type”: “hadoop”,
“partitionsSpec”: {
“targetPartitionSize”: 5000000
},
“jobProperties” : {
“fs.s3.awsAccessKeyId” : “AAA”,
“fs.s3.awsSecretAccessKey” : “BBB”,
“fs.s3.impl” : “org.apache.hadoop.fs.s3native.NativeS3FileSystem”,
“fs.s3n.awsAccessKeyId” : “AAA”,
“fs.s3n.awsSecretAccessKey” : “BBB”,
“fs.s3n.impl” : “org.apache.hadoop.fs.s3native.NativeS3FileSystem”,
“io.compression.codecs” : “org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec”
}
}
}
}

``

java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath config/_common:config/overlord:lib/*:/home/hadoop/conf/ -Dhadoop.mapreduce.user.classpath.first=true io.druid.cli.Main server overlord

``

Hey guys,
I
have been trying Druid to run on a single node EMR-AWS cluster. The indexing task runs fine, executes map-reduce job on yarn, writes temp files under hdfs://tmp but instead tries to store then under file://druidStorage instead of hdfs://``druidStorage. These are config I am using:

  1. config/_common/common.runtime.properties

2.Here’s the .json file for indexing

Here’s how I am running indexer:

Any clues what might be going on?
Thanks in advance.

Click here to Reply

Auto Generated Inline Image 1.png

me (Rushil Gupta change)
Auto Generated Inline Image 2.png

6:20 PM (less than a minute ago)

Sorry about posting it in wrong channel. I’ll move this to Druid User group.

Hey Rushil,

Could you double-check that you have your hadoop config XMLs on Druid’s classpath, and that they have a “fs.defaultFS” set that is somewhere on HDFS? (this should be in core-site.xml)

Auto Generated Inline Image 1.png

Auto Generated Inline Image 2.png

Hi Gian,
Thanks for the quick reply.
Here’s the classpath that I am using to startup overlord and historical.

-classpath config/_common:config/historical:lib/*:/home/hadoop/conf/

I had the fs.default.name set in core-site.xml. I have now also added the following property as per your suggestion
fs.defaultFShdfs://172.31.8.38:9000

``

I restarted all services and resubmitted job but it is still looking to store data under file:/druidStorage

Also, if I specify
druid.storage.storageDirectory=hdfs://172.31.8.38:9000/druidStorage

``

in common.runtime.properties instead of simply "/druidStorage",
the file gets written to:

file:/home/hadoop/druid-0.8.3/hdfs:/172.31.8.38:9000/druidStorage/raw_site_events/raw_site_events/2016-03-31T00:00:00.000Z_2016-04-01T00:00:00.000Z/2016-04-13T05:16:20.894Z/0/index.zip.0

Hi Rushil, you need to include the hdfs extension. You can try Druid 0.9.0, which bundles the HDFS extension by default.