Deep storage local in multiple nodes

Hi all,

I’m about to test Druid in a two node cluster. This could be the distribution of the services:

  1. Historical, middleManagar, Tranquility, Kafka
  2. ZK, Overlord, Coordinator, Broker
    The question would be, if I choose druid.storage.type=local, will the segments be stored in only the first node (in which the Historial service will run) or in both? What if I had two Historical nodes?

Thanks,

Hi Fede, you can’t use ‘local’ deep storage for a distributed cluster. To run a distributed cluster, the following docs might help:
http://druid.io/docs/0.9.1.1/tutorials/cluster.html

Alternatively, if you were looking to package Druid in an easy way for clustering, Imply has already done this and you might find this helpful:

http://imply.io/docs/latest/cluster

The Druid clustering tutorial is based on the Imply docs so you may find similarities :slight_smile:

Thank you FY!

We’ll try GCS as a deep storage.

Now I’d like to ask you, what do you think of the distribution of the services along the nodes? Is it correct? If I run Historical on both nodes, will its performance improve? Would that be logic?

Thanks,

Talking about that, I cannot set Google Cloud Storage as deep storage.

I set

druid.storage.type=hdfs

druid.storage.storageDirectory=gs://druid-deep-storage/test

And put the gs-connector.jar inside /extensions/druid-hdfs-extension and /lib.

Do I need to configure something else? It’s not a hadoop cluster, we just loaded an GCS instance.

Thanks in advance.

Hi Fede,

What kind of error do you see when trying to use GCS?

I’ve never tried using GCS with Druid, but maybe you have to set some extra GCS-specific configuration in the jobProperties section under tuningConfig in your indexing task, e.g. using S3:

"tuningConfig" : {
  "type" : "hadoop",
  "partitionsSpec" : {
    "type" : "hashed",
    "targetPartitionSize" : 5000000
  },
  "jobProperties" : {
    "fs.s3.awsAccessKeyId" : "access",
    "fs.s3.awsSecretAccessKey" : "secret",
    "fs.s3.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",
    "fs.s3n.awsAccessKeyId" : "access",
    "fs.s3n.awsSecretAccessKey" : "secret",
    "fs.s3n.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",
    "io.compression.codecs" : "org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec"
  }
}


Looks like GCS may have similar required config:
[https://cloud.google.com/hadoop/google-cloud-storage-connector#configuringhadoop](https://cloud.google.com/hadoop/google-cloud-storage-connector#configuringhadoop)


- Jon