Storing segments over HDFS

I am trying to ingest data into Druid and using HDFS as deep storage. I have 5 nodes in the cluster and my hdfs storage directory is “/data” on every node ( so there are directories like ‘/data/dfs’ , /data/dfs/nn’, ‘/data/dfs/dn’ on different nodes).
So,

  1. What should be the value of the property druid.storage.storageDirectory ?

  2. Does Druid automatically understand that HDFS is spread over 5 nodes and also the namenodes and datanodes.

  3. How does Druid interact with the HDFS?

  4. Where is the data stored by the Realtime and Historical nodes?

Hi Saksham,

This may be help to you. https://groups.google.com/forum/#!topic/druid-user/ZlONEHlJs6g

You’ll need to do 3 things to use HDFS for deep storage.

  1. Include the HDFS extension in your list of extensions.

  2. Set the proper configs to HDFS

  3. Include relevant hadoop configuration files in the classpath of the nodes you are using.

在 2015年9月1日星期二 UTC+8下午3:35:30,Saksham Garg写道:

Please find answers inline:

I am trying to ingest data into Druid and using HDFS as deep storage. I have 5 nodes in the cluster and my hdfs storage directory is “/data” on every node ( so there are directories like ‘/data/dfs’ , /data/dfs/nn’, ‘/data/dfs/dn’ on different nodes).
So,

  1. What should be the value of the property druid.storage.storageDirectory ?

Its value should be fully qualified HDFS path i.e hdfs://:/

  1. Does Druid automatically understand that HDFS is spread over 5 nodes and also the namenodes and datanodes.

Druid interacts with HDFS via the HDFS client,hence we need to add all the hadoop dependencies in that classpath of Druid

  1. How does Druid interact with the HDFS?
  1. Where is the data stored by the Realtime and Historical nodes?

Realtime data is keept in memory for the duration of window period and later it is handed over to the hitorical node and persisted on the deep storage,in this case at the specified HDFS location.