Druid Historical Nodes and Deep Storage

So, I presented to my team about Druid and they have a lot of questions: some I couldn’t find the answer to.

Let’s imagine that we have a production cluster: Several historical nodes, a coordinator node, a real-time node, and a broker node.

I understand basic flow: real-time node ingests data, creates segments, push them to deep storage (HDFS) and publishes to

other nodes that segments are available.

First question is, “Does coordinator ensure that all segments in HDFS are loaded into historical nodes?”

If so, “What happens when you have more data in deep storage than you have space on Historical Nodes”? For instance,

let’s say we have 1 TB of data in HDFS and we have 5 historical nodes (each can hold 100 GB)? How does druid handle this:

Does it load as much as it can into Historical nodes?

My other question regards querying. I know that broker receives query, but how does it know which historical nodes

to send the query to. Does it communicate with the coordinator node to get this information or does it simply send blindly to

all historical nodes?

Does historical node ever load data from HDFS to service a query, or does it only load when coordinator tells it (when a new

segment is published to HDFS)?

Lastly, looking at Druid from a high level, I am still not understanding exactly why Deep Storage is needed. If the historical nodes

can be configured to replicate data (already an available system) then why can’t they just be used to hold all of the data?

Hey Austin, hope this helps:

  1. Yes, the coordinator does ensure that segments from HDFS are loaded on historicals. It decides which historical should load which segment and sends them commands to make that happen.

  2. If you have more data in deep storage than you can fit on your historical disks, then Druid will load as much as it can and then start logging warnings on the coordinator that there is not enough capacity to load everything.

  3. Historical nodes advertise which segments they’re serving in ZooKeeper, and the broker maintains a local cache of the segment -> historical mapping.

  4. Historicals only pull data from deep storage when instructed to do so by the coordinator. They won’t pull it dynamically in response to a query.

  5. Good question :). The simple answer is that Druid was designed to run in an environment where a persistent blob storage service exists, like HDFS, S3, or the analogous Google and Azure offerings. In that world it’s natural to leverage the blob storage service rather than trying to invent one’s own distributed file system. Using an external blob store also makes Druid much easier to run elastically- you can safely spin up or terminate historical nodes at any time without worrying about permanent data loss.

1 Like

To add onto why deep storage was first created. The first Druid deployments ran in AWS, where failures could occur that wipe out half your cluster. WIthout having data backed up elsewhere, you would lose data anytime there was an AWS outage.

One follow up question on this ? when we day it can scale to petabytes of data we need that much space on the historical nodes? this means if I have 1 petabyte of data to analyze, I will need atleast 1 petabyte(without replication) on HDFS(deepstorage) and 1 petabyte of total disk space on historical nodes?

Amey, reading http://static.druid.io/docs/druid.pdf might help

segments are on historicals, and raw data/segments are backed up to HDFS

Interesting thread. We’ve been having exactly the same questions.

Even with what has been explained here, there are two things I still don’t get:

1.) when I was trying to find out whether segments can be loaded from deep storage at query time (which according to the info from this thread is not possible), I found one statement in the Druid documentation that I interpreted as proof that such on-demand loading of segments is possible:


This page decribes the property druid.segmentCache.locations

Segments assigned to a Historical node are first stored on the local file system (in a disk cache) and then served by the Historical node. These locations define where that local cache resides.

Default value: none (no caching)

So, if it is possible and even the default to disable caching of segments on local disk AND if segments cannot be loaded from deep storage on demand at query time, then how would a historical be able to serve queries in the case that local disk caching is disabled?


I have the same question, more specifically:

One follow up question on this ? when we day it can scale to petabytes of data we need that much space on the historical nodes? this means if I have 1 petabyte of data to analyze, I will need atleast 1 petabyte(without replication) on HDFS(deepstorage) and 1 petabyte of total disk space on historical nodes?

When using Tranquility for real time indexing we came up with an issue. Index Tasks were always on “Running” state forever. After some debug, it turned out that on Coordinator logs were showing: Not enough [_default_tier] servers or node capacity to assign segment .

On Historicals we just put 30GB for segment Cache and maxSize:


druid.segmentCache.locations=[{“path”: “/var/druid/cache/historical”, “maxSize”: 30000000000}]

But HDFS is much larger than this (around 1TB). But it seems that index tasks are not making the handoff because Historicals don’t have capacity. So all data in HDFS should also be inside Historicals, right?

The equation could be something like:

HistoricalNodes * HistoricalCapacity * DruidReplication = HDFSCapacity * HDFSReplication


We also had similar question, when thinking about capacity planning :slight_smile:

Though Druid is a fantastic product, so nothing about that, and this could be inherent design decision to have to have local disk of historical nodes = HDFS/S3 space ( including replication)

But wanted to understand the implication of

(Historical nodes disk space < Deep storage storage requirement)



My 2 cents here.

Most of these analytical tools (say ELK) is not recommended to be used as Single Source of Truth,i.e. you are free to use them for your analytics /OLAP needs but not to trust them entirely for the safe long term preservation of data as you would trust a full fledged database.

Hence the philosophy of a dedicated, totally redundancy backup.

Guess, Druid also follows the same philosophy.


I had same question so real all comments and links.

Finally my understanding is,
Deep storage is not for data scalability. It is just for data backup or data recovery.

All segments need to be located on historical before query.

So, If you have 1TB segments to be queried, Your historical need to have more than 1TB disk.