Druid cache in historical node

If we specify a cache location, historical node always pull data from there.

An issue I found is that if the cached file is gone, the historical node won’t go to deepstorage for the content, it acts as if the segment is empty. This might bring unwanted side effects if the cached disk is corrupted, we are not sure if served data represent the complete data universe. ideally the historical node, when loading data from cache, should compare the data size in cache location and deep storage, or a tool is needed to alter zookeeper entry to signal the need to refresh cache.

If I’m reading this correctly you have some sort of FS problems during runtime (example: a mount point gets swapped out or something) and hence the local cache does not match up with the cache surveyed at startup time. Does that sound like what you’re describing?

When you restart the historical node, do the segments get properly downloaded?

When a historical node does NOT find any cached segments upon reboot, it will wait until the coordinator tells it to load up something before it will return any meaningful results. Once the coordinator tells the blank historical node to load up segments, it will load them and announce their readyness (and return results on the segment).

When you are querying and get empty results, are you querying the historical node directly, or are you going through a broker?

Also, do you have replication of your segments in case one node goes down?

Also, are there any errors in the historical logs when it tries to service the query for a segment missing the file?


Charles Allen

Charles, you are right, once I restarted the historical node, the missing segment was pulled over. Before I ran identical query one in broker and one in historical, and noticed a difference.

When you say the cache, do you mean the actual segment location or the file that is created on the filesystem that indicates that a historical has downloaded a segment.

If you are taking about the file that is created when a historical downloads a segment, losing that file should not impact queries.

If you are talking about the actual segment file on the historical, losing disk should cause all sorts of exceptions to occur, and Druid should eventually recover. Filesystem corruption is something we’ve seen fairly often.

Are you able to produce any problems when querying the broker with replication turned on in your cluster?

While it is true Druid can do a better job of reporting when queries may be incomplete in terms of segments, the use case you are describing should not be an issue.