What is the role of hdfs://tmp/druid-indexing/ directory, and is it safe to delete after ingestion?


hdfs://tmp/druid-indexing/data-source takes 3.1TB.

Druid manual says that is is a workingPath and “The working path to use for intermediate results (results between Hadoop jobs)”

I’m wondering what does “intermediate results” exactly mean (the output of Mapper?), and is it safe to delete after ingestion competed successfully.



It’s intermediate data for Druid Hadoop indexing jobs, and it’s safe to delete after the job is complete. However Druid should, by default, be deleting it automatically. So if you find that it’s not, it could mean something needs fixing.