Smoosh files on historical nodes

Hi Group,

Does smoosh file on the historical nodes contains data? if yes is it encrypted?

The 00000.smoosh file is the one with data in it.

The data is not encrypted. The data is compressed and dictionary encoded.

Is there any way to encrypt those files with much changes?

Depends on what kind of encryption you want. If you want to use OS level transparent file encryption stuff then you should be able to use whatever file system supports the features you need.

If you’re asking if the transport protocol between nodes supports https… then… sometimes?

If you want the data encrypted while its on deep storage, and that deep storage has a transparent passthrough, then it should be pretty straight forward to get an implementation going that uses that transparent passthrough, but would probably require a relatively simple code change.

If you want the data in the file contents encrypted so that Druid has to decrypt it at the app level as it reads it off disk, then no, there is no such mechanism currently in place and that would be a pretty involved code change. And if I’m honest, druid is not really architected to heavily use any of the security policies within the JVM, so the security of the jvm itself (for someone who has access to the node) should be considered minimal at best. As such, I’m not sure encrypting the files for decryption during a segment scan would amount to anything more than security theater unless there was a thorough overhaul of druid security features and testing.

Hopefully that answers your questions and let me know if you have any more.

Thanks for your answers.

When you say data will be encrypted while its on deep storage(in my case it is hdfs files) and then deep storage has transparent passthrough, does that mean there will be some mechanism on hadoop side which will writing files in encrypted format on hdfs and decrypt them before ingesting that data into druid?

Also i think i am kinda of interested in encrypting data on the druid files so last option that you mention, can you explain me that in some more detail(I mean were there will be some extension to the druid or druid code itself has to be changed)?. Also if i go with that option what should i be doing with the files already present in the druid, will those get automatically encrypted?

Hi Vishal,
For HDFS, you can try using transparent encryption supported by HDFS (https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html).

In this case the druid segment files will be stored in HDFS in encrypted form.

However, the segment files present in the historical node segmentCache directory locally will NOT be encrypted and you will have to look for setting up transparent encryption there also.

Thanks Nishant, is there any way i can have segementCache encrypted too?