rewrite druid index without local file system mmap and temp file

we need to run batch index job in our own self MR system which , for security reason, does not allow to write data into local disk ,but gives a distributed filesystem api to write data. so the question now is to rewrite the index logic of IndexMerger 、IndexIO 、and smooth file related codes. Also druid depends on the mmap tech, most of the smooth index reading code was written ByteBuffer oriented.but our distributed file system only gives stream related read interface, no mmap . so any advice to rewrite the druid index code ? what lucene does to support different storage is to give a Directory interface ,so , from my own opinion, druid maybe gives a similar abstract storage interface to be more extensible .

any commitors can give an advice ? It seems Druid depends on mmap seriously,I find it so hard to rewrite the indexing logic without mmap. loading all bytes into memory will not be acceptable.

在 2016年11月21日星期一 UTC+8下午1:35:06,weijie tong写道:

It would be some work, I think, to rewrite the indexing code to avoid using mmapped files at all. Would a solution like tmpfs work for you?

what do you mean by tmpfs ? exactly, local temp file system was also restricted by our deploy env

在 2016年11月23日星期三 UTC+8上午7:03:15,Gian Merlino写道:

I want to do the refactoring job to let druid indexing module not directly depend on mmap ,though mmap performance is attractive , but it restrict druid storage part to be only on local disk. I will refer to the Lucene Directory interface .so now I will try to submit a feature issue ,if acceptable ,then submit PRs step by step, maybe the basic storage interface to abstract the ByteBuffer ,then replace the code like ColumnPartSerde to use the new interface ,not java.nio.ByteBuffer, and a mmap implementation of the new storage interface.

在 2016年11月21日星期一 UTC+8下午1:35:06,weijie tong写道:

Just an FYI, this discussion continues in