Batch ingestion 100 gb file

We are using Hadoop based indexing for a csv file of size 150gb for 10 years of data.
We are using local storage only. I know this is going to take a lot of time for indexing.
Is there any way to make indexing fast.
I'm using using 5 middle managers and we each can run 8 peons.

Can anyone tell me how can I make indexing fast.


Hi dilip.

If you load that file in one part there are only one peon working on it.

If you have 8 peons per middle manager, cut that file in 40 fragments and send 8 frag to each middle manager. It will be faster