Problems ingesting large data volumes with Hadoop Indexer


I am trying to load 100 gb of data, through hadoop indexer, but the task is taking lots of time. Besides this, the task consumes a lot of memory on disk with the temporary files.

Can you please give me some help optimizing the ingestion?

I set this properties:

“maxRowsInMemory” : “200000”,


“partitionsSpec”: {

“type”: “hashed”,

“numShards”: 3


Should I increase “numBackgroundPersistThreads”?

How can I avoid spending so much space with temporary files?

I’m doing some benchmarks, so this is really important.

Thanks in advance.

Best Regards,

José Correia