I’m using the standard index task to. Index quite a large amount of data.
My current setup is:
1 Middle Manager with: MAX_DIRECT_MEMORY: 3g
with up to 4 Peons: With MAX_HEAP 1g, MIN_HEAP 256m, MAX_DIRECT_MEMORY: 3g.
I’m inserting data 1 day at a time. Processing one day of data takes about 30 seconds on a Peon. I have data going back to ~25 years. Once it’s all into Druid it’s showing as ~ 2.5GB.
Back of the envelope it would take ~3 days to insert the data into Druid. (This is consistent with my testing, it takes ages!).
Is this a normal speed for the index task to operate at? Perhaps it’s not configured correctly? Would switching to hadoop_index likely improve things? What speed increase does hadoop bring over indexing.
Any advice / suggestions welcome!