Batch Data ingestion using parallel index task taking a long time

Hello - I am evaluating DRUID for some of our needs. I did a clustered setup. It is taking a long time for ingestion of data. I am doing Native batch ingestion using parallel index task. It is taking 15 mins for ingesting a 2MB file. ’

I am starting 2 peons with 2G in the middle manager. Peon buffer size ~500MB

I did not find anything unusual in the logs. Can someone help me find what might be causing this issue.

Thanks!

Hi,

If you mean the memory configuration for Peon by Peon buffer size, I think it should be larger. 2G would be probably fine.

MiddleManagers are fine to have a little memory like 64MB.

Also, would you check what takes long time in the task logs?

Jihoon

Thanks Jihoon. The property druid.indexer.fork.property.druid.processing.buffer.sizeBytes is set to 500Mb. Memory for Peon is 2G. I did not find anything unusual in the logs. I will attach logs shortly

I analyzed the logs and find that 95% of the time is taken in dim conversions and dim inverted. I also see dim conversions and dim inverted happening twice for the same dimensions.

Would someone please help me understand why dim conversions are happening twice and taking such a long time. Thanks!