Loading Data into Segments taking too long

Hi ,
I’m trying to load a 100 row data into druid. But the time taken for loading data into segments is too long. Approximately 12-18 s. Is there any way to solve this ?

Hi mebinjoy,

There is certain overhead to ingestion and even a single row of data can sometimes take about 6-8s to complete.

What is the source of your data? Network latency may play a role here.

Have you checked the task log to see if there is anything that stands out?

Hi , I tried to ingest the example data . wikipedia and it took over 20 s. My configuraton is a 16gb 4 core cpu based aws ec2 instance.

Hi @mebinjoy, I think that what Vijeth is trying to convey is that the batch ingestion process has multiple steps:

  • Overlord parses the ingestion spec and hands out tasks to middle managers.
  • Middle managers execute the tasks and locally create the segment files from the input ( this portion is likely very fast for 100 rows).
  • Middle managers publish the segment file to Deep Storage.
  • Coordinator picks up new published segments asynchronously and assigns them to one or more of the available Historicals
  • Historicals load the assigned segments into their local segment cache and only then does the segment become available for query.

Given this flow, batch ingestion in production is usually done with much larger datasets and therefore achieve much greater throughput. With such a small dataset, most of the time is overhead.

With streaming ingestion, data is made available for query as soon as it is ingested by a middle manager task. The middle manager responds to queries on that data as the segments are being built.

1 Like