Druid ingestion task creating thousands of small segments for a single file

I have a small parquet file of size 125mb
the file has around 1 million rows
when I ingest the file it takes a long time and I see some 100000 segments being created
what can I look for to compact the segment (dont want to run a compact process afte ringestion as that also takes a long time) and speed up the ingestion

hey @ranjan,

3 things,

  • reduce maxconcurrentSubtasks (single file does not need >1 task)

  • look at the segment granularity (like HOUR → DAY etc., can be increased based on data spread)

  • ingest as dynamic partition (and change partition strategy during compaction)


1 Like