partitionSpec

Hi guys,

so I just skipped the tests (as they fail for me) and 0.17 seems to run fine. Currently trying to adjust to the new index_parallel task.

“partitionsSpec”: {

“type”: “dynamic”,

“maxRowsPerSegment”: 419430400,

“maxTotalRows”: 2097152000

},

That’s pretty much the biggest numbers I can stick in, yet I still only get 3mb per segment. I can force a shard number via numShard, but I rather not.

Any idea how can push the segment size to 500-1000kb?

Thanks!

Hagen

Hi Hagen,

the partitionsSpec looks ok. But if you still see small segments, it may be because of the input file size. Currently, in the index parallel task each sub task processes one file. So if your input file is too small, the segment size cannot be large.

Jihoon

that’s unfortunate, probably should get a documentation note. But yes, I see how I can work around it.

Yeah it should be better in the future. Btw, if you use hashed or single_dim partitionsSpec, you can control the segment size without touching input files. Check out
https://druid.apache.org/docs/0.17.0/ingestion/native-batch.html#partitionsspec for more details.