Expect behavior of druid.coordinator.merge.on=true

I have set the config option druid.coordinator.merge.on=true, but small segments haven’t been merged,
but find the following logs every half-hour:

2015-08-12T22:53:54,650 INFO [Coordinator-Exec–0] io.druid.server.coordinator.DruidCoordinator - Issued merge requests for 0 segments

2015-08-12T23:23:54,665 INFO [Coordinator-Exec–0] io.druid.server.coordinator.DruidCoordinator - Issued merge requests for 0 segments

2015-08-12T23:53:54,680 INFO [Coordinator-Exec–0] io.druid.server.coordinator.DruidCoordinator - Issued merge requests for 0 segments

2015-08-13T00:23:54,695 INFO [Coordinator-Exec–0] io.druid.server.coordinator.DruidCoordinator - Issued merge requests for 0 segments

Did I missing somthing?

Hi,
the coordinator merges the segments if -

  1. they are not-sharded

  2. smaller then the configured mergeBytesLimit

make sure the segments you are expecting to get merged satisfy both these conditions.

Hi,
the coordinator merges the segments if -

  1. they are not-sharded

what do you mean by not sharded?

is this a sharded segment or not?

  1. smaller then the configured mergeBytesLimit

The default value for mergeBytesLimit is about 500MB+, and my segment is smaller than that, roughly 20MB or even smaller per segment

Hi,

It seems the segment you sent is ingested with HashedShardSpec with one shard.

the current merging code only works with NoneShardSpec. (have opened https://github.com/druid-io/druid/issues/1623 to handle cases where data is ingested with other shardSpec but have only 1 shard)

until this is resolved, you can use NoneShardSpec when ingesting your data and the coordinator should merge your segments.

"shardSpec": {"type": "none"}

More details on sharding are documented here -

http://druid.io/docs/latest/ingestion/realtime-ingestion.html#sharding

Thank you very much Nishant.

Currently, what can I do to merge the segments already produced with HashedShardSpec?

Does the Append task or Merge task (http://druid.io/docs/latest/misc/tasks.html) make sense?

BTW, when the task is running, is the segment data involved available for query?

One more questions, I’am using Tranquility to send event to druid, and how can I set:

  1. the reject policy and

  2. the shard spec

for the ingest task, I haven’t found the api for these settings.

Thanks.

Hi Zhihui,

The idea behind merging is that small segments can be merged into a larger segment as Druid works best when segments are roughly 250MB to 900MB. If you are additionally partitioning your data for an interval, it means that you have too much data for that interval that it needs to be additionally sharded to generate segments of an ideal size. Tranquility’s API should have a tuning component (https://github.com/druid-io/tranquility) where these parameters can be set.

Hi Fangjin,

Thanks for your kindly response, but I didn’t find the ‘tunning component’ that can set the parameters.

I find https://github.com/druid-io/tranquility/blob/master/src/main/scala/com/metamx/tranquility/beam/ClusteredBeamTuning.scala

and https://github.com/druid-io/tranquility/blob/master/src/main/scala/com/metamx/tranquility/druid/DruidTuning.scala,

but didn’t find the API for setting the reject policy and shard spec.

Hmmm, I will ask Gian to take a look.

The same for me. I can’t specify reject Policy for tranquility. Can someone help?

неділя, 16 серпня 2015 р. 18:10:35 UTC+3 користувач Fangjin Yang написав:

You can’t change the rejection policy or shard spec with tranquility – it always uses server time rejection and linear shard specs.

Is there any workaround of it? or tranquility has any further plan to support config shard spec?
merge on is a very useful feature.

Thanks very much

在 2016年4月9日星期六 UTC+8上午6:52:11,Gian Merlino写道:

I think odds are the Druid community will address this not by adjusting Tranquility’s shard specs, but by making it possible to do automated merging of more kinds of shard specs.