Scheduled Compaction is not trigger any Task

Hi,
I’m running Druid 0.14 with Kafka indexing service.
I’m trying to run schedule compaction, and I would like the segments will be around 500MB as recommended in the documentation.
My middle manager has 2 workers and the ratio is 0.5, means 1 compaction job at a time.
This is my configuration, and I have an interval time which has the following segments size,
I expect the first segment will be merged with the second one.
and the tasks just not started…
Any ideas?

{

“compactionConfigs”: [

{

“dataSource”: “events”,

“keepSegmentGranularity”: true,

“taskPriority”: 25,

“inputSegmentSizeBytes”: 536870912,

“targetCompactionSizeBytes”: 419430400,

“maxRowsPerSegment”: null,

“maxNumSegmentsToCompact”: 150,

“skipOffsetFromLatest”: “P3D”,

“tuningConfig”: null,

“taskContext”: null

}

],

“compactionTaskSlotRatio”: 0.5,

“maxCompactionTaskSlots”: 1

}

``

default config allows 10% of tasks to be used by compaction job. In your case, try submitting job manually and monitor the log. Hope this helps.

Hi Alon,

Currently in auto compaction, the compaction happens atomically per time chunk, which means, all segments in the same time chunk are compacted together or not.

From the screenshot you shared, this time chunk looks have 5 segments and their total size is greater than the configured “inputSegmentSizeBytes” which is 512MB.

You need to raise this to more than the total size of segments in each time chunk (probably 1.5 - 2GB?).

Jihoon

Hi,
According to your claim, the coordinator will compact the same interval over and over ( I already had this issue ),
because the sum of the segments will be always lower or equal than 2.5G.

According to the documentation here and the example:

look at “bar” datasource (, it has 3 segments and only the sum of two are 20MB so it will merge them and not the third one
https://druid.apache.org/docs/0.15.0-incubating/design/coordinator.html#segment-search-policy

what do you say? any idea?

1 Like

when I compact manually it works, but its on specific time frame, and i want an automatic process…

1 Like

Hi,
According to your claim, the coordinator will compact the same interval over and over ( I already had this issue ),
because the sum of the segments will be always lower or equal than 2.5G.

According to the documentation here and the example:

look at “bar” datasource (, it has 3 segments and only the sum of two are 20MB so it will merge them and not the third one
https://druid.apache.org/docs/0.15.0-incubating/design/coordinator.html#segment-search-policy

image.png

what do you say? any idea?

I think it’s a documentation error. It should have been targetCompactionSizeBytes rather than inputSegmentSizeBytes
in that documentation link you referred -
https://druid.apache.org/docs/0.15.0-incubating/design/coordinator.html#segment-search-policy
. Would await Jihoon’s reply though.

Same interval would be compacted again and again, until the optimal size is reached (targetCompactionSizeBytes). If this is not the case, then may be its an issue.

As I understand,
the segment scan algorithm, look at one interval and it’s segments. if it found two segments in the same interval, which their sum is less than inputSegmentSizeBytes
it will merge them.
In the next iteration, if in the same interval there are candidates for merging, merge !
if not, look for the next interval which has segments we can merge(lower than inputSegmentSizeBytes )

make sense?

Yes. It makes sense to me.

Note: I re-read the documentation and it is correct.

So is it a bug???
Who can handle it?

Hi Alon

I meant, there is no bug in Documentation.

But, yes, druid keeps checking for compaction possibilities when inputSegmentBytes condition is met and sometimes druid might land up doing compaction again.

This is expected to be fixed next edition of auto compaction.

Thanks & Rgds

Venkat

But in my case the conditions met the tasks are not executed.
You can take a look in my first post and my scenario, i expect druid will compact my two first segments.

What do you think?

Alon

Could you please let me know the number of segments for the given interval and the total size of the segments?

Is the total size of all the segments in the interval < inputSegmentSizeBytes? This is an expected condition.

Thanks & Rgds

Venkat

Alon

Could you please let me know the number of segments for the given interval and the total size of the segments?

Is the total size of all the segments in the interval < inputSegmentSizeBytes? This is an expected condition.

Thanks & Rgds

Venkat

From:druid-user@googlegroups.comdruid-user@googlegroups.com on behalf of Alon Shoshani alon@oribi.io
Reply-To:druid-user@googlegroups.comdruid-user@googlegroups.com
Date: Wednesday, July 31, 2019 at 2:31 PM
To:druid-user@googlegroups.comdruid-user@googlegroups.com
Subject: Re: [druid-user] Re: Scheduled Compaction is not trigger any Task

But in my case the conditions met the tasks are not executed.

You can take a look in my first post and my scenario, i expect druid will compact my two first segments.

What do you think?

Hi Alon

I meant, there is no bug in Documentation.

But, yes, druid keeps checking for compaction possibilities when inputSegmentBytes condition is met and sometimes druid might land up doing compaction again.

This is expected to be fixed next edition of auto compaction.

Thanks & Rgds

Venkat

From:druid-user@googlegroups.com” <druid-user@googlegroups.com > on behalf of
Alon Shoshani alon@oribi.io
Reply-To:druid-user@googlegroups.comdruid-user@googlegroups.com
Date: Tuesday, July 30, 2019 at 9:30 PM
To:druid-user@googlegroups.comdruid-user@googlegroups.com
Subject: Re: [druid-user] Re: Scheduled Compaction is not trigger any Task

So is it a bug???

Who can handle it?

Yes. It makes sense to me.

Note: I re-read the documentation and it is correct.

From:druid-user@googlegroups.com” <druid-user@googlegroups.com > on behalf of
Alon Shoshani alon@oribi.io
Reply-To:druid-user@googlegroups.comdruid-user@googlegroups.com
Date: Tuesday, July 30, 2019 at 5:41 PM
To: Druid User druid-user@googlegroups.com
Subject: Re: [druid-user] Re: Scheduled Compaction is not trigger any Task

As I understand,
the segment scan algorithm, look at one interval and it’s segments. if it found two segments in the same interval, which their sum is less than inputSegmentSizeBytes

it will merge them.
In the next iteration, if in the same interval there are candidates for merging, merge !

if not, look for the next interval which has segments we can merge(lower than inputSegmentSizeBytes
)

make sense?

I think it’s a documentation error. It should have been targetCompactionSizeBytes
rather than inputSegmentSizeBytes in that documentation link you referred -

https://druid.apache.org/docs/0.15.0-incubating/design/coordinator.html#segment-search-policy
. Would await Jihoon’s reply though.

Same interval would be compacted again and again, until the optimal size is reached (targetCompactionSizeBytes). If this is not the case, then may be its an issue.

From:druid...@googlegroups.com” <druid...@googlegroups.com > on behalf of Alon Shoshani
al...@oribi.io
Reply-To:druid...@googlegroups.comdruid...@googlegroups.com
Date: Tuesday, July 30, 2019 at 12:14 PM
To:druid...@googlegroups.comdruid...@googlegroups.com
Subject: Re: [druid-user] Re: Scheduled Compaction is not trigger any Task

Hi,
According to your claim, the coordinator will compact the same interval over and over ( I already had this issue ),

because the sum of the segments will be always lower or equal than 2.5G.

According to the documentation here and the example:

look at “bar” datasource (, it has 3 segments and only the sum of two are 20MB so it will merge them and not the third one
https://druid.apache.org/docs/0.15.0-incubating/design/coordinator.html#segment-search-policy

Error! Filename not specified.

what do you say? any idea?

Alon Shoshani, R&D

www.oribi.io I blog.oribi.io

972.52.6603186

Hi Alon,

Currently in auto compaction, the compaction happens atomically per time chunk, which means, all segments in the same time chunk are compacted together or not.

From the screenshot you shared, this time chunk looks have 5 segments and their total size is greater than the configured “inputSegmentSizeBytes” which is 512MB.

You need to raise this to more than the total size of segments in each time chunk (probably 1.5 - 2GB?).

Jihoon

default config allows 10% of tasks to be used by compaction job. In your case, try submitting job manually and monitor the log. Hope this helps.

Hi,

I’m running Druid 0.14 with Kafka indexing service.

I’m trying to run schedule compaction, and I would like the segments will be around 500MB as recommended in the documentation.

My middle manager has 2 workers and the ratio is 0.5, means 1 compaction job at a time.

This is my configuration, and I have an interval time which has the following segments size,

I expect the first segment will be merged with the second one.

and the tasks just not started…

Any ideas?

{

"compactionConfigs": [
    {
        "dataSource": "events",
        "keepSegmentGranularity": true,
        "taskPriority": 25,
        "inputSegmentSizeBytes": 536870912,
        "targetCompactionSizeBytes": 419430400,
        "maxRowsPerSegment": null,
        "maxNumSegmentsToCompact": 150,
        "skipOffsetFromLatest": "P3D",
        "tuningConfig": null,
        "taskContext": null
    }
],
"compactionTaskSlotRatio": 0.5,
"maxCompactionTaskSlots": 1

}

Error! Filename not specified.

You received this message because you are subscribed to the Google Groups “Druid User” group.
To unsubscribe from this group and stop receiving emails from it, send an email to
druid...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/druid-user/8bb64b43-b5fa-4b6b-be84-833310d17c0a%40googlegroups.com
.

You received this message because you are subscribed to a topic in the Google Groups “Druid User” group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/druid-user/sTwbjtLm5Uc/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
drui...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/druid-user/CACZfFK4ec89YwNfPLKb6KFWHfhxeuyXCx%3DA_NzP5iv5fvPaVhQ%40mail.gmail.com
.


You received this message because you are subscribed to the Google Groups “Druid User” group.
To unsubscribe from this group and stop receiving emails from it, send an email to
druid...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/druid-user/CAL_eGv79MMKAXd5%2BSeeMt9FVPVfZeV8rbOmWJqRi8G2xtG4MmA%40mail.gmail.com
.


You received this message because you are subscribed to the Google Groups “Druid User” group.
To unsubscribe from this group and stop receiving emails from it, send an email to
druid-user+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/druid-user/50c352b5-7817-4338-bd46-80ffdd08ab71%40googlegroups.com
.

You received this message because you are subscribed to a topic in the Google Groups “Druid User” group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/druid-user/sTwbjtLm5Uc/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
druid-user+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/druid-user/MN2PR05MB66859C1B5423011E205DDF8AAEDC0%40MN2PR05MB6685.namprd05.prod.outlook.com
.


You received this message because you are subscribed to the Google Groups “Druid User” group.
To unsubscribe from this group and stop receiving emails from it, send an email to
druid-user+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/druid-user/CAL_eGv4LfPXpMm95_rGCLnrJrtqO50dz%3DLn8gCcXPcsFARYNuA%40mail.gmail.com
.


You received this message because you are subscribed to a topic in the Google Groups “Druid User” group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/druid-user/sTwbjtLm5Uc/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to
druid-user+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/druid-user/MN2PR05MB6685AB583AE543CFFB927600AEDF0%40MN2PR05MB6685.namprd05.prod.outlook.com
.


You received this message because you are subscribed to the Google Groups “Druid User” group.
To unsubscribe from this group and stop receiving emails from it, send an email to
druid-user+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/druid-user/CAL_eGv72ZLGEa8XfyaLZ192y_PgLDLK38f4EJcH1bd6byG4_Tw%40mail.gmail.com
.

You received this message because you are subscribed to a topic in the Google Groups “Druid User” group.

To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-user/sTwbjtLm5Uc/unsubscribe.

To unsubscribe from this group and all its topics, send an email to druid-user+unsubscribe@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/MN2PR05MB668517D146AB0703F2675583AEDF0%40MN2PR05MB6685.namprd05.prod.outlook.com.

Can you pleas set it to 2.5Gb and check if your segments are being auto compacted?

Yes, when I set it to 2.5G, the segments are compacted.
But Druid Keeps executing the same task on the same interval again and again, why?

That’s a bug expected to be fixed in an upcoming version.

Ok,
Maybe you can help me with another issue with druid 0.14.
I’m not able to send graphite metrics using graphite emitter it’s just ignoring the host I provide in the configuration,
while when the emitter is logging the metrics are logged to file…

I will appreciate it.
I also open bug on github, but no response…
https://groups.google.com/forum/#!searchin/druid-user/graphite$20emitter%7Csort:date/druid-user/ikI6aKkjmc0/ealRd390AgAJ

Alon,

Please post a new thread for this. So that some one who is working on graphite emitter could also reply.

I already post twice…no comment…