Segment announcement fails

I have 10 data sources and 12 workers (middle-managers/poens)

I have a continuous flow of data.

Ingestion task for one of the datasource waits in announcing segment task for a long time and then it fails.

Some of the error I could get coordinator log is about Balance cost strategy failed.

Do you think if i need to tune below params of the spec?

maxRowsInMemory
500000
maxRowsPerSegment
5000000

Any other suggestions to avoid failures and make segment not being in a failure state.

I have the below spec for my datasource
type
“kafka”
dataSchema
dataSource
“service2-requests”
parser
type
“string”
parseSpec
format
“json”
timestampSpec
column
“timestamp”
format
“millis”
dimensionsSpec
dimensions
0
“site”
1
“env”
2
“host”
3
“method”
4
“statuscode”
5
“bytes”
6
“duration”
7
“resource_type”
8
“repo”
9
“clientip”
10
“timestamp”
11
“username”
dimensionExclusions

spatialDimensions

metricsSpec
0
type
“count”
name
“count”
1
type
“longSum”
name
“bytesSum”
fieldName
“bytes”
expression
null
2
type
“longSum”
name
“durationSum”
fieldName
“duration”
expression
null
granularitySpec
type
“uniform”
segmentGranularity
“HOUR”
queryGranularity
type
“none”
rollup
true
intervals
null
transformSpec
filter
null
transforms

tuningConfig
type
“kafka”
maxRowsInMemory
500000
maxRowsPerSegment
5000000
intermediatePersistPeriod
“PT15M”
basePersistDirectory
“/data/druid/tmp/1563382339563-0”
maxPendingPersists
0
indexSpec
bitmap
type
“concise”
dimensionCompression
“lz4”
metricCompression
“lz4”
longEncoding
“longs”
buildV9Directly
true
reportParseExceptions
true
handoffConditionTimeout
0
resetOffsetAutomatically
true
segmentWriteOutMediumFactory
null
workerThreads
null
chatThreads
null
chatRetries
8
httpTimeout
“PT10S”
shutdownTimeout
“PT80S”
offsetFetchPeriod
“PT30S”
ioConfig
topic
“log.service2.json.request_log”
replicas
1
taskCount
1
taskDuration
“PT900S”
consumerProperties
bootstrap.servers
“stats1:9092,stats2:9092,stats3:9092”
startDelay
“PT5S”
period
“PT30S”
useEarliestOffset
false
completionTimeout
“PT1800S”
lateMessageRejectionPeriod
null
earlyMessageRejectionPeriod
null
skipOffsetGaps
false
context
null

Ingestion task for one of the datasource waits in announcing segment task for a long time and then it fails.

Hm, if the task was able to successfully create the segment and attempts to announce it, the maxRows* settings are probably okay.

To troubleshoot, I think you could look at the segment ID that’s being announced, and search the coordinator and historical logs for log events that mention that segment ID, there may be some issue loading the segment to historicals.