Real-time ingestion through tranquility continuously loops on merge on MiddleManagers

Hi All,

We are seeing real-time ingestion tasks going to continuous loop of “Persist and Merge” on MiddleManagers.

No exceptions in MiddleManager, Historical and HDFS(deep storage) logs.

Able to see the segments created in Postgres metadata and the indexing-logs of all the tasks getting stored in HDFS

Snippet from Task logs:

2018-02-15T11:04:25,003 INFO [watchtower-analytics-overseer-0] io.druid.segment.realtime.plumber.RealtimePlumber - Starting merge and push.
2018-02-15T11:04:25,003 INFO [watchtower-analytics-overseer-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [1] segments. Attempting to hand off segments that start before **[1970-01-01T00:00:00.000Z]**.
2018-02-15T11:04:25,004 INFO [watchtower-analytics-overseer-0] io.druid.segment.realtime.plumber.RealtimePlumber - Skipping persist and merge for entry [1518692580000=Sink{interval=2018-02-15T11:03:00.000Z/2018-02-15T11:04:00.000Z, schema=DataSchema{dataSource='watchtower-analytics', parser={type=map, parseSpec={format=json, timestampSpec={column=t, format=millis, missingValue=null}, dimensionsSpec={dimensions=[b, d, z, n, v, vi, i, e, m], spatialDimensions=[]}}}, aggregators=[CountAggregatorFactory{name='count'}, ApproximateHistogramFoldingAggregatorFactory{name='resp_time', fieldName='p', resolution=50, numBuckets=7, lowerLimit=-Infinity, upperLimit=Infinity}], granularitySpec=UniformGranularitySpec{segmentGranularity={type=period, period=PT1M, timeZone=UTC, origin=null}, queryGranularity=NoneGranularity, rollup=true, inputIntervals=null, wrappedSpec=null}}}] : **Start time [2018-02-15T11:03:00.000Z] >= [1970-01-01T00:00:00.000Z]** min timestamp required in this run. Segment will be picked up in a future run.
2018-02-15T11:04:25,004 INFO [watchtower-analytics-overseer-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] sinks to persist and merge

Tried with rejection policy set to “serverTime”, but this is getting overriden to “none” and still resulting in the same loop behavior

Can someone let us know the root cause of this issue?

Thanks,

Ramya

tranquility-kafka.json (2.43 KB)

Hi Ramya,

See if this FAQ helps (“my tasks are never exiting”): https://github.com/druid-io/tranquility/blob/master/docs/trouble.md#my-tasks-are-never-exiting

Hi Gian,

We are able to load the data into both Historical Segment cache and HDFS.

For segments of minute granularity, it is taking ~7-8 mins for the complete indexing process to be done

The indexing task keep looping through the below log message 5-6 times before actual handoff happens:

2018-03-01T10:45:05,002 INFO [watchtower-analytics-overseer-1] io.druid.segment.realtime.plumber.RealtimePlumber - Starting merge and push.

2018-03-01T10:45:05,002 INFO [watchtower-analytics-overseer-1] io.druid.segment.realtime.plumber.RealtimePlumber - Found [1] segments. Attempting to hand off segments that start before [1970-01-01T00:00:00.000Z].

2018-03-01T10:45:05,002 INFO [watchtower-analytics-overseer-1] io.druid.segment.realtime.plumber.RealtimePlumber - Skipping persist and merge for entry [1519900980000=Sink{interval=2018-03-01T10:43:00.000Z/2018-03-01T10:44:00.000Z, schema=DataSchema{dataSource=‘watchtower-analytics’, parser={type=map, parseSpec={format=json, timestampSpec={column=t, format=millis, missingValue=null}, dimensionsSpec={dimensions=[b, d, z, n, v, vi, i, e, m], spatialDimensions=}}}, aggregators=[CountAggregatorFactory{name=‘count’}, ApproximateHistogramFoldingAggregatorFactory{name=‘resp_time’, fieldName=‘p’, resolution=50, numBuckets=7, lowerLimit=-Infinity, upperLimit=Infinity}], granularitySpec=UniformGranularitySpec{segmentGranularity={type=period, period=PT1M, timeZone=UTC, origin=null}, queryGranularity=NoneGranularity, rollup=true, inputIntervals=null, wrappedSpec=null}}}] : Start time [2018-03-01T10:43:00.000Z] >= [1970-01-01T00:00:00.000Z] min timestamp required in this run. Segment will be picked up in a future run.

2018-03-01T10:45:05,002 INFO [watchtower-analytics-overseer-1] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] sinks to persist and merge

Tried below approaches to reduce this time taken, but didn’t find any improvement:

  1. Reduced the WindowPeriod from 25s to 5s (IntermediatePersistPeriod as well)

  2. Increased the capacity on Historical nodes

  3. Increased the task.partition parameter from 1 to 2

Can you suggest what we are missing to reduce this time?

Or is this the optimal time taken by the indexing process for each segment?

Thanks,

Ramya

Check this property
“druidBeam.firehoseGracePeriod”: “PT1S”

task lives for roughly the segmentGranularity period + the windowPeriod + however long pushing takes + however long handoff takes.

and default shutoff time maybe 5 minutes.

Check your overlord console and see the task payload link information : ‘shutoffTime’ field value.