Kinesis ingestion service fails to push subsequent partitions (update segment)

my spec is as follows:

{
“type”: “kinesis”,
“dataSchema”: {
“dataSource”: “my-ds”,
“parser”: {
“type”: “string”,
“parseSpec”: {
“format”: “json”,
“dimensionsSpec”: {
“dimensions”: [
“eventId”,
“providedHashes”,
“s2sEnabled”,
“source”,
“platform”,
{
“name”: “advertiserId”,
“type”: “long”
},
{
“name”: “pixelId”,
“type”: “long”
},
“decisionIds”,
“appId”,
“pageUrl”,
“xForwardedFor”,
“trackerVersion”,
“resolvedHashes”,
“userDetails”,
“contentDetails”,
“transactionDetails”
]
},
“timestampSpec”: {
“column”: “timestamp”,
“format”: “iso”
}
}
},
“metricsSpec”: ,
“granularitySpec”: {
“type”: “uniform”,
“segmentGranularity”: “HOUR”,
“queryGranularity”: {
“type”: “none”
},
“rollup”: false,
“intervals”: null
},
“transformSpec”: {
“filter”: null,
“transforms”:
}
},
“tuningConfig”: {
“type”: “kinesis”,
“maxRowsInMemory”: 100000,
“maxBytesInMemory”: 100000000,
“maxRowsPerSegment”: 5000000,
“intermediatePersistPeriod”: “PT10M”,
“maxPendingPersists”: 0,
“indexSpec”: {
“bitmap”: {
“type”: “concise”
},
“dimensionCompression”: “lz4”,
“metricCompression”: “lz4”,
“longEncoding”: “longs”
},
“buildV9Directly”: true,
“reportParseExceptions”: false,
“handoffConditionTimeout”: 0,
“resetOffsetAutomatically”: false,
“skipSequenceNumberAvailabilityCheck”: false,
“workerThreads”: null,
“chatThreads”: null,
“chatRetries”: 8,
“httpTimeout”: “PT10S”,
“shutdownTimeout”: “PT80S”,
“recordBufferSize”: 10000,
“recordBufferOfferTimeout”: 5000,
“recordBufferFullWait”: 5000,
“fetchSequenceNumberTimeout”: 60000,
“fetchThreads”: 20,
“logParseExceptions”: true,
“maxParseExceptions”: 100,
“maxSavedParseExceptions”: 10
},
“ioConfig”: {
“stream”: “my-stream”,
“endpoint”: “kinesis.us-east-1.amazonaws.com”,
“replicas”: 1,
“taskCount”: 1,
“taskDuration”: “PT300S”,
“startDelay”: “PT5S”,
“period”: “PT30S”,
“useEarliestSequenceNumber”: false,
“completionTimeout”: “PT21600S”,
“lateMessageRejectionPeriod”: null,
“earlyMessageRejectionPeriod”: null,
“recordsPerFetch”: 1000,
“fetchDelayMillis”: 200,
“awsAccessKeyId”: null,
“awsSecretAccessKey”: null,
“awsAssumedRoleArn”: “arn:aws:iam::XXXXXXXXXXXXXX:role/kinesis-reader”,
“awsExternalId”: null,
“deaggregate”: false
}
}
}

``

if I reset supervisor, it builds the segment, uploads it, all good, task marked as SUCCESSFUL. However any subsequent invocations of that service failing as follows:

2019-03-27T00:43:53,934 INFO [publish-0] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Transaction failure while publishing segments, removing them from deep storage and checking if someone else beat us to publishing.
2019-03-27T00:43:53,954 INFO [publish-0] org.apache.druid.storage.s3.S3DataSegmentKiller - Removing index file[s3://REDACTED/2019-03-27T00:00:00.000Z_2019-03-27T01:00:00.000Z/2019-03-27T00:00:43.546Z/2/bddb8231-cf6f-42c8-b5f8-35890a6965d9/index.zip] from s3!
2019-03-27T00:43:53,972 INFO [publish-0] org.apache.druid.storage.s3.S3DataSegmentKiller - Removing descriptor file[s3://REDACTED/2019-03-27T00:00:00.000Z_2019-03-27T01:00:00.000Z/2019-03-27T00:00:43.546Z/2/bddb8231-cf6f-42c8-b5f8-35890a6965d9/descriptor.json] from s3!
2019-03-27T00:43:53,983 INFO [publish-0] org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for task[index_kinesis_my-stream_8564b0432c6bf9e_gcbpghdb]: SegmentListUsedAction{dataSource='my-stream', intervals=[2019-03-27T00:00:00.000Z/2019-03-27T01:00:00.000Z]}
2019-03-27T00:43:53,985 INFO [publish-0] org.apache.druid.indexing.common.actions.RemoteTaskActionClient - Submitting action for task[index_kinesis_my-stream_8564b0432c6bf9e_gcbpghdb] to overlord: [SegmentListUsedAction{dataSource='my-ds', intervals=[2019-03-27T00:00:00.000Z/2019-03-27T01:00:00.000Z]}].
2019-03-27T00:43:53,996 WARN [publish-0] org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver - Failed publish, not removing segments: [DataSegment{size=192323428, shardSpec=NumberedShardSpec{partitionNum=2, partitions=0}, metrics=[], dimensions=[eventId, providedHashes, s2sEnabled, source, platform, advertiserId, pixelId, decisionIds, appId, pageUrl, xForwardedFor, trackerVersion, resolvedHashes, userDetails, contentDetails, transactionDetails], version='2019-03-27T00:00:43.546Z', loadSpec={type=>s3_zip, bucket=REDACTED, key=>base/v1/REDACTED/2019-03-27T00:00:00.000Z_2019-03-27T01:00:00.000Z/2019-03-27T00:00:43.546Z/2/bddb8231-cf6f-42c8-b5f8-35890a6965d9/index.zip, S3Schema=>s3n}, interval=2019-03-27T00:00:00.000Z/2019-03-27T01:00:00.000Z, dataSource='REDACTED', binaryVersion='9'}]
org.apache.druid.java.util.common.ISE: Failed to publish segments.
        at org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver.lambda$publishInBackground$8(BaseAppenderatorDriver.java:579) ~[druid-server-0.14.1-incubating-SNAPSHOT.jar:0.14.1-incubating-SNAPSHOT]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_202]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_202]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_202]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
2019-03-27T00:43:53,998 ERROR [publish-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Error while publishing segments for sequenceNumber[SequenceMetadata{sequenceName='index_kinesis_my-stream_8564b0432c6bf9e_0', sequenceId=0, startOffsets={shardId-000000000049=49590109558860927907438470706516573389254189571400270610, shardId-000000000038=49590109540061399705076439576449114390504682798106804834, shardId-000000000079=49591705748747982211861215407710837226827532335113569522, shardId-000000000068=49591053885945727499747396412322882923314596216517428290, shardId-000000000046=49590194961371026041540280598913219515173163055544533730, shardId-000000000056=49590109574003133897240047998125187726873019653596119938, shardId-000000000045=49592563485674197377115608238804929948321224555086480082, shardId-000000000034=49590115304982439486846207884047085831459939978697507362, shardId-000000000076=49590955254745081323300975290848036548561715289900713154, shardId-000000000086=49590415333588552727812113668457307111371967328054936930, shardId-000000000075=49590986837930160545079571702239810556203764165482906802, shardId-000000000064=49592104602771931563317714845989338678786014171503789058, shardId-000000000053=49590337630276929915209163381407177632101816953567445842, shardId-000000000041=49590109545837292711494047653995351368762799183372485266, shardId-000000000083=49590109643826767113839448631937400894218156822757377330, shardId-000000000061=49590392151674009167640127832878298700123797962157458386, shardId-000000000071=49590338266874002352470472843531810734413632068489577586, shardId-000000000060=49590986837595649367101612355035776752199858519468409794, shardId-000000000091=49591705772921990007071426902414714402714041731430483378, shardId-000000000090=49590109658255349257293145203671937588340852360420001186}, endOffsets={shardId-000000000038=49590109540061399705076439669961944389335499954297242210, shardId-000000000049=49590109558860927907438470777588113398578645288800486162, shardId-000000000079=49591705748747982211861215413807450135144128428755649778, shardId-000000000068=49591053885945727499747396755659024619688917284247569474, shardId-000000000046=49590194961371026041540280680427460754328785652776174306, shardId-000000000056=49590109574003133897240048089669886491371220101048042370, shardId-000000000034=49590115304982439486846208647492203398835587444188906018, shardId-000000000045=49592563485674197377115608247663938354457248312932172498, shardId-000000000076=49590955254745081323300975309429226396038586939452949698, shardId-000000000075=49590986837930160545079571720092018134453014422687319218, shardId-000000000053=49590337630276929915209163435139094610693646899193316178, shardId-000000000086=49590415333588552727812113954930413956352590946622244194, shardId-000000000064=49592104602771931563317715148984834700439774652060926978, shardId-000000000041=49590109545837292711494047670439160367161006452044202642, shardId-000000000083=49590109643826767113839448710292718966720753317762303282, shardId-000000000061=49590392151674009167640127850647090396819638737587667922, shardId-000000000071=49590338266874002352470473173828478540424563407594718322, shardId-000000000060=49590986837595649367101612372972609137822132887621731266, shardId-000000000091=49591705772921990007071427228219013863037011057327474098, shardId-000000000090=49590109658255349257293145349306395374036791823611987362}, assignments=[], sentinel=false, checkpointed=true}]
org.apache.druid.java.util.common.ISE: Failed to publish segments.
        at org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver.lambda$publishInBackground$8(BaseAppenderatorDriver.java:579) ~[druid-server-0.14.1-incubating-SNAPSHOT.jar:0.14.1-incubating-SNAPSHOT]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_202]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_202]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_202]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
2019-03-27T00:43:54,003 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Shutting down immediately...
2019-03-27T00:43:54,005 INFO [task-runner-0-priority-0] org.apache.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[my-stream_2019-03-27T00:00:00.000Z_2019-03-27T01:00:00.000Z_2019-03-27T00:00:43.546Z_2] at path[/druid/segments/dev-druid-middlemanager-0.dev-druid-middlemanager-headless.dev-druid.svc.cluster.local:8100/dev-druid-middlemanager-0.dev-druid-middlemanager-headless.dev-druid.svc.cluster.local:8100_indexer-executor__default_tier_2019-03-27T00:41:57.504Z_43340307fa134b13b8c7a471cd56fc430]
2019-03-27T00:43:54,005 INFO [task-runner-0-priority-0] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/segments/dev-druid-middlemanager-0.dev-druid-middlemanager-headless.dev-druid.svc.cluster.local:8100/dev-druid-middlemanager-0.dev-druid-middlemanager-headless.dev-druid.svc.cluster.local:8100_indexer-executor__default_tier_2019-03-27T00:41:57.504Z_43340307fa134b13b8c7a471cd56fc430]
2019-03-27T00:43:54,043 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.firehose.ServiceAnnouncingChatHandlerProvider - Unregistering chat handler[index_kinesis_my-stream_8564b0432c6bf9e_gcbpghdb]
2019-03-27T00:43:54,043 INFO [task-runner-0-priority-0] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannouncing [DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/middleManager', host='dev-druid-middlemanager-0.dev-druid-middlemanager-headless.dev-druid.svc.cluster.local', bindOnHost=false, port=-1, plaintextPort=8100, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='PEON', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=0, type=indexer-executor, priority=0}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}].
2019-03-27T00:43:54,043 INFO [task-runner-0-priority-0] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/internal-discovery/PEON/dev-druid-middlemanager-0.dev-druid-middlemanager-headless.dev-druid.svc.cluster.local:8100]
2019-03-27T00:43:54,047 INFO [task-runner-0-priority-0] org.apache.druid.curator.discovery.CuratorDruidNodeAnnouncer - Unannounced [DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/middleManager', host='dev-druid-middlemanager-0.dev-druid-middlemanager-headless.dev-druid.svc.cluster.local', bindOnHost=false, port=-1, plaintextPort=8100, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeType='PEON', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=0, type=indexer-executor, priority=0}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}].
2019-03-27T00:43:54,047 INFO [task-runner-0-priority-0] org.apache.druid.server.coordination.CuratorDataSegmentServerAnnouncer - Unannouncing self[DruidServerMetadata{name='dev-druid-middlemanager-0.dev-druid-middlemanager-headless.dev-druid.svc.cluster.local:8100', hostAndPort='dev-druid-middlemanager-0.dev-druid-middlemanager-headless.dev-druid.svc.cluster.local:8100', hostAndTlsPort='null', maxSize=0, tier='_default_tier', type=indexer-executor, priority=0}] at [/druid/announcements/dev-druid-middlemanager-0.dev-druid-middlemanager-headless.dev-druid.svc.cluster.local:8100]
2019-03-27T00:43:54,047 INFO [task-runner-0-priority-0] org.apache.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/dev-druid-middlemanager-0.dev-druid-middlemanager-headless.dev-druid.svc.cluster.local:8100]
2019-03-27T00:43:54,050 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Encountered exception while running task.
java.util.concurrent.ExecutionException: org.apache.druid.java.util.common.ISE: Failed to publish segments.
        at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-16.0.1.jar:?]
        at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-16.0.1.jar:?]
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-16.0.1.jar:?]
        at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:720) ~[druid-indexing-service-0.14.1-incubating-SNAPSHOT.jar:0.14.1-incubating-SNAPSHOT]
        at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:246) [druid-indexing-service-0.14.1-incubating-SNAPSHOT.jar:0.14.1-incubating-SNAPSHOT]
        at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.run(SeekableStreamIndexTask.java:166) [druid-indexing-service-0.14.1-incubating-SNAPSHOT.jar:0.14.1-incubating-SNAPSHOT]
        at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:419) [druid-indexing-service-0.14.1-incubating-SNAPSHOT.jar:0.14.1-incubating-SNAPSHOT]
        at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:391) [druid-indexing-service-0.14.1-incubating-SNAPSHOT.jar:0.14.1-incubating-SNAPSHOT]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_202]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_202]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_202]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_202]
Caused by: org.apache.druid.java.util.common.ISE: Failed to publish segments.
        at org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver.lambda$publishInBackground$8(BaseAppenderatorDriver.java:579) ~[druid-server-0.14.1-incubating-SNAPSHOT.jar:0.14.1-incubating-SNAPSHOT]
        ... 4 more
2019-03-27T00:43:54,055 INFO [task-runner-0-priority-0] org.apache.druid.indexing.overlord.TaskRunnerUtils - Task [index_kinesis_my-stream_8564b0432c6bf9e_gcbpghdb] status changed to [FAILED].
2019-03-27T00:43:54,058 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_kinesis_ny-stream_8564b0432c6bf9e_gcbpghdb",
  "status" : "FAILED",
  "duration" : 317796,
  "errorMsg" : "java.util.concurrent.ExecutionException: org.apache.druid.java.util.common.ISE: Failed to publish se..."
}

``

Any guidance is much appreciated!

Hi, a couple of PRs have been merged to fix this problem recently as below:

Would you check that your build includes all of them?

Jihoon

Yeah, just checked… I’ve compiled druid like a day before yesterday, so some of these are missing…

I just want to make sure it’s not something in the spec itself… Going to rebuild and try with a fresher snapshot.

I will post an update later today.

Ingestion works fine now. Thanks Jihoon!

I will continue my tests for data consistency.

Thank you for testing!

Jihoon