Druid realtime kafka ingestion fails due to error java.util.TreeMap$NavigableSubMap

All realtime ingestion tasks fail with this error (stacktrace).

I have also hard-reset the supervisor, as well as set useEarliestOffset both true and false. In all scenarios, this is the error that is happening. There is another data source that is running in parallel and it is doing fine.

2022-10-27T14:00:47,590 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Encountered exception while running task.

java.lang.IllegalArgumentException: fromKey > toKey
	at java.util.TreeMap$NavigableSubMap.<init>(TreeMap.java:1368) ~[?:1.8.0_282]
	at java.util.TreeMap$AscendingSubMap.<init>(TreeMap.java:1855) ~[?:1.8.0_282]
	at java.util.TreeMap.subMap(TreeMap.java:913) ~[?:1.8.0_282]
	at org.apache.druid.timeline.partition.OvershadowableManager.entryIteratorGreaterThan(OvershadowableManager.java:423) ~[druid-core-0.22.1.jar:0.22.1]
	at org.apache.druid.timeline.partition.OvershadowableManager.findOvershadowedBy(OvershadowableManager.java:299) ~[druid-core-0.22.1.jar:0.22.1]
	at org.apache.druid.timeline.partition.OvershadowableManager.findOvershadowedBy(OvershadowableManager.java:275) ~[druid-core-0.22.1.jar:0.22.1]
	at org.apache.druid.timeline.partition.OvershadowableManager.moveNewStandbyToVisibleIfNecessary(OvershadowableManager.java:456) ~[druid-core-0.22.1.jar:0.22.1]
	at org.apache.druid.timeline.partition.OvershadowableManager.determineVisibleGroupAfterAdd(OvershadowableManager.java:432) ~[druid-core-0.22.1.jar:0.22.1]
	at org.apache.druid.timeline.partition.OvershadowableManager.addAtomicUpdateGroupWithState(OvershadowableManager.java:629) ~[druid-core-0.22.1.jar:0.22.1]
	at org.apache.druid.timeline.partition.OvershadowableManager.addChunk(OvershadowableManager.java:699) ~[druid-core-0.22.1.jar:0.22.1]
	at org.apache.druid.timeline.partition.PartitionHolder.add(PartitionHolder.java:70) ~[druid-core-0.22.1.jar:0.22.1]
	at org.apache.druid.timeline.partition.PartitionHolder.<init>(PartitionHolder.java:52) ~[druid-core-0.22.1.jar:0.22.1]
	at org.apache.druid.timeline.VersionedIntervalTimeline.addAll(VersionedIntervalTimeline.java:210) ~[druid-core-0.22.1.jar:0.22.1]
	at org.apache.druid.timeline.VersionedIntervalTimeline.add(VersionedIntervalTimeline.java:189) ~[druid-core-0.22.1.jar:0.22.1]
	at org.apache.druid.segment.realtime.appenderator.StreamAppenderator.getOrCreateSink(StreamAppenderator.java:482) ~[druid-server-0.22.1.jar:0.22.1]
	at org.apache.druid.segment.realtime.appenderator.StreamAppenderator.add(StreamAppenderator.java:264) ~[druid-server-0.22.1.jar:0.22.1]
	at org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver.append(BaseAppenderatorDriver.java:410) ~[druid-server-0.22.1.jar:0.22.1]
	at org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver.add(StreamAppenderatorDriver.java:187) ~[druid-server-0.22.1.jar:0.22.1]
	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:639) ~[druid-indexing-service-0.22.1.jar:0.22.1]
	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:263) [druid-indexing-service-0.22.1.jar:0.22.1]
	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.run(SeekableStreamIndexTask.java:146) [druid-indexing-service-0.22.1.jar:0.22.1]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:471) [druid-indexing-service-0.22.1.jar:0.22.1]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:443) [druid-indexing-service-0.22.1.jar:0.22.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_282]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
	Suppressed: java.lang.RuntimeException: java.lang.IllegalArgumentException: fromIndex(0) > toIndex(-1)
		at org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver.persist(StreamAppenderatorDriver.java:239) ~[druid-server-0.22.1.jar:0.22.1]
		at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:757) ~[druid-indexing-service-0.22.1.jar:0.22.1]
		at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:263) [druid-indexing-service-0.22.1.jar:0.22.1]
		at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.run(SeekableStreamIndexTask.java:146) [druid-indexing-service-0.22.1.jar:0.22.1]
		at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:471) [druid-indexing-service-0.22.1.jar:0.22.1]
		at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:443) [druid-indexing-service-0.22.1.jar:0.22.1]
		at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_282]
		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
		at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
	Caused by: java.lang.IllegalArgumentException: fromIndex(0) > toIndex(-1)
		at java.util.ArrayList.subListRangeCheck(ArrayList.java:1016) ~[?:1.8.0_282]
		at java.util.ArrayList.subList(ArrayList.java:1006) ~[?:1.8.0_282]
		at org.apache.druid.segment.realtime.appenderator.StreamAppenderator.persistAll(StreamAppenderator.java:585) ~[druid-server-0.22.1.jar:0.22.1]
		at org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver.persist(StreamAppenderatorDriver.java:231) ~[druid-server-0.22.1.jar:0.22.1]
		... 9 more```

Has there been any resharding of the failing Kafka topic?

No, but there have been some pausing and probably some offset displacement/gap

And it’s interesting, because in no way I can get the supervisor to work.
It just errors (which probably is due to empty segments and empty segments are caused by false offsets).

The things done:

Hard-resetting
Deleting and re-creating the DS
Changing all configs related to offset to ensure it uses earliest (and even vice-versa, turning off earliste config to read the last offset)

All of them, still, have not solved the supervisor constant errors.

It seems eventually and after waiting and turning off useEarliestOffset, supervisor has started working again.

Can you share the ingestion spec?
I think I’ve seen this reported before when there are too many partitions in a per time interval (segment granularity). There is a maximum of 32k partitions per time interval.

So, it could be that your ingestion had accumulated that many partitions and restarting it clean will work for a while until it reaches that point.

We’ve seen this when the __time field is static or the segment granularity is too wide (i.e. MONTH) causing the creation of too many partitions for the same time bucket.

I guess you are right. The issue just came back again.
This is due to having many partitions probably (?).
My Segment granularity is set to month, yes.

Also where in the documentation can I see 32k limit?

And if it helps, here is an error of my tasks:
Which are throwing these errors: org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner - Encountered exception in run() before persisting.

shardSpec=NumberedShardSpec{partitionNum=65570, partitions=0}, lastCompactionState=null, size=81195766}}]}]"}]. Check overlord logs for details.
	at org.apache.druid.indexing.common.actions.RemoteTaskActionClient.submit(RemoteTaskActionClient.java:98) ~[druid-indexing-service-0.22.1.jar:0.22.1]
	at org.apache.druid.indexing.appenderator.ActionBasedSegmentAllocator.allocate(ActionBasedSegmentAllocator.java:57) ~[druid-indexing-service-0.22.1.jar:0.22.1]
	at org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver.getSegment(BaseAppenderatorDriver.java:337) ~[druid-server-0.22.1.jar:0.22.1]
	at org.apache.druid.segment.realtime.appenderator.BaseAppenderatorDriver.append(BaseAppenderatorDriver.java:406) ~[druid-server-0.22.1.jar:0.22.1]
	at org.apache.druid.segment.realtime.appenderator.StreamAppenderatorDriver.add(StreamAppenderatorDriver.java:187) ~[druid-server-0.22.1.jar:0.22.1]
	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.runInternal(SeekableStreamIndexTaskRunner.java:639) [druid-indexing-service-0.22.1.jar:0.22.1]
	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskRunner.run(SeekableStreamIndexTaskRunner.java:263) [druid-indexing-service-0.22.1.jar:0.22.1]
	at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTask.run(SeekableStreamIndexTask.java:146) [druid-indexing-service-0.22.1.jar:0.22.1]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:471) [druid-indexing-service-0.22.1.jar:0.22.1]
	at org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner$SingleTaskBackgroundRunnerCallable.call(SingleTaskBackgroundRunner.java:443) [druid-indexing-service-0.22.1.jar:0.22.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_282]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]```

The partition number is a 2 byte signed integer, hence the 32k limit. I couldn’t find it in the docs either. I’ve made a note to get that updated.

If you look at the partitionNum=65570, that is indicative of this issue.

Why do you have so many segment partitions in a month? How many rows per month do you expect?
Segment file size optimization is an important part of tuning Druid. Too many segment files generates overhead, segment partitions should ideally have 1-5 million rows each.
If you have a lot more data than that per month, then perhaps daily or hourly segment granularity makes more sense.

With any realtime ingestion that runs multiple tasks, you will get a partition from each task at every handoff interval @ taskDuration or @ intermediateHandoffPeriod
This can generate a lot of segments quickly. Real-time ingestion benefits from the parallelism in order to increase ingestion throughput, but it has this side-effect. The best practice is to use auto-compaction such that these segment partitions are merged often.

Also if you still end up with a lot of data per time segment, look at the section on partitioning which helps significantly in segment partition pruning at query time improving performance and enabling higher concurrency.

Well it seems this limitation is causing issue .
Because coordinator cannot return all segments now, which means Datasources Tab in web console is no more accessible (and so many other problems).

I had to manage it by excluding the data-source temporarily.

Thanks!