Druid realtime unable to hand off segments

Our druid realtime nodes were running for for some time but recently have hit a hard limit of some sort in the druid.processing.buffer.sizeBytes limit. We have it configured to be 2GB but this does not seem to be adequate for merging some of the segments. I see the following errors in the log for broken/missing segments:
2016-05-02T12:33:50,893 ERROR [weaver_events-2016-04-29T02:00:00.000Z-persist-n-merge] io.druid.segment.realtime.plumber.RealtimePlumber - Failed to persist merged index[weaver_events]: {class=io.druid.segment.realtime.plumber.RealtimePlumber, exceptionType=class com.metamx.common.IAE, exceptionMessage=Asked to add buffers[2,439,613,871] larger than configured max[2,147,483,647], interval=2016-04-29T02:00:00.000Z/2016-04-29T03:00:00.000Z}

com.metamx.common.IAE: Asked to add buffers[2,439,613,871] larger than configured max[2,147,483,647]

at com.metamx.common.io.smoosh.FileSmoosher.addWithSmooshedWriter(FileSmoosher.java:152) ~[java-util-0.27.7.jar:?]

at io.druid.segment.IndexIO$DefaultIndexIOHandler.convertV8toV9(IndexIO.java:744) ~[druid-processing-0.9.0.jar:0.9.0]

at io.druid.segment.IndexMerger.makeIndexFiles(IndexMerger.java:1009) ~[druid-processing-0.9.0.jar:0.9.0]

at io.druid.segment.IndexMerger.merge(IndexMerger.java:421) ~[druid-processing-0.9.0.jar:0.9.0]

at io.druid.segment.IndexMerger.mergeQueryableIndex(IndexMerger.java:242) ~[druid-processing-0.9.0.jar:0.9.0]

at io.druid.segment.IndexMerger.mergeQueryableIndex(IndexMerger.java:215) ~[druid-processing-0.9.0.jar:0.9.0]

at io.druid.segment.realtime.plumber.RealtimePlumber$4.doRun(RealtimePlumber.java:536) [druid-server-0.9.0.jar:0.9.0]

at io.druid.common.guava.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:42) [druid-common-0.9.0.jar:0.9.0]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_60]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_60]

at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]

``

I thought ok, I must up the buffer size a bit more no problem, then upon adding the new value i get the following exception. It seems like there is a hard limit at 2GB? How do I fix this?

  1. Error in custom provider, java.lang.NumberFormatException: For input string: “3073741824”

at io.druid.guice.ConfigProvider.bind(ConfigProvider.java:44)

at io.druid.guice.ConfigProvider.bind(ConfigProvider.java:44)

while locating io.druid.query.DruidProcessingConfig

at io.druid.guice.DruidProcessingModule.getProcessingExecutorService(DruidProcessingModule.java:92)

at io.druid.guice.DruidProcessingModule.getProcessingExecutorService(DruidProcessingModule.java:92)

while locating java.util.concurrent.ExecutorService annotated with @io.druid.guice.annotations.Processing()

for parameter 0 at io.druid.query.IntervalChunkingQueryRunnerDecorator.<init>(IntervalChunkingQueryRunnerDecorator.java:37)

while locating io.druid.query.IntervalChunkingQueryRunnerDecorator

for parameter 0 at io.druid.query.timeseries.TimeseriesQueryQueryToolChest.<init>(TimeseriesQueryQueryToolChest.java:71)

at io.druid.guice.QueryToolChestModule.configure(QueryToolChestModule.java:74)

while locating io.druid.query.timeseries.TimeseriesQueryQueryToolChest

for parameter 0 at io.druid.query.timeseries.TimeseriesQueryRunnerFactory.<init>(TimeseriesQueryRunnerFactory.java:53)

at io.druid.guice.QueryRunnerFactoryModule.configure(QueryRunnerFactoryModule.java:82)

while locating io.druid.query.timeseries.TimeseriesQueryRunnerFactory

while locating io.druid.query.QueryRunnerFactory annotated with @com.google.inject.multibindings.Element(setName=,uniqueId=18, type=MAPBINDER)

at io.druid.guice.DruidBinders.queryRunnerFactoryBinder(DruidBinders.java:38)

while locating java.util.Map<java.lang.Class<? extends io.druid.query.Query>, io.druid.query.QueryRunnerFactory>

for parameter 0 at io.druid.query.DefaultQueryRunnerFactoryConglomerate.<init>(DefaultQueryRunnerFactoryConglomerate.java:36)

while locating io.druid.query.DefaultQueryRunnerFactoryConglomerate

at io.druid.guice.StorageNodeModule.configure(StorageNodeModule.java:55)

while locating io.druid.query.QueryRunnerFactoryConglomerate

``

How big are your segments? How many rows are in each?

Druid segments should be around 300-700M in size with around 5M rows.

Segments are around 2-4GB in size, and this is what i see in the logs for size"
2016-05-03T07:11:30,582 INFO [events-incremental-persist] io.druid.segment.IndexMerger - Starting persist for interval[2016-05-03T07:00:00.000Z/2016-05-03T08:00:00.000Z], rows[500,000]

Here is the result of a count query by hour:

user@druid-realtime:~$ curl -X POST ‘http://localhost:8084/druid/v2/?pretty’ -H ‘content-type: application/json’ -d @query.q

[ {

“timestamp” : “2016-05-03T00:00:00.000Z”,

“result” : {

"count" : 41555535

}

}, {

“timestamp” : “2016-05-03T01:00:00.000Z”,

“result” : {

"count" : 46411458

}

}, {

“timestamp” : “2016-05-03T02:00:00.000Z”,

“result” : {

"count" : 40967300

}

}, {

“timestamp” : “2016-05-03T03:00:00.000Z”,

“result” : {

"count" : 32833436

}

}, {

“timestamp” : “2016-05-03T04:00:00.000Z”,

“result” : {

"count" : 29186762

}

}, {

“timestamp” : “2016-05-03T05:00:00.000Z”,

“result” : {

"count" : 24195599

}

}, {

“timestamp” : “2016-05-03T06:00:00.000Z”,

“result” : {

"count" : 20235289

}

}, {

“timestamp” : “2016-05-03T07:00:00.000Z”,

“result” : {

"count" : 16825411

}

}, {

“timestamp” : “2016-05-03T08:00:00.000Z”,

“result” : {

"count" : 14114823

}

}, {

“timestamp” : “2016-05-03T09:00:00.000Z”,

“result” : {

"count" : 13307833

}

}, {

“timestamp” : “2016-05-03T10:00:00.000Z”,

“result” : {

"count" : 15092930

}

}, {

“timestamp” : “2016-05-03T11:00:00.000Z”,

“result” : {

"count" : 20319835

}

}, {

“timestamp” : “2016-05-03T12:00:00.000Z”,

“result” : {

"count" : 20064067

}

}, {

“timestamp” : “2016-05-03T13:00:00.000Z”,

“result” : {

"count" : 20766580

}

}, {

“timestamp” : “2016-05-03T14:00:00.000Z”,

“result” : {

"count" : 24863256

}

}, {

“timestamp” : “2016-05-03T15:00:00.000Z”,

“result” : {

"count" : 24225975

}

}, {

“timestamp” : “2016-05-03T16:00:00.000Z”,

“result” : {

"count" : 31565259

}

}, {

“timestamp” : “2016-05-03T17:00:00.000Z”,

“result” : {

"count" : 34518830

}

}, {

“timestamp” : “2016-05-03T18:00:00.000Z”,

“result” : {

"count" : 25000000

}

} ]

``

You need way more partitions and much smaller segments.

Thanks Fangjin!

Is there any way to fix the data that is already collected? I was under the assumption if I modify these parameters they are not retroactive. I currently only have 1 realtime node consuming, am switching over to Tranquility soon but am trying to minimize data loss.

-Pere

I forgot to ask, will tranquility handle these issues automatically? Or is this a problem moreso with the segment length

Thanks,

Pere

With Tranquility you can define the number of partitions you want and it’ll create the additional required segments and manage the sharding for you. You can do the same thing with realtime nodes, however, it requires much more manual tuning. You can set up multiple realtime nodes, each with a different shardSpec number.

Thanks again. Will go down that path then!

@Pere I am also facing the same issue. Were you able to resolve this ? What did you change?