committing offsets Repeated consumption

I hive more realtime nodes, Read in kafka. They have the same groupId. When a node lost, other consumer(realtime node) will rebalance,but it’s not committing,Lead to partial data repeat consumption .

Why don’t we use Kafka to provide auto.commit.enable=true ,And manual to commit?

When the query requests too much,the realtime node Easily lost. why?


We disable autocommit because otherwise, we will end up committing a lot of data that’s in memory and hasn’t been written to disk.

Thank you for your answer

Have any good suggestions for the problems described above?

在 2015年10月9日星期五 UTC+8下午10:40:16,Gian Merlino写道:

Right now, the main suggestion is to use a hybrid batch/realtime setup to reload data periodically in batches. The batch loaded data will replace realtime loaded data for the same interval.

For the future, there is some work underway to switch off the Kafka high level consumer, and onto a consumer that will give us more control over how to handle committing and rebalances. The tracking issue is:

thinks Merlino

But I don’t know what it is.
When the query requests too much,the realtime node Easily lost. why?

this is broker node log
2015-10-09T20:49:36,279 INFO [ServerInventoryView-0] io.druid.client.BatchServerInventoryView - Server Disappeared[DruidServerMetadata{name=‘xxx’, host=‘xxx’, maxSize=0, tier=’_default_tier’, type=‘realtime’, priority=‘0’}]

在 2015年10月10日星期六 UTC+8上午9:41:53,Gian Merlino写道:

What do you mean the realtime node is host due to queries? Does it go offline? That should never happen. Can you describe a bit more what is happening?

realtime.spec configuration:

“intermediatePersistPeriod”: “PT10s”

“windowPeriod”: “PT50m”

so,The last two hours of the data will be in real-time node。

I try to restore the scene, When I select the last hour, or the last day of the data, it will not trigger the problem。

But when I select the last week’s data, and =groupBy queryType,

The amount of data in this datasource is about 56G,

I have 7 historical nodes that are distributed on three servers.

I found that there was a server Disappeared 1 historical node, Disappeared and 3 real-time nodes,But a few minutes later, it was restored.

Is it that my cluster configuration is not reasonable, resulting in a server pressure is too large?

You can see the situation in the annex, the history node cluster.


在 2015年10月12日星期一 UTC+8上午4:05:14,Fangjin Yang写道:


My problem is solved. I added a server,And to reduce the previous problem of the server to a historical node.


在 2015年10月12日星期一 UTC+8上午4:05:14,Fangjin Yang写道: