historical node not serving the segment

Hi all,

I was trying to use Druid and following the tutorials. For the druid cluster tutorial I started coordinator, historical, broker and realtime node. I am using kafka to load the data in druid cluster.
but the problem I see is, every time I query for data it is getting served from realtime node only. I am exactly using the setting as suggested in the tutorial here.

I am attaching screenshot of what is happening on all four services. I am running all of them on 1 vm.

The nodes in the picture are in the following order.

[Coordinator, broker]
[historical, realtime]

Please help out in how do I find the reason for this.

I also see this in the realtime node

Hi Amey,
I guess your rejectionPolicy is set to noop which is causing handoff to not happen.

Try changing the rejectionPolicy to serverTime.

Also, with serverTime rejection policy, realtime node will only handoff segments to the historical nodes every “segmentGranularity” + “windowPeriod”.

Thanks for the reply. I am following the tutorial and see in the wikipedia_realtime.spec file that rejectionPolicy is set to messageTime. Please see the configs below
“tuningConfig”: {
“type”: “realtime”,
“maxRowsInMemory”: 500000,
“intermediatePersistPeriod”: “PT3m”,
“windowPeriod”: “PT1m”,
“basePersistDirectory”: “/tmp/realtime/basePersist”,
“rejectionPolicy”: {
“type”: “messageTime”
}

``

Hi Amey, this rejectionPolicy is really there as an example and we don’t recommend using it in production. The feature you are really looking for, the ability to stream in any timestamp into Druid, is currently in development. Right now, we recommend using batch ingestion for historical data.

http://druid.io/docs/latest/ingestion/overview.html

Yes I get it that for production I will need server time rejection policy but for POC I just wanted to see if everything works. I narrowed down this problem to some extent. I started deploying it from scratch again, and this time I found that realtime nodes is writing data to metastore and once that data is written the coordinator get the segments and pushed it to history node. But when I again tried to deply on a multinode cluster with the same MYSQL database, realtime node is not writing anything to metastore which I think is the reason that history node is getting any segments. Do you see any problem in this. DOes the order in which I start the nodes matter like starting realtime node before satrting coordinator.
Thanks

Hi Amey,

Try http://druid.io/docs/latest/ingestion/faq.html for handoff issues.

I already went through it. It was a weired problem I started everything from scratch with new database and now it is working. Still cannot figure out what was the problem. Thanks for looking into it.

The message time rejection policy will not hand off events unless there is a constant stream of events. It is really flaky with handoff. You will need to use batch ingestion for historical data.

FWIW, if you are getting started with Druid for the first time, you’ll probably have a lot more luck with this quickstart: http://imply.io/docs/latest/quickstart

The current Druid docs don’t go into a lot of detail of what you are supposed to do in various cases.

Thanks will have a look.

Hello,

I am having a similar issue. “messageTime” rejection policy - ingestion via RealTime node using a Kafka firehose on a single-partition Kafka topic with absolutely chronological messages. Some segments were handed off - others not!

Checking the logs it seems that handoffs started happening about 30mins after ingestion stopped (metric ingest/events/processed = 0) and continued for about 1 hour after at which point 17 of 30 segments had been handed off. I will start ingesting more data and see if I can encourage the remaining ones - could that work?

What exactly triggers the handoff?

If this rejection policy is really as flaky as Fangjin mentioned, maybe it should be removed from the documentation :-/

I know that batch-ingestion is the way to go in that case but for POC purposes it has a really high ramp-up, compared to just reading from Kafka which is a pity.

Regards,

/David

I agree that messageTime should be removed from the documentation and generally not recommended to be used. Batch ingestion should be used to ingest historical data and streaming ingestion is used to ingest live data.