Old data in Realtime node

Hi,

We had initially the window period configuration as 1 month since certain old data need to be processed and also we are not ready with deep storage. Now have changed window period to 20 min and i could see that only recent data moved to deep storage (S3), but older data is there local drive of realtime node.

Please advice on this, is there any configuration i am missing or is there any way to forcefully move realtime data to S3 location.

Thanks in advance.

-Suresh

Hey Suresh,

There isn’t a way to force handoff, but it should happen automatically for segments that are old enough. Do you see any logs like “Starting merge and push.” or “Found [%,d] sinks to persist and merge”? Or any references to doing anything with those older segments?

Hey Suresh,

Do you have any segments that might have been old enough to hand off before you configured a deep storage? If so then it’s possible those segments were “handed off” to local deep storage. In this case they should also have some useless entries in the metadata store pointing to the realtime node’s local disk (useless because there is no way for any other node to get at this data). If this happened, you should be able to get handoff to happen again by stopping the realtime node, removing those rows from the metadata store, removing the isPushedMarker file from the relevant sinks on its disk, and starting it back up again.

Thanks Gian for the reply,i am able to see that old data is hand off to deep storage upon removing the isPushedMarker files from the segments.

-Suresh

Hi Gian,

One another issue i am facing.

I could see the segments are moved to S3, but still the older segments which moved to s3 also stays in local disk of realtime. isPushedMarker file is created and merged folder is available in the segments folder. Due to it when i query broker node, it takes time to respond for queries. I have the historical node running in separate instance.

Please guide me on this.

-Suresh

Hi Suresh, what version of Druid is this?

Usually the situation you are describing is because the realtime node didn’t see the handoff complete. Can you pick a segment where this occurred and attach the logs in the realtime node for the segment?

Hi Fangjin,

Sorry for the delayed reponse, I am using 0.8.0 version and attaching the realtime logs for particular segment

Please help on this.

-Suresh

Realtime-Segment-Logs.txt (4.25 KB)

Do the logs end there? It seems like the object is pushed to S3 but never picked up by the historicals. I’d imagine that either your cluster is out of space, or the historicals are trying to load sgements are failing. Seraching for this same segment’s ID in the coordinator logs will help. Please post those. If you see messages around which historical the segment is assigned to, please also post logs of that historical.

I see below error in Co-ordinator log, and my historical configuration is as below. I could see 10 GB is already reached in historicals. Is it occuring due to the space issue

If so, increasing the segment location size will be helping?

Co-ordinator Log

Yes, you need more capacity in your cluster. Add another historical node.

Thanks Fangjin, Increasing the segment location size helped in loading up of segments in historical node. Have below clarifications

1.Is there any option like historical node has the data locally which are being queried.At any point of time, i will be querying for only one month data, but i see previous months data (non queried data) is lying in historical index cache.

Using the property druid.segmentCache.deleteOnRemove= true

Druid data retention: http://druid.io/docs/latest/operations/rule-configuration.html

Thanks Fangjin, i am able to delete segments by setting the rules.