Delayed Data Handling

Hi all,

I’m new to druid but have a strong background in similar systems.

In the domain I’m working in common for clients to be offline for hours or days during which time they buffer events. When they regain connectivity they upload the buffered events.

Since time is so integral to Druid, can it handle this issue well? It would seem that it would require rebuilding a bunch of historical data chunks and could cause issues with the chunks in the real time nodes.

Ben

I did some searching for “delayed data” (https://groups.google.com/forum/#!searchin/druid-user/delayed$20data) and got enough of an answer.

Hi Benjamin, when doing streaming ingestion into Druid, Druid currently has a window where events can be accepted. Events outside this configurable window will be dropped. We are working hard on a new feature to remove this window so that delayed events can still be streamed into Druid at any time and this feature should be completed in the coming weeks.

You can also use batch ingestion to load any historical data into Druid. With batch ingestion, you don’t have to worry about any windows.

That’s great news!