Some questions about realtime "specFile"

Hi,
I am using the realtime node to execute streaming ingestion, and I have some questions about its “specFile”.

"tuningConfig": {
  "type" : "realtime",
  "maxRowsInMemory": 50000,
  "intermediatePersistPeriod": "PT10m",
  "windowPeriod": "PT10m",
  "basePersistDirectory": "\/data\/realtime\/basePersist",
  "rejectionPolicy": {
    "type": "serverTime"
  },
  "shardSpec": {
    "type": "linear",
    "partitionNum": 1
  }
}

Described as above:
1.If I set the "rejectionPolicy" to "serverTime", "windowPeriod" to "PT10m" and now the realtime's serving time is "2015-06-24T15:00:00Z", does it means realtime node only accept the event whose timestamp is in "2015-06-24T14:50:00Z" to "2015-06-24T15:10:00Z"?
2.If the "intermediatePersistPeroid" set to "PT10m", and now the realtime's serving time is "2015-06-24T15:00:00Z", does it means the segment whose time interval is in "2015-06-24T14:50:00Z" to "2015-06-24T15:00:00Z" will be flush to disk and then load to deep storage? If it is, I haven't found the realtime segment load to deep storage.

Hi,
I am using the realtime node to execute streaming ingestion, and I have some questions about its “specFile”.

"tuningConfig": {
  "type" : "realtime",
  "maxRowsInMemory": 50000,
  "intermediatePersistPeriod": "PT10m",
  "windowPeriod": "PT10m",
  "basePersistDirectory": "\/data\/realtime\/basePersist",
  "rejectionPolicy": {
    "type": "serverTime"
  },
  "shardSpec": {
    "type": "linear",
    "partitionNum": 1
  }
}


Described as above:
1.If I set the "rejectionPolicy" to "serverTime", "windowPeriod" to "PT10m" and now the realtime's serving time is "2015-06-24T15:00:00Z", does it means realtime node only accept the event whose timestamp is in "2015-06-24T14:50:00Z" to "2015-06-24T15:10:00Z"?

You have a lower bound, yes. About the upper bound, i am not sure.

2.If the "intermediatePersistPeroid" set to "PT10m", and now the realtime's serving time is "2015-06-24T15:00:00Z", does it means the segment whose time interval is in "2015-06-24T14:50:00Z" to "2015-06-24T15:00:00Z" will be flush to disk and then load to deep storage? If it is, I haven't found the realtime segment load to deep storage.

The base persist is only for storing the data in memory each “PT10m” onto disk. Bucket by bucket. It has nothing to do with loading it to deep storage.

But when its time to ship to deep storage for any reason, druid will then take all intermediatly persisted “data” (bucket by bucket), merges them into one segment (“segmentGranularity”) and ships it to the deep storage.

IMHO :slight_smile:

Thanks,Olaf :slight_smile:
But what’s the condition of triggering realtime node to upload the local persisted “data” to deep storage?

Follow the above example, if I set the “segmentGranularity” to “Hour”, does it means when the serving time is 15:00, the realtime node will merge the persisted “data” to one segment whose interval is from 14:00~15:00, then load it to deep storage?

在 2015年6月24日星期三 UTC+8下午4:20:17,Olaf Krische写道:

Hey,

  • your segment is “hour”,

  • into this “hour” all incoming data is added, that match with their timestamp into that “hour”,

  • the “hour” is kept open for some “windowPeriod” of time, so late data can still be added into this “hour”

  • then, if either “serverTime” or “messageTime” are past that “hour+windowPeriod”, the segment is closed and shipped to deep storage

  • all other incoming data for that “hour” will from now on be ignored on the realtime node

I really have no clue, how to describe this better. :slight_smile:

I imagine it as a bucket. I write on the bucket, “everything that matches 24th of june 2015”. And then i leave this bucket open for some time. But after some time i say, enough, i close the bucket and ship it into deep storage.

Sometimes, in the data pipeline you have delays. And with this approach you can still capture “late” data.

(Nonetheless, with the batch process you can later re-create segments, by adding the data, that did not arrive, when the bucket was open)

Good luck.

Thanks very much :slight_smile:

在 2015年6月24日星期三 UTC+8下午8:34:46,ol…@adsquare.com写道: