I want to run a daily scheduled batch indexing job to index all the data which has arrived on the previous day but the problem is that data which has arrived yesterday will have few events belonging to 2-3 previous days.
Hence i am not sure what should be the value of “interval” field of “granularity spec”?
Should it be the date of yesterday or a larger interval covering the entire time period for which events could be present in data?
What would happen to events which fall outside the interval?Will they be dropped?
If i am using the segment granularity as “Day” will even a single event belonging to older days create a new segment for that day and will override the data for older day?