We are using Imply-1.2.0
One of our data-sources has data that is spread across previous ~3 days while indexing, and needs to be batch-indexed using delta-ingestion every 15 minutes.
We tried 3 different combinations of intervals in granularitySpec and ingestionSpec for indexing:
Single interval (spanning 3 days) with minimum and maximum timestamps as interval start and end. Segment-size: 15 minutes
15 minutes intervals, listing only the intervals containing data points. Segment-size: 15 minutes
1 minute intervals, listing only the intervals containing data points. Segment-size: 1 minute (to handle sparse data more efficiently)
It takes a couple of minutes for indexing the data in all these cases, where the same amount of data takes only few seconds if indexed with current timestamp.
Is there a guideline for getting better performance out of delta ingestion of sparse data spread across long time?