I’m importing data into druid using hadoop index task.
I have 3 paths, like:
The 3 paths are created at different time, like 13:00/13:20/13:34, and the data in paths don’t have intersection.
Assume the segmentGranularity=day, and the timestamp column in timestampSpec is “ds”.
The worst case, I can build from all paths each time when a new path is ready, like [path1], [path1, path2], [path1, path2, path3], but it is inefficient.
Does druid have better way to append data to existed data, something like below?
When first path is ready, build dataSource for 20160929 with path1.
When the second path is ready, I’d like to update the dataSource for the 20160929 by just adding segments built from the second path.
So as the third path.
One thought is using shardSpec, but don’t know how to do this in hadoop index task.