what's the best way to append data to datasource in batch mode without rebuild?

I’m importing data into druid using hadoop index task.

I have 3 paths, like:
path1: /datasourceA/ds=20160929,type=a

path2: /datasourceA/ds=20160929,type=b

path3: /datasourceA/ds=20160929,type=c

The 3 paths are created at different time, like 13:00/13:20/13:34, and the data in paths don’t have intersection.

Assume the segmentGranularity=day, and the timestamp column in timestampSpec is “ds”.

The worst case, I can build from all paths each time when a new path is ready, like [path1], [path1, path2], [path1, path2, path3], but it is inefficient.

Does druid have better way to append data to existed data, something like below?

When first path is ready, build dataSource for 20160929 with path1.

When the second path is ready, I’d like to update the dataSource for the 20160929 by just adding segments built from the second path.

So as the third path.

One thought is using shardSpec, but don’t know how to do this in hadoop index task.

You could accomplish this by changing your segmentGranularity to something smaller (like FIVE_MINUTE, TEN_MINUTE, or FIFTEEN_MINUTE) if you know your paths will be created at regular time intervals. If you ultimately want DAY granularity, you can then run a daily job every 24 hours to either merge these indexes together or create a new index from the raw data now that all your data for the day is available.

Sorry I just re-read your question and realized that even though your folders are being created at different times, they all have the same timestamp which just contains the day. You can look into delta ingestion which will create a new segment by merging data from existing segments with other sources: http://druid.io/docs/latest/ingestion/update-existing-data.html