Question about batch ingestion and reprocessing data

Hi, list.

I have an use case that I can’t figure out how to solve it. Below I describe a simpler version.

Consider that we ingested these 4 rows and Druid put that in one segment:

{“timestamp”: “2014-10-10T01:00:00Z”, “page”: “A”, “language” : “en”, … } //old data
{“timestamp”: “2014-10-10T01:01:00Z”, “page”: “B”, “language” : “en”, … } //old data
{“timestamp”: “2014-10-10T01:02:00Z”, “page”: “C”, “language” : “en”, “country” : “USA”, …} //old data
{“timestamp”: “2014-10-10T01:03:00Z”, “page”: “D”, “language” : “en”, “country” : “USA”, …} //old data

Note that “country” field is null for first and second row.

After a few days, we receive an updated version of data with more fields filled, for instance “country”. Note that these updated data has the same timestamp that first and second rows in the example above:

{“timestamp”: “2014-10-10T01:00:00Z”, “page”: “A”, “language” : “en”, “country” : “USA”, …} //updated data
{“timestamp”: “2014-10-10T01:01:00Z”, “page”: “B”, “language” : “en”, “country” : “USA”, …} //updated data

I tried a batch ingestion with updated version of data but Druid replaced the entire segment with these data and I lost the third and fourth rows. Is there some batch ingestion type that can produce the result below?

{“timestamp”: “2014-10-10T01:00:00Z”, “page”: “A”, “language” : “en”, “country” : “USA”, … } //updated data
{“timestamp”: “2014-10-10T01:01:00Z”, “page”: “B”, “language” : “en”, “country” : “USA”, … } //updated data
{“timestamp”: “2014-10-10T01:02:00Z”, “page”: “C”, “language” : “en”, “country” : “USA”, …} //old data
{“timestamp”: “2014-10-10T01:03:00Z”, “page”: “D”, “language” : “en”, “country” : “USA”, …} //old data

Does this doc help answer the question?
http://druid.io/docs/0.9.0/ingestion/update-existing-data.html

Hi, Fangjin.