Multiple dimension tuples for the same timestamp - how to handle

Hi,

Just working my way through the tutorials at this point, so its more than likely that I haven’t understood some basic concept correctly. I’m including, my ingestion spec for some json data stored in a bunch of files (wikipedia edits made on 2018-12-01). In the data, I see this:

{“type”:“categorize”,“ns”:14,“title”:“Category:Musicals by Otto Harbach”,“user”:“RL0919”,“timestamp”:“2018-12-01T21:25:00Z”,“comment”:"[[:Sunny (musical)]] added to category"},

{“type”:“categorize”,“ns”:14,“title”:“Category:Musicals by Oscar Hammerstein II”,“user”:“RL0919”,“timestamp”:“2018-12-01T21:25:00Z”,“comment”:"[[:Sunny (musical)]] added to category"},

{“type”:“edit”,“ns”:0,“title”:“Parker (2013 film)”,“user”:“2A00:23C5:E28B:8900:3494:BCA:7605:3D82”,“anon”:"",“timestamp”:“2018-12-01T21:25:00Z”,“comment”:""},

{“type”:“edit”,“ns”:0,“title”:“Sunny (musical)”,“user”:“RL0919”,“timestamp”:“2018-12-01T21:25:00Z”,“comment”:"+[[Category:Musicals by Oscar Hammerstein II]]; +[[Category:Musicals by Otto Harbach]] using [[WP:HC|HotCat]]"}

A query like this:

{

“query”:“select * from december1 where __time = timestamp ‘2018-12-01 21:25:00’”

}

…shows

[{"__time":“2018-12-01T21:25:00.000Z”,“comment”:"[[:Sunny (musical)]] added to category",“count”:1,“ns”:“14”,“title”:“Category:Musicals by Otto Harbach”,“type”:“categorize”,“user”:“RL0919”}]

Is this a limitation in how Druid processes the data (i.e the timestamps have to be unique - otherwise things will get rolled up, even if “rollup”: “false” is set) or is there a way for me to ingest in all rows?

Regards,

Kenneth

wikipedia-edits.json (2.48 KB)

Hi,

Does anyone have any thoughts on my problem? I went ahead and faked the data and added millisecond values so that all the timestamps came out unique, and then all the rows loaded. But it seems like I’m missing something obvious - being able to load in rows which have the same timestamp values without it being rolled up or discarded is a basic need. This is a choice that the deployer / configure should make, not be forced into. If this is not the case, and Druid has good reasons for this behavior, could someone please explain why?

Thanks,

Kenneth