Select result includes segments with repeating offsets

Druid Version : 0.9.2

We are seeing an odd issues when running select queries. What we’re doing is running a query with the same interval but different filters. The first request limits the dimension to one value so the result set is much smaller(~1000 events), the second request includes the dimension from the first and other dimensions which allows for more results(~10000 events). If we filter out based on the common dimension we expect to see the same results but we do not. It turns out that the smaller request includes events that the larger request doesn’t.

The responses from the smaller request seem to have some inconsistencies in the segment and offset. If you take a look at request 1, the offset for the segment included enumerates in this way: 0, 1, 2, 3, 0, 0, 1, 0, 1. Even the paging identifier seems confused as it is returning 1, but the offset went up to 3.

Request two makes more sense based on the continuation of the offset, but the events with the duplicated offsets do not show up at all in the second request. For instance event 02321c59-9125-485c-986a-1a252a017d9e, is not returned at all in the second result.

Request 1

[

“pagingIdentifiers”: {

“source1_2017-10-11T16:00:00.000Z_2017-10-11T16:18:14.000Z_2017-10-11T16:00:15.256Z”: 1

}

…some events

{

“offset”: 0,

“segmentId”: “source1_2017-10-11T16:00:00.000Z_2017-10-11T16:18:14.000Z_2017-10-11T16:00:15.256Z”,

“event”: {

“id”: “2a288d7e-8bb5-4191-954a-70ad8f8508db”,

“timestamp”: “2017-10-11T16:00:20.365Z”

…event data

}

},

{

“offset”: 1,

“segmentId”: “source1_2017-10-11T16:00:00.000Z_2017-10-11T16:18:14.000Z_2017-10-11T16:00:15.256Z”,

“event”: {

“id”: “10880bd6-7d15-4f04-a7ab-deead6db9bb0”,

“timestamp”: “2017-10-11T16:02:03.203Z”,

…event data

}

},

{

“offset”: 2,

“segmentId”: “source1_2017-10-11T16:00:00.000Z_2017-10-11T16:18:14.000Z_2017-10-11T16:00:15.256Z”,

“event”: {

“id”: “1d016321-a2ab-45de-a580-5810f385c8c0”,

“timestamp”: “2017-10-11T16:04:37.789Z”,

…event data

}

},

{

“offset”: 3,

“segmentId”: “source1_2017-10-11T16:00:00.000Z_2017-10-11T16:18:14.000Z_2017-10-11T16:00:15.256Z”,

“event”: {

“id”: “cbdd6a84-f556-4561-824c-134edb5a3270”,

“timestamp”: “2017-10-11T16:05:03.298Z”,

…event data

}

},

{

“offset”: 0,

“segmentId”: “source1_2017-10-11T16:00:00.000Z_2017-10-11T16:18:14.000Z_2017-10-11T16:00:15.256Z”,

“event”: {

“id”: “011054f7-c630-421f-b3b2-16dd1fd38bb8”,

“timestamp”: “2017-10-11T16:06:39.207Z”,

…event data

}

},

{

“offset”: 0,

“segmentId”: “source1_2017-10-11T16:00:00.000Z_2017-10-11T16:18:14.000Z_2017-10-11T16:00:15.256Z”,

“event”: {

“id”: “f869f356-8662-47fb-9ae3-888a48cadf28”,

“timestamp”: “2017-10-11T16:10:50.884Z”,

…event data

}

},

{

“offset”: 1,

“segmentId”: “source1_2017-10-11T16:00:00.000Z_2017-10-11T16:18:14.000Z_2017-10-11T16:00:15.256Z”,

“event”: {

“id”: “f1f9b592-6c51-474d-9a7c-c55f54c299b4”,

“timestamp”: “2017-10-11T16:10:56.542Z”,

…event data

}

},

{

“offset”: 0,

“segmentId”: “source1_2017-10-11T16:00:00.000Z_2017-10-11T16:18:14.000Z_2017-10-11T16:00:15.256Z”,

“event”: {

“id”: “02321c59-9125-485c-986a-1a252a017d9e”,

“timestamp”: “2017-10-11T16:15:24.146Z”,

…event data

}

},

{

“offset”: 1,

“segmentId”: “source1_2017-10-11T16:00:00.000Z_2017-10-11T16:18:14.000Z_2017-10-11T16:00:15.256Z”,

“event”: {

“id”: “b32b3029-7cd2-40fd-880f-2433e62464c5”,

“timestamp”: “2017-10-11T16:15:24.195Z”,

…event data

}

}

]

Request 2

[

{

“offset”: 0,

“segmentId”: “source1_2017-10-11T16:00:00.000Z_2017-10-11T16:18:14.000Z_2017-10-11T16:00:15.256Z”,

“event”: {

“id”: “7646828d-6865-4a34-a069-86857d8152ea”,

“timestamp”: “2017-10-11T16:00:03.688Z”

}

},

{

“offset”: 1,

“segmentId”: “source1_2017-10-11T16:00:00.000Z_2017-10-11T16:18:14.000Z_2017-10-11T16:00:15.256Z”,

“event”: {

“id”: “9d00da40-226a-476e-9c38-fc4d6a75e554”,

“timestamp”: “2017-10-11T16:00:04.936Z”

}

},

{

“offset”: 2,

“segmentId”: “source1_2017-10-11T16:00:00.000Z_2017-10-11T16:18:14.000Z_2017-10-11T16:00:15.256Z”,

“event”: {

“id”: “26d0c27d-0857-4d8b-9083-cba1809bff16”,

“timestamp”: “2017-10-11T16:00:06.238Z”

}

},

{

“offset”: 3,

“segmentId”: “source1_2017-10-11T16:00:00.000Z_2017-10-11T16:18:14.000Z_2017-10-11T16:00:15.256Z”,

“event”: {

“id”: “2df9d482-3099-4a20-bfc3-232cc2adb465”,

“timestamp”: “2017-10-11T16:00:07.475Z”

}

},

{

“offset”: 4,

“segmentId”: “source1_2017-10-11T16:00:00.000Z_2017-10-11T16:18:14.000Z_2017-10-11T16:00:15.256Z”,

“event”: {

“id”: “400f32c0-0ee7-4b66-b1f0-da65413c7d50”,

“timestamp”: “2017-10-11T16:00:11.120Z”

}

}

]

I’ve ommited alot of the data but we have validated our pagination logic over and over. This is causing us to be missing data in our production queries. Is there expected behavior in segments repeating offsets within one select query?

We only see this behavior when we query real time segments, and it does seem to fix itself after some time(~10-15 mins). When I query the segment again, the duplicated offsets are returned with a recalculated offset that is consistent.