Data not available to be queried until kafka_indexing_task is complete

Hello.

I have a Druid V.10.0 cluster that has a kafka indexing service as follows:

{

“type”: “kafka”,

“dataSchema”: {

“dataSource”: “clickstreamkafka”,

“parser”: {

“type”: “string”,

“parseSpec”: {

“format”: “json”,

“timestampSpec”: {

“column”: “timestamp”,

“format”: “yyyy/MM/dd HH:mm:ssZ”

},

“dimensionsSpec”: {

“dimensions”: [“name”, “id”, “task”, “value”],

“dimensionExclusions”: [

“timestamp”

]

}

}

},

“metricsSpec”: [

{

“name”: “count”,

“type”: “count”

}

],

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “MINUTE”,

“queryGranularity”: “SECOND”

}

},

“tuningConfig”: {

“type”: “kafka”,

“maxRowsPerSegment”: 5000000

},

“ioConfig”: {

“topic”: “clickstream.demo.V2”,

“consumerProperties”: {

“bootstrap.servers”: “10.54.52.3:9092”

},

“taskCount”: 1,

“replicas”: 1,

“taskDuration”: “PT1H”

}

}

This is the query I am using to see what data is available:
{

“queryType”: “select”,

“dataSource”: “clickstreamkafka”,

“descending”: “false”,

“dimensions”:,

“metrics”:,

“granularity”: “all”,

“intervals”: [

“2017-11-21/2017-11-22”

],

“pagingSpec”:{“pagingIdentifiers”: {}, “threshold”:1000}

}

The paging spec as part of the query result:
“pagingIdentifiers”: {

“clickstreamkafka_2017-11-21T10:34:00.000Z_2017-11-21T10:35:00.000Z_2017-11-21T15:34:03.640Z”: 136,

“clickstreamkafka_2017-11-21T10:35:00.000Z_2017-11-21T10:36:00.000Z_2017-11-21T15:35:02.116Z”: 109,

“clickstreamkafka_2017-11-21T10:36:00.000Z_2017-11-21T10:37:00.000Z_2017-11-21T15:35:53.902Z”: 31,

“clickstreamkafka_2017-11-21T10:37:00.000Z_2017-11-21T10:38:00.000Z_2017-11-21T15:36:53.857Z”: 56,

“clickstreamkafka_2017-11-21T10:38:00.000Z_2017-11-21T10:39:00.000Z_2017-11-21T15:38:05.626Z”: 82,

“clickstreamkafka_2017-11-21T10:39:00.000Z_2017-11-21T10:40:00.000Z_2017-11-21T15:38:57.605Z”: 133,

“clickstreamkafka_2017-11-21T10:40:00.000Z_2017-11-21T10:41:00.000Z_2017-11-21T15:39:53.815Z”: 13,

“clickstreamkafka_2017-11-21T11:59:00.000Z_2017-11-21T12:00:00.000Z_2017-11-21T18:54:08.096Z”: 4,

“clickstreamkafka_2017-11-21T12:00:00.000Z_2017-11-21T12:01:00.000Z_2017-11-21T18:54:08.253Z”: 6,

“clickstreamkafka_2017-11-21T12:01:00.000Z_2017-11-21T12:02:00.000Z_2017-11-21T18:54:08.312Z”: 6,

“clickstreamkafka_2017-11-21T12:07:00.000Z_2017-11-21T12:08:00.000Z_2017-11-21T18:54:08.338Z”: 9,

“clickstreamkafka_2017-11-21T12:39:00.000Z_2017-11-21T12:40:00.000Z_2017-11-21T18:54:08.374Z”: 15,

“clickstreamkafka_2017-11-21T13:19:00.000Z_2017-11-21T13:20:00.000Z_2017-11-21T18:54:08.404Z”: 5

This is the task Creation and Insertion time.

CreatedTime
2017-11-21T18:56:18.470Z
QueueInsertionTime
2017-11-21T18:58:08.393Z

I have been inserting data since 19:02:11 and it is now 19:24:04 however none of the data I Inserted after the new indexing task was started is queryable… Is there a bug or did I miss-configure the supervisor specification?

Thanks,

Gregory

In case anyone sees this and has the same problem, I solved my issue by change the timestamp embeded in my druid data to UTC throughout. This fixed all the odd behavior with data not showing up when I expected it to.

Thanks
Gregory