kafka indexing service: Could not allocate segment for row with timestamp

Hi all,

Currently, I’m testing the kafka indexing task on kubernetes druid cluster. The cluster works fine … and the indexing tasks is running on my middleManager:

I send kafka message from another node… but I only can send one message, if I send one message I can query them:

{

“queryType”: “topN”,

“dataSource”: “test-data-4”,

“granularity”: “all”,

“dimension”: “dim1”,

“threshold”: 1000,

“metric”: “value”,

“aggregations”: [

{

“type”: “doubleSum”,

“name”: “value”,

“fieldName”: “value_sum”

}

],

“intervals”: [

“2016-12-01T08:30:00/2016-12-01T12:00:00”

]

}

``

[ {

“timestamp” : “2016-12-01T08:58:00.000Z”,

“result” : [ {

“value” : 200.0,

“dim1” : “val2”

} ]

} ]

``

but … If I send another message, the task throw this exception and die …

2016-11-30T12:02:21,156 INFO [test-data-1-incremental-persist] io.druid.segment.realtime.appenderator.AppenderatorImpl - Committing metadata[FiniteAppenderatorDriverMetadata{act

iveSegments={index_kafka_test-data-1_b1d9f9f90e48493_0=[test-data-1_2016-11-30T11:00:00.000Z_2016-11-30T12:00:00.000Z_2016-11-30T11:31:00.880Z]}, lastSegmentIds={index_kafka_tes

t-data-1_b1d9f9f90e48493_0=test-data-1_2016-11-30T11:00:00.000Z_2016-11-30T12:00:00.000Z_2016-11-30T11:31:00.880Z}, callerMetadata={nextPartitions=KafkaPartitions{topic='druid-t

esting’, partitionOffsetMap={0=2, 1=3, 2=2, 3=3}}}}] for sinks[test-data-1_2016-11-30T11:00:00.000Z_2016-11-30T12:00:00.000Z_2016-11-30T11:31:00.880Z:1].

2016-11-30T12:02:21,163 INFO [task-runner-0-priority-0] io.druid.segment.realtime.appenderator.FiniteAppenderatorDriver - Persisted pending data in 157ms.

2016-11-30T12:02:21,167 INFO [task-runner-0-priority-0] io.druid.segment.realtime.appenderator.AppenderatorImpl - Shutting down…

2016-11-30T12:02:21,179 INFO [appenderator_persist_0] io.druid.server.coordination.BatchDataSegmentAnnouncer - Unannouncing segment[test-data-1_2016-11-30T11:00:00.000Z_2016-11-

30T12:00:00.000Z_2016-11-30T11:31:00.880Z] at path[/druid/segments/10.0.4.20:7081/10.0.4.20:7081_indexer-executor__default_tier_2016-11-30T12:02:20.942Z_ef8717f4c2d04819905b17ab

9378ca140]

2016-11-30T12:02:21,180 INFO [appenderator_persist_0] io.druid.curator.announcement.Announcer - unannouncing [/druid/segments/10.0.4.20:7081/10.0.4.20:7081_indexer-executor__def

ault_tier_2016-11-30T12:02:20.942Z_ef8717f4c2d04819905b17ab9378ca140]

2016-11-30T12:02:21,199 INFO [appenderator_persist_0] io.druid.segment.realtime.appenderator.AppenderatorImpl - Removing sink for segment[test-data-1_2016-11-30T11:00:00.000Z_20

16-11-30T12:00:00.000Z_2016-11-30T11:31:00.880Z].

2016-11-30T12:02:21,207 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_test-data-1

_b1d9f9f90e48493_njmjinao, type=index_kafka, dataSource=test-data-1}]

com.metamx.common.ISE: Could not allocate segment for row with timestamp[2016-11-30T11:40:36.000Z]

at io.druid.indexing.kafka.KafkaIndexTask.run(KafkaIndexTask.java:427) ~[?:?]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_111]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]

at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]

2016-11-30T12:02:21,231 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_test-data-1_b1d9f9f90e48493_njmjinao] status changed to [F

AILED].

2016-11-30T12:02:21,233 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

“id” : “index_kafka_test-data-1_b1d9f9f90e48493_njmjinao”,

“status” : “FAILED”,

“duration” : 1093

}

2016-11-30T12:02:21,238 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.coordination.AbstractDataSegm

entAnnouncer.stop()] on object[io.druid.server.coordination.BatchDataSegmentAnnouncer@4538856f].

``

I did this test serveral times and the return is the same all the times.

This is my supervisor-kafka sepc:

{

“type”: “kafka”,

“dataSchema”: {

“dataSource”: “test-data-4”,

“parser”: {

“type”: “string”,

“parseSpec”: {

“format”: “json”,

“timestampSpec”: {

“column”: “timestamp”,

“format”: “ruby”

},

“dimensionsSpec”: {

“dimensions”: [“dim1”]

}

}

},

“metricsSpec”: [

{

“name”: “count”,

“type”: “count”

},

{

“name”: “value_sum”,

“fieldName”: “value”,

“type”: “doubleSum”

},

{

“name”: “value_min”,

“fieldName”: “value”,

“type”: “doubleMin”

},

{

“name”: “value_max”,

“fieldName”: “value”,

“type”: “doubleMax”

}

],

“granularitySpec”: {

“type”: “uniform”,

“segmentGranularity”: “HOUR”,

“queryGranularity”: “minute”

}

},

“tuningConfig”: {

“type”: “kafka”,

“maxRowsPerSegment”: 5000000

},

“ioConfig”: {

“topic”: “druid-testing”,

“consumerProperties”: {

“bootstrap.servers”: “kafka:9092”

},

“taskCount”: 1,

“replicas”: 1,

“taskDuration”: “PT1H”

}

}

``

I have read on other threads that this exception maybe throw when you have hadoop generated segments that doesn’t support append data … but this is a empty cluster without hadoop and without data.

Some idea about the problem??

Regards,

Andrés

mu guess is that the time format depends on the timezone so if it is not UTC some see/desr mismatch will occur

Can you post your overlord logs as well? The actual exception thrown while trying to allocate the segment should be found there.

Hi!!

Apparently the problem solved when I add the -Duser.timezone=UTC -Dfile.encoding=UTF-8 to the JAVA ARGS and added the druid-s3-extensions, I forgot it!! hahaha

I suppose that this make sense, isn’t??

Regards,

Andrés Gómez

Big Data Development Manager

agomez@redborder.com
+34 606224922 | +34 955 601 160