tranquility rejectionPolicy set to none but drop everything

Hi

My timezone is UTC+8, I’ve set all timezone in conf file to UTC, and start service like this

Coordinator / Overlord / Broker on Node1

MiddleManager on Node2

Historical / Tranquility on Node3

I set

  1. rejectionPolicy.type to none

  2. druid.storage.type = noop

this is how I set parse data in tranquility config file

                 "timestampSpec" : {

                    "format" : "auto",

                    "column" : "druid_tstamp"

                 },

        "tuningConfig" : {

           "maxRowsInMemory" : "10000",

           "type" : "realtime",

           "windowPeriod" : "PT60M",

           "intermediatePersistPeriod" : "PT10M",

           "rejectionPolicy": {

              "type": "none"

           }

        }

and this is timestamp data sample

{“druid_tstamp”: “2016-10-20T05:59:40Z”, “tracker”: “iOS”}

and druid_tstamp is very close to our UTC time (not Local time)

but tranquility still drop everything,

2016-10-20 05:54:46,735 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {events-v2={receivedCount=104623, sentCount=0, droppedCount=104623, unparseableCount=0}} pending messages in 0ms and committed offsets in 48ms.

2016-10-20 05:55:01,781 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {events-v2={receivedCount=106874, sentCount=0, droppedCount=106874, unparseableCount=0}} pending messages in 0ms and committed offsets in 46ms.

Is there anything I can do to find why tranquility drop the message ?

is it because timestamp all out of window ?

I pushed data time range from -10hous to 10hours later, but still no rows send, this is some of tranquility log

2016-10-20 11:36:32,426 [ClusteredBeam-ZkFuturePool-5f11d753-6c4d-4942-94fc-7c31b84163a6] INFO c.m.tranquility.beam.ClusteredBeam - Global latestCloseTime[2016-10-20T12:00:00.000Z] for identifier[druid:overlord/druid-test] has moved past timestamp[2016-10-20T10:00:00.000Z], not creating merged beam

2016-10-20 11:36:32,429 [ClusteredBeam-ZkFuturePool-5f11d753-6c4d-4942-94fc-7c31b84163a6] INFO c.m.tranquility.beam.ClusteredBeam - Turns out we decided not to actually make beams for identifier[druid:overlord/druid-test] timestamp[2016-10-20T10:00:00.000Z]. Returning None.

2016-10-20 11:36:32,446 [ClusteredBeam-ZkFuturePool-5f11d753-6c4d-4942-94fc-7c31b84163a6] INFO c.m.tranquility.beam.ClusteredBeam - Global latestCloseTime[2016-10-20T12:00:00.000Z] for identifier[druid:overlord/druid-test] has moved past timestamp[2016-10-20T11:00:00.000Z], not creating merged beam

2016-10-20 11:36:32,447 [ClusteredBeam-ZkFuturePool-5f11d753-6c4d-4942-94fc-7c31b84163a6] INFO c.m.tranquility.beam.ClusteredBeam - Turns out we decided not to actually make beams for identifier[druid:overlord/druid-test] timestamp[2016-10-20T11:00:00.000Z]. Returning None.

2016-10-20 11:36:39,516 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {druid-test={receivedCount=104920, sentCount=0, droppedCount=104920, unparseableCount=0}} pending messages in 4ms and committed offsets in 20ms.

2016-10-20 11:36:54,533 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {druid-test={receivedCount=15018, sentCount=0, droppedCount=15018, unparseableCount=0}} pending messages in 1ms and committed offsets in 15ms.

Hey Luan,

Feels like a timestamp out of window period issue to me (if I remember correctly Tranquility doesn’t read the rejectionPolicy parameter). You can try setting “reportDropsAsExceptions”:true in your ingestion spec which might give more information on what’s happening. Also check that -Duser.timezone=UTC is set across all your processes including tranquility-kafka.

If you’re just starting out with Druid ingestion through Kafka, I recommend you take a look at the Kafka indexing service here: http://druid.io/docs/0.9.1.1/development/extensions-core/kafka-ingestion.html. The Kafka indexing service can handle arbitrary timestamp events so you won’t see issues like this one.

thanks, I tried Kafka indexing service, unfortunately it didn’t work, this is what happens

when I’m trying to check log on coordinator web, it says

No log was found for this task. The task may not exist, or it may not have begun running yet.

BTW:

I had tested druid with tranquility on single node 2 weeks ago, tranquility works well and send all data to druid

now I’m deploying druid & tranquility on 3 node, the only difference is I set deep storage to noop

this is some of overlord service logging:

2016-10-21T07:05:05,381 INFO [rtr-pending-tasks-runner-0] io.druid.indexing.overlord.RemoteTaskRunner - Task index_kafka_druid-events-v2_ccb76cd9ba5f6b9_kindddfj switched from pending to running (on [hd-olap-02:8091])

2016-10-21T07:05:05,381 INFO [rtr-pending-tasks-runner-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_druid-events-v2_ccb76cd9ba5f6b9_kindddfj] status changed to [RUNNING].

2016-10-21T07:05:05,382 INFO [KafkaSupervisor-druid-events-v2-0] io.druid.indexing.overlord.RemoteTaskRunner - Sent shutdown message to worker: hd-olap-02:8091, status 200 OK, response: {“task”:“index_kafka_druid-events-v2_46229226ef7afab_jahkikib”}

2016-10-21T07:05:05,382 ERROR [KafkaSupervisor-druid-events-v2-0] io.druid.indexing.overlord.RemoteTaskRunner - Shutdown failed for index_kafka_druid-events-v2_46229226ef7afab_jahkikib! Are you sure the task was running?

2016-10-21T07:05:05,382 INFO [KafkaSupervisor-druid-events-v2-0] io.druid.indexing.overlord.TaskLockbox - Removing task[index_kafka_druid-events-v2_46229226ef7afab_jahkikib] from activeTasks

2016-10-21T07:05:05,384 INFO [KafkaSupervisor-druid-events-v2-0] io.druid.indexing.overlord.MetadataTaskStorage - Updating task index_kafka_druid-events-v2_46229226ef7afab_jahkikib to status: TaskStatus{id=index_kafka_druid-events-v2_46229226ef7afab_jahkikib, status=FAILED, duration=-1}

2016-10-21T07:05:05,385 INFO [KafkaSupervisor-druid-events-v2-0] io.druid.indexing.overlord.TaskQueue - Task done: KafkaIndexTask{id=index_kafka_druid-events-v2_46229226ef7afab_jahkikib, type=index_kafka, dataSource=druid-events-v2}

2016-10-21T07:05:05,385 INFO [TaskQueue-Manager] io.druid.indexing.overlord.TaskQueue - Asking taskRunner to clean up 1 tasks.

2016-10-21T07:05:05,387 INFO [TaskQueue-Manager] io.druid.indexing.overlord.RemoteTaskRunner - Sent shutdown message to worker: hd-olap-02:8091, status 200 OK, response: {“task”:“index_kafka_druid-events-v2_46229226ef7afab_jahkikib”}

2016-10-21T07:05:05,387 ERROR [TaskQueue-Manager] io.druid.indexing.overlord.RemoteTaskRunner - Shutdown failed for index_kafka_druid-events-v2_46229226ef7afab_jahkikib! Are you sure the task was running?

2016-10-21T07:05:05,387 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.RemoteTaskRunner - Worker[hd-olap-02:8091] wrote RUNNING status for task [index_kafka_druid-events-v2_ccb76cd9ba5f6b9_kindddfj] on [TaskLocation{host=‘null’, port=-1}]

2016-10-21T07:05:05,387 WARN [rtr-pending-tasks-runner-0] io.druid.indexing.overlord.RemoteTaskRunner - No worker selections strategy set. Using default.

2016-10-21T07:05:05,387 WARN [rtr-pending-tasks-runner-0] io.druid.indexing.overlord.RemoteTaskRunner - No worker selections strategy set. Using default.

2016-10-21T07:05:05,390 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.RemoteTaskRunner - Worker[hd-olap-02:8091] wrote FAILED status for task [index_kafka_druid-events-v2_46229226ef7afab_jahkikib] on [TaskLocation{host=‘hd-olap-02’, port=8102}]

2016-10-21T07:05:05,390 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.RemoteTaskRunner - Worker[hd-olap-02:8091] completed task[index_kafka_druid-events-v2_46229226ef7afab_jahkikib] with status[FAILED]

2016-10-21T07:05:05,390 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.TaskQueue - Received FAILED status for task: index_kafka_druid-events-v2_46229226ef7afab_jahkikib

2016-10-21T07:05:05,390 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.RemoteTaskRunner - Cleaning up task[index_kafka_druid-events-v2_46229226ef7afab_jahkikib] on worker[hd-olap-02:8091]

2016-10-21T07:05:05,389 WARN [KafkaSupervisor-druid-events-v2-0] io.druid.indexing.kafka.supervisor.KafkaSupervisor - Task [index_kafka_druid-events-v2_7b3acbabf02a985_lmjafgbd] failed to return start time, killing task

java.lang.RuntimeException: java.net.ConnectException: Connection refused

at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]

at io.druid.indexing.kafka.KafkaIndexTaskClient.submitRequest(KafkaIndexTaskClient.java:328) ~[druid-kafka-indexing-service-0.9.1.1.jar:0.9.1.1]

at io.druid.indexing.kafka.KafkaIndexTaskClient.submitRequest(KafkaIndexTaskClient.java:233) ~[druid-kafka-indexing-service-0.9.1.1.jar:0.9.1.1]

at io.druid.indexing.kafka.KafkaIndexTaskClient.getStartTime(KafkaIndexTaskClient.java:156) ~[druid-kafka-indexing-service-0.9.1.1.jar:0.9.1.1]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.getTaskStartTime(KafkaSupervisor.java:1218) ~[druid-kafka-indexing-service-0.9.1.1.jar:0.9.1.1]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.updateTaskStatus(KafkaSupervisor.java:708) [druid-kafka-indexing-service-0.9.1.1.jar:0.9.1.1]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor.runInternal(KafkaSupervisor.java:457) [druid-kafka-indexing-service-0.9.1.1.jar:0.9.1.1]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor$RunNotice.handle(KafkaSupervisor.java:409) [druid-kafka-indexing-service-0.9.1.1.jar:0.9.1.1]

at io.druid.indexing.kafka.supervisor.KafkaSupervisor$2.run(KafkaSupervisor.java:261) [druid-kafka-indexing-service-0.9.1.1.jar:0.9.1.1]

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_102]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_102]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_102]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_102]

at java.lang.Thread.run(Thread.java:745) [?:1.8.0_102]

Caused by: java.net.ConnectException: Connection refused

at java.net.PlainSocketImpl.socketConnect(Native Method) ~[?:1.8.0_102]

at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) ~[?:1.8.0_102]

at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) ~[?:1.8.0_102]

at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) ~[?:1.8.0_102]

at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.8.0_102]

at java.net.Socket.connect(Socket.java:589) ~[?:1.8.0_102]

at java.net.Socket.connect(Socket.java:538) ~[?:1.8.0_102]

at java.net.Socket.(Socket.java:434) ~[?:1.8.0_102]

at java.net.Socket.(Socket.java:211) ~[?:1.8.0_102]

at io.druid.indexing.kafka.KafkaIndexTaskClient.checkConnection(KafkaIndexTaskClient.java:228) ~[?:?]

at io.druid.indexing.kafka.KafkaIndexTaskClient.submitRequest(KafkaIndexTaskClient.java:264) ~[?:?]

… 12 more

2016-10-21T07:05:05,394 WARN [Curator-PathChildrenCache-0] io.druid.indexing.overlord.TaskQueue - Unknown task completed: index_kafka_druid-events-v2_46229226ef7afab_jahkikib

2016-10-21T07:05:05,394 INFO [Curator-PathChildrenCache-0] com.metamx.emitter.core.LoggingEmitter - Event [{“feed”:“metrics”,“timestamp”:“2016-10-21T07:05:05.394Z”,“service”:“druid/overlord”,“host”:“hd-olap-01:8090”,“metric”:“task/run/time”,“value”:38,“dataSource”:“druid-events-v2”,“taskStatus”:“FAILED”,“taskType”:“index_kafka”}]

2016-10-21T07:05:05,394 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.TaskQueue - Task FAILED: KafkaIndexTask{id=index_kafka_druid-events-v2_46229226ef7afab_jahkikib, type=index_kafka, dataSource=druid-events-v2} (38 run duration)

2016-10-21T07:05:05,394 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_druid-events-v2_46229226ef7afab_jahkikib] status changed to [FAILED].

2016-10-21T07:05:05,394 WARN [rtr-pending-tasks-runner-0] io.druid.indexing.overlord.RemoteTaskRunner - No worker selections strategy set. Using default.

2016-10-21T07:05:05,395 INFO [rtr-pending-tasks-runner-0] io.druid.indexing.overlord.RemoteTaskRunner - Coordinator asking Worker[hd-olap-02:8091] to add task[index_kafka_druid-events-v2_dc9f6e34e7303c6_klchladl]

2016-10-21T07:05:05,396 INFO [KafkaSupervisor-druid-events-v2-0] io.druid.indexing.overlord.RemoteTaskRunner - Sent shutdown message to worker: hd-olap-02:8091, status 200 OK, response: {“task”:“index_kafka_druid-events-v2_7b3acbabf02a985_lmjafgbd”}

2016-10-21T07:05:05,396 ERROR [KafkaSupervisor-druid-events-v2-0] io.druid.indexing.overlord.RemoteTaskRunner - Shutdown failed for index_kafka_druid-events-v2_7b3acbabf02a985_lmjafgbd! Are you sure the task was running?

2016-10-21T07:05:05,396 INFO [KafkaSupervisor-druid-events-v2-0] io.druid.indexing.overlord.TaskLockbox - Removing task[index_kafka_druid-events-v2_7b3acbabf02a985_lmjafgbd] from activeTasks

2016-10-21T07:05:05,398 INFO [KafkaSupervisor-druid-events-v2-0] io.druid.indexing.overlord.MetadataTaskStorage - Updating task index_kafka_druid-events-v2_7b3acbabf02a985_lmjafgbd to status: TaskStatus{id=index_kafka_druid-events-v2_7b3acbabf02a985_lmjafgbd, status=FAILED, duration=-1}

2016-10-21T07:05:05,400 INFO [KafkaSupervisor-druid-events-v2-0] io.druid.indexing.overlord.TaskQueue - Task done: KafkaIndexTask{id=index_kafka_druid-events-v2_7b3acbabf02a985_lmjafgbd, type=index_kafka, dataSource=druid-events-v2}

2016-10-21T07:05:05,400 INFO [TaskQueue-Manager] io.druid.indexing.overlord.TaskQueue - Asking taskRunner to clean up 1 tasks.

2016-10-21T07:05:05,401 INFO [TaskQueue-Manager] io.druid.indexing.overlord.RemoteTaskRunner - Sent shutdown message to worker: hd-olap-02:8091, status 200 OK, response: {“task”:“index_kafka_druid-events-v2_7b3acbabf02a985_lmjafgbd”}

2016-10-21T07:05:05,401 ERROR [TaskQueue-Manager] io.druid.indexing.overlord.RemoteTaskRunner - Shutdown failed for index_kafka_druid-events-v2_7b3acbabf02a985_lmjafgbd! Are you sure the task was running?

2016-10-21T07:05:05,401 INFO [rtr-pending-tasks-runner-0] io.druid.indexing.overlord.RemoteTaskRunner - Task index_kafka_druid-events-v2_dc9f6e34e7303c6_klchladl switched from pending to running (on [hd-olap-02:8091])

2016-10-21T07:05:05,401 INFO [rtr-pending-tasks-runner-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_druid-events-v2_dc9f6e34e7303c6_klchladl] status changed to [RUNNING].

2016-10-21T07:05:05,401 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.RemoteTaskRunner - Task[index_kafka_druid-events-v2_46229226ef7afab_jahkikib] went bye bye.

2016-10-21T07:05:05,402 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.RemoteTaskRunner - Worker[hd-olap-02:8091] wrote RUNNING status for task [index_kafka_druid-events-v2_ccb76cd9ba5f6b9_kindddfj] on [TaskLocation{host=‘hd-olap-02’, port=8100}]

2016-10-21T07:05:05,402 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_druid-events-v2_ccb76cd9ba5f6b9_kindddfj] location changed to [TaskLocation{host=‘hd-olap-02’, port=8100}].

2016-10-21T07:05:05,404 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.RemoteTaskRunner - Worker[hd-olap-02:8091] wrote FAILED status for task [index_kafka_druid-events-v2_7b3acbabf02a985_lmjafgbd] on [TaskLocation{host=‘hd-olap-02’, port=8101}]

2016-10-21T07:05:05,404 INFO [Curator-PathChildrenCache-0] io.druid.indexing.overlord.RemoteTaskRunner - Worker[hd-olap-02:8091] completed task[index_kafka_druid-events-v2_7b3acbabf02a985_lmjafgbd] with status[FAILED]

which document should I read if I want to know how every piece of data go between overlord / coordinator / broker / middleManager ?

thanks David

Hi Luan,

Ah, can you try running 0.9.2-rc1 and let me know if the Kafka indexing service works for you? The issue that I think you’re having should be fixed in 0.9.2. You can download the RC here: http://druid.io/downloads.html

As for documentation, the pages under the ‘Design’ category would be a good place to start. Check out http://druid.io/docs/0.9.1.1/design/design.html and http://druid.io/docs/0.9.1.1/design/indexing-service.html.

Kafka indexing service in 0.9.2-rc1 works for me
thanks so much