Tranquility dropping all the data before sending it to Druid

Hello folks:

I need some direction on where to start troubleshooting this issue. This issue keeps coming back and I have no idea why. This is what I am seeing on tranq when my producer starts pushing data to Kafka server: (please note that tranquility is located on the same server as kafka, zookeeper, postgresql, overlord and middlemanager):

#sudo systemctl -l status tranq-broadsoft

May 30 05:45:19 lxlyxf1001.XXXXX.com tranq-broadsoft.sh[9360]: 2018-05-29 19:45:19,783 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {broadsoft={receivedCount=0, sentCount=0, droppedCount=0, unparseableCount=0}} pending messages in 0ms and committed offsets in 0ms.

May 30 05:45:34 lxlyxf1001.XXXXX.com tranq-broadsoft.sh[9360]: 2018-05-29 19:45:34,784 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {broadsoft={receivedCount=0, sentCount=0, droppedCount=0, unparseableCount=0}} pending messages in 0ms and committed offsets in 0ms.

May 30 05:45:49 lxlyxf1001.XXXXX.com tranq-broadsoft.sh[9360]: 2018-05-29 19:45:49,785 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {broadsoft={receivedCount=0, sentCount=0, droppedCount=0, unparseableCount=0}} pending messages in 0ms and committed offsets in 0ms.

May 30 05:46:04 lxlyxf1001.XXXXX.com tranq-broadsoft.sh[9360]: 2018-05-29 19:46:04,786 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {broadsoft={receivedCount=0, sentCount=0, droppedCount=0, unparseableCount=0}} pending messages in 0ms and committed offsets in 0ms.

May 30 05:46:19 lxlyxf1001.XXXXXX.com tranq-broadsoft.sh[9360]: 2018-05-29 19:46:19,787 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {broadsoft={receivedCount=0, sentCount=0, droppedCount=0, unparseableCount=0}} pending messages in 0ms and committed offsets in 0ms.

May 30 05:46:34 lxlyxf1001.XXXXX.com tranq-broadsoft.sh[9360]: 2018-05-29 19:46:34,788 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {broadsoft={receivedCount=0, sentCount=0, droppedCount=0, unparseableCount=0}} pending messages in 0ms and committed offsets in 0ms.

May 30 05:46:49 lxlyxf1001.XXXXXX.com tranq-broadsoft.sh[9360]: 2018-05-29 19:46:49,791 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {broadsoft={receivedCount=0, sentCount=0, droppedCount=0, unparseableCount=0}} pending messages in 1ms and committed offsets in 0ms.

May 30 05:47:04 lxlyxf1001.XXXXXX.com tranq-broadsoft.sh[9360]: 2018-05-29 19:47:04,792 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {broadsoft={receivedCount=0, sentCount=0, droppedCount=0, unparseableCount=0}} pending messages in 0ms and committed offsets in 1ms.

May 30 05:47:19 lxlyxf1001.XXXXXX.com tranq-broadsoft.sh[9360]: 2018-05-29 19:47:19,795 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {broadsoft={receivedCount=60, sentCount=0, droppedCount=60, unparseableCount=0}} pending messages in 0ms and committed offsets in 3ms.

May 30 05:47:34 lxlyxf1001.XXXXXX.com tranq-broadsoft.sh[9360]: 2018-05-29 19:47:34,798 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {broadsoft={receivedCount=806, sentCount=0, droppedCount=806, unparseableCount=0}} pending messages in 0ms and committed offsets in 2ms.

My producer has always been on UTC timezone. Tranq server is on AEST (+10). No tasks are assigned to the overlord.

This is the part of tranq (0.8.2) spec file that I believe is relevant:

"ioConfig": {

"type": "realtime"

},

"tuningConfig": {

"type": “realtime”,

"maxRowsInMemory": “75000”,

"intermediatePersistPeriod": “PT10M”,

"windowPeriod": "PT2H"

}

},

"properties": {

"task.partitions": “1”,

"task.replicants": “1”,

"topicPattern": "broadsoft"

}

}

},

"properties": {

"zookeeper.connect": “10.98.54.122”,

"druid.discovery.curator.path": “/druid/discovery”,

"druid.selectors.indexing.serviceName": “druid/overlord”,

"commit.periodMillis": “15000”,

"consumer.numThreads": “2”,

"kafka.zookeeper.connect": “10.98.54.122”,

"kafka.group.id": “tranquility-kafka”,

"reportDropsAsExceptions": "false"

}

}

The issue comes and goes. It happened twice so far and I think it may get triggered when we try to send queries to Druid for visualization too frequently.

Any help is much appreciated.

Thanks

I think my overlord is functional but no tasks are assigned to it from tranq. Unfortunately, my WEB UI access to the server that hosts services (overlord, middlemanager, zookeeper, postgresql, kafka server) is blocked; I can only access CLI via SSH, and I do know how to manage the tasks (list, cancel etc.) via commands.

Hello folks,

I ended up removing /tranquility/beams/ from my zookeeper and restart all the relevant Druid and kafka services on my server. When I restarted tranquility as a service, I noticed that it was stopping constantly. When I checked the logs, I saw that tranq was trying to generate tasks with preexisting names on overlord, so I changed my tranq spec file and added this new attribute under properties: druidBeam.randomizeTaskId

I set it to be “true”. That fixed my issue.

However, now I found a way to replicate my problem consistently. Every time, I try to query large amounts of data on broker, my system breaks.My tranq starts dropping all incoming new messages and I can no longer query my real-time data (24 hours). I can query my historical data fine. To fix I need to start over.

Where can I start to find the root cause for this?

Thanks,