Tranquility Kafka issues in Realtime Ingestion

I’m trying to setup production cluster setup. I’m getting the following error:

Dev                       Instance ID       Machine     Private IP
Master Servers       i-0cd6fed0db062e59f   m4.large    10.2.1.98     # Overlord, Coordinator
Data Servers         i-062654d423a3acc1f   r3.large    10.2.1.229    # Middle Managers, Historical
Broker Servers       i-0a98a683398eceffd   m4.large    10.2.1.42     # Broker Server
Capeve Server        i-083eb4965531ab5df   m4.large    10.2.1.239    # Running Zookeeper, Kafka, Node JS App to feed data to kafka

What might be the cause of the following issue?

2017-01-09 00:25:17,004 [Curator-Framework-0-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x0

2017-01-09 00:25:17,004 [Curator-Framework-0-SendThread(10.2.1.239:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.2.1.239/10.2.1.239:2181. Will not attempt to authenticate using SASL (unknown error)

2017-01-09 00:25:17,004 [Curator-Framework-0] WARN o.a.c.retry.ExponentialBackoffRetry - Sleep extension too large (1287000). Pinning to 30000

2017-01-09 00:25:17,005 [Curator-Framework-0] WARN o.a.c.retry.ExponentialBackoffRetry - Sleep extension too large (289000). Pinning to 30000

2017-01-09 00:25:17,005 [Curator-Framework-0-SendThread(10.2.1.239:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to 10.2.1.239/10.2.1.239:2181, initiating session

2017-01-09 00:25:22,187 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {vnk-clst={receivedCount=0, sentCount=0, droppedCount=0, unparseableCount=0}} pending messages in 0ms and committed offsets in 0ms.

2017-01-09 00:25:33,415 [main-SendThread(10.2.1.239:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 20015ms for sessionid 0x1597c96f6cd0dfa

2017-01-09 00:25:33,415 [main-SendThread(10.2.1.239:2181)] INFO org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 20015ms for sessionid 0x1597c96f6cd0dfa, closing socket connection and attempting reconnect

2017-01-09 00:25:34,534 [main-SendThread(10.2.1.239:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.2.1.239/10.2.1.239:2181. Will not attempt to authenticate using SASL (unknown error)

2017-01-09 00:25:34,534 [main-SendThread(10.2.1.239:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to 10.2.1.239/10.2.1.239:2181, initiating session

2017-01-09 00:25:37,187 [KafkaConsumer-CommitThread] INFO c.m.tranquility.kafka.KafkaConsumer - Flushed {vnk-clst={receivedCount=0, sentCount=0, droppedCount=0, unparseableCount=0}} pending messages in 0ms and committed offsets in 0ms.

2017-01-09 00:25:47,004 [Curator-Framework-0] ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (10.2.1.239) and timeout (20000) / elapsed (31099)

2017-01-09 00:25:47,004 [Curator-Framework-0] WARN o.a.c.retry.ExponentialBackoffRetry - Sleep extension too large (4035000). Pinning to 30000

2017-01-09 00:25:47,005 [Curator-Framework-0] ERROR org.apache.curator.ConnectionState - Connection timed out for connection string (10.2.1.239) and timeout (20000) / elapsed (31100)

2017-01-09 00:25:47,006 [Curator-Framework-0] WARN o.a.c.retry.ExponentialBackoffRetry - Sleep extension too large (1803000). Pinning to 30000

2017-01-09 00:25:47,619 [main-SendThread(10.2.1.239:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x1597c96f6cd0dfa for server 10.2.1.239/10.2.1.239:2181, unexpected error, closing socket connection and attempting reconnect

2017-01-09 00:25:49,279 [main-SendThread(10.2.1.239:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.2.1.239/10.2.1.239:2181. Will not attempt to authenticate using SASL (unknown error)

2017-01-09 00:25:49,280 [main-SendThread(10.2.1.239:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to 10.2.1.239/10.2.1.239:2181, initiating session

2017-01-09 00:25:49,280 [main-SendThread(10.2.1.239:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x1597c96f6cd0dfa has expired

2017-01-09 00:25:49,280 [main-EventThread] INFO org.I0Itec.zkclient.ZkClient - zookeeper state changed (Expired)

2017-01-09 00:25:49,280 [main-SendThread(10.2.1.239:2181)] INFO org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x1597c96f6cd0dfa has expired, closing socket connection

2017-01-09 00:25:49,280 [main-EventThread] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=10.2.1.239 sessionTimeout=20000 watcher=org.I0Itec.zkclient.ZkClient@7164ca4c

2017-01-09 00:25:49,281 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x1597c96f6cd0dfa

2017-01-09 00:25:49,283 [ZkClient-EventThread-12-10.2.1.239] INFO k.c.ZookeeperConsumerConnector - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c], ZK expired; release old broker parition ownership; re-register consumer tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c

2017-01-09 00:25:49,283 [ZkClient-EventThread-12-10.2.1.239] INFO k.c.ZookeeperConsumerConnector - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c], begin registering consumer tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c in ZK

2017-01-09 00:25:49,283 [main-SendThread(10.2.1.239:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.2.1.239/10.2.1.239:2181. Will not attempt to authenticate using SASL (unknown error)

2017-01-09 00:25:49,283 [main-SendThread(10.2.1.239:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to 10.2.1.239/10.2.1.239:2181, initiating session

2017-01-09 00:25:49,290 [main-SendThread(10.2.1.239:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server 10.2.1.239/10.2.1.239:2181, sessionid = 0x1597c96f6cd0e22, negotiated timeout = 20000

2017-01-09 00:25:49,291 [main-EventThread] INFO org.I0Itec.zkclient.ZkClient - zookeeper state changed (SyncConnected)

2017-01-09 00:25:49,292 [ZkClient-EventThread-12-10.2.1.239] INFO k.c.ZookeeperConsumerConnector - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c], end registering consumer tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c in ZK

2017-01-09 00:25:49,292 [ZkClient-EventThread-12-10.2.1.239] INFO k.c.ZookeeperConsumerConnector - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c], begin rebalancing consumer tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c try #0

2017-01-09 00:25:49,292 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] INFO kafka.utils.VerifiableProperties - Verifying properties

2017-01-09 00:25:49,292 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] INFO kafka.utils.VerifiableProperties - Property client.id is overridden to tranquility-kafka

2017-01-09 00:25:49,292 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] INFO kafka.utils.VerifiableProperties - Property metadata.broker.list is overridden to

2017-01-09 00:25:49,292 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] INFO kafka.utils.VerifiableProperties - Property request.timeout.ms is overridden to 30000

2017-01-09 00:25:49,292 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] WARN k.c.ConsumerFetcherManager$LeaderFinderThread - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread], Failed to find leader for Set([vnk-clst,0])

2017-01-09 00:25:49,293 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] INFO k.consumer.ConsumerFetcherManager - [ConsumerFetcherManager-1483901646202] Added fetcher for partitions ArrayBuffer()

2017-01-09 00:25:49,293 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] INFO kafka.consumer.ConsumerFetcherThread - [ConsumerFetcherThread-tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-0-0], Shutting down

2017-01-09 00:25:49,293 [ConsumerFetcherThread-tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-0-0] INFO kafka.consumer.ConsumerFetcherThread - [ConsumerFetcherThread-tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-0-0], Stopped

2017-01-09 00:25:49,293 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] INFO kafka.consumer.ConsumerFetcherThread - [ConsumerFetcherThread-tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-0-0], Shutdown completed

2017-01-09 00:25:49,295 [ZkClient-EventThread-12-10.2.1.239] WARN k.c.ZookeeperConsumerConnector - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c], no brokers found when trying to rebalance.

2017-01-09 00:25:49,295 [ZkClient-EventThread-12-10.2.1.239] INFO k.c.ZookeeperConsumerConnector - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c], end rebalancing consumer tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c try #0

2017-01-09 00:25:49,295 [ZkClient-EventThread-12-10.2.1.239] INFO k.c.ZookeeperTopicEventWatcher - ZK expired: resubscribing topic event listener to topic registry

2017-01-09 00:25:49,296 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c_watcher_executor] INFO k.c.ZookeeperConsumerConnector - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c], begin rebalancing consumer tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c try #0

2017-01-09 00:25:49,298 [ZkClient-EventThread-12-10.2.1.239] INFO k.c.ZookeeperConsumerConnector - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c], Topics to consume = List(vnk-clst)

2017-01-09 00:25:49,298 [ZkClient-EventThread-12-10.2.1.239] INFO k.c.ZookeeperConsumerConnector - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c], Topic info for path /brokers/topics/vnk-clst changed to {“version”:1,“partitions”:{“0”:[0]}}, triggering rebalance

2017-01-09 00:25:49,300 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c_watcher_executor] WARN k.c.ZookeeperConsumerConnector - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c], no brokers found when trying to rebalance.

2017-01-09 00:25:49,301 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c_watcher_executor] INFO k.c.ZookeeperConsumerConnector - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c], end rebalancing consumer tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c try #0

2017-01-09 00:25:49,301 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c_watcher_executor] INFO k.c.ZookeeperConsumerConnector - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c], begin rebalancing consumer tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c try #0

2017-01-09 00:25:49,303 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c_watcher_executor] WARN k.c.ZookeeperConsumerConnector - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c], no brokers found when trying to rebalance.

2017-01-09 00:25:49,303 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c_watcher_executor] INFO k.c.ZookeeperConsumerConnector - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c], end rebalancing consumer tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c try #0

2017-01-09 00:25:49,493 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] INFO kafka.utils.VerifiableProperties - Verifying properties

2017-01-09 00:25:49,494 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] INFO kafka.utils.VerifiableProperties - Property client.id is overridden to tranquility-kafka

2017-01-09 00:25:49,494 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] INFO kafka.utils.VerifiableProperties - Property metadata.broker.list is overridden to

2017-01-09 00:25:49,494 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] INFO kafka.utils.VerifiableProperties - Property request.timeout.ms is overridden to 30000

2017-01-09 00:25:49,494 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] WARN k.c.ConsumerFetcherManager$LeaderFinderThread - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread], Failed to find leader for Set([vnk-clst,0])

2017-01-09 00:25:49,494 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] INFO k.consumer.ConsumerFetcherManager - [ConsumerFetcherManager-1483901646202] Added fetcher for partitions ArrayBuffer()

2017-01-09 00:25:49,694 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] INFO kafka.utils.VerifiableProperties - Verifying properties

2017-01-09 00:25:49,694 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] INFO kafka.utils.VerifiableProperties - Property client.id is overridden to tranquility-kafka

2017-01-09 00:25:49,694 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] INFO kafka.utils.VerifiableProperties - Property metadata.broker.list is overridden to

2017-01-09 00:25:49,694 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] INFO kafka.utils.VerifiableProperties - Property request.timeout.ms is overridden to 30000

2017-01-09 00:25:49,695 [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread] WARN k.c.ConsumerFetcherManager$LeaderFinderThread - [tranquility-kafka_ip-10-2-1-229-1483901645795-14e14e6c-leader-finder-thread], Failed to find leader for Set([vnk-clst,0])

``

And when I restarted the data servers, I lost around 3.2 M events. The log says waiting to handoff the segments. You can see the logs here: http://52.77.7.243:8090/
Fetching the logs from S3 is not working.

Historical Node Log:

2017-01-09T01:52:47,832 ERROR [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[vnk-clst_2017-01-07T11:00:00.000Z_2017-01-07T12:00:00.000Z_2017-01-07T11:00:19.097Z], segment=DataSegment{size=155738, shardSpec=LinearShardSpec{partitionNum=0}, metrics=[count, session_count, hyper_user_id], dimensions=[Brand, advertiser_id, carrier, city, country, device_brand, device_id, device_manufacturer, device_model, email, event_id, event_source, event_type, gps_enabled, isCouponAvailable, language, lat, like_status, line_items, lng, order_number, os_name, os_version, payment_method, payment_switched_to, platform, product_category, product_id, product_images, product_size, profile_menu, refund_through, region, search_term, seller, seller_rating, sequence_number, session_id, tinder_action, utm_campaign, utm_medium, utm_source, vendor_order_id, version_name], version=‘2017-01-07T11:00:19.097Z’, loadSpec={type=s3_zip, bucket=common-druid, key=common-druid/segments/vnk-clst/2017-01-07T11:00:00.000Z_2017-01-07T12:00:00.000Z/2017-01-07T11:00:19.097Z/0/index.zip}, interval=2017-01-07T11:00:00.000Z/2017-01-07T12:00:00.000Z, dataSource=‘vnk-clst’, binaryVersion=‘9’}}

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[vnk-clst_2017-01-07T11:00:00.000Z_2017-01-07T12:00:00.000Z_2017-01-07T11:00:19.097Z]

    at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:310) ~[druid-server-0.9.2.jar:0.9.2]

    at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:351) [druid-server-0.9.2.jar:0.9.2]

    at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:44) [druid-server-0.9.2.jar:0.9.2]

    at io.druid.server.coordination.ZkCoordinator$1.childEvent(ZkCoordinator.java:153) [druid-server-0.9.2.jar:0.9.2]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:522) [curator-recipes-2.11.0.jar:?]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.11.0.jar:?]

    at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:93) [curator-framework-2.11.0.jar:?]

    at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]

    at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:84) [curator-framework-2.11.0.jar:?]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:513) [curator-recipes-2.11.0.jar:?]

    at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.11.0.jar:?]

    at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:773) [curator-recipes-2.11.0.jar:?]

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_111]

    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_111]

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_111]

    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_111]

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]

    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]

Caused by: io.druid.segment.loading.SegmentLoadingException: var/druid/segment-cache/vnk-clst/2017-01-07T11:00:00.000Z_2017-01-07T12:00:00.000Z/2017-01-07T11:00:19.097Z/0/index.drd (No such file or directory)

    at io.druid.segment.loading.MMappedQueryableIndexFactory.factorize(MMappedQueryableIndexFactory.java:52) ~[druid-server-0.9.2.jar:0.9.2]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:96) ~[druid-server-0.9.2.jar:0.9.2]

    at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.2.jar:0.9.2]

    at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:306) ~[druid-server-0.9.2.jar:0.9.2]

    ... 18 more

Caused by: java.io.FileNotFoundException: var/druid/segment-cache/vnk-clst/2017-01-07T11:00:00.000Z_2017-01-07T12:00:00.000Z/2017-01-07T11:00:19.097Z/0/index.drd (No such file or directory)

    at java.io.FileInputStream.open0(Native Method) ~[?:1.8.0_111]

    at java.io.FileInputStream.open(FileInputStream.java:195) ~[?:1.8.0_111]

    at java.io.FileInputStream.<init>(FileInputStream.java:138) ~[?:1.8.0_111]

    at io.druid.segment.SegmentUtils.getVersionFromDir(SegmentUtils.java:43) ~[druid-api-0.9.2.jar:0.9.2]

    at io.druid.segment.IndexIO.loadIndex(IndexIO.java:217) ~[druid-processing-0.9.2.jar:0.9.2]

    at io.druid.segment.loading.MMappedQueryableIndexFactory.factorize(MMappedQueryableIndexFactory.java:49) ~[druid-server-0.9.2.jar:0.9.2]

    at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:96) ~[druid-server-0.9.2.jar:0.9.2]

    at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:152) ~[druid-server-0.9.2.jar:0.9.2]

    at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:306) ~[druid-server-0.9.2.jar:0.9.2]

    ... 18 more

2017-01-09T01:52:47,832 INFO [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - zNode[/druid/loadQueue/10.2.1.229:8083/vnk-clst_2017-01-07T11:00:00.000Z_2017-01-07T12:00:00.000Z_2017-01-07T11:00:19.097Z] was removed

2017-01-09T01:52:47,851 INFO [ZkCoordinator-Exec–0] io.druid.server.coordination.ServerManager - Told to delete a queryable for a dataSource[vnk-clst] that doesn’t exist.

2017-01-09T01:52:47,852 WARN [ZkCoordinator-Exec–0] io.druid.server.coordination.ZkCoordinator - Unable to delete segmentInfoCacheFile[var/druid/segment-cache/info_dir/vnk-clst_2017-01-07T18:00:00.000Z_2017-01-07T19:00:00.000Z_2017-01-07T18:59:35.774Z]

2017-01-09T01:52:47,887 INFO [ZkCoordinator-Exec–0] io.druid.server.coordination.ServerManager - Told to delete a queryable for a dataSource[vnk-clst] that doesn’t exist.

2017-01-09T01:52:47,887 WARN [ZkCoordinator-Exec–0] io.druid.server.coordination.ZkCoordinator - Unable to delete segmentInfoCacheFile[var/druid/segment-cache/info_dir/vnk-clst_2017-01-07T11:00:00.000Z_2017-01-07T12:00:00.000Z_2017-01-07T11:00:19.097Z]

``

middleManager Log stopped here:

2017-01-09T00:25:07,147 INFO [Curator-Framework-0-SendThread(10.2.1.239:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to 10.2.1.239/10.2.1.239:2181, initiating session

2017-01-09T00:25:20,227 WARN [Curator-Framework-0-SendThread(10.2.1.239:2181)] org.apache.zookeeper.ClientCnxn - Session 0x0 for server 10.2.1.239/10.2.1.239:2181, unexpected error, closing socket connection and attempting reconnect

java.io.IOException: Connection reset by peer

    at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_111]

    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_111]

    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_111]

    at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_111]

    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_111]

    at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) ~[zookeeper-3.4.9.jar:3.4.9-1757313]

    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[zookeeper-3.4.9.jar:3.4.9-1757313]

    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [zookeeper-3.4.9.jar:3.4.9-1757313]

2017-01-09T00:25:21,839 INFO [Curator-Framework-0-SendThread(10.2.1.239:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.2.1.239/10.2.1.239:2181. Will not attempt to authenticate using SASL (unknown error)

2017-01-09T00:25:21,840 INFO [Curator-Framework-0-SendThread(10.2.1.239:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to 10.2.1.239/10.2.1.239:2181, initiating session

2017-01-09T00:25:34,931 WARN [Curator-Framework-0-SendThread(10.2.1.239:2181)] org.apache.zookeeper.ClientCnxn - Session 0x0 for server 10.2.1.239/10.2.1.239:2181, unexpected error, closing socket connection and attempting reconnect

java.io.IOException: Connection reset by peer

    at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_111]

    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_111]

    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_111]

    at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_111]

    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_111]

    at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) ~[zookeeper-3.4.9.jar:3.4.9-1757313]

    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[zookeeper-3.4.9.jar:3.4.9-1757313]

    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [zookeeper-3.4.9.jar:3.4.9-1757313]

2017-01-09T00:25:36,465 INFO [Curator-Framework-0-SendThread(10.2.1.239:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.2.1.239/10.2.1.239:2181. Will not attempt to authenticate using SASL (unknown error)

2017-01-09T00:25:36,465 INFO [Curator-Framework-0-SendThread(10.2.1.239:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to 10.2.1.239/10.2.1.239:2181, initiating session

2017-01-09T00:25:49,555 WARN [Curator-Framework-0-SendThread(10.2.1.239:2181)] org.apache.zookeeper.ClientCnxn - Session 0x0 for server 10.2.1.239/10.2.1.239:2181, unexpected error, closing socket connection and attempting reconnect

java.io.IOException: Connection reset by peer

    at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_111]

    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_111]

    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_111]

    at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_111]

    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[?:1.8.0_111]

    at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) ~[zookeeper-3.4.9.jar:3.4.9-1757313]

    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[zookeeper-3.4.9.jar:3.4.9-1757313]

    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141) [zookeeper-3.4.9.jar:3.4.9-1757313]

2017-01-09T00:25:51,428 INFO [Curator-Framework-0-SendThread(10.2.1.239:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server 10.2.1.239/10.2.1.239:2181. Will not attempt to authenticate using SASL (unknown error)

2017-01-09T00:25:51,429 INFO [Curator-Framework-0-SendThread(10.2.1.239:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to 10.2.1.239/10.2.1.239:2181, initiating session

2017-01-09T00:25:51,430 INFO [Curator-Framework-0-SendThread(10.2.1.239:2181)] org.apache.zookeeper.ClientCnxn - Session establishment complete on server 10.2.1.239/10.2.1.239:2181, sessionid = 0x1597c96f6cd0e25, negotiated timeout = 30000

2017-01-09T00:25:51,430 INFO [Curator-Framework-0-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: RECONNECTED

2017-01-09T00:25:51,432 INFO [Curator-Framework-0-EventThread] io.druid.curator.announcement.Announcer - Node[/druid/indexer/announcements/10.2.1.229:8091] dropped, reinstating.

``

Error from overlord when request a log file:

17-01-09T01:55:45,718 INFO [qtp1595472338-58] com.metamx.http.client.pool.ChannelResourceFactory - Generating: http://10.2.1.229:8091

2017-01-09T01:55:56,661 INFO [TaskQueue-StorageSync] io.druid.indexing.overlord.TaskQueue - Synced 12 tasks from storage (0 tasks added, 0 tasks removed).

2017-01-09T01:56:39,572 WARN [qtp1595472338-58] io.druid.indexing.overlord.http.OverlordResource - Failed to stream log for task index_realtime_vnk-clst_2017-01-08T15:00:00.000Z_0_0

java.io.IOException: Failed to stream logs from: common-druid/indexing-logs/index_realtime_vnk-clst_2017-01-08T15:00:00.000Z_0_0/log

    at io.druid.storage.s3.S3TaskLogs.streamTaskLog(S3TaskLogs.java:105) ~[?:?]

    at io.druid.indexing.common.tasklogs.SwitchingTaskLogStreamer.streamTaskLog(SwitchingTaskLogStreamer.java:48) ~[druid-indexing-service-0.9.2.jar:0.9.2]

    at io.druid.indexing.overlord.http.OverlordResource.doGetLog(OverlordResource.java:592) [druid-indexing-service-0.9.2.jar:0.9.2]

    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_111]

    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_111]

    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_111]

    at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_111]

    at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409) [jersey-server-1.19.jar:1.19]

    at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409) [jersey-servlet-1.19.jar:1.19]

    at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558) [jersey-servlet-1.19.jar:1.19]

    at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733) [jersey-servlet-1.19.jar:1.19]

    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) [javax.servlet-api-3.1.0.jar:3.1.0]

    at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:286) [guice-servlet-4.1.0.jar:?]

    at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:276) [guice-servlet-4.1.0.jar:?]

    at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:181) [guice-servlet-4.1.0.jar:?]

    at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) [guice-servlet-4.1.0.jar:?]

    at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85) [guice-servlet-4.1.0.jar:?]

    at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120) [guice-servlet-4.1.0.jar:?]

    at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:135) [guice-servlet-4.1.0.jar:?]

    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [jetty-servlet-9.2.5.v20141112.jar:9.2.5.v20141112]

    at io.druid.server.http.RedirectFilter.doFilter(RedirectFilter.java:71) [druid-server-0.9.2.jar:0.9.2]

    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [jetty-servlet-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:83) [jetty-servlets-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:364) [jetty-servlets-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) [jetty-servlet-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) [jetty-servlet-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) [jetty-servlet-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.server.Server.handle(Server.java:497) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248) [jetty-server-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) [jetty-io-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:620) [jetty-util-9.2.5.v20141112.jar:9.2.5.v20141112]

    at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:540) [jetty-util-9.2.5.v20141112.jar:9.2.5.v20141112]

    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]

Caused by: org.jets3t.service.ServiceException: Request Error.

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:426) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:279) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:1052) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2264) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2193) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:1120) ~[jets3t-0.9.4.jar:0.9.4]

    at io.druid.storage.s3.S3TaskLogs.streamTaskLog(S3TaskLogs.java:61) ~[?:?]

    ... 51 more

Caused by: org.jets3t.service.impl.rest.HttpException: 403 Forbidden

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:425) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:279) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestHead(RestStorageService.java:1052) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectImpl(RestStorageService.java:2264) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.impl.rest.httpclient.RestStorageService.getObjectDetailsImpl(RestStorageService.java:2193) ~[jets3t-0.9.4.jar:0.9.4]

    at org.jets3t.service.StorageService.getObjectDetails(StorageService.java:1120) ~[jets3t-0.9.4.jar:0.9.4]

    at io.druid.storage.s3.S3TaskLogs.streamTaskLog(S3TaskLogs.java:61) ~[?:?]

    ... 51 more

``

What might be the problem?

I’m trying to benchmark the real time ingestion. Please help me to resolve the issues?

Hi,
looks like an s3 config issue that is causing both the handoff and task logs to not show up. Make sure the s3 access keys you are setting have required access to the s3 buckets.

Caused by: org.jets3t.service.impl.rest.HttpException: 403 Forbidden

Hi Nishant,