Ingestion .tsv file in S3 into druid using indexing service

We are trying to ingest .tsv file in S3 into druid using indexing service. I have captured task log from middle manager

It seems Indexing has finished successfully as below :
dataSource=‘data_audience’, binaryVersion=‘9’}]}

2017-07-23T06:02:56,457 INFO [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Removing task directory: /tmp/persistent/task/index_data_audience_2017-07-22T23:47:27.419Z/work

2017-07-23T06:02:56,461 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

“id” : “index_data_audience_2017-07-22T23:47:27.419Z”,

“status” : “SUCCESS”,

“duration” : 12678

}

``

But ingesting data is not finished yet because of below error:

2017-07-23T06:05:18,016 INFO [ServerInventoryView-0] io.druid.client.BatchServerInventoryView - Server Disappeared[DruidServerMetadata{name=‘52.34.6.148:8092’, host=‘52.34.6.148:8092’, maxSize=0, tier=’_default_tier’, type=‘indexer-executor’, priority=‘0’}]

2017-07-23T06:05:18,016 WARN [main-SendThread(52.33.70.80:8020)] org.apache.zookeeper.ClientCnxn - Session 0x4521ac795dd1e4a for server 52.33.70.80/52.33.70.80:8020, unexpected error, closing socket connection and attempting reconnect

java.io.IOException: Xid out of order. Got Xid 2108 with err 0 expected Xid 2107 for a packet with details: clientPath:null serverPath:null finished:false header:: 2107,14 replyHeader:: 0,0,-4 request:: org.apache.zookeeper.MultiTransactionRecord@c08d7b1 response:: org.apache.zookeeper.MultiResponse@0

at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:798) ~[druid-services-0.8.0-selfcontained.jar:0.8.0]

at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94) ~[druid-services-0.8.0-selfcontained.jar:0.8.0]

at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[druid-services-0.8.0-selfcontained.jar:0.8.0]

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) [druid-services-0.8.0-selfcontained.jar:0.8.0]

2017-07-23T06:05:18,121 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: SUSPENDED

2017-07-23T06:05:18,583 INFO [main-SendThread(52.24.28.235:8020)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server 52.24.28.235/52.24.28.235:8020. Will not attempt to authenticate using SASL (unknown error)

2017-07-23T06:05:18,584 INFO [main-SendThread(52.24.28.235:8020)] org.apache.zookeeper.ClientCnxn - Socket connection established to 52.24.28.235/52.24.28.235:8020, initiating session

2017-07-23T06:05:18,586 WARN [main-SendThread(52.24.28.235:8020)] org.apache.zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable

2017-07-23T06:05:18,586 INFO [main-SendThread(52.24.28.235:8020)] org.apache.zookeeper.ClientCnxn - Session establishment complete on server 52.24.28.235/52.24.28.235:8020, sessionid = 0x4521ac795dd1e4a, negotiated timeout = 180000

``

Please attached fulll log file of task from middle manager.

I don’t know why I got Xid out of order exception and wondering if ingestion is done or not.

Anyone who can help to resolve this issue?

index_data_audience_2017-07-22T23%3A47%3A27.419Z.txt (79.7 KB)

Hi Eric,

In Druid, the successfully finished batch data ingestion means that Druid generated segments for all input data and published them successfully. Here, publishing segments involve pushing segments to deep storage and updating metadata.

So, if you saw the success message in the log, the ingestion task is finished successfully, meaning that the segments for the input data are successfully generated and published. You can see them in your deep storage, maybe s3.

The zookeeper error looks that the peon lost zookeeper connection and reconnected. This might happen because your zookeeper cluster is under high load. Please check your zookeeper cluster.

Thanks,

Jihoon

2017년 7월 23일 (일) 오후 3:38, Eric Kim ericygkim@gmail.com님이 작성:

Hi Son,

Thank you for your comments. Let me verify those data are ingested into deep storage.

One more issue on Middle manager is that it seems the Peon task is not finished. Seems it keeps trying to connect zookeeper Cluster. How a Peon task is supposed to be finished? i.e. Does the Peon task need to put a certain information to finish it?

{code}

2017-07-24T09:11:52,517 INFO [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Removing task directory: /tmp/persistent/task/index_data_audience_2017-07-23T06:58:16.344Z/work
2017-07-24T09:11:52,521 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_data_audience_2017-07-23T06:58:16.344Z",
  "status" : "SUCCESS",
  "duration" : 19323
}
2017-07-24T09:11:52,524 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.coordination.AbstractDataSegmentAnnouncer.stop()] on object[io.druid.server.coordination.BatchDataSegmentAnnouncer@28e325e4].
2017-07-24T09:11:52,524 INFO [main] io.druid.server.coordination.AbstractDataSegmentAnnouncer - Stopping class io.druid.server.coordination.BatchDataSegmentAnnouncer with config[io.druid.server.initialization.ZkPathsConfig@58d3f4be]
2017-07-24T09:11:52,525 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/52.34.6.148:8093]
2017-07-24T09:59:07,527 WARN [main-SendThread(52.32.158.115:8020)] org.apache.zookeeper.ClientCnxn - Session 0x55d73588a6d0001 for server 52.32.158.115/52.32.158.115:8020, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Xid out of order. Got Xid 1880 with err 0 expected Xid 1879 for a packet with details: clientPath:null serverPath:null finished:false header:: 1879,14  replyHeader:: 0,0,-4  request:: org.apache.zookeeper.MultiTransactionRecord@c08d7b2 response:: org.apache.zookeeper.MultiResponse@0
	at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:798) ~[druid-services-0.8.0-selfcontained.jar:0.8.0]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94) ~[druid-services-0.8.0-selfcontained.jar:0.8.0]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[druid-services-0.8.0-selfcontained.jar:0.8.0]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) [druid-services-0.8.0-selfcontained.jar:0.8.0]
2017-07-24T09:59:07,637 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
2017-07-24T09:59:08,003 INFO [main-SendThread(52.25.189.175:8020)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server 52.25.189.175/52.25.189.175:8020. Will not attempt to authenticate using SASL (unknown error)
2017-07-24T09:59:08,005 INFO [main-SendThread(52.25.189.175:8020)] org.apache.zookeeper.ClientCnxn - Socket connection established to 52.25.189.175/52.25.189.175:8020, initiating session
2017-07-24T09:59:08,009 WARN [main-SendThread(52.25.189.175:8020)] org.apache.zookeeper.ClientCnxnSocket - Connected to an old server; r-o mode will be unavailable
2017-07-24T09:59:08,009 INFO [main-SendThread(52.25.189.175:8020)] org.apache.zookeeper.ClientCnxn - Session establishment complete on server 52.25.189.175/52.25.189.175:8020, sessionid = 0x55d73588a6d0001, negotiated timeout = 60000
2017-07-24T09:59:08,010 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: RECONNECTED
2017-07-24T09:59:08,638 WARN [main-SendThread(52.25.189.175:8020)] org.apache.zookeeper.ClientCnxn - Session 0x55d73588a6d0001 for server 52.25.189.175/52.25.189.175:8020, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Xid out of order. Got Xid 2833 with err 0 expected Xid 2832 for a packet with details: clientPath:null serverPath:null finished:false header:: 2832,14  replyHeader:: 0,0,-4  request:: org.apache.zookeeper.MultiTransactionRecord@c08d7b2 response:: org.apache.zookeeper.MultiResponse@0
	at org.apache.zookeeper.ClientCnxn$SendThread.readResponse(ClientCnxn.java:798) ~[druid-services-0.8.0-selfcontained.jar:0.8.0]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:94) ~[druid-services-0.8.0-selfcontained.jar:0.8.0]
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) ~[druid-services-0.8.0-selfcontained.jar:0.8.0]
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) [druid-services-0.8.0-selfcontained.jar:0.8.0]
{code}

Hi Son,

For your better information, I found push segments in deep storage in our AWS S3 Bucket. Here is further information. When I send a query to get those segments data, Broker does not send query to one of history nodes. It looks like Broker does not know where is the ingested segments.(If I understand correctly)

It begs me a question like why middle manager needs to connect to Zookeeper again after pushing segments into deep storage?

I will appreciate if you help me more.

Eric,
the peon log you attached looks weird because the historical unannounced itself when the task is finished. Would you share your cluster configuration?

2017년 7월 24일 (월) 오후 10:54, Eric Kim eric.kim@streamlyzer.com님이 작성:

Hi Jihoon,

Thank you for your help.

The configuration file includes our server information in production and credential. Can I send you those file in your personal e-mail if you don’t mind?

Eric, sorry for late reply.
I don’t think it’s not a good idea.
Please check your cluster configurations first. Maybe this will be helpful for you.

http://druid.io/docs/latest/configuration/production-cluster.html

2017년 7월 26일 (수) 오전 12:22, Eric Kim eric.kim@streamlyzer.com님이 작성:

Hi Jihoon,

I have already looked over it.

Attached is historical, common properties, middle manager and overlord properties. Sorry for removing our server addresses and sql credentials.

I will appreciate if you help me more.

Eric Kim

Properties.zip (10.3 KB)

I have checked logs from Zookeeper side. I got below error

2017-07-30 08:17:12,468 [myid:5] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:8020:NIOServerCnxn@368] - caught end of stream exception

EndOfStreamException: Unable to read additional data from client sessionid 0x55d928c14710001, likely client has closed socket

at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:239)

at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:203)

at java.lang.Thread.run(Thread.java:745)

2017-07-30 08:17:12,470 [myid:5] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:8020:NIOServerCnxn@1044] - Closed socket connection for client /52.34.6.148:54852 which had sessionid 0x55d928c14710001

``

I have upgraded zookeeper version from 3.3.2 to 3.4.10. But I got same issue. Look like middle manager do not send additional data after finishing pushing segments into deepstorage.

Any help??

After upgrading all zookeepers of quorum to 3.4.10, this issue is gone.

Nice! Did you find what the root problem was?

2017년 7월 31일 (월) 오전 1:13, Eric Kim ericygkim@gmail.com님이 작성:

Yes, we did. There were two issues. One is Zookeeper old version issue as I addressed. Another issue we found is that coordinator was not working properly and restarted it. Thank you for your help.