Error on realtime ingestion using tranquility (Server module) and druid

Hi Guys,

I’m having trouble trying to ingest data realtime using tranquility and druid. I couldn’t get much information from the task log exception:

Basically this part (full task log enclosed):

2016-02-17T23:26:05,983 INFO [Thread-44] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.http.client.NettyHttpClient.stop()] on object[com.metamx.http.client.NettyHttpClient@35fe7ec2].

2016-02-17T23:26:05,983 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[RealtimeIndexTask{id=index_realtime_000C7221A34CA9DD8444662FFFB2EFD0_2016-02-17T00:00:00.000Z_0_0, type=index_realtime, dataSource=000C7221A34CA9DD8444662FFFB2EFD0}]

java.lang.RuntimeException: java.lang.InterruptedException

at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]

at io.druid.segment.realtime.firehose.EventReceiverFirehoseFactory$EventReceiverFirehose.hasMore(EventReceiverFirehoseFactory.java:209) ~[druid-server-0.8.3.jar:0.8.3]

at io.druid.segment.realtime.firehose.TimedShutoffFirehoseFactory$TimedShutoffFirehose.hasMore(TimedShutoffFirehoseFactory.java:99) ~[druid-server-0.8.3.jar:0.8.3]

at io.druid.segment.realtime.firehose.PredicateFirehose.hasMore(PredicateFirehose.java:50) ~[druid-server-0.8.3.jar:0.8.3]

at io.druid.indexing.common.task.RealtimeIndexTask.run(RealtimeIndexTask.java:314) ~[druid-indexing-service-0.8.3.jar:0.8.3]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:285) [druid-indexing-service-0.8.3.jar:0.8.3]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:265) [druid-indexing-service-0.8.3.jar:0.8.3]

at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_95]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_95]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_95]

at java.lang.Thread.run(Thread.java:745) [?:1.7.0_95]

Caused by: java.lang.InterruptedException

at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2017) ~[?:1.7.0_95]

at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2095) ~[?:1.7.0_95]

at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:389) ~[?:1.7.0_95]

at io.druid.segment.realtime.firehose.EventReceiverFirehoseFactory$EventReceiverFirehose.hasMore(EventReceiverFirehoseFactory.java:201) ~[druid-server-0.8.3.jar:0.8.3]

… 9 more

2016-02-17T23:26:05,988 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {

“id” : “index_realtime_000C7221A34CA9DD8444662FFFB2EFD0_2016-02-17T00:00:00.000Z_0_0”,

“status” : “FAILED”,

“duration” : 434577

}

2016-02-17T23:26:05,991 INFO [Thread-44] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.metrics.MonitorScheduler.stop()] on object[com.metamx.metrics.MonitorScheduler@3d75caf9].

The task is received by overloard

The temporary files are created in /tmp/druid/zk_druid

The task shows up in the indexing tab of the coordinator console (web)

The task shows up in the overlord console (web) always with status RUNNING

The log of the task is created in /usr/local/druid/log

About my environment, I’m running:

Druid 0.8.3

Tranquility 0.7.3

Zookeeper 3.4.6

Mysql 5.7.11

my current topology is:

machine A (7.5Gb ram, 14 ECU) running zookeeper, mysql and coordinator node.

machine B (16Gb ram, 13 ECU) running historical and overlord (indexer)

my deep storage is s3.

metadata storage is mysql

My current nodes configuration for machine A:

druid.extensions.coordinates=[“io.druid.extensions:druid-examples”,“io.druid.extensions:druid-kafka-eight”, “io.druid.extensions:mysql-metadata-storage”, “io.druid.extensions:druid-hdfs-storage”, "io.druid.extensions:$

druid.extensions.localRepository=extensions-repo

druid.zk.service.host=localhost

druid.zk.paths.base=/druid

druid.metadata.storage.type=mysql

druid.metadata.storage.connector.connectURI=jdbc:mysql://mysqlip/druid

druid.metadata.storage.connector.user=druid

druid.metadata.storage.connector.password=(somepass)

druid.storage.type=s3

druid.s3.accessKey=(some access key)

druid.s3.secretKey=(some secret)

druid.storage.bucket=(bucket)

druid.storage.baseKey=(key)

druid.storage.archiveBucket=druid-archive

druid.storage.archiveBaseKey=bc

druid.cache.type=local

druid.cache.sizeInBytes=10000000

druid.selectors.indexing.serviceName=overlord

druid.selectors.coordinator.serviceName=coordinator

druid.emitter=noop

coordinator:

JVM parameters: -Xmx4g -Duser.timezone=UTC -Dfile.encoding=UTF-8

druid.host=(machine A IP)

druid.port=8081

druid.service=coordinator

druid.coordinator.startDelay=PT70s

Common confinguration for machine B:

druid.extensions.coordinates=[“io.druid.extensions:druid-examples”,“io.druid.extensions:druid-kafka-eight”, “io.druid.extensions:mysql-metadata-storage”, “io.druid.extensions:druid-hdfs-storage”, “io.druid.extensions:druid-s3-extensions”]

druid.extensions.localRepository=extensions-repo

druid.zk.service.host=(machine A’s ip address)

druid.zk.paths.base=/druid

druid.metadata.storage.type=mysql

druid.metadata.storage.connector.connectURI=jdbc:mysql://mysqlip/druid

druid.metadata.storage.connector.user=druid

druid.metadata.storage.connector.password=(somepass)

druid.storage.type=s3

druid.s3.accessKey=(some access key)

druid.s3.secretKey=(some secret)

druid.storage.bucket=(bucket)

druid.storage.baseKey=(key)

druid.storage.archiveBucket=druid-archive

druid.storage.archiveBaseKey=bc

druid.cache.type=local

druid.cache.sizeInBytes=10000000

druid.selectors.indexing.serviceName=overlord

druid.selectors.coordinator.serviceName=coordinator

druid.monitoring.monitors=[“com.metamx.metrics.SysMonitor”,“com.metamx.metrics.JvmMonitor”]

druid.emitter=noop

historical:

JVM parameters: -server -Xmx12g -Xms12g -XX:NewSize=6g -XX:MaxNewSize=6g -XX:MaxDirectMemorySize=16g -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Duser.timezone=UTC -Dfile.encoding=UTF-8

druid.host=(machine B ip)

druid.port=8083

druid.service=historical

druid.processing.numThreads=15

druid.historical.cache.useCache=true

druid.historical.cache.populateCache=true

druid.server.http.numThreads=50

druid.server.maxSize=300000000000

druid.segmentCache.locations=[{“path”: “/tmp/druid/zk_druid”, “maxSize”: 300000000000}]

overlord (not running middle manager):

JVM parameters:

druid.host=(machine B ip)

druid.port=8090

druid.service=overlord

Thanks in advance for any help!

index_realtime_000C7221A34CA9DD8444662FFFB2EFD0_2016-02-17T00%3A00%3A00.000Z_0_0.log (134 KB)

this seems the task was interrupted, i am not sure what caused that ! did you send a cancel task ?

No, didn’t send any cancel task.
I’ve just pushed it to tranquility and collected the logs.

The task is stuck as RUNNING, even after the exception.

Hi Rodrigo, do you see this happen often or was this only with the 1 task?

Hi Fangjin,

It happened to every task I submitted via tranquility http server (I tried different configurations and different data sources).

I connected to zookeeper via zkCli and saw that the segments were under /druid/segments but the /druid/loadQueue was empty.

I tried also to running on instance of the middlemanager and saw on the indexing service web console that the worker was showing there as available and then allocated to my task.

On the indexing service console, the tasks hangs as running for like 1hour and then goes to the “completed” bucket but no segments were correctly created.

I checked on my task temp directories /tmp/task and the files for the task are there…

I really can’t figure out what part I’m missing.

Thanks for the help!

You likely have misconfiguration in your setup. Can you attach a task log?

Hi All ,

Any resolution for this error ?? I’m also seeing exact same error during ingestion via tranquility server .

Thank you .

For the “tasks hangs as running for 1hour and then goes to ‘completed’” thing, try checking out: https://github.com/druid-io/tranquility/blob/master/docs/trouble.md#no-data-is-showing-up-in-druid-even-though-events-seem-to-be-successfully-sent

Biswajit, is that what you’re seeing too? If not could you describe what you’re seeing in more detail?

Hi Guys,

Sorry for the late reply… I ended up upgrading my version to 0.9.0.rc3 and everything started working…

Thanks for the support!