Indexing Service Extremely Slow

Hi, I found my indexing service take long time for completing tasks, for example,

it takes more than 5 hours to complete a task which has a segment of interval

“2015-07-14T12:00:00.000Z/2015-07-14T13:00:00.000Z”(for 1 hour, which contains about

35000000 origin records, 1 of the 5 shards)

So if the ingestion continue, many “running” tasks will lock all of the peon workers, and

then no more tasks can be assigned to any peon.

I don’t know why is it so slow, and how to improve this.

I have only 1 middleManager node for test, and the server has a 12 cores CPU of “Intel E5-2620 v3 @ 2.40GHz”,

192GB RAM, HDD.

My JDK version is “1.8.040“.

My config files:

_common/common.runtime.properties:

druid.extensions.searchCurrentClassloader=true

druid.extensions.coordinates=[“io.druid.extensions:druid-kafka-eight”,”io.druid.extensions:mysql-metadata-storage”,”io.druid.extensions:druid-hdfs-storage”]

druid.zk.service.host=192.168.105.240:2181

druid.metadata.storage.type=mysql

druid.metadata.storage.connector.connectURI=jdbc:mysql://192.168.105.240:3306/druid

druid.metadata.storage.connector.user=druid

druid.metadata.storage.connector.password=durid

druid.storage.type=hdfs

druid.storage.storageDirectory=hdfs://192.168.105.240:9000/druid

druid.cache.type=local

druid.cache.sizeInBytes=100000000

druid.selectors.indexing.serviceName=overlord

druid.monitoring.monitors=[“com.metamx.metrics.JvmMonitor”]

druid.monitoring.emissionPeriod=PT10S

druid.emitter=http

druid.emitter.http.recipientBaseUrl=http://192.168.105.240:19999/metrics/

druid.request.logging.feed=druidrequests

druid.emitter.logging.loggerClass=HttpPostEmitter

druid.request.logging.type=file

druid.request.logging.dir=/data1/logs/druid

``

middleManager/runtime.properties:

druid.host=192.168.105.17:3029

druid.port=3029

druid.service=middlemanager

druid.indexer.queue.startDelay=PT1M

druid.indexer.task.baseDir=/data1/druid

druid.indexer.task.baseTaskDir=/data1/druid/persistent/tasks

druid.indexer.runner.javaOpts=-server -Xmx8g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

druid.indexer.fork.property.druid.monitoring.monitors=[“com.metamx.metrics.JvmMonitor”]

druid.indexer.fork.property.druid.processing.buffer.sizeBytes=536870912

druid.indexer.fork.property.druid.processing.numThreads=10

druid.indexer.fork.property.druid.segmentCache.locations=[{“path”: “/data1/druid/persistent/segmentcache“, “maxSize”: 1024000000000}]

druid.indexer.fork.property.druid.server.http.numThreads=50

druid.worker.capacity=15

druid.indexer.runner.startPort=41000

druid.worker.ip=192.168.105.17

druid.worker.version=0

druid.indexer.task.chathandler.type=announce

druid.indexer.logs.type=file

druid.indexer.logs.directory=/data1/logs/druid/indexer

druid.indexer.fork.property.log4j.configurationFile=<file:///opt/druid/config/middleManager/log4j2.xml>

``

config/coordinator/runtime.properties:

druid.host=192.168.105.17:3031

druid.port=3031

druid.service=coordinator

druid.coordinator.startDelay=PT5M

druid.emitter=http

druid.emitter.http.recipientBaseUrl=http://192.168.105.240:19999/metrics/

``

config/historical/runtime.properties:

druid.host=192.168.105.17:3033

druid.port=3033

druid.service=historical

druid.processing.buffer.sizeBytes=1073741824

druid.processing.numThreads=4

druid.segmentCache.locations=[{“path”: “/data1/druid/indexCache”, “maxSize”: 1024000000000}]

druid.server.maxSize=1024000000000

druid.server.http.numThreads=50

``

overlord/runtime.properties:

druid.host=192.168.105.17:3030

druid.port=3030

druid.service=overlord

druid.indexer.queue.startDelay=PT1M

druid.indexer.runner.taskAssignmentTimeout=PT1M

druid.indexer.runner.type=remote

druid.indexer.storage.type=metadata

druid.indexer.logs.type=file

druid.indexer.logs.directory=/data1/logs/druid/indexer

``

broker/runtime.properties:

druid.host=192.168.105.17:3032

druid.port=3032

druid.service=broker

druid.broker.cache.useCache=true

druid.broker.cache.populateCache=true

druid.broker.http.numConnections=20

druid.broker.http.readTimeout=PT5M

druid.processing.buffer.sizeBytes=512000000

druid.processing.numThreads=2

druid.server.http.numThreads=50

``

My start commands:

middleManager node:

java -Xmx64m -Xms64m -server -XX:+UseConcMarkSweepGC -Duser.timezone=UTC -Dfile.encoding=UTF-8
-Ddruid.emitter.logging.logLevel=info -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Djava.io.tmpdir=/data1/druid -Ddruid.segmentCache.locations=[{“path”:”/data1/druid/indexCache”,”maxSize”:”500000000000”}]
-Ddruid.server.maxSize=500000000000 -classpath /opt/druid/config/common:/opt/druid/config/middleManager:/opt/druid/lib/:
/opt/hadoop/share/hadoop/hdfs/
:/opt/hadoop/share/hadoop/common/:/opt/hadoop/share/hadoop/common/lib/
io.druid.cli.Main server middleManager

``

overlord node:

java -Xmx4g -Xms4g -server -XX:+UseConcMarkSweepGC -Duser.timezone=UTC -Dfile.encoding=UTF-8
-Ddruid.emitter.logging.logLevel=info -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Djava.io.tmpdir=/data1/druid -Ddruid.segmentCache.locations=[{“path”:”/data1/druid/indexCache”,”maxSize”:”500000000000”}]
-Ddruid.server.maxSize=500000000000 -classpath /opt/druid/config/common:/opt/druid/config/overlord:/opt/druid/lib/:
/opt/hadoop/share/hadoop/hdfs/
:/opt/hadoop/share/hadoop/common/:/opt/hadoop/share/hadoop/common/lib/
io.druid.cli.Main server overlord

``

historical node:

java -Xmx8g -Xms2g -server -XX:+UseConcMarkSweepGC -Duser.timezone=UTC -Dfile.encoding=UTF-8
-Ddruid.emitter.logging.logLevel=info -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Djava.io.tmpdir=/data1/druid -Ddruid.segmentCache.locations=[{“path”:”/data1/druid/indexCache”,”maxSize”:”500000000000”}]
-Ddruid.server.maxSize=500000000000 -classpath /opt/druid/config/common:/opt/druid/config/historical:/opt/druid/lib/:
/opt/hadoop/share/hadoop/hdfs/
:/opt/hadoop/share/hadoop/common/:/opt/hadoop/share/hadoop/common/lib/
io.druid.cli.Main server historical

``

coordinator node:

java -Xmx8g -Xms2g -server -XX:+UseConcMarkSweepGC -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Ddruid.emitter.logging.logLevel=info -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Djava.io.tmpdir=/data1/druid -Ddruid.segmentCache.locations=[{“path”:"/data1/druid/indexCache",“maxSize”:“500000000000”}] -Ddruid.server.maxSize=500000000000 -classpath /opt/druid/config/_common:/opt/druid/config/coordinator:/opt/druid/lib/:/opt/hadoop/share/hadoop/hdfs/:/opt/hadoop/share/hadoop/common/:/opt/hadoop/share/hadoop/common/lib/ io.druid.cli.Main server coordinator

``

My segments payload like below:

{

task: “index_realtime_datasource2_2015-07-14T20:00:00.000+08:00_0_0”,

payload: {

id: “index_realtime_datasource2_2015-07-14T20:00:00.000+08:00_0_0”,

resource: {

availabilityGroup: “datasource2-12-0000”,

requiredCapacity: 1

},

spec: {

dataSchema: {

dataSource: “datasource2”,

parser: {

type: “map”,

parseSpec: {

format: “json”,

timestampSpec: {

column: “timestamp”,

format: “auto”,

missingValue: null

},

dimensionsSpec: {

dimensions: [

“brand”,

“country”,

“cpa”,

“cpc”,

“cpm”,

“currency”,

“deal_type”,

“from_jpa”,

“from_rmb”,

“from_twd”,

“height”,

“id”,

“impression_type”,

“media_type”,

“payment_rule_id”,

“payment_type”,

“placement_group_id”,

“placement_id”,

“revenue_type”,

“timestamp”,

“to_jpa”,

“to_rmb”,

“to_twd”

],

dimensionExclusions: [

“count”

],

spatialDimensions:

}

}

},

metricsSpec: [

{

type: “count”,

name: “row”

},

{

type: “longSum”,

name: “count”,

fieldName: “count”

},

{

type: “doubleSum”,

name: “sum_cpm”,

fieldName: “cpm”

},

{

type: “doubleSum”,

name: “sum_cpc”,

fieldName: “cpc”

},

{

type: “doubleSum”,

name: “sum_cpa”,

fieldName: “cpa”

},

{

type: “javascript”,

name: “rmb_cpm”,

fieldNames: [

“cpm”,

“to_rmb”

],

fnAggregate: “function(current, a, b) { return current + a * b; }”,

fnReset: “function() { return 0; }”,

fnCombine: “function(partialA, partialB) { return partialA + partialB; }”

}

],

granularitySpec: {

type: “uniform”,

segmentGranularity: “HOUR”,

queryGranularity: {

type: “duration”,

duration: 1000,

origin: “1970-01-01T00:00:00.000Z”

},

intervals: null

}

},

ioConfig: {

type: “realtime”,

firehose: {

type: “clipped”,

delegate: {

type: “timed”,

delegate: {

type: “receiver”,

serviceName: “druid:firehose:datasource2-12-0000-0000”,

bufferSize: 100000

},

shutoffTime: “2015-07-14T21:06:00.000+08:00”

},

interval: “2015-07-14T12:00:00.000Z/2015-07-14T13:00:00.000Z”

}

},

tuningConfig: {

type: “realtime”,

maxRowsInMemory: 10000000,

intermediatePersistPeriod: “PT2M”,

windowPeriod: “PT1M”,

basePersistDirectory: “/tmp/1436876000335-0”,

versioningPolicy: {

type: “intervalStart”

},

rejectionPolicy: {

type: “none”

},

maxPendingPersists: 0,

shardSpec: {

type: “linear”,

partitionNum: 0

},

indexSpec: {

bitmap: {

type: “concise”

},

dimensionCompression: null,

metricCompression: null

},

persistInHeap: false,

ingestOffheap: false,

bufferSize: 134217728

}

},

groupId: “index_realtime_datasource2”,

dataSource: “datasource2”

}

}

``

the failed task logs show exceptions:

2015-07-14T13:55:26,104 INFO [main] io.druid.initialization.Initialization - Loading extension[io.druid.extensions:druid-kafka-eight] for class[io.druid.initialization.DruidModule]

2015-07-14T13:55:26,109 INFO [main] io.druid.initialization.Initialization - Adding extension module[class io.druid.firehose.kafka.KafkaEightDruidModule] for class[io.druid.initialization.DruidModule]

2015-07-14T13:55:26,110 INFO [main] io.druid.initialization.Initialization - Loading extension[io.druid.extensions:mysql-metadata-storage] for class[io.druid.initialization.DruidModule]

2015-07-14T13:55:26,114 INFO [main] io.druid.initialization.Initialization - Adding extension module[class io.druid.metadata.storage.mysql.MySQLMetadataStorageModule] for class[io.druid.initialization.DruidModule]

2015-07-14T13:55:26,115 INFO [main] io.druid.initialization.Initialization - Loading extension[io.druid.extensions:druid-hdfs-storage] for class[io.druid.initialization.DruidModule]

2015-07-14T13:55:26,119 INFO [main] io.druid.initialization.Initialization - Adding extension module[class io.druid.storage.hdfs.HdfsStorageDruidModule] for class[io.druid.initialization.DruidModule]

2015-07-14T13:55:28,129 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.server.initialization.HttpEmitterConfig] from props[druid.emitter.http.] as [io.druid.server.initialization.HttpEmitterConfig@2e53b094]

Exception in thread “main” com.google.inject.CreationException: Guice creation errors:

Error injecting method, java.lang.OutOfMemoryError: unable to create new native thread

at io.druid.server.initialization.EmitterModule$EmitterProvider.inject(EmitterModule.java:101)

at io.druid.server.initialization.EmitterModule.configure(EmitterModule.java:72)

1 error

at com.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:448)

at com.google.inject.internal.InternalInjectorCreator.injectDynamically(InternalInjectorCreator.java:176)

at com.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:110)

at com.google.inject.Guice.createInjector(Guice.java:96)

at com.google.inject.Guice.createInjector(Guice.java:73)

at com.google.inject.Guice.createInjector(Guice.java:62)

at io.druid.initialization.Initialization.makeInjectorWithModules(Initialization.java:369)

at io.druid.cli.GuiceRunnable.makeInjector(GuiceRunnable.java:55)

at io.druid.cli.CliPeon.run(CliPeon.java:207)

at io.druid.cli.Main.main(Main.java:88)

Caused by: java.lang.OutOfMemoryError: unable to create new native thread

at java.lang.Thread.start0(Native Method)

at java.lang.Thread.start(Thread.java:714)

at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)

at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)

at org.jboss.netty.util.internal.DeadLockProofWorker.start(DeadLockProofWorker.java:38)

at org.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:368)

at org.jboss.netty.channel.socket.nio.AbstractNioSelector.(AbstractNioSelector.java:100)

at org.jboss.netty.channel.socket.nio.AbstractNioWorker.(AbstractNioWorker.java:52)

at org.jboss.netty.channel.socket.nio.NioWorker.(NioWorker.java:45)

at org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:45)

at org.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:28)

at org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(AbstractNioWorkerPool.java:143)

at org.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractNioWorkerPool.java:81)

at org.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:39)

at com.metamx.http.client.HttpClientInit.createBootstrap(HttpClientInit.java:134)

at com.metamx.http.client.HttpClientInit.createClient(HttpClientInit.java:89)

at io.druid.server.initialization.HttpEmitterModule.getEmitter(HttpEmitterModule.java:79)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:497)

at com.google.inject.internal.ProviderMethod.get(ProviderMethod.java:105)

at com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:86)

at com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:55)

at com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:66)

at com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:47)

at com.google.inject.internal.ProviderToInternalFactoryAdapter1.get(LifecycleScope.java:49)

at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)

at com.google.inject.internal.InjectorImpl1.call(InjectorImpl.java:1005)

at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058)

at com.google.inject.internal.InjectorImplEmitterProvider.findEmitter(EmitterModule.java:125)

at io.druid.server.initialization.EmitterModuleEmitterProvider

c521ec5e.invoke()

at com.google.inject.internal.cglib.reflect.1.invoke(SingleMethodInjector.java:57)

at com.google.inject.internal.SingleMethodInjector.inject(SingleMethodInjector.java:91)

at com.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:132)

at com.google.inject.internal.MembersInjectorImpl1.call(MembersInjectorImpl.java:80)

at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1051)

at com.google.inject.internal.MembersInjectorImpl.injectAndNotify(MembersInjectorImpl.java:80)

at com.google.inject.internal.Initializer$InjectableReference.get(Initializer.java:174)

at com.google.inject.internal.Initializer.injectAll(Initializer.java:108)

at com.google.inject.internal.InternalInjectorCreator.injectDynamically(InternalInjectorCreator.java:174)

… 8 more

Heap

garbage-first heap total 2097152K, used 82228K [0x00000005c0000000, 0x00000005c0202000, 0x00000007c0000000)

region size 2048K, 41 young (83968K), 6 survivors (12288K)

Metaspace used 26540K, capacity 26950K, committed 27008K, reserved 1073152K

class space used 3298K, capacity 3414K, committed 3456K, reserved 1048576K

``

I also find many ‘Cannot shut down yet!’ errors in the peon logs:

/data1/logs/druid % grep ‘Cannot shut down yet!’ peon_indexing.log

2015-07-14T13:06:00,075 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T12:00:00.000Z_2015-07-14T13:00:00.000Z_2015-07-14T12:14:34.966Z_2

2015-07-14T13:06:00,282 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T12:00:00.000Z_2015-07-14T13:00:00.000Z_2015-07-14T12:14:34.966Z

2015-07-14T13:06:00,283 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T12:00:00.000Z_2015-07-14T13:00:00.000Z_2015-07-14T12:14:34.966Z_3

2015-07-14T13:06:00,290 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T12:00:00.000Z_2015-07-14T13:00:00.000Z_2015-07-14T12:14:34.966Z_1

2015-07-14T13:06:00,311 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T12:00:00.000Z_2015-07-14T13:00:00.000Z_2015-07-14T12:14:34.966Z_4

2015-07-14T14:06:00,036 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T13:00:00.000Z_2015-07-14T14:00:00.000Z_2015-07-14T13:00:08.657Z_4

2015-07-14T14:06:00,468 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T13:00:00.000Z_2015-07-14T14:00:00.000Z_2015-07-14T13:00:08.657Z_2

2015-07-14T14:06:00,475 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T13:00:00.000Z_2015-07-14T14:00:00.000Z_2015-07-14T13:00:08.657Z_1

2015-07-14T14:06:00,495 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T13:00:00.000Z_2015-07-14T14:00:00.000Z_2015-07-14T13:00:08.657Z

2015-07-14T14:06:00,500 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T13:00:00.000Z_2015-07-14T14:00:00.000Z_2015-07-14T13:00:08.657Z_3

2015-07-14T19:06:00,141 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T18:00:00.000Z_2015-07-14T19:00:00.000Z_2015-07-14T18:00:08.748Z_2

2015-07-14T19:06:00,147 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T18:00:00.000Z_2015-07-14T19:00:00.000Z_2015-07-14T18:00:08.748Z_4

2015-07-14T19:06:00,148 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T18:00:00.000Z_2015-07-14T19:00:00.000Z_2015-07-14T18:00:08.748Z_1

2015-07-14T19:06:00,150 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T18:00:00.000Z_2015-07-14T19:00:00.000Z_2015-07-14T18:00:08.748Z_3

2015-07-14T19:06:00,150 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T18:00:00.000Z_2015-07-14T19:00:00.000Z_2015-07-14T18:00:08.748Z

2015-07-14T21:06:00,343 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T20:00:00.000Z_2015-07-14T21:00:00.000Z_2015-07-14T20:00:08.808Z_3

2015-07-14T23:06:00,169 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T22:00:00.000Z_2015-07-14T23:00:00.000Z_2015-07-14T22:00:09.014Z_1

2015-07-15T00:06:00,097 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T23:00:00.000Z_2015-07-15T00:00:00.000Z_2015-07-14T23:00:08.898Z_4

2015-07-15T01:06:00,314 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-15T00:00:00.000Z_2015-07-15T01:00:00.000Z_2015-07-15T00:00:08.880Z_3

2015-07-15T01:06:00,322 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-15T00:00:00.000Z_2015-07-15T01:00:00.000Z_2015-07-15T00:00:08.880Z_1

2015-07-15T01:06:00,324 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-15T00:00:00.000Z_2015-07-15T01:00:00.000Z_2015-07-15T00:00:08.880Z_4

2015-07-15T01:06:00,339 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-15T00:00:00.000Z_2015-07-15T01:00:00.000Z_2015-07-15T00:00:08.880Z

2015-07-15T01:06:00,341 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-15T00:00:00.000Z_2015-07-15T01:00:00.000Z_2015-07-15T00:00:08.880Z_2

2015-07-15T04:06:00,126 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-15T03:00:00.000Z_2015-07-15T04:00:00.000Z_2015-07-15T03:00:09.029Z_3

``

and the indexer logs is here: https://drive.google.com/a/weiwen.org/file/d/0B7biMMdOHoEkYnNPSFNfaVpmanc/view?usp=sharing

other log files are attached to this post.

Any help would be really appreciated.

Regards,

Weiwen

druid.log.tar.bz2 (154 KB)

coordinator_20150714201311.log.tar.bz2 (64.3 KB)

historical_20150714201311.log.tar.bz2 (15.7 KB)

middleManager_20150714201311.log.tar.bz2 (23.6 KB)

overlord_20150714201311.log.tar.bz2 (32.2 KB)

peon_indexing.log.tar.bz2 (629 KB)

update: my druid version is 0.7.3.

Hi Weiwen, it looks like your peon_indexing.log.tar.bz2 actually mixes together the logs from a bunch of different tasks. Is it possible for you to split that out for one of the slow tasks? The “log (all)” link next to a task on the overlord console should get you that.

A couple of things that stand out just from what you have:

  • The one segment that your tasks managed to insert was really large (task=index_realtime_datasource2_2015-07-14T20:00:00.000+08:00_3_0 published one with size= 2881622658). Generally it’s best to have segments a third to half that size. So, you may need either more parallel tasks or a shorter segmentGranularity (you could try FIFTEEN_MINUTE with tranquility 0.5.0).

  • It looks like some of your tasks are OOMEing on startup by running out of threads (java.lang.OutOfMemoryError: unable to create new native thread). This seems unrelated to the realtime task slowness, but it’s hard to tell because the logs are jumbled up.