Indexer of Druid 0.7.0 is slower that Druid 0.6.169

Hi Guys,

I am using Druid 0.7.0 for local Indexer.

On one node I ran local Indexer on Druid 0.7.0 with 8 GB file its taking 8 hr and with same file I ran on Druid 0.6.169 its taking 40 mins.

Please check the following config for Druid 0.7.0

Overlord Config

-server

-Duser.timezone=UTC

-Dfile.encoding=UTF-8

druid.host=xxx

druid.port=48080

druid.service=overlord

druid.selectors.indexing.serviceName=overlord

druid.indexer.queue.startDelay=PT0M

#druid.indexer.runner.javaOpts=

druid.indexer.fork.property.druid.processing.numThreads=50

druid.indexer.fork.property.druid.computation.buffer.size=100000000

Curator/Zookeeper

druid.zk.service.host=xxx:2181

druid.discovery.curator.path=/druid/discNew

druid.extensions.coordinates=[“io.druid.extensions:druid-s3-extensions:0.7.0”,“io.druid.extensions:druid-mssql-metadata-storage:0.7.0”]

druid.worker.capacity=1

Announcer

druid.announcer.type=batch

DB

druid.metadata.storage.type=mssql

druid.metadata.storage.connector.connectURI=xxx

druid.metadata.storage.connector.user=xxx

druid.metadata.storage.connector.password=xxx

druid.metadata.storage.connector.useValidationQuery=true

druid.metadata.storage.tables.base=ec2_druid

druid.metadata.storage.tables.segments=ec2_segment_table

druid.metadata.storage.tables.rules=ec2_rule_table

druid.metadata.storage.connector.primaryKey=pk_ec2_table

druid.metadata.storage.tables.config=ec2_config_table

druid.metadata.storage.tables.tasks=ec2_test_task_table

druid.metadata.storage.tables.taskLog=ec2_test_tasklog_table

druid.metadata.storage.tables.taskLock=ec2_test_tasklock_table

Seg. Loader

druid.storage.type=s3

druid.s3.accessKey=xxxx

druid.s3.secretKey=xxxx

druid.storage.bucket=realtime-segment-bucket

druid.storage.baseKey=realtime-segement

druid.storage.disableAcl=true

Local Storage

#druid.storage.storageDirectory=/druid/localStorage

druid.publish.type=db

Overlord

druid.indexer.runner.type=local

druid.indexer.storage.type=metadata

#druid.indexer.storage.recentlyFinishedThreshold

#druid.indexer.queue.maxSize

druid.indexer.queue.startDelay=PT0M

druid.indexer.queue.restartDelay=PT30S

druid.indexer.queue.storageSyncRate=PT1M

Overlord in Remote Mode

druid.indexer.runner.taskAssignmentTimeout=PT1H

druid.indexer.runner.minWorkerVersion=0

#druid.indexer.runner.compressZnodes

#druid.indexer.runner.maxZnodeBytes

Autoscale Config

druid.indexer.autoscale.strategy=noop

druid.indexer.autoscale.doAutoscale=false

druid.indexer.autoscale.provisionPeriod=PT1M

druid.indexer.autoscale.terminatePeriod=PT11M

##druid.indexer.autoscale.originTime

druid.indexer.autoscale.workerIdleTimeout=PT1M

druid.indexer.autoscale.maxScalingDuration=PT1H

druid.indexer.autoscale.numEventsToTrack=10

druid.indexer.autoscale.pendingTaskTimeout=PT1M

druid.indexer.autoscale.workerVersion=0

druid.indexer.autoscale.workerPort=8091

druid.indexer.runner.javaOpts=-Xms5g -Xmx13g -XX:+PrintGCDetails -XX:+PrintGCTimeStamps

druid.indexer.task.baseDir=/druid/temp

druid.indexer.task.baseTaskDir=/druid/temp/persistent/tasks

Why its taking too much time to create segment ?

Thanks,

Jitesh Mogre

2015-04-27T18:53:57,253 INFO  [task-runner-0] io.druid.segment.LoggingProgressIndicator [] - [/druid/temp/persistent/tasks/index_test_analytics_2015-04-27T17:16:25.331Z/work/test_analytics_2015-04-27T16:00:00.000Z_2015-04-27T17:00:00.000Z_2015-04-27T17:16:25.337Z_102/test_analytics_2015-04-27T16:00:00.000Z_2015-04-27T17:00:00.000Z_2015-04-27T16:00:00.000Z_102/spill0]: Starting [make aid]
2015-04-27T18:54:04,315 INFO  [task-runner-0] io.druid.segment.IndexMaker [] - Dimension[aid] has null rows.
2015-04-27T18:54:04,315 INFO  [task-runner-0] io.druid.segment.IndexMaker [] - Dimension[aid] has no null value in the dictionary, expanding...
2015-04-27T18:54:04,334 INFO  [task-runner-0] io.druid.segment.IndexMaker [] - Completed dimension[aid] with cardinality[145]. Starting write.
2015-04-27T18:54:04,336 INFO  [task-runner-0] io.druid.segment.LoggingProgressIndicator [] - [/druid/temp/persistent/tasks/index_test_analytics_2015-04-27T17:16:25.331Z/work/test_analytics_2015-04-27T16:00:00.000Z_2015-04-27T17:00:00.000Z_2015-04-27T17:16:25.337Z_102/test_analytics_2015-04-27T16:00:00.000Z_2015-04-27T17:00:00.000Z_2015-04-27T16:00:00.000Z_102/spill0]: [make aid] has completed. Elapsed time: [7,082] millis

This is log when indexer make one of Dimension.

and in old Druid we were using IndexIO.java for indexing.

Is any difference in both Druid indexing ?

What is difference between IndexMaker and IndexMerger ?

Why we create IndexMaker ?

IndexMaker was used for some experimentation of building Druid v9 segments directly. It is much slower than IndexMerger. I am extremely surprised you are hitting that code as you don’t seem to be setting any configs that should trigger the switch logic. Are you using a modified version of Druid?