Merging tasks submitted over and over to the indexing service.

Hi all,

I’m experiencing the following problem with my indexing service. My coordinator is always trying to merge the same segments and submit the same merging tasks to my indexing service even though they are are always failing with this exception:

2015-10-06T11:19:04,691 ERROR [task-runner-0] io.druid.indexing.common.task.MergeTaskBase - Exception merging[rtb_auctions]: {class=io.druid.indexing.common.task.MergeTaskBase, exceptionType=class com.metamx.common.ISE, exceptionMessage=Cannot merge columns of type[LONG] and [FLOAT], interval=2015-03-22T00:00:00.000-07:00/2015-03-24T16:00:00.000-07:00}
com.metamx.common.ISE: Cannot merge columns of type[LONG] and [FLOAT]
	at io.druid.segment.column.ColumnCapabilitiesImpl.merge(ColumnCapabilitiesImpl.java:124) ~[druid-processing-0.7.0.jar:0.7.0]
	at io.druid.segment.IndexMerger.makeIndexFiles(IndexMerger.java:431) ~[druid-processing-0.7.0.jar:0.7.0]
	at io.druid.segment.IndexMerger.append(IndexMerger.java:399) ~[druid-processing-0.7.0.jar:0.7.0]
	at io.druid.segment.IndexMerger.append(IndexMerger.java:326) ~[druid-processing-0.7.0.jar:0.7.0]
	at io.druid.indexing.common.task.AppendTask.merge(AppendTask.java:106) ~[druid-indexing-service-0.7.0.jar:0.7.0]
	at io.druid.indexing.common.task.MergeTaskBase.run(MergeTaskBase.java:146) [druid-indexing-service-0.7.0.jar:0.7.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:235) [druid-indexing-service-0.7.0.jar:0.7.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:214) [druid-indexing-service-0.7.0.jar:0.7.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:262) [?:1.7.0_80]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_80]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_80]
	at java.lang.Thread.run(Thread.java:745) [?:1.7.0_80]
2015-10-06T11:19:04,703 INFO [task-runner-0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"alerts","timestamp":"2015-10-06T11:19:04.697-07:00","service":"druid/prod/worker","host":"ec2-54-226-66-29.compute-1.amazonaws.com:8081","severity":"component-failure","description":"Exception merging[rtb_auctions]","data":{"class":"io.druid.indexing.common.task.MergeTaskBase","exceptionType":"com.metamx.common.ISE","exceptionMessage":"Cannot merge columns of type[LONG] and [FLOAT]","exceptionStackTrace":"com.metamx.common.ISE: Cannot merge columns of type[LONG] and [FLOAT]\n\tat io.druid.segment.column.ColumnCapabilitiesImpl.merge(ColumnCapabilitiesImpl.java:124)\n\tat io.druid.segment.IndexMerger.makeIndexFiles(IndexMerger.java:431)\n\tat io.druid.segment.IndexMerger.append(IndexMerger.java:399)\n\tat io.druid.segment.IndexMerger.append(IndexMerger.java:326)\n\tat io.druid.indexing.common.task.AppendTask.merge(AppendTask.java:106)\n\tat io.druid.indexing.common.task.MergeTaskBase.run(MergeTaskBase.java:146)\n\tat io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:235)\n\tat io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:214)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:262)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)\n\tat java.lang.Thread.run(Thread.java:745)\n","interval":"2015-03-22T00:00:00.000-07:00/2015-03-24T16:00:00.000-07:00"}}]
2015-10-06T11:19:04,704 INFO [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Removing task directory: /mnt/tmp/persistent/tasks/merge_rtb_auctions_810e5984ab980bbb3360f407e2d6cc336705c5f5_2015-10-06T11:18:48.788-07:00/work
2015-10-06T11:19:04,711 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "merge_rtb_auctions_810e5984ab980bbb3360f407e2d6cc336705c5f5_2015-10-06T11:18:48.788-07:00",
  "status" : "FAILED",
  "duration" : 8889
}

It looks like there is a problem in the column type which is not the same from one segment to another. I have attached one log of the merging task causing the issue. It looks like all my segments have the same metrics but one of them doesn’t have the same metrics order (segment with interval 2015-03-23T12:00:00.000-07:00_2015-03-24T16:00:00.000-07:00).

My segments are first created via Hadoop index tasks and have always on hour time granularity. This segment with a different payload has therefore been created by a merging task (its interval is 28 hours). I have checked my un merged segments and they all have the same payload with the same metric order.

Here are my questions:

  1. Was there any bug related to the merging tasks which could explain this issue? I am running Druid 0.7.0.

  2. Why is my coordinator always trying to merge these segments even though they will apparently never be merged?

  3. Is there a way to tell my coordinators to stop trying to merge these segments?

  4. Has a merging task the same priority as a real time indexing task? For example if I need to assign 2 real time indexing tasks and 2 merging tasks at the same time but I have only 2 peons available. How Druid will assign my tasks to the peons?

Thanks for your help.

merge-task-logs-loop-submit (74.9 KB)

Please see inline.

Ref for hadoop converter task: druid.io/docs/0.8.1/misc/tasks.html#hadoop-convert-segment-task

Thanks for your answer! We were planning to upgrade our cluster so we will use the converter as you suggested once it’s done.

Guillaume