Hi,
I’m trying to migrate my old data sources into a new one (merging 3-4 data sources into 1). For some, reindexing works as expected. Some threw OOM (maxDirectMemory) while some failed without any errors at all. I’m trying to understand the latter.
Attached is my task spec where I’m trying to migrate data from “events_app1” into “events” data source while injecting an additional field (orgId) into the table. While this spec worked for many apps, it doesn’t work for some. The parent task and a child task is submitted successfully but since child task dies immediately, 2 more child tasks are spawned. But they end up failing too.
Below, you can see the error I see for the submitted task. There are no logs for child tasks. I also checked coordinator, overlord and middleManager logs but couldn’t find anything relevant.
Any ideas about how to enable logs for child tasks and what might be the reason for this behaviour?
WindowedSegmentId{segmentId=‘app1_events_2020-04-13T13:00:00.000Z_2020-04-13T14:00:00.000Z_2020-04-13T16:33:48.071Z’, intervals=[2020-04-13T13:00:00.000Z/2020-04-13T14:00:00.000Z]}, WindowedSegmentId{segmentId=‘app1_events_2020-05-04T12:00:00.000Z_2020-05-04T13:00:00.000Z_2020-05-04T11:22:41.715Z_2’, intervals=[2020-05-04T12:00:00.000Z/2020-05-04T13:00:00.000Z]}]}]
2020-05-15T07:45:46,942 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Submitted a new task[single_phase_sub_task_events_events_ipangpal_2020-05-15T07:45:46.776Z] for spec[index_parallel_events_events_cegnibin_2020-05-15T07:45:38.318Z_0]
2020-05-15T07:45:47,043 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexPhaseRunner - Waiting for subTasks to be completed
2020-05-15T07:45:47,781 WARN [task-monitor-0] org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - task[single_phase_sub_task_events_events_ipangpal_2020-05-15T07:45:46.776Z] failed!
2020-05-15T07:45:47,781 INFO [task-monitor-0] org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - We still have more chances[1/3] to process the spec[index_parallel_events_events_cegnibin_2020-05-15T07:45:38.318Z_0].
2020-05-15T07:45:47,880 INFO [task-monitor-0] org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Submitted a new task[single_phase_sub_task_events_events_dhgpmndj_2020-05-15T07:45:47.781Z] for retrying spec[index_parallel_events_events_cegnibin_2020-05-15T07:45:38.318Z_0]
2020-05-15T07:45:48,770 WARN [task-monitor-0] org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - task[single_phase_sub_task_events_events_dhgpmndj_2020-05-15T07:45:47.781Z] failed!
2020-05-15T07:45:48,770 INFO [task-monitor-0] org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - We still have more chances[2/3] to process the spec[index_parallel_events_events_cegnibin_2020-05-15T07:45:38.318Z_0].
2020-05-15T07:45:48,850 INFO [task-monitor-0] org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Submitted a new task[single_phase_sub_task_events_events_ahhfnoch_2020-05-15T07:45:48.770Z] for retrying spec[index_parallel_events_events_cegnibin_2020-05-15T07:45:38.318Z_0]
2020-05-15T07:45:49,770 WARN [task-monitor-0] org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - task[single_phase_sub_task_events_events_ahhfnoch_2020-05-15T07:45:48.770Z] failed!
2020-05-15T07:45:49,771 ERROR [task-monitor-0] org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - spec[index_parallel_events_events_cegnibin_2020-05-15T07:45:38.318Z_0] failed after [3] tries
2020-05-15T07:45:49,774 ERROR [task-runner-0-priority-0] org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexPhaseRunner - Failed because of the failed sub task[single_phase_sub_task_events_events_ahhfnoch_2020-05-15T07:45:48.770Z]
2020-05-15T07:45:49,774 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexPhaseRunner - Cleaning up resources
2020-05-15T07:45:49,775 INFO [task-runner-0-priority-0] org.apache.druid.indexing.common.task.batch.parallel.TaskMonitor - Stopped taskMonitor
2020-05-15T07:45:49,780 INFO [task-runner-0-priority-0] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
“id” : “index_parallel_events_events_cegnibin_2020-05-15T07:45:38.318Z”,
“status” : “FAILED”,
“duration” : 4842,
“errorMsg” : null,
“location” : {
“host” : null,
“port” : -1,
“tlsPort” : -1
}
}
2020-05-15T07:45:49,789 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [ANNOUNCEMENTS]
2020-05-15T07:45:49,792 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [SERVER]
2020-05-15T07:45:49,797 INFO [main] org.eclipse.jetty.server.AbstractConnector - Stopped ServerConnector@6bdedbbd{HTTP/1.1,[http/1.1]}{0.0.0.0:8130}
2020-05-15T07:45:49,798 INFO [main] org.eclipse.jetty.server.session - node0 Stopped scavenging
2020-05-15T07:45:49,800 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - Stopped o.e.j.s.ServletContextHandler@2cdb5974{/,null,UNAVAILABLE}
2020-05-15T07:45:49,804 INFO [main] org.apache.druid.java.util.common.lifecycle.Lifecycle - Stopping lifecycle [module] stage [NORMAL]
2020-05-15T07:45:49,805 INFO [main] org.apache.druid.indexing.overlord.SingleTaskBackgroundRunner - Starting graceful shutdown of task[index_parallel_events_events_cegnibin_2020-05-15T07:45:38.318Z].
2020-05-15T07:45:49,807 INFO [main] org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexPhaseRunner - Cleaning up resources
Thanks!
example.json (2.17 KB)