Task running without location

Hello,

We’re facing some ingestion issues, some tasks are started without any location, and it takes several minutes while a location is assigned to this task (sometime never and the tasks FAILED in timeout):

That provokes some delays…
In the overlord and middlemanager log we can see that the tasks is assigned to a MiddleManager but never start

What can be the root cause ?

Are the MM nodes really busy in terms of CPU/memory utilization? What servers are they running on (cpu and memory characteristics), how are the MMs and Peon JVMs configured?

Hello, thanks for your reply.
No, no CPU or Ram issue. We have 13 MM with 18 workers in each. Each MM is running on a separate node.

What do mean and what detail do you want for how are the MMs and Peon JVMs configured ?

If that can help

 middlemanagers:
      jvm:
        xms_value: -Xms1G
        xmx_value: -Xmx8G
      nodeRoles:
        value: allow
      replicasCount: 13
      runtime:
        buffer_sizeBytes: '786432000'
        fork_buffer_sizeBytes: '786432000'
        javaOpts: -Daws.region=us-east-1 -XX:MaxDirectMemorySize=1000g
        worker_capacity: 18

@OliveBZH Thanks for the details. I’m not familiar with that deployment configuration, is that a Druid Operator setup?

Regardless, if I am reading it correctly, you have

  • the MM JVM with -Xms1G -Xmx8G, the recommendation is that these values be the same and 1G is probably enough for the MM itself.
  • the javaOpts: I’m assuming these translate into druid.indexer.runner.javaOpts or druid.indexer.runner.javaOptsArray which control the Peon JVMs. If you do not specify the -Xms -Xmx settings here, they will inherit the values from the MM jvm. It is recommended that these are set explicitly.
  • each MM pod will use a total of the MM JVM heap + (worker_capacity * (Peon JVM Heap + Peon JVM DirectMemorySize) )

Given your current settings,

  • the demand for memory on each MM will be up to 9GB per Peon + up to 8 for MM resulting in (18 * 9) + 8 = 170GB of memory.
  • the demand on CPU will be worker_capacity + 1 = 19 vCPUs

Do the MM pods have those resources available? If not you can reduce the number of workers or adjust memory settings accordingly.

Hope this helps. Let us know how it goes.

If found the following memory usage in Druid basic cluster tuning which is also include buffer size bytes, is it right ?

#### Total memory usage

To estimate total memory usage of a Task under these guidelines:

* Heap: `1GiB + (2 * total size of lookup maps)`
* Direct Memory: `(druid.processing.numThreads + druid.processing.numMergeBuffers + 1) * druid.processing.buffer.sizeBytes`

The total memory usage of the MiddleManager + Tasks:

`MM heap size + druid.worker.capacity * (single task memory usage)`

but yes we have enough resources to cover that. My question is also: theses tasks without any location can be cause by a lack of ressource ?

I’m not sure, just a theory. I’m trying to understand your setup and the workload on the system to see if that might be interfering with normal processing.

Are there any errors in the overlord or MM logs?

no no error.

For instance for tasks index_kafka_ms_export_diameter_wing_1min_9495568f8e20a6e_hlpeebif, only that in overlord:

2022-11-17T12:21:26,847 INFO [rtr-pending-tasks-runner-0] org.apache.druid.indexing.overlord.RemoteTaskRunner - Coordinator asking Worker[10.244.13.176:8091] to add task[index_kafka_ms_export_diameter_wing_1min_9495568f8e20a6e_hlpeebif]
2022-11-17T12:21:26,850 INFO [rtr-pending-tasks-runner-0] org.apache.druid.indexing.overlord.RemoteTaskRunner - Task index_kafka_ms_export_diameter_wing_1min_9495568f8e20a6e_hlpeebif switched from pending to running (on [10.244.13.176:8091])
2022-11-17T12:21:26,867 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[10.244.13.176:8091] wrote RUNNING status for task [index_kafka_ms_export_diameter_wing_1min_9495568f8e20a6e_hlpeebif] on [TaskLocation{host='null', port=-1, tlsPort=-1}]

and for mm:

2022-11-17T12:21:26,864 INFO [WorkerTaskManager-NoticeHandler] org.apache.druid.indexing.worker.WorkerTaskManager - Task[index_kafka_ms_export_diameter_wing_1min_9495568f8e20a6e_hlpeebif] started.

but has you see, the {host='null', port=-1, tlsPort=-1} indicate that no peon has been assigned

and after 2minutes a new log in overlord:

2022-11-17T12:23:22,925 INFO [Curator-PathChildrenCache-3] org.apache.druid.indexing.overlord.RemoteTaskRunner - Worker[10.244.13.176:8091] wrote RUNNING status for task [index_kafka_ms_export_diameter_wing_1min_9495568f8e20a6e_hlpeebif] on [TaskLocation{host='10.244.13.176', port=8115, tlsPort=-1}]

so here it took 2 min to assign a host to the tasks but sometime it took 10, 20 minutes or never.
So which mecanism assign a PEON to a task ? Its not a matter of available peon as we have a lot of free peon…

Thanks for the details @OliveBZH, I will research what the execution path is for that…

Hey @OliveBZH, don’t really have a full explanation for you, but one suggestion is that it could be a zookeeper issue that could be circumvented by changing the runner type to:
druid.indexer.runner.type=httpRemote

This will result in direct communication between overlord and MMs/tasks such that zookeeper is no longer involved.

Another suggestion is to look at the middle manager log, to see if there is any additional info about these tasks even if it isn’t an error.

1 Like