Increasing number of workers for real-time Ingestion task

Hi,

I am using druid-docker image to run a druid cluster and i am ingesting streaming data using tranquility.

The segment granularity is set to 15 minutes and window time to 10 minutes. Druid has a grace period druidBeam.firehoseGracePeriod of 5 minutes by default. This means the realtime tasks would run for total 30 minutes.

As an when the first event is received by application, a realtime task is created by tranquility and data is ingested into druid. This realtime task is responsible for ingesting data over next 15 minutes. This works fine and events with timestamp within next 15 minutes (segment granularity) is ingested without any problem.

Eg. first event is received at 11.00 leading to creation of a task as shown below:

However this task will not index any events with timestamp beyond 11.15 but will keep running till 11.30.

When events with timestamp not covered by above tasks are received , new task gets created. For eg. for event with timestamp 11.17 new task is created as shown below:

However tranquility is not able to load data into druid for the new events since the earlier task is still running and the task created to serve next 15 minutes of data is in pending state.

This would lead to loss of messages in those 15 minutes.

I believe this is something that can be handled by increasing the number of workers. Is this correct?

workers.PNG

How can number of workers be increased when docker-druid image?

Thanks,

Prathamesh

Hey Prathamesh,

Yeah you’ll need to have an additional slot for that task to run.

You can either:

  • Increase the capacity of the worker that’s running so that it can run more tasks simultaneously

  • Change the indexing configuration to remote and spawn multiple middleManager processes.

The second option require changing more configuration so if you’re interested in getting things working I’d suggest going for the first.

I haven’t much experience using the indexing service in local mode but you should be able to use the configuration option druid.worker.capacity to configure how many tasks a worker can run at once. This can be passed in via the supervisor configuration which spawns the druid-coordinator process.

Best regards,

Dylan

Thanks a lot Dylan.

After increasing the capacity of workers, i could see multiple tasks running simultaneously and event data indexed successfully. Thanks again.

Just out of curiosity, what did you mean by “using the indexing service in local mode”? whats local mode here?

Also, Is there a way to increase “number of workers” as well?

Regards,

Prathamesh