I am using druid-docker image to run a druid cluster and i am ingesting streaming data using tranquility.
The segment granularity is set to 15 minutes and window time to 10 minutes. Druid has a grace period druidBeam.firehoseGracePeriod of 5 minutes by default. This means the realtime tasks would run for total 30 minutes.
As an when the first event is received by application, a realtime task is created by tranquility and data is ingested into druid. This realtime task is responsible for ingesting data over next 15 minutes. This works fine and events with timestamp within next 15 minutes (segment granularity) is ingested without any problem.
Eg. first event is received at 11.00 leading to creation of a task as shown below:
However this task will not index any events with timestamp beyond 11.15 but will keep running till 11.30.
When events with timestamp not covered by above tasks are received , new task gets created. For eg. for event with timestamp 11.17 new task is created as shown below:
However tranquility is not able to load data into druid for the new events since the earlier task is still running and the task created to serve next 15 minutes of data is in pending state.
This would lead to loss of messages in those 15 minutes.
I believe this is something that can be handled by increasing the number of workers. Is this correct?
How can number of workers be increased when docker-druid image?