Questions about Druid EC2 auto scaling configuration

Hi Druid,

I found that Druid has a built-in autoscaling mechanism and are confused about some points.

First, what monitoring metrics does Druid use?

Second, I saw that a JSON object “nodeData” inside worker config spec was required for EC2 autoscaling on the official documentation. The demo doesn’t show how to add other EC2 environment variables like the subnet, tags, ebs_size. Is there anymore instruction how to write the “nodeData”?

Third, why the node data is a necessary field? Suppose a middelManager has been launched, we could auto scaling based on that one by using the provisioned API “Launch more like this”.

Looking forward to your reply.

For your convenience, I post below the sample worker config spec below.

Best,

Xuanyi

link: https://druid.apache.org/docs/latest/configuration/index.html#overlord

{
“selectStrategy” : {
“type”: “fillCapacity” ,
“affinityConfig” : {
“affinity” : {
“datasource1”: [“host1:port”, “host2:port” ],
“datasource2”: [“host3:port” ]
}
}
},
“autoScaler” : {
“type”: “ec2” ,
“minNumWorkers”: 2 ,
“maxNumWorkers”: 12 ,
“envConfig” : {
“availabilityZone”: “us-east-1a” ,
“nodeData” : {
“amiId”: “{AMI}" , "instanceType": "c3.8xlarge" , "minInstances": 1 , "maxInstances": 1 , "securityGroupIds": ["{IDs}” ],
“keyName”: “{KEY_NAME}" }, "userData" : { "impl": "string" , "data": "{SCRIPT_COMMAND}” ,
“versionReplacementString”: “:VERSION:” ,
“version”: null
}
}
}
}

Anybody there?

Hi Xuanyi. I’m looking into this for you and will have some more information for you in a couple of days.

Hey Xuanyi,

The built-in autoscaling works on MiddleManagers and scales based on the task queue. It is pretty simple. If there are tasks queued up it will launch more. If there are empty MiddleManagers, it will terminate that.

This system was added pretty early in Druid’s life, and was meant for a pretty narrow use case (basically, something that would be useful for one particular team). I am sure the design could be improved in various ways. If you are interested, check out “EC2AutoScaler” in the Druid codebase.

Hi Gian, thanks for your reply. I have looked at the PendingTaskBasedStrategy inside the codebase. Just my two, Druid should have somehow ThroughputBasedStrategy and dynamic taskCount.

Best,

Xuanyi

Hi,

Also wondering if this autoscaling feature could conflict with aws autoscaling groups ?

Hi Guillaume,

The Druid EC2AutoScaler currently operates directly on EC2 instances, without an autoscaling group. The overlord tracks and manages the lifecycle of these instances itself.