EC2AutoScaler not working properly - Potential Bug

Hi, I enable the EC2AutoScaler for the middle manager. The Autoscaler brings up the node. The node gets attached to the cluster but after the timeout period the node is killed.

Here is my json input

{

“selectStrategy”: {

“type”: “equalDistribution”

},

“autoScaler”: {

“type”: “ec2”,

“minNumWorkers”: 2,

“maxNumWorkers”: 5,

“envConfig”: {

“availabilityZone”: “us-east-1c”,

“nodeData”: {

“amiId”: “XXXXXXXXXXXXXXX”,

“instanceType”: “r5.4xlarge”,

“minInstances”: 1,

“maxInstances”: 1,

“securityGroupIds”: [“XXXXXXXXXXXXXXX”],

“subnetId” : “XXXXXXXXXXXXXXX”,

“iamRole”: “XXXXXXXXXXXXXXX”,

“keyName”: “XXXXXXXXXXXXXXX”

},

“userData”: {

“impl”: “string”,

“data”: “XXXXXXXXXXXXXXX”,

“versionReplacementString”: “:VERSION:”,

“version”: 1

}

}

}

}

I narrowed down the issue to ipToIdLookup API of EC2AutoScaler.java. The API is trying to do describeInstances with “private-ip-address” as filter but in input it is providing the “private-dns-name”. Due to this the Reservation array is empty setting retVal as empty list.

When the retVal is empty it doesn’t remove the node which is already provisioned from the

SimpleWorkerProvisioningStrategy.java (Line 147) currentlyProvisioning.removeAll(workerNodeIds);

And eventually due to above all the instances from currentlyProvisioning are terminated due to line 176

workerConfig.getAutoScaler().terminateWithIds(Lists.newArrayList(currentlyProvisioning));

Solution

  1. Either we should pass prover IP address to **describeInstances **

or

**2. Change the filter of ****describeInstances to **private-dns-name

This would fix the issue.

Is there any other way to make it work? How is it working for others?

Thanks,

Mohan

Hi,

Is anyone else facing this issue?

Regards,

Mohan

Hi Mohan,

Did you try setting the following in your overlord config?

{

“selectStrategy”: {

“type”: “fillCapacity”,

“affinityConfig”: {

“affinity”: {

“datasource1”: [“host1:port”, “host2:port”],

“datasource2”: [“host3:port”]

}

}

},

“autoScaler”: {

The middle manager doesn’t talk to DNS so you may need to resolve host to IP in a local hosts file on the middle manager.

Eric

Hey Mohan,

I think what’s going on here is the EC2AutoScaler is assuming that the AMI you provide is setting “druid.host” in common.runtime.properties equal to the private IP address of the host. Under that assumption, the current code would work fine. As I recall, the environment that the EC2AutoScaler was originally developed for did work like that.

To move past this, I would suggest either doing the same thing in your AMI, or if you are interested in contributing to Druid, working on a patch to fix this to be more flexible. Perhaps resolve the druid.host if it’s a hostname and then look that up as a private-ip-address. (You wouldn’t want to just switch to private-dns-name, since then if someone had specified the private IP address for druid.host, it would break their setup.)

Hi Gian,

The issue is resolved by adding

druid.worker.ip in middle manager runtime.properties

and druid.host in common.runtime.properties

By adding these Druid is detecting the ip addresses instead of private dns which solves the problem

Thanks for your help.

Regards,

Mohan

Thanks for the update Mohan.

Eric Graham

Solutions Engineer -** **Imply

**cell: **303-589-4581

email: eric.graham@imply.io

www.imply.io