[druid-user] druid data server: why ec2 i3?

is there a reason i3 is recommended? the SSD comes with i3 is instance store,
if we perform a Stop/Start on the i3 instance, data stored in the instance store volume will be lost. this means start/stop EC2 will lose all the local disk cache.

For Druid it’s good to have high speed ssd disk e.g. nvme ssd disk for the segment cache and better network bandwidth , i3 instances have these both and are also very much cost effective when compared to other types of instance.

Regarding your second point : In my understanding it’s an option you can enable while creating the instance to persist the disk while stopping the node. Another option is to mount the disk and format it while starting the node. This is even better in the case of Druid as druid has the capability to pull the segment from deepstorage. In fact the later option will save some Dollar in case your use case demands keeping the nodes down while not in use.

Yeah, I would agree that the high performance of the disk is the reason to use those instance types. We are using i3en instances currently.

Instance store volumes can’t be persisted after the instance is stopped. They are not lost on a reboot, but there is no option to keep the volume after stopping or termination. Therefore we rely on the deep storage to persist the data. There is a danger, if too many instances are lost at once, the data may be unavailable while it is loaded from deep storage.

Thanks, doug and Tijo!