AWS Database Blog
Automate Avalanche node deployment using the AWS CDK: Part 1
Avalanche is an EVM-compatible, layer-1 blockchain network. The protocol is built upon a novel consensus mechanism, paired with subnets with their own virtual machines. Subnets enable the creation of custom, app-specific blockchains for different use cases and allow the Avalanche network to scale infinitely.
At its core, a blockchain is a set of replicated state machines that run the same set of operations in the same order (for example, smart contract execution, asset transfers), where blocks are ordered globally through consensus. At peak, Avalanche network produces over 500 transactions per block with a sub-second finality (see block explorer), which can make node operation resource-intensive. Although Avalanche fully supports on-premises strategies, depending on your organization’s needs regarding locality and cost, a cloud-native environment can be more cost-effective and reliable.
In this post, we address operational challenges using AWS primitives and the Rust SDK, and showcase how to do a single-command deployment of Avalanche node using the AWS Cloud Development Kit (AWS CDK). A future post will cover more advanced topics like private and custom network deployments and Avalanche Subnet installation.
Before we start, we want to highlight some key technologies from AWS that Avalanche utilizes to create a consistent node deployment.
Amazon EBS volume mapper
As a chain makes progress, its historical states grow, utilizing more disk space over time. The full archival node for a chain can take anywhere from a few hours to days to sync the full chain states. Therefore, you need to have static Amazon Elastic Block Store (Amazon EBS) volumes per an Availability Zone, in case the underlying Amazon Elastic Compute Cloud (Amazon EC2) instance goes offline (for example, for hardware maintenance). That way, on its recovery, the new EC2 instance can simply reload the existing data from the volume, without going through another cycle of state sync. aws-volume-provisioner maintains the static mapping between the Availability Zone and newly launched EC2 instances in order to retain the node state.
In case of a single node deployment, we pin the node to one Availability Zone, because the EBS volume is a zone-specific resource. This is particularly useful when the machine is a Spot Instance with more frequent node restarts.
Elastic IP mapper
An Avalanche node is not required to have a fixed IP host, because the peer uptime is tracked based on its node ID by the randomly sampled peers. However, having an IP statically mapped to a node ID makes node monitoring easy and enables more stable anchor-node discovery mechanisms for private network use cases. aws-ip-provisioner maintains the static mapping between the EBS volume and the Elastic IP.
Note that peer discovery via hard IPs is in general an anti-pattern in cloud networking and system management. However, decentralized networks like Avalanche require vendor-neutral mechanisms for the beacon/anchor node discovery. Therefore, Avalanche maintains the well-established list of anchor node IPs in the code base (see the code on GitHub).
Avalanche agent
avalanched is an agent (or daemon) that runs on every remote machine, and creates and installs Avalanche-specific resources (for example, TLS certificate generation, anchor-node discovery, write avalanche node service files). After the basic AWS resources are provisioned with the AWS CDK and AWS CloudFormation, avalanched auto-generates staking TLS certificates and stores them encrypted in the remote storage Amazon Simple Storage Service (Amazon S3), based on user-provided configuration (see the default avalanche-ops configuration). The TLS certificates are used to uniquely identify each node (maps to a node ID), so it’s important to back it up safely. avalanched uses the user-provided AWS Key Management Service (AWS KMS) key to envelope encrypt the keys before uploading them to Amazon S3.
Avalanche telemetry
Each Avalanche node provides metrics that monitor the overall health and performance of the validators. avalanche-telemetry-cloudwatch is an agent that routinely collects such metrics and reports back to a dedicated telemetry recording service (for example, Amazon CloudWatch). Node operators can create alarms that alert the team if any abnormal conditions are found. The agent is written in Rust using the AWS Rust SDK. An Avalanche node exposes its metrics via a Prometheus endpoint, and the agent periodically queries and parses the metrics data based on the regex-based configuration.
Solution overview
The following system diagram illustrates our solution architecture.
We walk you through the following high-level steps to set up this solution:
- Clone the GitHub repo.
- Create an S3 bucket for backing up the envelope encrypted node certificate.
- Create a KMS key for envelope encrypting the node certificate.
- Create an EC2 key pair for SSH access to the node.
- Create an EC2 instance role for the Avalanche node.
- Create a VPC for the Avalanche node.
- Create an EC2 auto scaling group for the Avalanche node.
Prerequisites
For this walkthrough, the following are required:
- An AWS account
- An AWS Identity and Access Management (IAM) user with administrator access
- The AWS CDK
- The AWS CLI
- Configured AWS credentials for AWS CDK commands
Clone the GitHub repo
Use the following code to clone the GitHub repo:
Set AWS Region
For example, you can set your default region as follows.
Set the AWS_REGION as follows:
You can also run aws configure
command and set your region as well.
Create an S3 bucket for backing up the envelope encrypted node certificate
Use the following code to create your S3 bucket (replace S3_BUCKET_NAME
with your own S3 bucket name):
Create a KMS key for envelope encrypting the node certificate
Create your KMS key with the following code (replace KMS_CMK_ARN
with your own KMS key):
To delete the key later, use the following code:
Create an EC2 key pair for SSH access to the nodes
Use the following code to create your EC2 key pair (replace EC2_KEY_PAIR_NAME with your own EC2 key pair name):
Create an EC2 instance role for the Avalanche node
Set the following parameters:
- CDK_REGION – The Region to create resources. The command below is using the default AWS region set on your AWS CLI.
- CDK_ACCOUNT – The AWS account to create resources
- ID – The unique identifier for the node
- KMS_CMK_ARN – The KMS customer managed key (CMK) ARN for envelope encryption of your node certificate
- S3_BUCKET_NAME – The S3 bucket name to back up the node certificate. The S3 bucket names are unique and hence you would need to add your name to the end of the string.
For example:
See ec2_instance_role.yaml for the CloudFormation template.
Create a VPC for the Avalanche node
Set the following parameters:
- CDK_REGION – The Region to create resources
- CDK_ACCOUNT – The AWS account to create resources
- ID – The unique identifier for the node
For example:
See vpc.yaml for the CloudFormation template.
Create an EC2 auto scaling group for the Avalanche node
Set the following parameters:
- CDK_REGION – The Region to create resources
- CDK_ACCOUNT – The AWS account to create resources
- ID – The unique identifier for the node
- KMS_CMK_ARN – The KMS CMK ARN for envelope encryption of your node certificate
- S3_BUCKET_NAME – The S3 bucket name to back up the node certificate
- EC2_KEY_PAIR_NAME – The EC2 key pair name for SSH access
- AAD_TAG: Authentication of additional authenticated data (AAD) for envelope encryption
- INSTANCE_PROFILE_ARN – The EC2 instance profile ARN
- SECURITY_GROUP_ID – The VPC security group
- PUBLIC_SUBNET_IDS – The public subnet IDs created for the VPC
- NETWORK_ID – The network IDs: one for mainnet, five for fuji/test net
- NLB_VPC_ID – The VPC ID, used for setting up a Network Load Balancer
For example:
See asg_amd64_ubuntu.yaml for the CloudFormation template.
Check the created resources
By default, the solution creates an Elastic IP per node. If NlbEnabled
is set to true
(found in the CloudFormation template), use the NlbDnsName
output from the preceding stack, otherwise use the Elastic IP to check the metrics and RPC endpoints. For instance, http:// + NlbDnsName/Elastic IP + :9650/ext/metrics
returns the current metrics of the node as shown in the following screenshot.
Go to CloudWatch Logs to see the logs being published from avalanched
.
The following screenshot shows a list of log events for this cluster.
Got to the CloudWatch Metrics page to see the metrics being published
from avalanched
.
The following screenshot shows an example of graphed metrics.
The default setup creates SSH and HTTP port inbound rules open to the public. As SSH and HTTP ports aren’t used for peer-to-peer communication and only needed for host machine access, we strongly advise limiting the CIDR range to your own IP address as shown below.
Clean up
To clean up your resources, run cdk destroy
for each stack as follows:
Please run the following command to get your account number and substitute in the CDK_ACCOUNT
variable shown below for each of the stacks.
Using a Rust-based CLI
If you are looking for a Rust-based CLI, check out avalancheup-aws/recipes.
Conclusion
In this post, we showed how to use the AWS CDK to deploy an Avalanche node on AWS. The avalanched
agent helps implement Avalanche-specific installation logic that would otherwise be harder to apply in bash scripts. aws-volume-manager
retains the existing volumes in case an EC2 instance is deleted. aws-telemetry-cloudwatch
collects Avalanche node metrics that can monitor client-perceived latencies and other performance and reliability issues.
To learn more about Avalanche, check out the avalanche-ops recipes, AWS CDK deployment instructions, and official Avalanche documentation.
About the authors
Raj Seshadri is a Senior Partners Solutions Architect with AWS and and a valued member of the Technical Field Community for both containers and blockchain. With an insatiable appetite for exploring blockchain technology, Raj is particularly drawn to Ethereum, Web3, NFTs, and defi. Before joining AWS, Raj acquired significant industry experience with notable companies such as Aqua Security, Red Hat, Dell, and EMC. In his spare time, he plays tennis and enjoys traveling around the world. Follow him on Twitter @texanraj to stay up-to-date on his latest thoughts and insights.
Guyho Lee is a staff software engineer at Ava Labs, working on the consensus protocols and various toolings. Previously, worked at AWS EKS as a senior software engineer, a lead maintainer of etcd.