AWS Compute Blog
Deploying your first 5G enabled application with AWS Wavelength
This post was written by Mike Coleman, Senior Developer Advocate, Twitter handle: @mikegcoleman
Today, AWS released AWS Wavelength. Wavelength allows you to deploy applications and services at the edge of a mobile carrier’s 5G network. By combining the benefits of 5G, such as high bandwidth and low latency, with the ability to use AWS tools and services you’re already familiar with, you’re able to build next generation edge applications quickly and easily.
Rather than go into more depth about Wavelength in this blog, I’d recommend reading Jeff Barr’s blog post. His post goes into detail about why we built Wavelength, and how you can get started with deploying AWS resources in a Wavelength Zone.
In this blog, I walk you through deploying one of the most common Wavelength use cases: machine learning inference.
Why inference at the edge?
One of the tradeoffs with machine learning applications is system responsiveness. If your application must be highly responsive, you may need to deploy your inference processing application close to the end user. In the case of mobile devices, this could mean that the inference processing takes place on the device itself. This type of additional processing demand on the device often results in reduced device battery life among other tradeoffs. Additionally, if you need to update your machine learning model, you must push out an update to all the devices running your application.
As I mentioned earlier, one of the key benefits of 5G and Wavelength is significantly lower latencies compared to previous generation mobile networks. For edge applications, this implies you can actually perform inference processing in a Wavelength zone with near real-time responsiveness to the mobile device. By moving the inference processing to the Wavelength zone, you reduce power consumption and battery drain on the mobile device. Additionally, you can simplify application updates. If you need to make a change to your training model, you simply update your servers in the Wavelength Zone instead of having to ship a new version to all the devices running your code.
Solution Overview
The following tutorial guides you through deploying an object detection application that is comprised of three components:
- A Wavelength-hosted API endpoint (using Flask)
- A Wavelength-hosted Inference server (running Torchserve)
- A React web app being accessed via the browser via mobile device running on the carrier’s 5G network..
- A server that acts as a bastion host allowing you to SSH into your other instances and as a web server for the React web application.
The API server is built using Python and Flask, and runs on a t3.medium instance based upon a standard Unbuntu 18.04 image. It accepts an image from the client application running on a device connected to the carriers 5G mobile network, which it then forwards to the inference server. The inference server returns the detected object along with coordinates for that object (or an error if it can’t detect any objects). The API server adds a text label and bounding boxes to the image and returns it to the mobile client.
The inference server runs Torchserve, an open source project that provides a flexible and easy way to serve up PyTorch models. Object detection is done using a Faster R-CNN model. It is then deploy it on a g4dn.2xlarge instance running the AWS deep learning Amazon Machine Image (AMI).
You will use the web browser on your mobile device to access the web server which will host the client application which is written in React.
Wavelength is designed to provide access to services and applications that require low latency. It’s important to note that you don’t need to deploy your entire application in a Wavelength Zone. You only need to deploy parts of your application that benefit from being deployed in the Wavelength Zone – such as application components requiring low latency.
In the case of the demo application, the API and inference servers are located in the Wavelength Zone because one of the design goals of the application is low-latency processing of the inference requests.
On the other hand, because the web server is only serving a small single page React web app, it does not have the same latency requirements as the inference processing. For that reason, it’s hosted in the Region instead of the Wavelength Zone.
Prerequisites
To complete the walkthrough below, you need:
- To be familiar working from the command line, including editing text files.
- The AWS CLI installed on your local machine. Ensure it’s the latest version so it supports Wavelength.
- An administrative account with sufficient permissions to create VPC resources (instances, subnets, etc).
- In order to access resources in a Wavelength Zone, you need a mobile device on a carrier’s 5G mobile network in a city that has access to the Zone. The following tutorial is written to be deployed in the Boston Wavelength Zone, but you can adjust the environment variables for the Zone and Region to deploy it other area.
- An SSH key pair in the us-east-1 Region.
- The commands below work on Mac and Linux machines. If you are on a Windows machine the easiest way to run through the tutorial is to spin up a Linux-based EC2 instance, install and configure the AWS CLI, and run the commands from the EC2 instance’s command line.
Create the VPC and associated resources
The first step in this tutorial is deploying to the VPC, internet gateway, and carrier gateway.
Start by configuring some environment variables, and then deploying the resources.
- In order to get started, you need to first set some environment variables.
Note: replace the value for KEY_NAME with the name of the key pair you wish to use.
Note: these values are specific to the us-east-1 Region. If you wish to deploy into another region, you’ll need to modify them as appropriate. Check the documentation for more info.export REGION="us-east-1" export WL_ZONE="us-east-1-wl1-bos-wlz-1" export NBG="us-east-1-wl1-bos-wlz-1" export INFERENCE_IMAGE_ID="ami-029510cec6d69f121" export API_IMAGE_ID="ami-0ac80df6eff0e70b5" export BASTION_IMAGE_ID="ami-027b7646dafdbe9fa" export KEY_NAME=<your key name>
- Use the AWS CLI to create the VPC.
export VPC_ID=$(aws ec2 --region $REGION \ --output text \ create-vpc \ --cidr-block 10.0.0.0/16 \ --query 'Vpc.VpcId') \ && echo '\nVPC_ID='$VPC_ID
- Create an internet gateway and attach it to the VPC.
export IGW_ID=$(aws ec2 --region $REGION \ --output text \ create-internet-gateway \ --query 'InternetGateway.InternetGatewayId') \ && echo '\nIGW_ID='$IGW_ID aws ec2 --region $REGION \ attach-internet-gateway \ --vpc-id $VPC_ID \ --internet-gateway-id $IGW_ID
- Add the carrier gateway.
export CAGW_ID=$(aws ec2 --region $REGION \ --output text \ create-carrier-gateway \ --vpc-id $VPC_ID \ --query 'CarrierGateway.CarrierGatewayId') \ && echo '\nCAGW_ID='$CAGW_ID
Deploy the security groups
In this section, you add three security groups:
- Bastion SG allows SSH traffic from your local machine to the bastion host as well as HTTP web traffic from the Internet
- API SG allows SSH traffic from the Bastion SG and opens up port 5000 to accept incoming API requests
- Inference SG allows SSH traffic from the Bastion host and communications on port 8080 and 8081 (the ports used by the inference server) from the API SG.
- Create the bastion security group and add the ingress SSH role.Note: SSH access is only being allowed from your current IP address. You can adjust if you need by changing the –-cidr parameter in the second command.
export BASTION_SG_ID=$(aws ec2 --region $REGION \ --output text \ create-security-group \ --group-name bastion-sg \ --description "Security group for bastion host" \ --vpc-id $VPC_ID \ --query 'GroupId') \ && echo '\nBASTION_SG_ID='$BASTION_SG_ID aws ec2 --region $REGION \ authorize-security-group-ingress \ --group-id $BASTION_SG_ID \ --protocol tcp \ --port 22 \ --cidr $(curl https://checkip.amazonaws.com)/32 aws ec2 --region $REGION \ authorize-security-group-ingress \ --group-id $BASTION_SG_ID \ --protocol tcp \ --port 80 \ --cidr 0.0.0.0/0
- Create the API security group along with two ingress rules: one for SSH from the bastion security group and one opening up the port the API server communicates on (5000).
export API_SG_ID=$(aws ec2 --region $REGION \ --output text \ create-security-group \ --group-name api-sg \ --description "Security group for API host" \ --vpc-id $VPC_ID \ --query 'GroupId') \ && echo '\nAPI_SG_ID='$API_SG_ID aws ec2 --region $REGION \ authorize-security-group-ingress \ --group-id $API_SG_ID \ --protocol tcp \ --port 22 \ --source-group $BASTION_SG_ID aws ec2 --region $REGION \ authorize-security-group-ingress \ --group-id $API_SG_ID \ --protocol tcp \ --port 5000 \ --cidr 0.0.0.0/0
- Create the security group for the inference server along with three ingress rules: one for SSH from the bastion security group, and opening the ports the inference server communicates on (8080 and 8081) to the API security group.
export INFERENCE_SG_ID=$(aws ec2 --region $REGION \ --output text \ create-security-group \ --group-name inference-sg \ --description "Security group for inference host" \ --vpc-id $VPC_ID \ --query 'GroupId') \ && echo '\nINFERENCE_SG_ID='$INFERENCE_SG_ID aws ec2 --region $REGION \ authorize-security-group-ingress \ --group-id $INFERENCE_SG_ID \ --protocol tcp \ --port 22 \ --source-group $BASTION_SG_ID aws ec2 --region $REGION \ authorize-security-group-ingress \ --group-id $INFERENCE_SG_ID \ --protocol tcp \ --port 8080 \ --source-group $API_SG_ID aws ec2 --region $REGION \ authorize-security-group-ingress \ --group-id $INFERENCE_SG_ID \ --protocol tcp \ --port 8081 \ --source-group $API_SG_ID
Add the subnets and routing tables
In the following steps you’ll create two subnets along with their associated routing tables and routes.
- Create the subnet for the Wavelength Zone
export WL_SUBNET_ID=$(aws ec2 --region $REGION \ --output text \ create-subnet \ --cidr-block 10.0.0.0/24 \ --availability-zone $WL_ZONE \ --vpc-id $VPC_ID \ --query 'Subnet.SubnetId') \ && echo '\nWL_SUBNET_ID='$WL_SUBNET_ID
- Create the route table for the Wavelength subnet
export WL_RT_ID=$(aws ec2 --region $REGION \ --output text \ create-route-table \ --vpc-id $VPC_ID \ --query 'RouteTable.RouteTableId') \ && echo '\nWL_RT_ID='$WL_RT_ID
- Associate the route table with the Wavelength subnet and a route to route traffic to the carrier gateway which in turns routes traffic to the carrier mobile network.
aws ec2 --region $REGION \ associate-route-table \ --route-table-id $WL_RT_ID \ --subnet-id $WL_SUBNET_ID aws ec2 --region $REGION create-route \ --route-table-id $WL_RT_ID \ --destination-cidr-block 0.0.0.0/0 \ --carrier-gateway-id $CAGW_ID
Next, repeat the same process to create the subnet and routing for the bastion subnet.
- Create the bastion subnet
BASTION_SUBNET_ID=$(aws ec2 --region $REGION \ --output text \ create-subnet \ --cidr-block 10.0.1.0/24 \ --vpc-id $VPC_ID \ --query 'Subnet.SubnetId') \ && echo '\nBASTION_SUBNET_ID='$BASTION_SUBNET_ID
- Deploy the bastion subnet route table and a route to direct traffic to the internet gateway
export BASTION_RT_ID=$(aws ec2 --region $REGION \ --output text \ create-route-table \ --vpc-id $VPC_ID \ --query 'RouteTable.RouteTableId') \ && echo '\nBASTION_RT_ID='$BASTION_RT_ID aws ec2 --region $REGION \ create-route \ --route-table-id $BASTION_RT_ID \ --destination-cidr-block 0.0.0.0/0 \ --gateway-id $IGW_ID aws ec2 --region $REGION \ associate-route-table \ --subnet-id $BASTION_SUBNET_ID \ --route-table-id $BASTION_RT_ID
- Modify the bastion’s subnet to assign public IPs by default
aws ec2 --region $REGION \ modify-subnet-attribute \ --subnet-id $BASTION_SUBNET_ID \ --map-public-ip-on-launch
Create the Elastic IPs and networking interfaces
The final step before deploying the actual instances is to create two carrier IPs, IP addresses associated with the carrier network. These IP addresses will be assigned to two Elastic Network Interfaces (ENIs), and the ENIs will be assigned to our API and Inference server (the bastion host will have it’s public IP assigned upon creation by the bastion subnet).
- Create two carrier IPs, one for the API server and one for the inference server
export INFERENCE_CIP_ALLOC_ID=$(aws ec2 --region $REGION \ --output text \ allocate-address \ --domain vpc \ --network-border-group $NBG \ --query 'AllocationId') \ && echo '\nINFERENCE_CIP_ALLOC_ID='$INFERENCE_CIP_ALLOC_ID export API_CIP_ALLOC_ID=$(aws ec2 --region $REGION \ --output text \ allocate-address \ --domain vpc \ --network-border-group $NBG \ --query 'AllocationId') \ && echo '\nAPI_CIP_ALLOC_ID='$API_CIP_ALLOC_ID
- Create two elastic network interfaces (ENIs)
export INFERENCE_ENI_ID=$(aws ec2 --region $REGION \ --output text \ create-network-interface \ --subnet-id $WL_SUBNET_ID \ --groups $INFERENCE_SG_ID \ --query 'NetworkInterface.NetworkInterfaceId') \ && echo '\nINFERENCE_ENI_ID='$INFERENCE_ENI_ID export API_ENI_ID=$(aws ec2 --region $REGION \ --output text \ create-network-interface \ --subnet-id $WL_SUBNET_ID \ --groups $API_SG_ID \ --query 'NetworkInterface.NetworkInterfaceId') \ && echo '\nAPI_ENI_ID='$API_ENI_ID
- Associate the carrier IPs with the ENIs
aws ec2 --region $REGION associate-address \ --allocation-id $INFERENCE_CIP_ALLOC_ID \ --network-interface-id $INFERENCE_ENI_ID aws ec2 --region $REGION associate-address \ --allocation-id $API_CIP_ALLOC_ID \ --network-interface-id $API_ENI_ID
Deploy the API and inference instances
With the VPC and underlying networking and security deployed, you can now move on to deploying your API and Inference instances. The API server is a t3.medium instance based on a standard Ubuntu 18.04 AMI. The Inference server is a g4dn.2xlarge running the AWS deep learning AMI. You install and configure the software components in subsequent steps.
- Deploy the API instance
aws ec2 --region $REGION \ run-instances \ --instance-type t3.medium \ --network-interface '[{"DeviceIndex":0,"NetworkInterfaceId":"'$API_ENI_ID'"}]' \ --image-id $API_IMAGE_ID \ --key-name $KEY_NAME
- Deploy the inference instance
aws ec2 --region $REGION \ run-instances \ --instance-type g4dn.2xlarge \ --network-interface '[{"DeviceIndex":0,"NetworkInterfaceId":"'$INFERENCE_ENI_ID'"}]' \ --image-id $INFERENCE_IMAGE_ID \ --key-name $KEY_NAME
Deploy the bastion / web server
You must deploy a bastion server in order to SSH into your application instances. Remember that the carrier gateway in a Wavelength Zone only allows ingress from the carrier’s 5G network. This means that in order to SSH into the API and inference servers you need to first SSH into the bastion host, and then from there, SSH into your Wavelength instances.
You are also going to install the client front end application onto the bastion host. You can use the webserver to test the application if you don’t want to install the React Native version of the application onto a mobile device. Remember that even though you’re not using the native application, the website must still be accessed from a device on the carrier’s 5G network.
- Issue the command below to create your bastion host
aws ec2 --region $REGION run-instances \ --instance-type t3.medium \ --associate-public-ip-address \ --subnet-id $BASTION_SUBNET_ID \ --image-id $BASTION_IMAGE_ID \ --security-group-ids $BASTION_SG_ID \ --key-name $KEY_NAME
Note: It takes a few minutes for your instances to be ready. Even when the status check in the EC2 console reads 2/2 checks passed, It may still be a few minutes before the instance is done installing additional software packages and configuring itself. If you receive a lock error while running apt-get
, wait several minutes and try again.
Configure the bastion host / web server
The last server you deployed serves two purposes. It acts as the bastion host allowing you to SSH into your other two servers, and it serves the client web app. In this section you’ll install that web app.
- SSH into bastion host (the user name is bitnami).Note: In order to be able to easily SSH from the bastion host to the inference server you should use the -A (agent forwarding) parameter when starting your SSH session e.g.:
ssh -i /path/to/key.pem -A bitnami@<bastion ip address>
- Clone the GitHub repo with the React code
git clone https://github.com/mikegcoleman/react-wavelength-inference-demo.git
- Install the dependencies
cd react-wavelength-inference-demo && npm install
- Build the webpage
npm run build
- Copy the page into web servers root directory
cp -r ./build/* /home/bitnami/htdocs
- Test that the web app is running correctly by navigating to the public IP address of your bastion instance
Configure the inference server
In this section you deploy a Torchserve server running on EC2. Torchserve is configured with the fasterrcnn model. It receives the image from the API server, runs the inference, and returns the labels and bounding boxes for the items found in the image.
I’m not going to spend time going into the inner workings of Torchserve in this post. However, if you’re interested in learning more, check out my colleague Shashank’s blog.
- SSH into bastion host and the nSSH into the inference server instance.Note: In order to be able to easily SSH from the bastion host to the inference server you will want to use the -A (agent forwarding) parameter when starting your SSH session with the bastion host e.g.:
ssh -i /path/to/key.pem -A bitnami@<bastion public ip>
To SSH from the bastion host to the inference server you do not need the -i or -A parameters e.g.:
ssh ubuntu@<inference server private ip>
- Update the packages on the server and install the necessary prerequisite packages.
sudo apt-get update -y \ && sudo apt-get install -y virtualenv openjdk-11-jdk gcc python3-dev
- Create a virtual environment.
mkdir inference && cd inference virtualenv --python=python3 inference source inference/bin/activate
- Install Torchserve and its related components
pip3 install \ torch torchtext torchvision sentencepiece psutil \ future wheel requests torchserve torch-model-archiver
- Install the inference model that the application will use.
mkdir torchserve-examples && cd torchserve-examples git clone https://github.com/pytorch/serve.git mkdir model_store wget https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth torch-model-archiver --model-name fasterrcnn --version 1.0 \ --model-file serve/examples/object_detector/fast-rcnn/model.py \ --serialized-file fasterrcnn_resnet50_fpn_coco-258fb6c6.pth \ --handler object_detector \ --extra-files serve/examples/object_detector/index_to_name.json mv fasterrcnn.mar model_store/
- Create a configuration file for Torchserve (
config.properties
) and configure Torchserve to listen on your instance’s private IP.Be sure to substitute the private IP of your instance below, you can find the private IP for your instance in the EC2 console.The contents of config.properties should look as follows:inference_address=http://<your instance private IP>:8080 management_address=http://<your instance private IP>:8081
For example:
inference_address=http://10.0.0.253:8080 management_address=http://10.0.0.253:8081
- Start the Torchserve server.
torchserve --start \ --model-store model_store \ --models fasterrcnn=fasterrcnn.mar \ --ts-config config.properties
It takes a few seconds for the server to startup, when it’s ready you should see a line that ends with:
State change WORKER_STARTED -> WORKER_MODEL_LOADED
Leave this SSH session running so you can watch the inference server’s logs to see when it receives requests from the API server.
Configure the API server
In this section, you deploy the Flask-based API server.
- SSH into bastion host and then SSH into the API server instance.Note: In order to be able to easily SSH from the bastion host to the API server you should use the -A (agent forwarding) parameter when starting your SSH session with the bastion host e.g.:
ssh -i /path/to/key.pem -A bitnami@<bastion public ip>
To SSH from the bastion host to the API server you do not need the
-i
or-A
parameters e.g.:ssh ubuntu@<api server private ip>
- Test your inference server (being sure to substitute the INTERNAL IP of the inference instance in the second line below):
curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg curl -X POST \ http://<your_inf_server_internal_IP>:8080/predictions/fasterrcnn \ -T kitten.jpg
You should see something similar to
[ { "cat": "[(228.7825, 82.63463), (583.77545, 677.3058)]" }, { "car": "[(124.427414, 69.34327), (270.15457, 205.53458)]" } ]
The inference server returns the labels of the objects it detected, and the corner coordinates of boxes that surround those objects.
Now that you have verified the API server can connect to the inference server, you can configure the API server.
- Run the following command to update system package information and install necessary prerequisites.
sudo apt-get update -y \ && sudo apt-get install -y \ libsm6 libxrender1 libfontconfig1 virtualenv
- Clone the Python code into the application directory
mkdir apiserver && cd apiserver git clone https://github.com/mikegcoleman/flask_wavelength_api .
- Create and activate a virtual environment.
virtualenv --python=python3 apiserver source apiserver/bin/activate
- Install necessary Python packages.
pip3 install opencv-python flask pillow requests flask-cors
- Create a configuration file (
config_values.txt
) with the following line (substituting the INTERNAL IP of your inference server).http://<your_inf_server_internal_IP>:8080/predictions/fasterrcnn
- Start the application.
python api.py
You should see output similar to the following:
* Serving Flask app "api" (lazy loading) * Environment: production WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead * Debug mode: on * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit) * Restarting with stat * Debugger is active! * Debugger PIN: 311-750-351
Test the client application
To test the application, you need to have a device on the carrier’s 5G network. From your device’s web browser navigate the bastion / web server’s public IP address. In the text box at the top of the app enter the public IP of your API server.
Next, choose an existing photo from your camera roll, or take a photo with the camera and press the process object button underneath the preview photo (you may need to scroll down).
The client will send the image to the API server, which forwards it to the inference server for detection. The API server then receives back the prediction from the inference server, adds a label and bounding boxes, and return the marked-up image to the client where it will be displayed.
If the inference server cannot detect any objects in the image, you will receive a message indicating the prediction failed.
Conclusion and next steps
In this blog I covered some of the architectural considerations when deploying applications into Wavelength Zones. You then deployed a sample application designed to give you an idea of how you might architect an inference-at-the-edge solution. I hope this has inspired you to go off build something new to take advantage of the exciting capabilities that Wavelength and 5G enable. Visit https://aws.amazon.com/wavelength/ to request access and check out documentation and other resources.