Compute Abstractions on AWS: A Visual Story

When I joined AWS last year, I wanted to find a way to explain, in the easiest way possible, all the options it offers to users from a compute perspective. There are many ways to peel this onion, but I want to share a “visual story” that I have created.

I define the compute domain as “anything that has CPU and Memory capacity that allows you to run an arbitrary piece of code written in a specific programming language.” Your mileage may vary in how you define it, but this is broad enough that it should cover a lot of different interpretations.

A key part of my story is around the introduction of different levels of compute abstractions this industry has witnessed in the last 20 or so years.

Separation of duties

The start of my story is a line. In a cloud environment, this line defines the perimeter between the consumer role and the provider role. In the cloud, there are things that AWS will do and things that the consumer will do. The perimeter of these responsibilities varies depending on the services you opt to use. If you want to understand more about this concept, read the AWS Shared Responsibility Model documentation.

The different abstraction levels

The reason why the line above is oblique is because it needs to intercept different compute abstraction levels. If you think about what happened in the last 20 years of IT, we have seen a surge of different compute abstractions that changed the way people consume CPU and Memory resources. It all started with physical (x86) servers back in the 80s, and then we have seen the industry adding abstraction layers over the years (for example, hypervisors, containers, functions).

The higher you go in the abstraction levels, the more the cloud provider can add value and can offload the consumer from non-strategic activities. A lot of these activities tend to be “undifferentiated heavy lifting.” We define this as something that AWS customers have to do but that don’t necessarily differentiate them from their competitors (because those activities are table-stakes in that particular industry).

What we found is that supporting millions of customers on AWS requires a certain degree of flexibility in the services we offer because there are many different patterns, use cases, and requirements to satisfy. Giving our customers choices is something AWS always strives for.

A couple of final notes before we dig deeper. The way this story builds up through the blog post is aligned to the progression of the launch dates of the various services, with a few noted exceptions. Also, the services mentioned are all generally available and production-grade. For full transparency, the integration among some of them may still be work-in-progress, which I’ll call out explicitly as we go.

The instance (or virtual machine) abstraction

This is the very first abstraction we introduced on AWS back in 2006. Amazon Elastic Compute Cloud (Amazon EC2) is the service that allows AWS customers to launch instances in the cloud. When customers intercept us at this level, they retain responsibility of the guest operating system and above (middleware, applications, etc.) and their lifecycle. AWS has the responsibility for managing the hardware and the hypervisor including their lifecycle.

At the very same level of the stack there is also Amazon Lightsail, which “is the easiest way to get started with AWS for developers, small businesses, students, and other users who need a simple virtual private server (VPS) solution. Lightsail provides developers compute, storage, and networking capacity and capabilities to deploy and manage websites and web applications in the cloud.”

And this is how these two services appear in our story:

The container abstraction

With the rise of microservices, a new abstraction took the industry by storm in the last few years: containers. Containers are not a new technology, but the rise of Docker a few years ago democratized access. You can think of a container as a self-contained environment with soft boundaries that includes both your own application as well as the software dependencies to run it. Whereas an instance (or VM) virtualizes a piece of hardware so that you can run dedicated operating systems, a container technology virtualizes an operating system so that you can run separated applications with different (and often incompatible) software dependencies.

And now the tricky part. Modern containers-based solutions are usually implemented in two main logical pieces:

A containers control plane that is responsible for exposing the API and interfaces to define, deploy, and lifecycle containers. This is also sometimes referred to as the container orchestration layer.
A containers data plane that is responsible for providing capacity (as in CPU/Memory/Network/Storage) so that those containers can actually run and connect to a network. From a practical perspective this is typically a Linux host or less often a Windows host where the containers get started and wired to the network.

Arguably, in a specific compute abstraction discussion, the data plane is key, but it is as important to understand what’s happening for the control plane piece.

In 2014, Amazon launched a production-grade containers control plane called Amazon Elastic Container Service (ECS), which “is a highly scalable, high performance container management service that supports Docker … Amazon ECS eliminates the need for you to install, operate, and scale your own cluster management infrastructure.”

In 2017, Amazon also announced the intention to release a new service called Amazon Elastic Container Service for Kubernetes (EKS) based on Kubernetes, a successful open source containers control plane technology. Amazon EKS was made generally available in early June 2018.

Just like for ECS, the aim for this service is to free AWS customers from having to manage a containers control plane. In the past, AWS customers would spin up EC2 instances and deploy/manage their own Kubernetes masters (masters is the name of the Kubernetes hosts running the control plane) on top of an EC2 abstraction. However, we believe many AWS customers will leave to AWS the burden of managing this layer by either consuming ECS or EKS, depending on their use cases. A comparison between ECS and EKS is beyond the scope of this blog post.

You may have noticed that what we have discussed so far is about the container control plane. How about the containers data plane? This is typically a fleet of EC2 instances managed by the customer. In this particular setup, the containers control plane is managed by AWS while the containers data plane is managed by the customer. One could argue that, with ECS and EKS, we have raised the abstraction level for the control plane, but we have not yet really raised the abstraction level for the data plane as the data plane is still comprised of regular EC2 instances that the customer has responsibility for.

There is more on that later on but, for now, this is how the containers control plane and the containers data plane services appear:

The function abstraction

At re:Invent 2014, AWS introduced another abstraction layer: AWS Lambda. Lambda is an execution environment that allows an AWS customer to run a single function. So instead of having to manage and run a full-blown OS instance to run your code, or having to track all software dependencies in a user-built container to run your code, Lambda allows you to upload your code and let AWS figure out how to run it at scale.

What makes Lambda so special is its event-driven model. Not only can you invoke Lambda directly (for example, via the Amazon API Gateway), but you can trigger a Lambda function upon an event in another AWS service (for example, an upload to Amazon S3 or a change in an Amazon DynamoDB table).

The key point about Lambda is that you don’t have to manage the infrastructure underneath the function you are running. No need to track the status of the physical hosts, no need to track the capacity of the fleet, no need to patch the OS where the function will be running. In a nutshell, no need to spend time and money on the undifferentiated heavy lifting.

And this is how the Lambda service appears:

The bare metal abstraction

Also known as the “no abstraction.”

As recently as re:Invent 2017, we announced (the preview of) the Amazon EC2 bare metal instances. We made this service generally available to the public in May 2018.

This announcement is part of Amazon’s strategy to provide choice to our customers. In this case, we are giving customers direct access to hardware. To quote from Jeff Barr’s post:

“…. (AWS customers) wanted access to the physical resources for applications that take advantage of low-level hardware features such as performance counters and Intel® VT that are not always available or fully supported in virtualized environments, and also for applications intended to run directly on the hardware or licensed and supported for use in non-virtualized environments.”

This is how the bare metal Amazon EC2 i3.metal instance appears:

As a side note, and also as alluded to by Jeff, i3.metal is the foundational EC2 instance type on top of which VMware created their own VMware Cloud on AWS service. We are now offering the ability to any AWS user to provision bare metal instances. This doesn’t necessarily mean you can load your hypervisor of choice out of the box, but you can certainly do things you wouldn’t be able to do with a traditional EC2 instance (note: this was just a Saturday afternoon hack).

More seriously, a question I get often asked is whether users could install ESXi on i3.metal on their own. Today this cannot be done, but I’d be interested in hearing your use case for this.

The full container abstraction (for lack of a better term)

Now that we covered all the abstractions, it is time to go back and see if there are other optimizations we can provide for AWS customers. When we discussed the container abstraction, we called out that while there are two different fully managed containers control planes (ECS and EKS), there wasn’t a managed option for the data plane.

Some customers were (and still are) happy about being in full control of said instances. Others have been very vocal that they wanted to get out of the (undifferentiated heavy-lifting) business of managing the lifecycle of that piece of infrastructure.

Enter AWS Fargate, a production-grade service that provides compute capacity to AWS containers control planes. Practically speaking, Fargate is making the containers data plane fall into the “Provider space” responsibility. This means the compute unit exposed to the user is the container abstraction, while AWS will manage transparently the data plane abstractions underneath.

This is how the Fargate service appears:

Now ECS has two “launch types”: one called “EC2” (where your tasks get deployed on a customer-managed fleet of EC2 instances), and the other one called “Fargate” (where your tasks get deployed on an AWS-managed fleet of EC2 instances).

For EKS, the strategy will be very similar, but as of this writing it was not yet available. If you’re interested in some of the exploration being done to make this happen, this is a good read.

Conclusions

We covered the spectrum of abstraction levels available on AWS and how AWS customers can intercept them depending on their use cases and where they sit on their cloud maturity journey. Customers with a “lift & shift” approach may be more akin to consume services on the left-hand side of the slide, whereas customers with a more mature cloud native approach may be more interested in consuming services on the right-hand side of the slide.

In general, customers tend to use higher-level services to get out of the business of managing non-differentiating activities. For example, I recently talked to a customer interested in using Fargate. The trigger there was the fact that Fargate is ISO, PCI, SOC and HIPAA compliant, which was a huge time and money saver for them because it’s easier to point to an AWS document during an audit than having to architect and document for compliance the configuration of a DIY containers data plane.

As a recap, here’s our visual story with all the abstractions available:

I hope you found it useful. Any feedback is greatly appreciated.

AWS Architecture Blog