AWS Open Source Blog

Sustainability with Rust

Rust is a programming language implemented as a set of open source projects. It combines the performance and resource efficiency of systems programming languages like C with the memory safety of languages like Java. Rust started in 2006 as a personal project of Graydon Hoare before becoming a research project at Mozilla in 2010. Rust 1.0 launched in 2015, and in 2020, support for Rust moved from Mozilla to the Rust Foundation, a non-profit organization created as a partnership between Amazon Web Services, Inc (AWS), Google, Huawei, Microsoft, and Mozilla. The Foundation’s mission is to support the growth and innovation of Rust, and the member companies have grown from the founding 5 to 27 companies in the first year.

At AWS, Rust has quickly become critical to building infrastructure at scale. Firecracker is an open source virtualization technology that powers AWS Lambda and other serverless offerings. It launched publicly in 2018 as our first notable product implemented in Rust. We use Rust to deliver services such as Amazon Simple Storage Service (Amazon S3), Amazon Elastic Compute Cloud (Amazon EC2), Amazon CloudFront, and more. In 2020, we launched Bottlerocket, a Linux-based container operating system written in Rust, and our Amazon EC2 team uses Rust as the language of choice for new AWS Nitro System components, including sensitive applications, such as Nitro Enclaves.

At AWS, we believe leaders create more than they consume and always leave things better than they found them. In 2019, AWS was proud to become a sponsor of the Rust project. In 2020, we started hiring Rust maintainers and contributors, and we partnered with Google, Huawei, Microsoft, and Mozilla to create the Rust Foundation with a mission to support Rust. AWS is investing in the sustainability of Rust, a language we believe should be used to build sustainable and secure solutions.

Energy Efficiency in the Cloud

A chart showing energy efficiency in the cloud, comparing Traditional, Cloud (non hyperscale), and Hyperscale data centers.

Source: IEA (2021), Global data centre energy demand by data centre type, 2010-2022, https://www.iea.org/data-and-statistics/charts/global-data-centre-energy-demand-by-data-centre-type-2010-2022. All rights reserved.

Worldwide, data centers consume about 200 terawatt hours per year. That’s roughly 1% of all energy consumed on our planet. There are a couple of really interesting things about the details of that energy use. If you look at the graph of energy consumption, the top line is basically flat going back as far as 2010. That’s incredibly counter-intuitive give the tremendous growth of big data, machine learning, and edge devices our industry has experienced over that same period of time.

The second interesting detail is that while the top line of the graph is flat, inside the graph, the distribution over traditional, cloud, and hyperscale data centers has changed dramatically in the same period. Those cloud and hyperscale data centers have been implementing huge energy efficiency improvements, and the migration to that cloud infrastructure has been keeping the total energy use of data centers in balance despite massive growth in storage and compute for more than a decade.

There have been too many data center efficiency improvements to list, but here are a few examples. In compute, we’ve made efficiency improvements in hardware and implemented smarter utilization of resources to reduce idle time. We’ve slowed the growth of our servers with support for multi-instance and multi-tenant, and we’ve improved drive density and efficiency for storage. We’ve also adopted more energy efficient building materials and cooling systems.

As incredible as that success story is, there are two questions it raises. First, is the status quo good enough? Is keeping data center energy use to 1% of worldwide energy consumption adequate? The second question is whether innovations in energy efficiency will continue to keep pace with growth in storage and compute in the future? Given the explosion we know is coming in autonomous drones, delivery robots, and vehicles, and the incredible amount of data consumption, processing, and machine learning training and inference required to support those technologies, it seems unlikely that energy efficiency innovations will be able to keep pace with demand.

Image showing how Customers are responsible for sustainability in the cloud and how AWS is responsible for sustainability of the cloud

The energy efficiency improvements we’ve talked about so far have been the responsibility of AWS, but just like security, sustainability is a shared responsibility. AWS customers are responsible for energy efficient choices in storage policies, software design, and compute utilization, while AWS owns efficiencies in hardware, utilization features, and cooling systems. We are also making huge investments in renewable energy.

AWS is on a path to have 100% of our data centers powered with renewable energy by 2025, but even renewables have an environmental impact. It will take about half a million acres of solar panels to generate the 200 terawatt hours of energy used by data centers today. The mining, manufacturing, and management of that many solar panels has substantial environmental impact. So, while we’re really proud of our success with renewable energy, as Peter DeSantis, SVP, AWS said at re:Invent 2020, “The greenest energy is the energy we don’t use.”

Renewables should not replace energy efficiency as a design principle. In the same way that operational excellence, security, and reliability have been principles of traditional software design, sustainability must be a principle in modern software design. That’s why AWS announced a sixth pillar for sustainability to the AWS Well-Architected Framework.

What that looks like in practice is choices like relaxing SLAs for non-critical functions and prioritizing resource use efficiency. We can take advantage of virtualization and allow for longer device upgrade cycles. We can leverage caching and longer TTLs whenever possible. We can classify our data and implement automated lifecycle policies that delete data as soon as possible. When we choose algorithms for cryptography and compression, we can include efficiency in our decision criteria. Last, but not least, we can choose to implement our software in energy efficient programming languages.

Energy Efficient Program Languages

There was a really interesting study a few years ago that looked at the correlation between energy consumption, performance, and memory use. This is a really common conversation in sustainability. Given how little visibility we have into energy or carbon use by our services, is there a metric that can serve as a proxy? Can I look at my existing service dashboards with infrastructure costs, performance, memory, etc and use the trends I see to infer something about the trends in my service’s energy consumption?

What the study did is implement 10 benchmark problems in 27 different programming languages and measure execution time, energy consumption, and peak memory use. C and Rust significantly outperformed other languages in energy efficiency. In fact, they were roughly 50% more efficient than Java and 98% more efficient than Python.

Table showing energy efficiency scores for various programming languages. C and Rust are circled in green at the top of the list as the most energy efficient. Java is circled in red, it is in 5th place. Python is circled in red, it is in 26th place.

It’s not a surprise that C and Rust are more efficient than other languages. What is shocking is the magnitude of the difference. Broad adoption of C and Rust could reduce energy consumption of compute by 50% – even with a conservative estimate.

So the question is why not use more C? The language and developer tools are extremely mature, and the size of the developer community is much bigger than Rust. During his keynote at Open Source Summit in 2021, Linus Torvalds, the creator of Linux, acknowledged that implementing code in C can be like juggling chainsaws. As a lifelong C programmer, Torvalds knows that, “[C’s subtle type interactions] are not always logical [and] are pitfalls for pretty much anybody.”

Torvalds called Rust the first language he’s seen that might actually be a solution. Rust delivers the energy efficiency of C without the risk of undefined behavior. We can cut energy use in half without losing the benefits of memory safety.

Several analyses have concluded that more than 70% of the high severity CVEs that occur in C/C++ would be prevented by implementing those same solutions in Rust. In fact, the Internet Security Research Group (ISRG), the nonprofit that supports the Let’s Encrypt project, the Certificate Authority for 260 million websites, has a goal to move all internet security sensitive infrastructure to memory safe languages. The projects underway include support for Rust in the Linux kernel and migrating curl to Rust implementations of TLS and HTTP.

Table showing Time efficiency for various programming languages. C and Rust are circled in green at the top of the list as the most efficient.

Looking again at that study about correlation, we have measurements for more than just energy consumption. The middle column shows the results for execution time, and the times for Rust and C are really similar. Both languages are executing faster than other languages. That means, when you choose to implement your software in Rust for the sustainability and security benefits, you also get the optimized performance of C.

Rust Customer Success Stories

Bar chart showing Javascript compared to Rust in terms of Latency per datagram. In both Median Latency and P95 Latency, Javascript has significantly higher latency than Rust.

https://medium.com/tenable-techblog/optimizing-700-cpus-away-with-rust-dc7a000dbdb2

Tenable is a cyber security solutions provider focused on exposure visibility tools, and they had a sidecar agent that filtered out unnecessary metrics. It was written in JavaScript and had been working in production for a few months when the performance started to degrade due to scaling. Tenable decided to rewrite the filter in a more efficient language, and they chose Rust for its performance and safety. The result was about a 50% improvement in latency at both the median and the P95.

Line graph showing CPU Usage, Cores vs Time, the line starts out flat at around 1000 Cores at 0 time, then at roughly the 14:00 mark, it trends downward and stabilizes at slightly above 300 Cores.
Line graph showing Memory Usage, GB vs Time, the line starts out relatively flat around 650 GB, then at roughly the 14:00 mark, it trends downward and stabilizes at approximately 100 GB

50% performance improvements are great, but here are some other graphs from that migration. Tenable also saw a 75% reduction in CPU usage and a 95% reduction in memory usage. That is substantial savings, and that’s not just dollars saved – that’s energy saved. These are the graphs of an energy efficient, sustainable implementation.

Rust is being used today to ship real world production software, but developers aren’t choosing Rust to reduce carbon emissions. When we ask Rust developers why they started using Rust, by far the most common answer is some variant of runtime performance, whether it is because Rust is faster or because Rust has more reliable tail latencies. It’s almost always about performance.

Four charts, the first showing System CPU usage over time. The line shows regular spikes from approximately 20% usage to 40% usage on a roughly 2 minute interval. The second chart shows Average Response time in milliseconds, with a baseline just above 0, but with spikes up to 10 ms on a roughly 2 minute interval. The third chart shows Response Time in the 95th percentile and it has a similar line to the second chart except that the spikes go up to 200 and 300 ms. The final chart shows Max @mentions per second over time, and is extremely erratic but averages out at about 2.

https://discord.com/blog/why-discord-is-switching-from-go-to-rust

Discord started as a mostly Python, Go, and Elixir shop, and they had a problem with one of their key Go services. It was a pretty simple service, but it had slow tail latencies. Because Go is a garbage collection (GC) language, as objects are created and released, every so often, the garbage collector needs to stop execution of the program and run a garbage collection pass. While the GC is running, the process is unable to respond to requests, and you can see the spikes on the CPU and response time graphs when it’s running.

Go Rust

Four charts, the first two showing results with Go and the second two showing results with Rust. Chart one shows average response times in milliseconds (it is identical to the second chart in the previous image), with a baseline just above 0, but with spikes up to 10 ms on a roughly 2 minute interval. The second Go chart shows Max @mentions per second over time (it is identical to the fourth chart in the previous image), and is extremely erratic but averages out at about 2. The same metrics are graphed in two additional charts for Rust, with the key difference that Rust's charts are shown in units of nanoseconds and microseconds respectively. This illustrates how significant the improvement in these areas is for Rust.

To fix the issue, Discord decided to try rewriting the service in Rust, and these are the results. The Go implementation is on the left and the Rust implementation is on the right. While the GC spike pattern is gone on the Rust graph, the really amazing difference is the magnitude of the change. The Go and Rust graphs are actually using different units.

The Rust version is more than 10 times faster over all with the worst tail latencies reduced 100 times. These are incredible improvements, and because the server is able to respond to requests far more efficiently, fewer servers are needed, which means that less energy is used. While Discord didn’t decide to start using Rust to reduce energy consumption, that’s the impact.

Again, Rust isn’t the first efficient language. C has been around for a long time, but Rust is the first mainstream programming language that is efficient without sacrificing safety. 70% of all high severity security vulnerabilities written with C and C++ are due to memory unsafety, and Rust gives you efficiency without feeling like you’re playing with fire.

Revealing the Rust Secret Sauce

Most languages achieve memory safety by automatically managing memory at runtime with a garbage collector. Garbage collectors track outstanding references to a piece of memory and when all references go out of scope, the associated memory can be freed.

Instead of using a garbage collector to maintain safety, Rust uses ownership and borrow checking. Ownership is fairly simple but has deep implications for the rest of the Rust programming language. In Rust, all memory is owned by a single variable. That variable is called its owner. There can be only one owner at a time, but ownership of the data can be passed around.

Go code showing an example of message passing

First, here is an example of message passing with Go. On the left side, we create a gift, then send it via the channel. On some other go routine on the right side, the gift is received and opened. The Go’s garbage collector is going to manage the memory for us. However, in the code on the left side, we accidentally opened the gift after sending it into the channel. The gift is going to be opened twice, resulting in a bug.

Additional Go code showing a bug.

Here is the same message passing example with Rust. The gift is created and assigned. We say that the `gift` variable owns the data. Ownership of the gift is passed into the channel. The channel consumer receives the gift, taking ownership, and is able to open it. If we try to open the gift after sending it into the channel, the compiler will shout at us, because we are violating the ownership rules. Already, we are seeing how Rust helps us prevent bugs.

Rust code showing the equivalent code to the previous Go example code.

Because Rust enforces the rule that only one variable owns data, when that variable goes out of scope without passing off ownership, there is no possible way for the data to be accessed. Rust takes advantage of that and will automatically free the memory at that point. There is no need to manually free the memory.

Rust’s ownership model is part of the type system and based on a concept called affine types. An affine type imposes a rule that every variable is used at most once. The key is to define what “used” means. In the context of Rust, a use is either moving the data or dropping it. By using affine types, the Rust compiler is able to reason about a program and enforce its ownership rules.

The affine type system used by Rust is based on the work done in the early 1990s, when some folks attempted to design a garbage collector free lisp. While successful, they found that they lost a lot of runtime performance due to the excessive copying introduced by not being able to have multiple references to the same piece of data.

An iteration on the previous Rust example code which introduces a function.

And this gets us to the second innovation that has enabled Rust: the borrow checker. When writing larger programs we tend to use abstractions to help organize ideas. One abstraction that you’re probably familiar with is a function. Functions often require arguments. With only ownership, to call a function, we would need to pass ownership of the data into the function and the function would need to pass ownership of the data back when returning. This requires copying memory around and was the source of garbage collector-less lisp performance challenges.

Another iteration on the Rust last example code showing the use of a reference.

To solve this, Rust lets you borrow data. So, if we have a gift, we own it. It is ours. If our friend wants to admire it, she can borrow it for a moment, but then she has to give it back to us. Also, while our friend is borrowing the gift, we cannot hand off ownership of the gift to anyone else, because it is currently being borrowed. Most crucially, the Rust compiler enforces these rules, so our friend can’t just run off with the gift. And because the Rust compiler enforces that guarantee, when borrowing data, memory doesn’t have to be copied. The memory stays where it is, and a pointer is passed around. The pointer is guaranteed to be valid. When you put it all together, you have a system that is efficient and prevents bugs, even as the program gets larger and more complex.

And the same system that prevents memory unsafety can also prevent data races, a category of concurrency bug. A data race happens when two or more threads are concurrently accessing the same data and at least one of those accesses is a mutation. The type system that models ownership and borrowing is able to uphold the same guarantee across multiple threads, enabling more aggressive use of concurrency.

Synchronous Concurrent

Two examples of similar Rust code, one showing a synchronous model (on the left), the other showing a concurrent model (on the right)

Here is an example of how easy it can be to safely add concurrency to a Rust application. We have a function that iterates through an array of numbers and sums all even numbers. This is a highly parallelizable operation and for very large arrays, we could see the function getting significantly faster by adding concurrency.

The left side shows a single threaded version and the right side shows you the parallel version using the rayon library. And look at how similar the functions are. You get all the power of concurrency, without the hazards, by basically writing the same code. The only difference is that we use the par_iter() method instead of iter().

The parallel version will spread the computation across many threads, all while avoiding copying the array of numbers being passed as the argument. Rayon is able to provide this API safely thanks to Rust’s ownership and borrow checking system. All the checks to guarantee safety happen at compile time.

Getting Started with Rust

Hopefully, by now we have gotten you interested in Rust and starting your journey of sustainability in the cloud. So where to start? The good news is, all the content you need is available online and there are places you can go to get started.

First, you will need to learn the Rust programming language. The Rust book is an excellent resource to get started learning the language. It will help you get the Rust toolchain installed and teach you the language. The website also has exercises and lots of code examples to read. If you get stuck at any point, have questions, or need clarification, you can post on the user forum or talk directly on the community Discord server. The Discord server is usually the fastest way to get help. There are always people active there who can answer questions in real time.

Once you have gone through the Rust website, you should be comfortable enough to start building things, but there is another resource we want to call out for diving deeper. The Crust of Rust is a great youtube channel by Jon Gjenset. He does really deep dives on various Rust related topics, popping the hood and explaining how things work. His videos are multiple hours long, but we keep hearing from people how valuable they are for learning Rust.

The Future of Rust

A bar chart titled How would you rate your expertise in Rust. The highest bar is at a 7, with almost 1200 responses, while Beginner (or 1) is at almost 250 responses, and Expert (or 10) is below 100 responses. The distribution is roughly shaped like a Bell curve.

https://blog.rust-lang.org/2020/12/16/rust-survey-2020.html

Rust is challenging to learn. Of the more than 8,000 developers responding to the 2020 Rust user survey, only about 100 identified as “expert”, and of the respondents that said they were no longer using Rust, 55% cited learning or productivity as their reason for abandoning the language.

It takes experienced engineers 3-6 months of study, supported by access to subject matter experts, to become productive with Rust. Some engineers have likened learning Rust to learning to eat your vegetables, and while many of them love it once they are productive, a lot of engineers are deciding against learning it or abandoning the effort before they become productive. The potential impact of Rust on sustainability and security will only materialize if we turn the broccoli into a brownie.

A screenshot showing a visualization of the size of programming languages communities in Q3 2021. Rust is the second from the last in the list, showing 1.1M active developers globally.

SlashData, State of the Developer Nation Q3 2021

No one developer, service, or corporation can deliver substantial impact on sustainability. Adoption of Rust is like recycling; it only has impact if we all participate. To achieve broad adoption, we are going to have to grow the developer community.

The Rust developer community has been the fastest growing over the last two years, but based on historical trends, we know that of the half million developers that joined the Rust community in the last 12 months, most of them are not yet proficient with the language. We have some work to do on the Rust developer experience.

The question that raises is which developer experience? Engineers working on the Linux kernel have a very different ideal developer experience than an engineer building a database service or an engineer delivering a retail website. We can identify the Rust user personas by looking at three dimensions.

The first distinction is their reason for coming to Rust. Are they choosing Rust for performance? For security? For sustainability? The second distinction is domain. Are they working in an embedded environment with restricted resources? Are they working in machine learning with long running jobs that have huge amounts of data in incremental computations? The third distinction is the developer’s experience. Are they a systems programmer? Maybe they’ve only worked with dynamically typed languages?

We need to evolve those permutations of priority, domain, and developer experience into personas that allow us to develop a robust understanding, a common vocabulary, and an explicit set of engineering trade offs. We usually give these personas names, so let’s consider an example we’ll call “Bob”.

Bob is building a cryptographic solution, and he’s choosing Rust for the security properties. Bob has a distinct set of engineering trade offs. Bob prioritizes security over performance; he prioritizes security over operations. What that means in practice is that Bob would rather have a slow response than plain text, and he would rather have an outage than respond to an unsigned request.

For each of these personas, there are unique engineering trade offs, and what we want to do is to create a space in the Rust landscape that’s well-defined and easily discoverable and empowers all the Bobs to collaborate on building the best, whole developer experience for themselves without negatively impacting other personas.

Rust is an amazing technology to sustain and secure our industry, and you can start doing that today. We have a lot of work to do before everyone can use Rust, and the Rust Foundation is working to create platforms for effective, cross industry collaboration on that work. We hope you’ll join us.

Shane Miller

Shane Miller

Shane Miller leads the AWS Rust team, and she is chair of the Rust Foundation. Shane started working in software development as an engineer nearly 30 years ago. Since then, she’s held the roles of principal engineer, university faculty, business owner, principal technical program manager, and senior engineering manager.

Carl Lerche

Carl Lerche

Carl Lerche has been involved in open source for the past fifteen years. He first got involved with Ruby on Rails and co-authored the Bundler package manager for Ruby. Carl was an early contributor to the Rust programming language landscape and authored much of the asynchronous I/O infrastructure, including Tokio. Carl is currently a Principal Engineer at AWS where he leads the development on the Tokio stack.