Amazon Lab126 Creates HPC Solution to Help Teams Speed Development and Innovation
2020
Some of today’s most popular consumer technology devices were born at Amazon Lab126. The California-based research and development organization has created such high-profile devices as the Amazon Kindle e-reader and the Amazon Echo smart speaker.
Amazon Lab126 Devices teams use high-performance compute (HPC) capacity and machine learning capabilities to scale design environments to accelerate product development, gain efficiencies, and speed time-to-market. However, its aging, costly, on-premises HPC environment could not deliver the scalability and ease of use the teams required. “We run large simulations with long runtimes, such as looking at mechanical and thermal responses of consumer devices under certain conditions,” says Shankar Ganapathysubramanian, senior manager of the architecture team at Amazon Lab126. “We needed more compute capacity to support these workloads.” Amit Gaikwad, senior manager of wireless engineering for Amazon Lab126, adds, “We were architecting and building more customer-facing solutions, and the on-premises HPC environment didn’t give us the scalability and speed we needed.”
Amazon design and engineering teams perform simulation and modeling on a range of applications such as computational fluid dynamics, finite element analysis, electronic design automation, and computational electromagnetics. Self-service capability was an important requirement to support these diverse teams. Mickael Crozes, senior system/software developer engineer for Amazon Lab126, says, “Different teams have different compute capacity needs, and we lacked the flexibility to accommodate them all. We wanted to centralize HPC resources so each team could access their own environment on demand. We didn’t have the ability to launch new HPC clusters for each team when they needed them.”
With HPC on AWS, we can now support more devices, explore new technologies, and better understand how devices behave in the field."
Amit Gaikwad
Senior Manager, Wireless Engineering, Amazon Lab126
Building a Scalable HPC Framework on AWS
To address its internal customer needs, the Amazon Lab126 team chose to create a new cloud HPC environment on Amazon Web Services (AWS) in late 2017. “We evaluated third-party HPC services, but AWS ultimately offered the best technology in terms of scalability and flexibility of compute instance types,” says Crozes. “We also trusted AWS to own our compute and host our data.”
In 2018, Amazon Lab126 built a flexible HPC reference framework on AWS, which replaces its on-premises HPC solution and enables an AWS-based, multi-user R&D environment for scale-out workloads such as HPC and machine learning. The new framework integrates and simplifies compute-heavy Amazon Elastic Compute Cloud (Amazon EC2) instances with a fast network backbone, unlimited storage, and budget and cost management. It relies on Amazon Elastic Block Store (Amazon EBS) and Amazon Elastic File System (Amazon EFS) for data storage. Amazon Lab126 also uses Amazon FSx for Lustre for the most I/O-intensive workloads and AWS Backup to make the cluster more fault-resilient. Crozes says, “AWS Backup was the perfect solution for automating the protection of the production environment. It would have taken us many iterations to create a solution like that, which protects all the teams’ data, manages retention/lifecycle, and is simple to use.”
Running HPC Jobs Three Times Faster
Lab126 product designers and engineers have seen performance gains on the new HPC cluster. For example, the wireless device connectivity team improved cycle times for structural device drop simulations that study how cell phones behave when they hit the ground or another surface. “We saw a threefold increase in speed for our entire design cycle by using a scale-out computing HPC framework on AWS,” says Ganapathysubramanian. “We can run more simulations now because it’s easier to parallelize the workloads. Using the on-premises HPC solution, it would often take weeks to generate data. Now we can do it in hours.”
With the new framework on AWS, Amazon Devices designers and engineers can scale on demand to meet the requirements of specific workloads. “We have very large runtimes that require a lot of compute just to analyze wireless connectivity data,” says Gaikwad. “Using this solution, our engineers across the globe can scale the solution three times faster than before. And they can scale down just as easily, so if they don’t need 100 GPUs for a job, they don’t have to use them.”
Simplifying the Onboarding Process
The Amazon Lab126 Design Technologies team can also onboard and support new Amazon Devices engineering teams on the HPC cluster without assistance from an IT resource—and do so in less than a day instead of the weeks it used to take. “We now have a centralized, flexible HPC environment that works seamlessly for all users, regardless of their workload requirements,” says Crozes. “This has greatly reduced the complexity of the onboarding process. Many people here are not HPC experts, so this ease of use helps them focus on their specific design expertise.”
Amazon Devices teams can now perform the full computer-aided engineering workflow (model design/meshing, simulation, and post-processing visualization) all on AWS. This is possible because engineers and designers working from home now have their own dedicated supercomputers and powerful, cloud-based workstations a click away.
Driving Product Innovation
Because of the scalability and simplicity of the AWS-based HPC environment, Amazon Devices teams are spending less time on hardware management and more time on innovation. “With HPC on AWS, we can now support more devices, explore new technologies, and better understand how devices behave in the field,” says Gaikwad. For instance, the Amazon Devices wireless connectivity team recently won a DesignCon Best Paper Award for its research into optimizing wireless systems with minimal radio frequency interference.
Ganapathysubramanian says, “A lot of work is required before we can even do simulations, such as building models out of geometric calculations. Using the automation of the scale-out computing framework on AWS, we have reduced the complexity of some of this manual work so our engineers can focus on more value-added work. HPC on AWS is helping us imagine new opportunities. For example, in some of the newer Amazon Echo products, we have been able to integrate thermal design structures by more easily connecting different data stored on AWS, to optimize the design of multiple product functions.”
Amazon Lab126 is now entering the next phase of its HPC solution, powered by the scale-out computing framework on AWS. “We will continue to address the needs of our customers,” says Jake Boswell, senior manager of design technology for Amazon Lab126. “We’re looking to make the reference architecture even simpler and to extend the framework into additional areas to support innovation.”
To learn more, visit aws.amazon.com/solutions/implementations/scale-out-computing-on-aws and aws.amazon.com/hpc.
About Amazon Lab126
Amazon Lab126, based in Sunnyvale, California, is an Amazon research and development team that designs and engineers consumer electronic devices for Amazon. The lab, which includes Amazon Devices hardware, software, and operations teams, has developed high-profile products such as the Amazon Echo and Amazon Kindle.
Benefits of AWS
- Runs HPC jobs and scales workloads 3 times faster
- Onboards new users in less than a day instead of weeks
- Can launch new HPC clusters for each team as needed
- Drives product design innovation
AWS Services Used
Amazon EC2
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.
Amazon Elastic File System
Amazon Elastic File System (Amazon EFS) provides a simple, scalable, fully managed elastic NFS file system for use with AWS Cloud services and on-premises resources.
Amazon Elastic Block Store
Amazon Elastic Block Store (EBS) is an easy to use, high performance block storage service designed for use with Amazon Elastic Compute Cloud (EC2) for both throughput and transaction intensive workloads at any scale.
AWS Backup
AWS Backup is a fully managed backup service that makes it easy to centralize and automate the backup of data across AWS services.
Get Started
Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.