Breaking Down High Performance Computing Barriers with Amazon FSx for Lustre
By Jason Cutrer, CEO at Six Nines
Popular in the high performance computing (HPC) community and a range of industry verticals, Lustre is a powerful, open source, Linux-compatible, parallel file system.
However, operating a high-performance file system like Lustre requires in-house expertise that can handle the complicated provisioning, configuration, and tuning, along with ongoing support and maintenance.
With Amazon FSx file systems, you don’t have to worry about managing file servers and storage volumes, updating hardware, configuring software, or tuning performance. Amazon FSx automates these time-consuming administration tasks.
Amazon FSx for Lustre is a fully managed, scalable shared storage solution to power your compute workloads. It’s designed for workloads that demand high performance computing, such as artificial intelligence, video transcoding, financial modeling, genomics sequencing, and anything else that needs a lot of compute power and data storage.
Six Nines is an AWS Premier Consulting Partner with AWS Competencies DevOps and Microsoft Workloads. We specialize in HPC and have an AWS Service Delivery validation in Amazon EC2 for Microsoft Windows Server.
Six Nines is deeply familiar with HPC storage pain points, and has helped customers address this very issue over the past five years. Before Amazon FSx for Lustre became available, our engineers would perform difficult and complex configurations using self-managed storage solutions.
In this post, I will describe Six Nines’ experience with FSx for Lustre, particularly the high availability, performance, and integration features of its latest releases.
High-Performance Storage Made Easy
When Six Nines first began using Amazon FSx for Lustre, we liked how easy it was easy to launch from the console, and were surprised at the sub-millisecond access to data. We saw data throughput reach speeds we never thought possible in the cloud.
Mostly, we liked that Amazon FSx for Lustre is a fully managed service; we just spun it up, used it, and spun it down. In our opinion, it was super easy, cost effective, and perfect for workloads that would run for a few hours or days. After running those workloads, we could spin it down and stop the charges immediately.
Figure 1 – Amazon FSx for Lustre is a fully managed service you can easily spin up and tear down.
Amazon FSx has made many enhancements since its initial launch. For instance, the recent capabilities with Amazon Simple Storage Service (Amazon S3) metadata management helped Six Nines better use Amazon S3 as our durable storage tier. It works hand-in-hand with Amazon FSx for Lustre, seamlessly exchanging metadata and POSIX permissions.
Following, I will describe three categories of improvements:
- Integrations with other AWS services.
- Persistent file system for high availability.
- Higher performance.
Integrations with Other AWS Services
Amazon FSx for Lustre is still just as simple to set up and use as before, but under the hood, it can do much more. First, it has some great integrations with AWS services that are important to the HPC world, such as Amazon Elastic Kubernetes Service (Amazon EKS), AWS Batch, AWS ParallelCluster, and Amazon SageMaker.
The game changer isn’t one specific feature or capability, but continues to be the combination of capabilities. Six Nines has maintained AWS Well-Architected Partner status for years. To deliver the right solutions to our customers, we need the individual AWS services to continue to grow and mature in their features, but we also need them to integrate together effectively.
Persistent File System for High Availability
To further improve the Amazon FSx for Lustre service, AWS delivered a persistent file system option, together with upgrades for burst throughput to both their existing scratch storage tier and the new persistent storage tier. Those enhancements provide high availability so workloads can run longer with more confidence and uptime.
The new persistent tier is not just important for long-running workloads; it has great performance and cost savings as well. For instance, you can connect it to Amazon EKS as a persistent storage tier and, if you have a pod or set of containers go down for some reason, the storage on Amazon FSx for Lustre remains intact.
The same is true for Amazon EC2 Spot Auto Scaling groups. You can attach Amazon FSx for Lustre and suffer an almost zero latency penalty compared with the local Amazon EC2 storage. But if an Amazon EC2 instance is lost for any reason, your customer’s data remains intact and ready to connect to the next instance.
Many of Six Nines’ customers require extremely heavy computations, and while the original scratch-only file system option suited their needs, their workloads evolved over time. Eventually, the scratch storage tier was not practical for workloads that needed to run for weeks, or even months.
When the Amazon FSx for Lustre team delivered the persistent tier earlier this year, those customers were ready for it. Some of them are now running file systems with hundreds of terabytes of storage and tens of gigabytes per second of throughput.
Figure 2 – Performance improvements with Amazon FSx for Lustre.
The persistent storage tier provides three separate performance and price options to select from, ranging from 50 to 100 to 200 MB per second, per TB. Like the scratch tier, the persistent tier scales linearly, so the speeds can be astronomical once you start adding capacity. We now see burst rates that can reach 6X the speed of the normal sustained rates.
Through our early experiences, we found that Amazon FSx for Lustre made it easy to process cloud data sets at high performance throughputs. With the recent improvements in burst rates, we’re already seeing even greater performance across the bioinformatics, financial services, insurance, manufacturing, research and universities, aerospace, and oil and gas industries.
Summary of Benefits
With its recent enhancements, Amazon FSx for Lustre now provides these benefits:
- Eliminates storage bottlenecks – Amazon FSx for Lustre is high performing, parallel, optimized for data processing, and scalable to hundreds of gigabytes per second.
- Reduces overall storage cost – Because it closely integrates with Amazon S3.
- Reduces complexity – As a fully managed service, it eliminates the need for provisioning and maintenance, configuration, and performance tuning.
- Easy to get started – Launch and run a file system within minutes.
- Cost-optimized – Accelerate time to discovery, reduce time and money spent on compute resources, and only pay for the resources you use.
- Native interface – POSIX-compliant so you can use your current Linux-based applications.
- Secure and compliant — Automatically encrypts your data-at-rest; PCI-DSS, ISO, and SOC compliant; and HIPAA eligible.
Amazon FSx for Lustre Use Cases
Overall, we’re seeing great benefits from Amazon FSx for Lustre for our HPC customers in manufacturing, the semiconductor industry, and the data science community for healthcare and bioinformatics. Following are just a few of the use cases that can benefit from Amazon FSx for Lustre.
Many applications within R&D and data science divisions require very high performance and low latencies of scale-out parallel file systems for their compute clusters or cluster farms. Amazon FSx for Lustre is well suited to these workloads, which process massive amounts of data that need to be accessed by multiple compute instances with high levels of throughput.
For HPC customers, it’s a hard requirement that fast storage be available. The improvements AWS has made to Amazon FSx for Lustre are ideal for HPC workloads because they provide a file system that’s optimized for performance. Because Amazon FSx for Lustre also integrates well with many other AWS cost saving services, it lowers total cost of ownership (TCO).
In bioinformatics, customers are using Amazon FSx for Lustre to replace traditional on-premises storage nodes, significantly increasing performance and backup efficiencies, and decreasing retention costs due to the Amazon S3 integrations.
In pharmaceutical research and genomics, data scientists are making incredible strides toward improving modern healthcare by combining big data and high-performance computing to advance the ability to predict and treat life-threatening illnesses, from cancer to Alzheimer’s disease.
Running workloads to identify these risks requires massive data sets and compute power to learn from common images while correlating conditions and clearly identified patient data. Amazon FSx for Lustre’s integration with Amazon S3 and close metadata management accelerates this process. It enables researchers to quickly tap into a data lake so they can form decisions, study data, and ultimately get to conclusions faster.
AI workloads benefit from Amazon FSx for Lustre by allowing more users to run higher performing workloads on demand, as opposed to waiting for a supercomputer or resource constrained clusters on-premises.
We are seeing first-hand how the integration of AWS ParallelCluster and Amazon FSx for Lustre unlocks additional innovation by integrating compute farms and storage capacity.
Coupled with the ability of Amazon FSx for Lustre to seamlessly integrate with Amazon S3, this integration allows many industries—from ad tech to healthcare, manufacturing, and security—to flourish with product development and research at a pace that has not been possible until now.
No one wants to be saddled with managing an extremely critical component of their infrastructure. When we have discussions with our customers, we often ask them, “Why worry about undifferentiated heavy lifting when you can use Amazon FSx for Lustre and be up and running in a few minutes?”
With Amazon FSx for Lustre, what would have taken weeks to configure natively within AWS now takes hours to days. This is where Six Nines is working day in and day out, helping to accelerate and strategically position our customer for their cloud journey.
To learn how to get started with Amazon FSx for Lustre, contact Six Nines.
The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.
Six Nines – AWS Partner Spotlight
Six Nines is an AWS Premier Consulting Partner specializing in helping businesses move to the AWS Cloud responsibly. They help customers adopt AWS by focusing on reliability, scalability, operations, and cost savings.
*Already worked with Six Nines? Rate the Partner
*To review an AWS Partner, you must be a customer that has worked with them directly on a project.