AWS Startups Blog
Accelerating AI/ML scaling and AI development with Anyscale and AWS
Building a cloud-distributed and scalable artificial intelligence (AI) application is a cross-team effort that requires complicated management of resources and comes with numerous production concerns such as code changes, refactoring, setting up the infrastructure, and complex developer operations (DevOps). These can confuse the development process, slow down time-to-market, and keep developers from focusing on product innovation.
“With the lack of established development-to-production pathways, some estimates place the AI development project failure rate at nearly 50%,” explains Rob Ferguson, Amazon Web Services (AWS) global head of AI and machine learning (ML) for startups.
That’s where Anyscale, Ray (the first unified, distributed compute framework for scaling ML or Python workloads), and the Anyscale Platform (a fully-managed Ray platform provided by the creators of Ray) help AI teams speed development and experimentation, and effortlessly scale applications to ensure the success of their AI projects.
Ray was conceived at the University of California Berkeley in the RISELab and open-sourced in 2017. Today, Ray is widely adopted by organizations globally for developing AI and Python applications at scale.
Improve scalability and developer ergonomics with the Anyscale Platform
To advance AI development and scaling for every organization, the Anyscale Platform extends the capabilities of Ray by enabling developers and cross-functional teams to accelerate experimentation and speed up the development of ML applications at scale. The Anyscale Platform and Ray increase developer velocity by providing scalable compute for data ingestion and preprocessing, ML training, hyperparameter tuning, model serving, and more, all while integrating seamlessly with the rest of the ML ecosystem.
The Anyscale Platform significantly extends the capabilities of Ray. Anyscale Workspaces provides developers with a unified and seamless development experience to scale ML workloads from a laptop to the cloud with no code changes. Developers can now leverage a single environment to build, test, and deploy workloads to production while leveraging the tools they are familiar with. The Anyscale Platform improves iteration speed by reducing cluster setup time by 5X over Ray. Finally, the Anyscale Platform provides observability, monitoring, and job scheduling out of the box.
“The capabilities of the Anyscale Platform go far beyond the capabilities of Ray and make AI/ML and Python workload development, experimentation, and scaling even easier,” said Ion Stoica, co-founder and executive chairman of Anyscale. “Thousands of organizations rely on Ray to scale their ML and Python applications, and a growing number of organizations are leveraging the Anyscale Platform to build, test, and deploy these applications to production faster than ever.”
From AI developers, to ML practitioners, to data scientists and engineers; all contributors can benefit from the ease, scalability, and developer ergonomics that the Anyscale Platform provides for the full AI journey: from model development, tuning, and training to inference and scalable model serving.
Anyscale’s mission is to ease the building and scaling of AI/ML and Python workloads, as well as AI applications. Their team democratizes AI by making it easier for data scientists and ML engineers to work with project structures that make sense for building production ML systems. Scalable, unified, and open distributed computing, such as what Anyscale offers for Ray, helps lead to the mass democratization and industrialization of AI.
Anyscale’s commitment to open source and the freedom of shared technology is evident: The Anyscale team leads the Ray open source community by managing operations, reviewing contributions, and growing and developing the community. Additionally, they are the biggest contributor to Ray open-source software (OSS) and recently released Ray 2.1.
Building mature, reliable, and easily scalable ML models
With 90% of new applications expected to be cloud-native by 2025, Anyscale envisions the cloud as the default substrate in which the next generation of applications will be developed. They expect the next generation of cloud infrastructure to be optimized frameworks which ease the development and management of applications.
Anyscale leverages AWS’ mature suite of products as components for rapid development of their platform. Anyscale and Ray are full-featured on AWS, with highly efficient performance and scaling. In addition to AWS’ scale being a tremendous asset to Anyscale and Ray, some of their largest customers have a long history of using AWS and this facilitates their customer interactions.
“Inference is 90% of any ML project. Machine learning teams need novel ways of working with distributed computing to deliver the next generation of ML at scale,” explains Rob. “The golden era of machine learning is taking place and AWS and Anyscale help companies to build ML models the way they’re supposed to be: mature, reliable, and easily scalable.”
Amazon.com and AWS have worked with the Ray community to integrate Ray with many AWS services, including:
- AWS Identity and Access Management (IAM) for fine-grained access control
- AWS Certificate Manager (ACM) for SSL/TLS in-transit encryption
- AWS Key Management Service (AWS KMS) for at-rest encryption
- Amazon Simple Storage Service (Amazon S3) for object storage
- Amazon Elastic File System (Amazon EFS) for distributed-file access
- Amazon CloudWatch for observability
These service integrations allow AWS customers to scale their Ray-based workloads with secure, cost-efficient, and enterprise-ready AWS services across the AI and ML pipeline.
Solutions that fit within the AWS Well-Architected Machine Learning Lifecycle–such as the Anyscale Platform for the deployment stage—can help companies to spend more time on product development rather than resource provisioning and management.
The AWS Well-Architected Framework, which provides operational and architectural best practices for designing and operating workloads in the cloud, includes pillars for operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability. With the Anyscale platform, developers can increase their performance efficiency.
“Anyscale simplifies a typical Python ML code and helps it fit within the framework within as little as two lines of code,” explains Rob. “AWS is an incredible home for Anyscale with its industry-leading tools for production systems as well as broadest and deepest compute options.”