Some of the best features of Anyscale Platform include the Unified Ray Runtime, which helps to refine and tune the Ray engine for all kinds of workload types, such as batch, streaming, training, and serving. The fully managed cluster is something that I personally love as it helps with automatic provisioning, auto-scaling from 1 to 1,000+ nodes, auto-retries, and job scheduling. Clusters help us get the job done on time, and they can start up to 5x faster compared to stock Ray according to Anyscale benchmark. Additionally, it supports multi-node back IDEs and notebooks. Observability, log access, metrics, and debugging controls are specifically tailored for Ray workflows, which Anyscale Platform provided us.
Personally, I feel the biggest impact has been provided by the fully managed clusters, which help with automatic provisioning, auto-scaling from nodes, auto-retries, and job scheduling. These features help serve our purpose for data collection and tuning with Ray Train.
Anyscale Platform has helped us reduce computing costs by 67%, which is our official figure. As we used to manage EC2 for batch clusters by leveraging spot instances and aggressive features, we have saved almost 60% in costs compared to manual management. It has also helped with reductions in training and data processing times. We used to have a cycle time of two weeks, which was reduced to almost half a week, representing more than 100% improvement in training times. We have seen significant productivity gains in our developers as they were able to run large experiments independently without waiting for infrastructure provisioning, which reduced the time to market and increased our total organizational throughput by 77%.
Anyscale Platform is best suited for organizations that are committed to Ray as their distributed compute layer and want a production-ready and managed platform that can help maximize Ray advantages while minimizing operational burden.