Anyscale Platform is being used for large-scale data processing, such as ETL, feature engineering, and embedding generation, which helps us to outgrow the single-node Spark or Pandas limitations. Additionally, it supports training and fine-tuning of LLMs across many GPUs, reducing fine-tuning cycles from weeks on one GPU to days on 8+ GPUs with Ray Train. It also provides very good throughput and low latency model serving for a RAG agent pipeline using Ray Serve for our online interface at 10,000 QPS. Anyscale Platform is also helping us to get integrated with Kubernetes to offload orchestration and focus on model and application logic.
Anyscale Platform, Powered by Ray
AnyscaleExternal reviews
External reviews are not included in the AWS star rating for the product.
Streamlined data pipelines have cut training cycles and now empower rapid experimentation
What is our primary use case?
What is most valuable?
Some of the best features of Anyscale Platform include the Unified Ray Runtime, which helps to refine and tune the Ray engine for all kinds of workload types, such as batch, streaming, training, and serving. The fully managed cluster is something that I personally love as it helps with automatic provisioning, auto-scaling from 1 to 1,000+ nodes, auto-retries, and job scheduling. Clusters help us get the job done on time, and they can start up to 5x faster compared to stock Ray according to Anyscale benchmark. Additionally, it supports multi-node back IDEs and notebooks. Observability, log access, metrics, and debugging controls are specifically tailored for Ray workflows, which Anyscale Platform provided us.
Personally, I feel the biggest impact has been provided by the fully managed clusters, which help with automatic provisioning, auto-scaling from nodes, auto-retries, and job scheduling. These features help serve our purpose for data collection and tuning with Ray Train.
Anyscale Platform has helped us reduce computing costs by 67%, which is our official figure. As we used to manage EC2 for batch clusters by leveraging spot instances and aggressive features, we have saved almost 60% in costs compared to manual management. It has also helped with reductions in training and data processing times. We used to have a cycle time of two weeks, which was reduced to almost half a week, representing more than 100% improvement in training times. We have seen significant productivity gains in our developers as they were able to run large experiments independently without waiting for infrastructure provisioning, which reduced the time to market and increased our total organizational throughput by 77%.
Anyscale Platform is best suited for organizations that are committed to Ray as their distributed compute layer and want a production-ready and managed platform that can help maximize Ray advantages while minimizing operational burden.
What needs improvement?
Anyscale Platform could integrate AI in a better way and add more workflows in the future.
For how long have I used the solution?
Our organization has been using Anyscale Platform for almost three years now.
What do I think about the stability of the solution?
Anyscale Platform is definitely a stable solution. We did not face any kind of downtime, lags, glitches, or bugs, even during updates. It has solved our purpose without incurring significant glitches or downtime.
What do I think about the scalability of the solution?
The scalability is really good as Anyscale Platform comes with the pay-as-you-go model, so we can definitely scale as much as we want. We have seen this at a very large scale and did not face any particular problems while scaling. Scalability is something that works according to my observation.
How are customer service and support?
Customer support has been really good. We had an interaction almost six months back with the team, and they were very prompt. The service was good and they resolved our query. We were having an issue with the deployment on one of the new team structures, and they helped us resolve the concern within 16 minutes. Customer support deserves a 10 out of 10 according to my observation.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
Previously, we did not use any solution for this particular purpose, so this was the first time we used Anyscale Platform.
How was the initial setup?
It was definitely very easy to deploy Anyscale Platform in our environment. The support from the team during deployment was commendable, which allowed us to do it easily.
What about the implementation team?
The configuration is very easy and we were able to configure it as the learning curve was not that steep. We had careful monitoring over the overspending part for GPU-heavy systems. Dependency and environment management for complex Python stacks which Anyscale Platform mitigates required good DevOps practices, and this was handled really well.
The procurement was really easy and we did not face any kind of challenges. The teams involved were very positive about it and the outcomes were really positive. No issues were reported at all.
What was our ROI?
There was a return on investment and we have seen a reduction in compute costs and reduced the time to market by at least 50 to 60%. These are the relevant metrics showing our ROI.
What's my experience with pricing, setup cost, and licensing?
Pricing, setup cost, and licensing have been very transparent for us. We used custom-priced models for our organization, which added governance support and SLAs with dedicated resources at the upper tiers as we use the upper tier model. We can scale up anytime as it is based on the pay-as-you-go model, so there are no problems at all.
Regarding metering and billing experience, Anyscale Platform is based on the pay-as-you-go model and focuses on consumption-based pricing, which is a game-changer for us. Whenever we require GPUs, hours, or storage, we go ahead and get it done through the AWS Marketplace and receive good discounts according to our needs. The metering and billing experience has been very transparent and we have maximized output by paying the least amount.
Which other solutions did I evaluate?
We evaluated DIY Ray on Kubernetes, SageMaker, Vertex AI, and Azure ML as alternatives.
What other advice do I have?
The features which Anyscale Platform is providing are really exceptional and I personally do not feel any challenges or particular problems that we have faced. That is why I really admire it and we are using it. We have seen a good amount of ROIs only instead of problems. The features that we wished for have been there, which is why Anyscale Platform deserves a five out of five rating. It integrates really and whatever we use with it works really well. I am very positive and feel that they are headed in the right direction. I give this product a five out of five because it has solved our problems and delivered a good rate of investment.