Overview
Customizable AI Inference Tailored To You
- BentoCloud lets you deploy custom AI APIs with any open-source, fine-tuned, or custom models. You can choose the right model for the task, easily configure scaling behaviors, and leverage inference optimizations. This flexibility lets you decide how to balance cost/latency trade-offs, giving you a faster response time and lower inference cost.
Delightful Developer Experience
- We have simplified the entire model deployment workflow with a focus on developer experience. Our rich open-source ecosystem lowers the learning curve and integrates seamlessly with your existing systems. This helps accelerate development iteration cycles, production operations, and CI/CD processes, promote standardization across teams and empower AI teams to ship models to market faster with greater confidence.
State-of-the-Art Inference Optimizations
- Powered by BentoML, the leading open-source serving engine, BentoCloud simplifies AI model inference optimization. You can fully customize the inference setup to meet specific needs. We provide a suite of templates to help you jumpstart your AI project, leveraging the best-in-class inference optimizations while following the best design practices. For example, you can explore our benchmarks on various LLM inference backends on BentoCloud, such as vLLM, MLC-LLM, LMDeploy, and TensorRT-LLM, to see how they perform.
Fast and Scalable Infrastructure
- BentoCloud offers advanced scaling capabilities like scaling-to-zero, optimized cold starts, concurrency-based auto-scaling, external queuing, and stream model loading. These features mean rapid scaling up in response to demand, improved resource utilization, and reduced inference costs.
Highlights
- Autoscaling Deployments - Easily configure scaling behaviors and leverage inference optimizations. This flexibility lets you decide how to balance cost/latency trade-offs, giving you a faster response time and lower inference cost.
- Simplified model deployment workflow - Accelerate development iteration cycles, production operations, and CI/CD processes, promote standardization across teams and empower AI teams to ship models to market faster with greater confidence.
- Inference optimizations - Fully customize the inference setup to meet specific needs. We provide a suite of templates to help you jumpstart your AI project, leveraging the best-in-class inference optimizations while following the best design practices
Details
Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/12 months |
---|---|---|
BentoCloud Cluster | BentoCloud can deploy into many different regions and clusters | $35,000.00 |
Vendor refund policy
Once under contract, the order form will determine the termination conditions
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Software as a Service (SaaS)
SaaS delivers cloud-based software applications directly to customers over the internet. You can access these applications through a subscription model. You will pay recurring monthly usage fees through your AWS bill, while AWS handles deployment and infrastructure management, ensuring scalability, reliability, and seamless integration with other AWS services.
Resources
Vendor resources
Support
Vendor support
BentoCloud is a AI inference platform for deploying any AI model at production scale. Checkout our How-To guides for more information. If you have any questions or issues, you may contact us at bentocloud-support@bentoml.com
Or you also join our Slack group where you can get support from the community or us by direct message:
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.