BentoCloud

BentoCloud is an inference platform that takes the infrastructure complexity out of production AI workloads. It brings cutting-edge inference and serving capabilities directly to your cloud environment, making it easy for AI teams to build fast, secure, and scalable AI applications.

0 AWS reviews

View purchase options

Overview

Customizable AI Inference Tailored To You

BentoCloud lets you deploy custom AI APIs with any open-source, fine-tuned, or custom models. You can choose the right model for the task, easily configure scaling behaviors, and leverage inference optimizations. This flexibility lets you decide how to balance cost/latency trade-offs, giving you a faster response time and lower inference cost.

Delightful Developer Experience

We have simplified the entire model deployment workflow with a focus on developer experience. Our rich open-source ecosystem lowers the learning curve and integrates seamlessly with your existing systems. This helps accelerate development iteration cycles, production operations, and CI/CD processes, promote standardization across teams and empower AI teams to ship models to market faster with greater confidence.

State-of-the-Art Inference Optimizations

Powered by BentoML, the leading open-source serving engine, BentoCloud simplifies AI model inference optimization. You can fully customize the inference setup to meet specific needs. We provide a suite of templates to help you jumpstart your AI project, leveraging the best-in-class inference optimizations while following the best design practices. For example, you can explore our benchmarks on various LLM inference backends on BentoCloud, such as vLLM, MLC-LLM, LMDeploy, and TensorRT-LLM, to see how they perform.

Fast and Scalable Infrastructure

BentoCloud offers advanced scaling capabilities like scaling-to-zero, optimized cold starts, concurrency-based auto-scaling, external queuing, and stream model loading. These features mean rapid scaling up in response to demand, improved resource utilization, and reduced inference costs.

Highlights

Autoscaling Deployments - Easily configure scaling behaviors and leverage inference optimizations. This flexibility lets you decide how to balance cost/latency trade-offs, giving you a faster response time and lower inference cost.
Simplified model deployment workflow - Accelerate development iteration cycles, production operations, and CI/CD processes, promote standardization across teams and empower AI teams to ship models to market faster with greater confidence.
Inference optimizations - Fully customize the inference setup to meet specific needs. We provide a suite of templates to help you jumpstart your AI project, leveraging the best-in-class inference optimizations while following the best design practices

Details

Sold by

BentoML

Unlock automation with AI agent solutions

Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.

Explore AI agent solutions

Features and programs

Financing for AWS Marketplace purchases

AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.

View financing details

Pricing

BentoCloud

Info

View purchase options

Pricing is based on the duration and terms of your contract with the vendor. This entitles you to a specified quantity of use for the contract duration. If you choose not to renew or replace your contract before it ends, access to these entitlements will expire.

Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator to estimate your infrastructure costs.

12-month contract (1)

Info

Dimension	Description	Cost/12 months
BentoCloud Cluster	BentoCloud can deploy into many different regions and clusters	$35,000.00

Vendor refund policy

Once under contract, the order form will determine the termination conditions

How can we make this page better?

We'd like to hear your feedback and ideas on how to improve this page.

Legal

Vendor terms and conditions

Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Delivery details

Software as a Service (SaaS)

SaaS delivers cloud-based software applications directly to customers over the internet. You can access these applications through a subscription model. You will pay recurring monthly usage fees through your AWS bill, while AWS handles deployment and infrastructure management, ensuring scalability, reliability, and seamless integration with other AWS services.

Resources

Vendor resources

How to Guides

Getting Started

Creating a Deployment

Support

Vendor support

BentoCloud is a AI inference platform for deploying any AI model at production scale. Checkout our How-To guides for more information. If you have any questions or issues, you may contact us at bentocloud-support@bentoml.com

Or you also join our Slack group where you can get support from the community or us by direct message:

Get support

AWS infrastructure support

AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Get support

Customer reviews

Leave a review

Ratings and reviews

Info

0 ratings

5 star

4 star

3 star

2 star

1 star

0 AWS reviews

No customer reviews yet

Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.