Modular Platform: High-Performance GenAI Serving

Serve the latest GenAI models with MAX Container - a GPU-accelerated serving environment with support for 500+ optimized models (https://builds.modular.com/), OpenAI API compatibility (https://docs.modular.com/max/api/serve), and enterprise-grade performance across diverse hardware and compute services.

0 AWS reviews

View purchase options

Overview

The Modular Stack

1/3

A unified platform for AI development and deployment, including MAX and Mojo.

Image

The Modular Stack

Image

Write once, deploy everywhere

Video

MAX Deployment Instructions

The Modular Platform is an open and fully-integrated suite of AI libraries and tools that accelerates model serving and scales GenAI deployments. It abstracts away hardware complexity so you can run the most popular open models with industry-leading GPU and CPU performance without any code changes.

Our ready-to-deploy Docker container removes the complexity of deploying your own GenAI endpoint. And unlike other serving solutions, Modular enables customization across the entire stack. You can customize everything from the serving pipeline and model architecture all the way down to the metal by writing custom ops and GPU kernels in Mojo. Most importantly, Modular is hardware-agnostic and free from vendor lock-in no CUDA require so your code runs seamlessly across diverse systems.

MAX is a high-performance AI serving framework tailored for GenAI workloads. It provides low-latency, high-thoughput inference via advanced model serving optimizations like prefix caching and speculative decoding. An OpenAI-compatible serving endpoint executes native MAX and PyTorch models across GPUs and CPUs, and can be customized at the model and kernel level.

The MAX Container (max-nvidia-full) is a Docker image that packages the MAX Platform, pre-configured to serve hundreds of popular GenAI models on NVIDIA GPUs. This container is ideal for users seeking a fully optimized, out-of-the-box solution for deploying AI models.

Key capabilities include:

High-performance serving: Serve 500+ AI models from Hugging Face with industry-leading performance across NVIDIA GPUs
Flexible, portable serving: Deploy with a single Docker container across various GPUs (B200, H200, H100, A100, A10, L40 and L4) and compute services (EC2, EKS, AWS Batch, etc.) without compatibility issues.
OpenAI API Compatibility: Seamlessly integrate with applications adhering to the OpenAI API specification.

For detailed information on container contents and instance compatibility, refer to the MAX Containers Documentation (https://docs.modular.com/max/container ).

To access our full Modular platform, check out https://www.modular.com/

Highlights

500+ Pre-Optimized Models: Deploy popular models like Llama 3.3, Deepseek, Qwen2.5, and Mistral with individual optimizations for maximum performance
OpenAI API Compatible: Drop-in replacement for OpenAI API with full compatibility for existing applications and tools
Advanced GPU Acceleration: Optimized performance across NVIDIA B200, H200, H100, A100, A10, L40 and L4 GPUs with intelligent batching and memory management

Details

Sold by

Modular

Unlock automation with AI agent solutions

Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.

Explore AI agent solutions

Features and programs

Financing for AWS Marketplace purchases

AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.

View financing details

Pricing

Modular Platform: High-Performance GenAI Serving

Info

View purchase options

This product is available free of charge. Free subscriptions have no end date and may be canceled any time.

Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator to estimate your infrastructure costs.

Vendor refund policy

Please refer to our licensing agreement for more details.

How can we make this page better?

We'd like to hear your feedback and ideas on how to improve this page.

Legal

Vendor terms and conditions

Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Version

Delivery details

max-nvidia

Supported services: Learn more

Amazon ECS
Amazon EKS
Amazon ECS Anywhere
Amazon EKS Anywhere

Container image

Containers are lightweight, portable execution environments that wrap server application software in a filesystem that includes everything it needs to run. Container applications run on supported container runtimes and orchestration services, such as Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). Both eliminate the need for you to install and operate your own container orchestration software by managing and scheduling containers on a scalable cluster of virtual machines.

Version release notes

MAX delivers state-of-the-art performance on NVIDIA Blackwell (B200)!

We've been describing our Blackwell bring-up over a series of blog posts, and we recently published Part 4: Breaking SOTA, in which we share our latest matmul benchmarks compared to NVIDIA's cuBLAS library.

MAX provides industry-leading performance on AMD MI355X!

In a matter of weeks, we got MAX running on the brand new MI255X system and have already produced early benchmarks that go head-to-head with Blackwell. If you have access to an MI355X, you can try it yourself today by following our quickstart guide.

Benchmarking endpoints is easier than ever before the new max benchmark command, which accepts YAML configuration files so you can easily share and reproduce your benchmarks.

Additional details

Usage instructions

Follow these steps within your EC2 instance to launch an LLM using the MAX container on an NVIDIA GPU.

Install Docker. If Docker is not already installed, go to https://docs.docker.com/get-started/get-docker/ and follow the instructions for your operating system.
Start the MAX container. Use the docker run command to launch the container. For example commands and configuration options, see https://docs.modular.com/max/container/#get-started
Test the endpoint. After the container starts, it serves an OpenAI-compatible endpoint on port 8000. You can send a request to the /v1/chat/completions endpoint using a tool like cURL. Be sure to replace any placeholder values such as the model ID and message content.

Next steps. For more configuration options, troubleshooting help, or performance tuning tips, see the full documentation at https://docs.modular.com/max/container

Note: The MAX container is not currently compatible with macOS.

Support

Vendor support

Standard Support (Included): Professional support for deployment, configuration, and optimization questions through our dedicated AWS Marketplace support channel at aws-marketplace@modular.com . Our support team includes AI infrastructure specialists with deep expertise in production deployments.

Enterprise Premium Support: Our expert services team provides end-to-end assistance for large-scale AI infrastructure projects including architecture design, performance optimization, and integration with existing enterprise systems.

To access Enterprise Premium Support or Professional Services, book a call with us: https://modul.ar/talk-to-us . Our enterprise team will design a custom support package tailored to your organization's specific requirements and scale.

AWS infrastructure support

AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Get support

Similar products

Alchemist Software, Consulting & GenAI Services (Private Offer)

By Alchemist

Alchemist specializes in delivering secure, scalable, and cloud-native enterprise software systems for government and commercial clients. We combine deep experience in accredited system development, cloud migrations, and GenAI integration to build production-grade solutions that meet the highest standards of reliability and compliance. From rapid prototypes to full-scale platforms, we deliver at mission speed.

View product

Modular Platform: Code Repo Agent

By Modular

This container deploys a code repository agent that you can chat with via an interactive terminal session to learn more about a given repository.

View product

Apollo GraphOS - unify APIs and microservices in a modular graph

By Apollo GraphQL

The worlds only supergraph platform. See why Netflix, PayPal, Priceline, and Zillow choose Apollo GraphOS to unify their API architecture. Simplify client app development by connecting all data and services in a single GraphQL layer. Keep backend development decoupled and bottleneck-free with modular components called subgraphs that are composed together declaratively.

View product

Modular POS System for Seamless Sales Transactions and Analytics

By zeb

zeb's Modular POS System for Seamless Sales Transactions and Analytics offering is designed to transform retail operations with a custom-built, cloud-based POS solution. Leveraging AWS's robust infrastructure, our solution supports multi-faceted payment processing, real-time inventory tracking, and advanced analytics, ensuring streamlined sales transactions both online and in physical stores. Our expertise in retail and supply chain management enables us to tailor this solution to meet your specific business needs.

View product

Modular Policy Distribution Software for MGAs

By Sonata Software Limited

Transform your insurance operations with Virtual Office (VO) and Policy Administration System (PAS) by Sonata Software and iNube. These modular, cloud-based solutions are designed for MGAs and insurers to streamline policy distribution, enhance agent productivity, and ensure seamless policy lifecycle management. Now available on AWS Marketplace to accelerate your digital transformation.

View product

Customer reviews

Leave a review

Ratings and reviews

Info

0 ratings

5 star

4 star

3 star

2 star

1 star

0 AWS reviews

No customer reviews yet

Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.