Overview
The Modular Stack
A unified platform for AI development and deployment, including MAX and Mojo.
The Modular Stack
Write once, deploy everywhere
MAX Deployment Instructions
The Modular Platform is an open and fully-integrated suite of AI libraries and tools that accelerates model serving and scales GenAI deployments. It abstracts away hardware complexity so you can run the most popular open models with industry-leading GPU and CPU performance without any code changes.
Our ready-to-deploy Docker container removes the complexity of deploying your own GenAI endpoint. And unlike other serving solutions, Modular enables customization across the entire stack. You can customize everything from the serving pipeline and model architecture all the way down to the metal by writing custom ops and GPU kernels in Mojo. Most importantly, Modular is hardware-agnostic and free from vendor lock-in no CUDA require so your code runs seamlessly across diverse systems.
MAX is a high-performance AI serving framework tailored for GenAI workloads. It provides low-latency, high-thoughput inference via advanced model serving optimizations like prefix caching and speculative decoding. An OpenAI-compatible serving endpoint executes native MAX and PyTorch models across GPUs and CPUs, and can be customized at the model and kernel level.
The MAX Container (max-nvidia-full) is a Docker image that packages the MAX Platform, pre-configured to serve hundreds of popular GenAI models on NVIDIA GPUs. This container is ideal for users seeking a fully optimized, out-of-the-box solution for deploying AI models.
Key capabilities include:
- High-performance serving: Serve 500+ AI models from Hugging Face with industry-leading performance across NVIDIA GPUs
- Flexible, portable serving: Deploy with a single Docker container across various GPUs (B200, H200, H100, A100, A10, L40 and L4) and compute services (EC2, EKS, AWS Batch, etc.) without compatibility issues.
- OpenAI API Compatibility: Seamlessly integrate with applications adhering to the OpenAI API specification.
For detailed information on container contents and instance compatibility, refer to the MAX Containers Documentation (https://docs.modular.com/max/container ).
To access our full Modular platform, check out https://www.modular.com/Â
Highlights
- 500+ Pre-Optimized Models: Deploy popular models like Llama 3.3, Deepseek, Qwen2.5, and Mistral with individual optimizations for maximum performance
- OpenAI API Compatible: Drop-in replacement for OpenAI API with full compatibility for existing applications and tools
- Advanced GPU Acceleration: Optimized performance across NVIDIA B200, H200, H100, A100, A10, L40 and L4 GPUs with intelligent batching and memory management
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Vendor refund policy
Please refer to our licensing agreement for more details.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
max-nvidia
- Amazon ECS
- Amazon EKS
- Amazon ECS Anywhere
- Amazon EKS Anywhere
Container image
Containers are lightweight, portable execution environments that wrap server application software in a filesystem that includes everything it needs to run. Container applications run on supported container runtimes and orchestration services, such as Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). Both eliminate the need for you to install and operate your own container orchestration software by managing and scheduling containers on a scalable cluster of virtual machines.
Version release notes
MAX delivers state-of-the-art performance on NVIDIA Blackwell (B200)!
We've been describing our Blackwell bring-up over a series of blog posts, and we recently published Part 4: Breaking SOTA, in which we share our latest matmul benchmarks compared to NVIDIA's cuBLAS library.
MAX provides industry-leading performance on AMD MI355X!
In a matter of weeks, we got MAX running on the brand new MI255X system and have already produced early benchmarks that go head-to-head with Blackwell. If you have access to an MI355X, you can try it yourself today by following our quickstart guide.
Benchmarking endpoints is easier than ever before the new max benchmark command, which accepts YAML configuration files so you can easily share and reproduce your benchmarks.
Additional details
Usage instructions
Follow these steps within your EC2 instance to launch an LLM using the MAX container on an NVIDIA GPU.
-
Install Docker. If Docker is not already installed, go to https://docs.docker.com/get-started/get-docker/Â and follow the instructions for your operating system.
-
Start the MAX container. Use the docker run command to launch the container. For example commands and configuration options, see https://docs.modular.com/max/container/#get-startedÂ
-
Test the endpoint. After the container starts, it serves an OpenAI-compatible endpoint on port 8000. You can send a request to the /v1/chat/completions endpoint using a tool like cURL. Be sure to replace any placeholder values such as the model ID and message content.
Next steps. For more configuration options, troubleshooting help, or performance tuning tips, see the full documentation at https://docs.modular.com/max/containerÂ
Note: The MAX container is not currently compatible with macOS.
Support
Vendor support
Standard Support (Included): Professional support for deployment, configuration, and optimization questions through our dedicated AWS Marketplace support channel at  aws-marketplace@modular.com . Our support team includes AI infrastructure specialists with deep expertise in production deployments.
Enterprise Premium Support: Our expert services team provides end-to-end assistance for large-scale AI infrastructure projects including architecture design, performance optimization, and integration with existing enterprise systems.
To access Enterprise Premium Support or Professional Services, book a call with us: https://modul.ar/talk-to-us . Our enterprise team will design a custom support package tailored to your organization's specific requirements and scale.
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products
