Overview
dstack is a streamlined alternative to Kubernetes and Slurm, specifically designed for AI. It simplifies container orchestration for AI workloads both in the cloud and on-prem, speeding up the development, training, and deployment of AI models.
dstack is easy to use with any cloud providers as well as on-prem servers.
dstack supports NVIDIA GPU, AMD GPU, and Google Cloud TPU out of the box.
Highlights
- dstack is a streamlined alternative to Kubernetes and Slurm, designed to simplify the development and deployment of AI.
 - It simplifies container orchestration for AI workloads across multiple clouds and on-prem, speeding up the development, training, and deployment of AI models.
 - dstack enables AI teams to work with any tools, frameworks, and hardware across multiple cloud platforms and on-premises.
 
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
- Monthly subscription
 - $3,000.00/month
 
Vendor refund policy
No refund
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Container image
- Amazon EKS Anywhere
 - Amazon ECS
 - Amazon EKS
 - Amazon ECS Anywhere
 
Container image
Containers are lightweight, portable execution environments that wrap server application software in a filesystem that includes everything it needs to run. Container applications run on supported container runtimes and orchestration services, such as Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). Both eliminate the need for you to install and operate your own container orchestration software by managing and scheduling containers on a scalable cluster of virtual machines.
Version release notes
Clusters
Simplified use of MPI
startup_order and stop_criteria
New run configuration properties are introduced:
- startup_order: any/master-first/workers-first specifies the order in which master and workers jobs are started.
 - stop_criteria: all-done/master-done specifies the criteria when a multi-node run should be considered finished.
 
These properties simplify running certain multi-node workloads. For example, MPI requires that workers are up and running when the master runs mpirun, so you'd use startup_order: workers-first. MPI workload can be considered done when the master is done, so you'd use stop_criteria: master-done and dstack won't wait for workers to exit.
DSTACK_MPI_HOSTFILE
dstack now automatically creates an MPI hostfile and exposes the DSTACK_MPI_HOSTFILE environment variable with the hostfile path. It can be used directly as mpirun --hostfile $DSTACK_MPI_HOSTFILE.
CLI
We've also updated how the CLI displays run and job status. Previously, the CLI displayed the internal status code which was hard to interpret. Now, the the STATUS column in dstack ps and dstack apply displays a status code which is easy to understand why run or job was terminated.
dstack ps -n 10 NAME BACKEND RESOURCES PRICE STATUS SUBMITTED oom-task no offers yesterday oom-task nebius (eu-north1) cpu=2 mem=8GB disk=100GB $0.0496 exited (127) yesterday oom-task nebius (eu-north1) cpu=2 mem=8GB disk=100GB $0.0496 exited (127) yesterday heavy-wolverine-1 done yesterday replica=0 job=0 aws (us-east-1) cpu=4 mem=16GB disk=100GB T4:16GB:1 $0.526 exited (0) yesterday replica=0 job=1 aws (us-east-1) cpu=4 mem=16GB disk=100GB T4:16GB:1 $0.526 exited (0) yesterday cursor nebius (eu-north1) cpu=2 mem=8GB disk=100GB $0.0496 stopped yesterday cursor nebius (eu-north1) cpu=2 mem=8GB disk=100GB $0.0496 error yesterday cursor nebius (eu-north1) cpu=2 mem=8GB disk=100GB $0.0496 interrupted yesterday cursor nebius (eu-north1) cpu=2 mem=8GB disk=100GB $0.0496 aborted yesterdayExamples
Simplified NCCL tests
With this release improvements, it became much easier to run MPI workloads with dstack. This includes NCCL tests that can now be run using the following configuration:
type: task name: nccl-tests nodes: 2 startup_order: workers-first stop_criteria: master-done image: dstackai/efa env: - NCCL_DEBUG=INFO commands: - cd /root/nccl-tests/build - | if [ ${DSTACK_NODE_RANK} -eq 0 ]; then mpirun \ --allow-run-as-root --hostfile $DSTACK_MPI_HOSTFILE \ -n ${DSTACK_GPUS_NUM} \ -N ${DSTACK_GPUS_PER_NODE} \ --mca btl_tcp_if_exclude lo,docker0 \ --bind-to none \ ./all_reduce_perf -b 8 -e 8G -f 2 -g 1 else sleep infinity fi resources: gpu: nvidia:4:16GB shm_size: 16GBSee the updated NCCL tests example for more details.
Distributed training
TRL
The new TRL example walks you through how to run distributed fine-tune using TRL , Accelerate and Deepspeed .
Axolotl
The new Axolotl example walks you through how to run distributed fine-tune using Axolotl  with dstack.
What's changed
- [Feature] Update .gitignore logic to catch more cases by @colinjc in https://github.com/dstackai/dstack/pull/2695Â
 - [Bug] Increase upload_code client timeout by @r4victor in https://github.com/dstackai/dstack/pull/2709Â
 - [Bug] Fix missing apt-get update by @r4victor in https://github.com/dstackai/dstack/pull/2710Â
 - [Internal]: Update git hooks and package.json by @olgenn in https://github.com/dstackai/dstack/pull/2706Â
 - [Examples] Add distributed Axolotl and TRL example by @Bihan in https://github.com/dstackai/dstack/pull/2703Â
 - [Docs] Update dstack-proxy contributing guide by @jvstme in https://github.com/dstackai/dstack/pull/2683Â
 - [Feature] Implement DSTACK_MPI_HOSTFILE by @r4victor in https://github.com/dstackai/dstack/pull/2718Â
 - [Feature] Implement startup_order and stop_criteria by @r4victor in https://github.com/dstackai/dstack/pull/2714Â
 - [Bug] Fix CLI exiting while master starting by @r4victor in https://github.com/dstackai/dstack/pull/2720Â
 - [Examples] Simplify NCCL tests example by @r4victor in https://github.com/dstackai/dstack/pull/2723Â
 - [Examples] Update TRL Single Node example to uv by @Bihan in https://github.com/dstackai/dstack/pull/2715Â
 - [Bug] Fix backward compatibility when creating fleets by @jvstme in https://github.com/dstackai/dstack/pull/2727Â
 - [UX]: Make run status in UI and CLI easier to understand by @peterschmidt85 in https://github.com/dstackai/dstack/pull/2716Â
 - [Bug] Fix relative paths in dstack apply --repo by @jvstme in https://github.com/dstackai/dstack/pull/2733Â
 - [Internal]: Drop hardcoded regions from the backend template by @jvstme in https://github.com/dstackai/dstack/pull/2734Â
 - [Internal]: Update backend template to match ruff formatting by @jvstme in https://github.com/dstackai/dstack/pull/2735Â
 
Full changelog: https://github.com/dstackai/dstack/compare/0.19.11...0.19.12Â
Additional details
Usage instructions
Here's the most simple way to run the container image:
- Login to the Container Repo Hub
 
- Pull the container image:
 
- Run the Container Image
 
- 
Click the URL in the container output (e.g., http://localhost:3000Â ).
 - 
Copy the admin token from the container output to log in to the UI
 
For more advanced deployment configurations, check https://dstack.ai/docs/guides/server-deployment/Â
dstack Standard is fully compatible with the open-source CLI of dstack. More details can be found at dstack documentation: https://dstack.ai/docs/Â .
Support
Vendor support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.