- Events
- Dive Deep into Mixture of Experts and Disaggregated architecture on vLLM & NVIDIA
Dive Deep into Mixture of Experts and Disaggregated architecture on vLLM & NVIDIA
AWS GenAI Loft | San Francisco
-
-
PERSÖNLICH
English
300 – Fortgeschritten
Hear from vLLM contributors (Meta), NVIDIA Dynamo and AWS and explore Mixture of Experts architecture and the key benefits of separating the prefill and decode phase (disaggregated architecture) of a LLM model, enabling independent scaling and optimization of each. You'll learn how expert parallel and disaggregated system architectures can dramatically improve GPU utilization, cost-efficiency, and scalability - critical piece of infra for deploying LLMs, chatbots, and agentic workflows at scale.
Drawing on DeepSeek's innovation and our joint collaboration with Pytorch, vLLM, and NVIDIA on AWS, we'll share insights into the architecture, implementation details, and the steps we took to optimize vLLM disaggregation for performance.
Guest Instruction: Important information for your guests' confirmation emails and to show on your registration page:
It is mandatory to bring valid physical government ID to enter the AWS Builder loft (No Digital IDs will be accepted).
There is no scooter/bike parking available in the building. If you plan to bring one, you will need to find parking options nearby.
AWS GenAI loft, 525 Market Street, 2nd Floor Courtyard Entrance, San Francisco
Attendees should enter via the courtyard entrance (up the stairs by the circular water fountain). For accessible entry, building staff will provide elevator access - please enter by the reception desk.
By registering for this event, you acknowledge that you are 18 years or older and agree to follow the AWS Community Codes of Conduct and the AWS Event Terms and Conditions .