Guidance for Media Super Resolution on AWS
Machine learning powers video resolution upscaling
Overview
How it works
These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Operational Excellence
This Guidance complements your operational needs through end-to-end visibility and scalable automation across the video upscaling pipeline. For example, Amazon CloudWatch metrics and DynamoDB task tracking provide process oversight to monitor performance, help you identify issues, and troubleshoot root causes. Lambda facilitates automated, zero-downtime deployments, so infrastructure updates happen seamlessly without manual overhead. Additionally, ParallelCluster automates consistent infrastructure provisioning across environments using infrastructure as code (IaC) for simplified change control. And lastly, Amazon Elastic Container Registry (Amazon ECR) centralizes machine learning (ML) model containers for easy large-scale inference deployments using a unified script.
Security
Robust security protections span the entire workflow while simplifying access control. Input and output Amazon S3 buckets use presigned URLs or AWS Identity and Access Management (IAM) roles to grant temporary access, keeping data locked down. Authenticated users connect through Amazon Cognito to the private ALB. ParallelCluster seals ML inferencing within an isolated VPC, reachable only through Systems Manager. Also, administrators can restrict access to the FSx for Lustre file system with VPC security groups, and the data can be encrypted through AWS KMS. Lastly, AWS CloudTrail centralizes activity logging for audit visibility.
Reliability
The serverless frontend architecture provided by Elastic Load Balancing (ELB), Fargate, and DynamoDB provides high availability within a Region. In addition, you can deploy the transcoding and ML compute nodes across multiple Availability Zones (AZs) in a highly available manner. You can also deploy the Slurm controllers in a primary or secondary model across multiple AZs for resilience against failures. The FSx for Lustre file system stores data in cost-optimized storage for short-term, process-heavy workloads, such as transcoding. However, the source and final material are stored in Amazon S3 for high durability.
Performance Efficiency
This Guidance achieves scalable performance efficiency by using ParallelCluster, which auto-scales GPU resources to match dynamic batch processing needs, avoiding overprovisioning costs. Just-in-time job placement further optimizes infrastructure utility by intelligently assigning video workflows across the heterogeneous cluster. Lambda functions scale the invocation count for video frame extractions, while provisioned concurrency guarantees low-latency responses. Finally, using a shared FSx for Lustre file system across the cluster provides low-latency with read and/or write access for individual video frames.
Cost Optimization
The event-driven architecture behind this Guidance means that compute and network resources are consumed only when needed. Additionally, Lambda only bills by the millisecond of processing time, while ParallerCluster compute nodes can be configured to reduce capacity to zero when there are no jobs in the queue, saving you compute costs.
Sustainability
Achieving sustainability requires optimizing resources and infrastructure. The ParallelCluster Slurm scheduler enables intelligent job placement matching specific workload needs. Video extraction and transcoding tasks are efficiently assigned to cost-effective CPU nodes, while GPU fleets are reserved just for compute-intensive upscaling. This minimizes over-provisioned resources, speeds time to output, and lowers energy demands.
Disclaimer
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages