Video Podcast Generator - GPU Talking Head Video Creator

GPU-accelerated talking head video generator. Script + headshots = split-screen podcast video with lip sync and karaoke captions. Pre-patched Wav2Lip, ready to use.

View purchase options

Overview

Try agent mode

Create proposal

Ask question

Turn any two-speaker script into a professional split-screen talking head video with GPU-accelerated lip sync and karaoke-style captions.

Features:

Script or topic input via Web UI or CLI
GPU lip sync with pre-patched Wav2Lip
Split-screen and PiP layouts, landscape and portrait (9:16)
Karaoke captions with word-level highlighting
Web UI on port 8080, auto-starts on boot

Text-to-Speech Options:

Default: Microsoft Edge TTS (free, no API key, high quality voices). Note: this is an unofficial API and may occasionally be rate-limited.
Optional: Amazon Polly (reliable, AWS-native). To enable: set tts_backend to polly in config.yaml and attach an IAM role with polly:SynthesizeSpeech permission. Polly costs ~$4 per 1M characters billed to your AWS account.

IMPORTANT - Security & Networking:

The web UI listens on port 8080. By default this is accessible to anyone who can reach the instance.
YOU MUST configure your Security Group to restrict port 8080 access to your IP address only. Do not leave port 8080 open to 0.0.0.0/0 in production.
To restrict: EC2 Console > Security Groups > Edit inbound rules > Set port 8080 source to YOUR_IP/32.
SSH (port 22) should also be restricted to your IP only.
The application includes rate limiting (10 jobs/hour) and input validation as defense-in-depth, but network-level restriction is your primary security control.

Pre-installed: NVIDIA CUDA 11.8, Wav2Lip (patched), model checkpoints, ffmpeg with libass, Python deps pinned, DejaVu fonts.

Recommended instances: g4dn.xlarge ($0.53/hr) or g5.xlarge ($1.01/hr).

Security: CIS-hardened Amazon Linux 2023, auto-updates enabled, non-root application user, SSM Agent for patch management.

Third-party components: This AMI includes open-source software (Wav2Lip, PyTorch, ffmpeg) provided AS-IS. Users are responsible for OS-level security patches on running instances.

Highlights

GPU lip sync with pre-patched Wav2Lip - zero setup required
Split-screen and PiP layouts with karaoke captions
Web UI and CLI - paste script or topic, download MP4

Details

Sold by

Waltsoft Inc.

Introducing multi-product solutions

You can now purchase comprehensive solutions tailored to use cases and industries.

Learn more

Explore multi-product solutions

Features and programs

Financing for AWS Marketplace purchases

AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.

View financing details

Pricing

Video Podcast Generator - GPU Talking Head Video Creator

Info

View purchase options

Pricing and entitlements for this product are managed through an external billing relationship between you and the vendor. You activate the product by supplying a license purchased outside of AWS Marketplace, while AWS provides the infrastructure required to launch the product. AWS Subscriptions have no end date and may be canceled any time. However, the cancellation won't affect the status of the external license.

Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator to estimate your infrastructure costs.

Vendor refund policy

We do not support refunds. You can cancel your subscription anytime through AWS Marketplace. For support, email support@waltsoft.net .

How can we make this page better?

We'd like to hear your feedback and ideas on how to improve this page.

Legal

Vendor terms and conditions

Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Version

Delivery details

64-bit (x86) Amazon Machine Image (AMI)

Amazon Machine Image (AMI)

An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.

Version release notes

All fixes: wav2lip lip sync default, edge-tts v7, fonts, speaker labels, multi-speaker parsing, model checkpoints pre-downloaded.

Additional details

Usage instructions

Launch on GPU instance (g4dn.xlarge or g5.xlarge)
SECURITY: Restrict Security Group port 8080 to YOUR IP only
Open http://INSTANCE_IP:8080
Upload Speaker A image (Andrew - Male Voice) and Speaker B image (Ava - Female Voice)
Enter script in SPEAKER: text format, example: Andrew: Welcome to the show. Ava: Thanks for having me.
Click Generate. Lip-synced video ready in 1-2 minutes.

Support

Vendor support

Email support at support@waltsoft.net . Documentation included in the product.

AWS infrastructure support

AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Get support

Similar products

Jitsi Meet Video Conferencing Server on Ubuntu

By Cloud Infrastructure Services

Secure, Simple and Scalable Video Conferences. A Zoom alternative, Jitsi Meet is an open-source (Apache) WebRTC JavaScript application that uses Jitsi Videobridge to provide high quality, secure and scalable video conferences.

View product

Callaba: Cloud Video Production Suite

By Callaba Cloud

Callaba, an award-winning solution, provides high-level cloud-based NDI®, SRT, accessible from any location, enabling broadcasters to enhance content while reducing costs and environmental impact. Built for both cloud and hybrid setups, Callaba adapts smoothly to changing production demands. Whether you're using it in the cloud or on-premises, Callaba allows you to leverage your best talent, integrate unlimited NDI® sources for interactive content, lower travel costs, and easily scale production capacity to meet short-term or long-term needs. Get the best and most advanced technologies for cloud video production, online conferencing, content delivery, monetization, and video streaming API - all with just one subscription. A complete solution for live and on-demand video, Callaba offers a professional suite for online events and internet broadcasting through a simple UI.

View product

Fortinet FortiRecorder-VM Network Video Security

By Fortinet Inc.

The Fortinet FortiRecorder-VM is the core of a network video surveillance solution offering secure access to live and recorded camera streams from anywhere there is an Internet connection. View it on mobile devices or work with the sophisticated FortiRecorder Central Video Management System.

View product

DeepField-SR Video Super Resolution

By MEGAZONECLOUD Corporation

DeepField-SR is a fixed functional hardware accelerator leveraging FPGA to offer the highest computational efficiency for Video Super Resolution.

View product

AMD Xilinx Video SDK AMI for VT1 Instances (Ubuntu 22.04 LTS)

By AMD Xilinx

The AMD Xilinx Video SDK is a complete software stack allowing users to seamlessly leverage the hardware accelerated features of AMD Xilinx video codecs such as the ones available on Amazon EC2 VT1 instances.

View product

Customer reviews

Leave a review

Ratings and reviews

Info

0 ratings

5 star

4 star

3 star

2 star

1 star

0 reviews

No customer reviews yet

Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.