Overview
Turn any two-speaker script into a professional split-screen talking head video with GPU-accelerated lip sync and karaoke-style captions.
Features:
- Script or topic input via Web UI or CLI
- GPU lip sync with pre-patched Wav2Lip
- Split-screen and PiP layouts, landscape and portrait (9:16)
- Karaoke captions with word-level highlighting
- Web UI on port 8080, auto-starts on boot
Text-to-Speech Options:
- Default: Microsoft Edge TTS (free, no API key, high quality voices). Note: this is an unofficial API and may occasionally be rate-limited.
- Optional: Amazon Polly (reliable, AWS-native). To enable: set tts_backend to polly in config.yaml and attach an IAM role with polly:SynthesizeSpeech permission. Polly costs ~$4 per 1M characters billed to your AWS account.
IMPORTANT - Security & Networking:
- The web UI listens on port 8080. By default this is accessible to anyone who can reach the instance.
- YOU MUST configure your Security Group to restrict port 8080 access to your IP address only. Do not leave port 8080 open to 0.0.0.0/0 in production.
- To restrict: EC2 Console > Security Groups > Edit inbound rules > Set port 8080 source to YOUR_IP/32.
- SSH (port 22) should also be restricted to your IP only.
- The application includes rate limiting (10 jobs/hour) and input validation as defense-in-depth, but network-level restriction is your primary security control.
Pre-installed: NVIDIA CUDA 11.8, Wav2Lip (patched), model checkpoints, ffmpeg with libass, Python deps pinned, DejaVu fonts.
Recommended instances: g4dn.xlarge ($0.53/hr) or g5.xlarge ($1.01/hr).
Security: CIS-hardened Amazon Linux 2023, auto-updates enabled, non-root application user, SSM Agent for patch management.
Third-party components: This AMI includes open-source software (Wav2Lip, PyTorch, ffmpeg) provided AS-IS. Users are responsible for OS-level security patches on running instances.
Highlights
- GPU lip sync with pre-patched Wav2Lip - zero setup required
- Split-screen and PiP layouts with karaoke captions
- Web UI and CLI - paste script or topic, download MP4
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Vendor refund policy
We do not support refunds. You can cancel your subscription anytime through AWS Marketplace. For support, email support@waltsoft.net .
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
All fixes: wav2lip lip sync default, edge-tts v7, fonts, speaker labels, multi-speaker parsing, model checkpoints pre-downloaded.
Additional details
Usage instructions
- Launch on GPU instance (g4dn.xlarge or g5.xlarge)
- SECURITY: Restrict Security Group port 8080 to YOUR IP only
- Open http://INSTANCE_IP:8080
- Upload Speaker A image (Andrew - Male Voice) and Speaker B image (Ava - Female Voice)
- Enter script in SPEAKER: text format, example: Andrew: Welcome to the show. Ava: Thanks for having me.
- Click Generate. Lip-synced video ready in 1-2 minutes.
Support
Vendor support
Email support at support@waltsoft.net . Documentation included in the product.
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products


