Overview
This is a repackaged software product wherein additional charges apply for hardening, security configuration, and support.
WHAT IS VLLM
vLLM is an open-source, high-throughput inference and serving engine for large language models, built in Python on top of PyTorch. It implements PagedAttention, continuous batching, and tensor parallelism to serve any HuggingFace-compatible transformer model (Llama, Mistral, Qwen, Phi, Gemma, OPT, GPT-J, MPT, Falcon, and 100+ more) through a fully OpenAI-compatible REST API. Any client built for OpenAI (openai-python, openai-node, LangChain, LlamaIndex, AnythingLLM, the OpenAI ChatGPT SDK) connects unchanged - just point the base URL at this instance and pass the local Bearer token. This AMI ships the CPU build of vLLM 0.21.0, bundled with Open WebUI 0.9.5 as a browser chat front end pre-wired to the local vLLM. Persists nothing externally - models cache to /var/lib/vllm/hf-cache, chats and accounts to /var/lib/open-webui. Apache-2.0 (vLLM) and MIT (Open WebUI), no vendor lock-in.
WHAT THIS AMI ADDS
Security hardening:
- Random 32-byte API key generated at first boot, written to /root/vllm-credentials.txt - never baked into the AMI; the same key is injected into Open WebUI so the chat UI authenticates to vLLM transparently
- vLLM API server bound to 127.0.0.1:8000 only - reachable only through Nginx with TLS, with --api-key Bearer auth enforced on every /v1/* request
- Open WebUI bound to 127.0.0.1:8080 only - reachable only through Nginx with TLS
- First registered user in Open WebUI becomes the workspace administrator; no admin baked in
- Nginx reverse proxy with TLS, HTTP-to-HTTPS redirect, WebSocket upgrade for streaming chat, security headers (X-Content-Type-Options, X-Frame-Options, Referrer-Policy)
- Loading splash page served while the model warms up on first request
- Anonymous telemetry disabled (VLLM_NO_USAGE_STATS, DO_NOT_TRACK, ANONYMIZED_TELEMETRY)
- UFW firewall pre-configured - only TCP 22, 80, 443 are exposed; 8000 and 8080 explicitly denied
- fail2ban, AppArmor
- CVE scan - every image is scanned for vulnerabilities before release
Out of the box, with no external services:
- Default tiny model facebook/opt-125m (~250 MB) loaded at first boot - chat and API work immediately, no HuggingFace token required
- Swap to any HuggingFace-compatible model by editing /etc/vllm/server.env and restarting the service - no rebuild needed
- No third-party LLM API keys baked in or required - everything runs on-instance
OS hardening (CIS Level 1):
- CIS Ubuntu 24.04 LTS Level 1 benchmark applied via ansible-lockdown
- auditd, SSH hardening, kernel hardening, IMDSv2 enforced
Compliance artifacts:
- SBOM - CycloneDX 1.6 at /etc/lynxroute/sbom.json
- CIS Conformance Report at /etc/lynxroute/cis-report.html
- CIS Tailored Profile at /usr/share/doc/lynxroute/CIS_TAILORED_PROFILE.md
Highlights
- vLLM security baked in: random 32-byte API key generated at first boot, vLLM and Open WebUI bound to 127.0.0.1 behind Nginx TLS, Bearer-token auth enforced on every /v1 request - unlike bare vLLM AMIs that expose port 8000 on 0.0.0.0 with no auth and no TLS.
- CIS Level 1 hardened Ubuntu 24.04 LTS: auditd, fail2ban, AppArmor, SSH key-only, IMDSv2 enforced. CVE-scanned before every release. SBOM (CycloneDX) and CIS Conformance Report included.
- Two ways to use the same engine: browser chat via Open WebUI for analysts and non-developers, OpenAI-compatible REST API at /v1 for openai-python, LangChain, LlamaIndex, AnythingLLM, and any other OpenAI client. Apache-2.0 (vLLM) and MIT (Open WebUI) - fully auditable, no vendor lock-in.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Cost/hour |
|---|---|
m6i.xlarge Recommended | $0.05 |
t3.large | $0.03 |
m6i.large | $0.03 |
m6i.2xlarge | $0.07 |
Vendor refund policy
We do not offer refunds for this product. AWS infrastructure charges (EC2, EBS, data transfer) are billed separately by AWS and are not refundable by us.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
Version 0.21.0 - Initial release (May 2026)
- vLLM 0.21.0 (CPU build from upstream wheel) + Open WebUI 0.9.5 on Ubuntu 24.04 LTS
- CIS Level 1 hardening applied (ansible-lockdown/UBUNTU24-CIS)
- CVE-scanned before every release
- Random 32-byte API key generated at first boot - written to /root/vllm-credentials.txt and shared between vLLM and Open WebUI
- vLLM API server bound to 127.0.0.1:8000 with --api-key Bearer auth on every /v1/* request
- Open WebUI bound to 127.0.0.1:8080, pre-wired to local vLLM via OPENAI_API_BASE_URL
- First registered user in Open WebUI becomes the workspace administrator
- Default model facebook/opt-125m preloaded - chat and API work immediately, no HuggingFace token required
- Nginx reverse proxy with self-signed TLS, HTTP-to-HTTPS redirect, WebSocket upgrade for streaming chat
- Loading splash page during model warmup
- Anonymous telemetry disabled (VLLM_NO_USAGE_STATS, DO_NOT_TRACK, ANONYMIZED_TELEMETRY)
- UFW firewall pre-configured (TCP 22, 80, 443 only; 8000 and 8080 explicitly denied)
- fail2ban, auditd, AppArmor pre-configured
- SBOM (CycloneDX 1.6) at /etc/lynxroute/sbom.json
- CIS Conformance Report (OpenSCAP) at /etc/lynxroute/cis-report.html
- IMDSv2 enforced
Additional details
Usage instructions
- Launch instance (m6i.xlarge recommended; minimum t3.large or m6i.large with 8 GB RAM)
- Open Security Group - allow TCP 443 from YOUR IP only until you have registered as the admin
- Open https://<PUBLIC_IP>/ in your browser - accept the self-signed certificate warning
- Click "Sign up" and register with YOUR real email - the first registered user becomes the workspace administrator
- Start chatting; the default model facebook/opt-125m is preloaded
- SSH if needed: ssh -i key.pem ubuntu@<PUBLIC_IP> ; credentials in /root/vllm-credentials.txt
To call the OpenAI-compatible API directly: curl https://<PUBLIC_IP>/v1/models -H "Authorization: Bearer <API_KEY>" -k The API key is the Bearer token shown in /root/vllm-credentials.txt.
To serve a different HuggingFace-compatible model: sudo nano /etc/vllm/server.env # change VLLM_MODEL=<huggingface-id> sudo systemctl restart vllm
After registration, restrict Security Group TCP 443 to your team's IP range. Replace the self-signed TLS certificate with a CA-signed certificate for production use.
Resources
Vendor resources
Support
Vendor support
Visit us online: https://lynxroute.com
For vLLM documentation: https://docs.vllm.ai/en/latest/getting_started/quickstart/ For vLLM upstream issues: https://github.com/vllm-project/vllm/issues For Open WebUI documentation: https://docs.openwebui.com/getting-started/ For Open WebUI upstream issues:
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
