vLLM + Open WebUI - Hardened Self-Hosted OpenAI-Compatible LLM Server

This product has charges associated with it for hardening, security configuration, and support. vLLM is a high-throughput OpenAI-compatible inference server for open-source LLMs (Python + PyTorch CPU build), bundled with Open WebUI as a browser chat front end. Unlike bare vLLM AMIs that ship without TLS, with the API server on 0.0.0.0:8000, with no Bearer-token auth, and no web UI for non-developers, this Lynxroute build is ready out of the box: random 32-byte API key generated at first boot, vLLM and Open WebUI bound to loopback behind Nginx TLS, a default tiny model (facebook/opt-125m) preloaded so the API and chat work immediately, UFW firewall pre-configured, and a CIS Level 1 hardened Ubuntu 24.04 LTS base. Apache-2.0 (vLLM) and MIT (Open WebUI) - fully auditable, no vendor lock-in.

View purchase options

Try for free

Overview

Try agent mode

Create proposal

Ask question

This is a repackaged software product wherein additional charges apply for hardening, security configuration, and support.

WHAT IS VLLM

vLLM is an open-source, high-throughput inference and serving engine for large language models, built in Python on top of PyTorch. It implements PagedAttention, continuous batching, and tensor parallelism to serve any HuggingFace-compatible transformer model (Llama, Mistral, Qwen, Phi, Gemma, OPT, GPT-J, MPT, Falcon, and 100+ more) through a fully OpenAI-compatible REST API. Any client built for OpenAI (openai-python, openai-node, LangChain, LlamaIndex, AnythingLLM, the OpenAI ChatGPT SDK) connects unchanged - just point the base URL at this instance and pass the local Bearer token. This AMI ships the CPU build of vLLM 0.21.0, bundled with Open WebUI 0.9.5 as a browser chat front end pre-wired to the local vLLM. Persists nothing externally - models cache to /var/lib/vllm/hf-cache, chats and accounts to /var/lib/open-webui. Apache-2.0 (vLLM) and MIT (Open WebUI), no vendor lock-in.

WHAT THIS AMI ADDS

Security hardening:

Random 32-byte API key generated at first boot, written to /root/vllm-credentials.txt - never baked into the AMI; the same key is injected into Open WebUI so the chat UI authenticates to vLLM transparently
vLLM API server bound to 127.0.0.1:8000 only - reachable only through Nginx with TLS, with --api-key Bearer auth enforced on every /v1/* request
Open WebUI bound to 127.0.0.1:8080 only - reachable only through Nginx with TLS
First registered user in Open WebUI becomes the workspace administrator; no admin baked in
Nginx reverse proxy with TLS, HTTP-to-HTTPS redirect, WebSocket upgrade for streaming chat, security headers (X-Content-Type-Options, X-Frame-Options, Referrer-Policy)
Loading splash page served while the model warms up on first request
Anonymous telemetry disabled (VLLM_NO_USAGE_STATS, DO_NOT_TRACK, ANONYMIZED_TELEMETRY)
UFW firewall pre-configured - only TCP 22, 80, 443 are exposed; 8000 and 8080 explicitly denied
fail2ban, AppArmor
CVE scan - every image is scanned for vulnerabilities before release

Out of the box, with no external services:

Default tiny model facebook/opt-125m (~250 MB) loaded at first boot - chat and API work immediately, no HuggingFace token required
Swap to any HuggingFace-compatible model by editing /etc/vllm/server.env and restarting the service - no rebuild needed
No third-party LLM API keys baked in or required - everything runs on-instance

OS hardening (CIS Level 1):

CIS Ubuntu 24.04 LTS Level 1 benchmark applied via ansible-lockdown
auditd, SSH hardening, kernel hardening, IMDSv2 enforced

Compliance artifacts:

SBOM - CycloneDX 1.6 at /etc/lynxroute/sbom.json
CIS Conformance Report at /etc/lynxroute/cis-report.html
CIS Tailored Profile at /usr/share/doc/lynxroute/CIS_TAILORED_PROFILE.md

Highlights

vLLM security baked in: random 32-byte API key generated at first boot, vLLM and Open WebUI bound to 127.0.0.1 behind Nginx TLS, Bearer-token auth enforced on every /v1 request - unlike bare vLLM AMIs that expose port 8000 on 0.0.0.0 with no auth and no TLS.
CIS Level 1 hardened Ubuntu 24.04 LTS: auditd, fail2ban, AppArmor, SSH key-only, IMDSv2 enforced. CVE-scanned before every release. SBOM (CycloneDX) and CIS Conformance Report included.
Two ways to use the same engine: browser chat via Open WebUI for analysts and non-developers, OpenAI-compatible REST API at /v1 for openai-python, LangChain, LlamaIndex, AnythingLLM, and any other OpenAI client. Apache-2.0 (vLLM) and MIT (Open WebUI) - fully auditable, no vendor lock-in.

Details

Sold by

Lynxroute

Introducing multi-product solutions

You can now purchase comprehensive solutions tailored to use cases and industries.

Learn more

Explore multi-product solutions

Features and programs

Financing for AWS Marketplace purchases

AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.

View financing details

Pricing

Free trial

Try for free

Try this product free for 5 days according to the free trial terms set by the vendor. Usage-based pricing is in effect for usage beyond the free trial terms. Your free trial gets automatically converted to a paid subscription when the trial ends, but may be canceled any time before that.

vLLM + Open WebUI - Hardened Self-Hosted OpenAI-Compatible LLM Server

Info

View purchase options

Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.

Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator to estimate your infrastructure costs.

Usage costs (4)

Info

Dimension	Cost/hour
m6i.xlarge Recommended	$0.05
m6i.2xlarge	$0.05
m6i.large	$0.05
t3.large	$0.05

Vendor refund policy

We do not offer refunds for this product. AWS infrastructure charges (EC2, EBS, data transfer) are billed separately by AWS and are not refundable by us.

How can we make this page better?

Tell us how we can improve this page, or report an issue with this product.

Legal

Vendor terms and conditions

Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Delivery details

64-bit (x86) Amazon Machine Image (AMI)

Amazon Machine Image (AMI)

An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.

Version release notes

vLLM 0.24.0 + Open WebUI 0.9.6

vLLM upgraded to 0.24.0 (from 0.23.0) - OpenAI-compatible LLM inference server with a chat UI. Additive API server changes (API-key auth, CORS, model metadata); no torch/CUDA bump (torch 2.11.0 unchanged). Open WebUI remains at 0.9.6.
Certbot pre-installed - enable a trusted HTTPS certificate with one command: sudo certbot --nginx -d yourdomain.com
Rebuilt on the latest CIS Level 1 hardened Ubuntu 24.04 LTS base

Additional details

Usage instructions

Launch instance (m6i.xlarge recommended; minimum t3.large or m6i.large with 8 GB RAM)
Open Security Group - allow TCP 443 from YOUR IP only until you have registered as the admin
Open https://<PUBLIC_IP>/ in your browser - accept the self-signed certificate warning
Click "Sign up" and register with YOUR real email - the first registered user becomes the workspace administrator
Start chatting; the default model facebook/opt-125m is preloaded
SSH if needed: ssh -i key.pem ubuntu@<PUBLIC_IP> ; credentials in /root/vllm-credentials.txt

To call the OpenAI-compatible API directly: curl https://<PUBLIC_IP>/v1/models -H "Authorization: Bearer <API_KEY>" -k The API key is the Bearer token shown in /root/vllm-credentials.txt.

To serve a different HuggingFace-compatible model: sudo nano /etc/vllm/server.env # change VLLM_MODEL=<huggingface-id> sudo systemctl restart vllm

After registration, restrict Security Group TCP 443 to your team's IP range. Replace the self-signed TLS certificate with a CA-signed certificate for production use.

Resources

Vendor resources

WebUI documentation

vLLM documentation

Support

Vendor support

Visit us online: https://lynxroute.com

For vLLM documentation: https://docs.vllm.ai/en/latest/getting_started/quickstart/ For vLLM upstream issues: https://github.com/vllm-project/vllm/issues For Open WebUI documentation: https://docs.openwebui.com/getting-started/ For Open WebUI upstream issues:

Get support

AWS infrastructure support

AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Get support

Similar products

vLLM | Support by cloudimg

By cloudimg

This product has charges associated with it for seller support. vLLM, the high-throughput OpenAI-compatible LLM inference and serving engine, preinstalled with NVIDIA GPU acceleration behind an nginx reverse proxy on port 80 and secured by a unique API key generated on first boot. Backed by 24/7 cloudimg support.

View product

vLLM Server - High-Throughput LLM Inference Engine with Support

By Waltsoft Inc.

This product has charges associated with it for deployment, configuration, and ongoing support. Deploy a production-ready vLLM inference endpoint with OpenAI-compatible API and continuous batching on EC2.

View product

S4 KV - int4 KV-Cache Compression for vLLM GPU Serving

By abyo software

Serve more concurrent users and longer context per GPU. A vLLM v1 KV-cache backend that quantizes the KV cache to int4 (per-channel KEY, per-token VALUE), bit-identical to fp16 - about 3x more KV density.

View product

HAMi - GPU Virtualization & Unified Scheduling for K8s

By dynamia.ai

HAMi is an open-source Kubernetes middleware providing GPU compute and memory isolation, flexible GPU slicing, and topology-aware scheduling maximizing utilization for AI inference and training workloads across NVIDIA GPUs and AWS Neuron devices; part of the CNCF ecosystem. Works with NVIDIA GPU Operator, vLLM Production Stack, and Xinference.

View product

Customer reviews

Leave a review

Ratings and reviews

Info

0 ratings

5 star

4 star

3 star

2 star

1 star

0 reviews

No customer reviews yet

Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.