Listing Thumbnail

    Open WebUI, vLLM, Ollama: Secure AI Inference Sandbox & GPU Optimized

     Info
    Deployed on AWS
    Deploy a fully secure, enterprise-grade Private AI environment in 5 minutes. Pre-configured with Open WebUI, vLLM, and Ollama on a hardened Ubuntu 22.04 LTS base. Optimized for AWS NVIDIA GPU instances (G6, G5, P4) to deliver high-throughput local inference with 100% data privacy. Perfect for running Llama 3, Qwen, and Mistral models securely within your own VPC without third-party data leaks.

    Overview

    Deploy a production-ready, security-hardened Private AI Sandbox on Amazon EC2 GPU instances. Built on Ubuntu 22.04 LTS with a CIS-oriented baseline, this AMI eliminates hours of manual work configuring NVIDIA drivers, CUDA 12.4, Docker, NVIDIA Container Toolkit, Nginx, and Open WebUI.

    WHAT YOU GET

    CoreNova Enterprise Secure AI Sandbox ships with a decoupled, secure three-layer architecture:

    • 1. Open WebUI (Default User Interface) Browser-based chat, RAG, and administrative controls. Served securely over HTTPS on port 443 via Nginx. (Note: Due to the private environment setup, a self-signed TLS certificate is used on first boot. It is perfectly safe to bypass the browser's privacy warning to proceed; you can replace this with your own public SSL certificate for production).
    • 2. Ollama (DEFAULT Inference Engine) GPU-accelerated GGUF model serving. This acts as your default out-of-the-box engine for chat. It follows a strict Bring Your Own Model (BYOM) approach, allowing you to pull any open-source or custom models directly via the WebUI or CLI without any pre-bloated weights. Ollama listens on localhost only and is never exposed to the public internet.
    • 3. vLLM (OPTIONAL Second Engine) A high-throughput, OpenAI-compatible API listening on localhost port 8000. It is NOT started by default. Enable it manually when you need to deploy HuggingFace transformers or require higher serving throughput. Multi-GPU instances will auto-configure the tensor parallel size.

    FIRST BOOT (Typical 5 to 10 minutes on g4dn.xlarge)

    After launching the instance, please allow 5 to 10 minutes for initialization before attempting to log in. The instance will automatically:

    • Configure the Docker NVIDIA container runtime.
    • Initialize the private inference engine environment.
    • Seed the secure administrator account and start Open WebUI behind the Nginx reverse proxy.

    Access URL: https://YOUR_PUBLIC_IP/ (Direct HTTPS access, no port 3000 exposure)

    First Login Credentials (Public signup is strictly disabled):

    • Email: <admin@local.host>
    • Password: Your unique EC2 Instance ID (e.g., i-0abcdef123456789f)

    > CRITICAL SECURITY NOTE: Please change your administrator password immediately after your first successful login.

    STORAGE (Recommended Expansion)

    To accommodate large LLM weights, attach an additional gp3 EBS volume (100 GB or larger) during launch. On the first boot, our background script will automatically detect, partition, and mount any blank/unformatted secondary volume to /mnt/models. (Note: To protect your data assets, existing formatted volumes containing data will not be modified or overwritten. Ollama models will persist under /mnt/models/ollama).

    SECURITY AND PRIVACY

    • Network Isolation: HTTPS 443 for WebUI access; Ollama and vLLM are bound strictly to localhost to narrow the attack surface.
    • Access Control: Admin seeded securely from your EC2 Instance ID; ENABLE_SIGNUP=false prevents rogue registrations.
    • OS-Level Hardening: Pre-configured with UFW firewall, fail2ban, auditd, and unattended security upgrades.
    • 100% Data Privacy: 100% on-instance inference. Your prompts, embeddings, and uploaded documents stay strictly within your own VPC.
    • Audit Readiness: Includes a pre-configured CloudWatch Agent configuration template for Nginx audit logs (requires attaching an appropriate IAM role to the EC2 instance).

    REQUIREMENTS

    • Amazon EC2 GPU Instance: Compatible with g4dn, g5, g6, p3, p4d, or p5 families.
    • Security Group Rules: Inbound TCP 443 for WebUI users, TCP 22 for administrative SSH.
    • EC2 Key Pair: Required for SSH access (password-based SSH login is strictly disabled).

    OPTIONAL vLLM SWITCHING & VRAM SAFETY

    On single-GPU instances, running Ollama and vLLM simultaneously will cause a VRAM Out-of-Memory (OOM) crash. To safely switch from the default Ollama engine to the high-throughput vLLM engine, you MUST stop Ollama first to free up GPU memory:

    To safely switch to vLLM:

    sudo systemctl stop ollama

    sudo systemctl start corenova-ai-vllm

    To safely switch back to Ollama:

    sudo systemctl stop corenova-ai-vllm

    sudo systemctl start ollama


    Part of the CoreNova Hardened AMI series. Technical Support & Inquiries: CoreNovaLabs@aipalnet.cn 

    Highlights

    • All-in-One AI Stack: Pre-integrated Open WebUI, vLLM, and Ollama for instant local LLM deployment.
    • Anti-Hijack Security Lock: Public registration is disabled; your unique AWS Instance ID is the default admin key.
    • Auto Elastic Multi-GPU Tuning: Native CUDA optimization with automatic tensor parallelism for multi-card scaling.

    Details

    Delivery method

    Delivery option
    64-bit (x86) Amazon Machine Image (AMI)

    Latest version

    Operating system
    Ubuntu 22.04 LTS

    Deployed on AWS
    New

    Introducing multi-product solutions

    You can now purchase comprehensive solutions tailored to use cases and industries.

    Multi-product solutions

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Open WebUI, vLLM, Ollama: Secure AI Inference Sandbox & GPU Optimized

     Info
    Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    Usage costs (7)

     Info
    Dimension
    Cost/hour
    g6.xlarge
    Recommended
    $0.25
    g5.12xlarge
    $0.25
    g6.48xlarge
    $0.25
    p5.48xlarge
    $0.25
    g5.xlarge
    $0.25
    g6.12xlarge
    $0.25
    g4dn.xlarge
    $0.25

    Vendor refund policy

    30-day refund on AWS Marketplace software fees for this product. Email CoreNovaLabs@aipalnet.cn  with your AWS account ID and purchase date. Software fees only; Amazon EC2 charges are not refundable by the seller. We reply within 5 business days. Free trial: no software charges during the 30-day trial.

    How can we make this page better?

    Tell us how we can improve this page, or report an issue with this product.
    Tell us how we can improve this page, or report an issue with this product.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    64-bit (x86) Amazon Machine Image (AMI)

    Amazon Machine Image (AMI)

    An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.

    Version release notes

    Initial release - Enterprise Secure AI Sandbox on Ubuntu 22.04 LTS (HVM).

    Default chat engine: Ollama (GPU-accelerated). Open WebUI is preconfigured to use Ollama on first boot. A small default model is downloaded automatically so you can chat without manual setup.

    Optional second engine: vLLM (OpenAI-compatible API on port 8000). vLLM is NOT started by default. Enable it manually when you need higher throughput or HuggingFace models. On single-GPU instances, do not run Ollama and vLLM at the same time (VRAM limit).

    Stack:

    • Open WebUI (HTTPS 443 via Nginx, self-signed TLS on first boot)
    • Ollama (default, localhost only, models stored under /mnt/models/ollama)
    • vLLM (optional profile, default model Qwen when enabled)
    • CIS-oriented hardened baseline (UFW, fail2ban, auditd, unattended-upgrades)
    • NVIDIA Driver 550 + CUDA 12.4 + Docker + NVIDIA Container Toolkit
    • Admin bootstrap: admin@local.host , password = EC2 Instance ID, public signup disabled
    • Optional EBS data volume auto-mount to /mnt/models
    • CloudWatch Agent config template for audit logs

    First boot typically takes 5 to 10 minutes on a GPU instance (download Ollama container, pull default model, seed admin, start WebUI). See Usage instructions for the step-by-step timeline.

    AMI: ami-0f54023ca05f20159 (us-east-1)

    Additional details

    Usage instructions

    OVERVIEW

    • Default Engine: Ollama. Open WebUI communicates with Ollama locally on the instance.
    • Clean Workspace (BYOM): Strict "Bring Your Own Model" policy. No pre-downloaded weights included. Pull models via WebUI or CLI when ready.
    • vLLM Engine: Optional and turned OFF by default. Use it for OpenAI-compatible API endpoints or heavy HuggingFace models.

    LAUNCH

    1. Subscribe to this product in AWS Marketplace, then click Launch in your target Region.
    2. Instance Type: MUST be an Amazon EC2 GPU instance family (e.g., g4dn, g5, g6, p3, p4, p5). Non-GPU instances will fail.
    3. Storage (Recommended): Attach an additional gp3 EBS volume (100 GB+). On first boot, it safely auto-mounts blank/unformatted secondary volumes to /mnt/models. Existing formatted data volumes will not be modified.
    4. Key Pair: Select your EC2 SSH key pair. Password-based SSH login is strictly disabled.
    5. Security Group: Allow inbound TCP 443 for WebUI users, and inbound TCP 22 strictly from your administrator IP.

    FIRST BOOT TIMELINE (Typical g4dn.xlarge)

    Please allow 5 to 7 minutes after the instance reaches "running" state before logging in.

    • Minute 0-2: SSH becomes available, NVIDIA driver loads, and Docker GPU runtime configures.
    • Minute 2-5: Local Ollama container service launches and prepares the inference environment.
    • Minute 5-7: Open WebUI initializes, admin account is seeded, and Nginx serves HTTPS on port 443.

    If the dashboard fails to load via https://YOUR_PUBLIC_IP/ after 10 minutes, SSH in and run these health checks:

    sudo systemctl status corenova-docker-gpu corenova-bootstrap-admin nginx

    sudo docker ps

    curl -s http://127.0.0.1:11434/api/tags

    WEBUI LOGIN

    1. Open WebUI: Go to https://YOUR_PUBLIC_IP/. Note: The system uses a self-signed TLS certificate. It is safe to click "Advanced" and bypass the browser privacy warning. Replace with your own public SSL certificate later.
    2. First Login (Public signup disabled):
    • Email: <admin@local.host>
    • Password: Your unique EC2 Instance ID (e.g., i-0abcdef123456789f)

    Change your administrator password immediately after your first successful login.

    1. Pulling Models: In WebUI, go to Settings -> Models, enter a model identifier (e.g., llama3.2 or qwen2.5), and click Download. Select it from the top dropdown to chat.

    SSH VERIFICATION

    1. To inspect the infrastructure directly, connect via terminal:

    ssh -i your-key.pem ubuntu@YOUR_PUBLIC_IP

    nvidia-smi

    sudo docker exec corenova-ollama ollama list

    sudo systemctl status corenova-ai-stack nginx

    PULLING MODELS VIA CLI (Optional)

    1. Pull GGUF models directly via command line:

    sudo docker exec corenova-ollama ollama pull <model_name>

    Models persist under /mnt/models/ollama if the secondary data volume is attached.

    OPTIONAL vLLM ENGINE (Advanced)

    1. vLLM is stopped by default. Running Ollama and vLLM simultaneously on single-GPU instances will cause a VRAM Out-of-Memory (OOM) crash.

    To safely switch to vLLM, run these commands sequentially:

    sudo systemctl stop ollama

    sudo systemctl start corenova-ai-vllm

    To switch back to Ollama, run these commands sequentially:

    sudo systemctl stop corenova-ai-vllm

    sudo systemctl start ollama

    When vLLM is running, Open WebUI can interface with it at: http://127.0.0.1:8000/v1

    CLOUDWATCH AUDIT LOGS

    1. To stream Nginx audit records, attach an IAM instance profile with CloudWatch Logs write permissions to this instance.

    SUPPORT AND CONTACT

    Support Email: CoreNovaLabs@aipalnet.cn 

    Support Website: https://aipalnet.cn/en/ 

    When contacting support, please include your AWS Region, AMI ID, EC2 Instance ID, Instance Type, nvidia-smi output, and steps to reproduce the issue.

    Support

    Vendor support

    Email: CoreNovaLabs@aipalnet.cn  Web: https://aipalnet.cn 

    [Support & Refund Policy]

    • Support Hours: Email support on business days (5x8).
    • Refund Policy: 30-day software-fee refunds per AWS Marketplace Standard Contract (SCMP).

    [Support Scope]

    • What is Covered: Technical guidance and documentation for launching, configuring, and verifying this AMI (including SSH access, baseline container services, and local firewall setup).
    • What is NOT Covered: We do not provide 24/7 managed production operations, custom LLM troubleshooting, or application-level code support unless separately agreed.

    [When Contacting Support] To help us diagnose your issue faster, please include the following in your email:

    1. AWS Region & AMI ID
    2. EC2 Instance ID & Product Version
    3. A clear description of the issue along with relevant system or Docker logs.

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    0 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    0%
    0%
    0%
    0%
    0 reviews
    No customer reviews yet
    Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.