Overview
Deploy a production-ready, security-hardened Private AI Sandbox on Amazon EC2 GPU instances. Built on Ubuntu 22.04 LTS with a CIS-oriented baseline, this AMI eliminates hours of manual work configuring NVIDIA drivers, CUDA 12.4, Docker, NVIDIA Container Toolkit, Nginx, and Open WebUI.
WHAT YOU GET
CoreNova Enterprise Secure AI Sandbox ships with a decoupled, secure three-layer architecture:
- 1. Open WebUI (Default User Interface) Browser-based chat, RAG, and administrative controls. Served securely over HTTPS on port 443 via Nginx. (Note: Due to the private environment setup, a self-signed TLS certificate is used on first boot. It is perfectly safe to bypass the browser's privacy warning to proceed; you can replace this with your own public SSL certificate for production).
- 2. Ollama (DEFAULT Inference Engine) GPU-accelerated GGUF model serving. This acts as your default out-of-the-box engine for chat. It follows a strict Bring Your Own Model (BYOM) approach, allowing you to pull any open-source or custom models directly via the WebUI or CLI without any pre-bloated weights. Ollama listens on localhost only and is never exposed to the public internet.
- 3. vLLM (OPTIONAL Second Engine) A high-throughput, OpenAI-compatible API listening on localhost port 8000. It is NOT started by default. Enable it manually when you need to deploy HuggingFace transformers or require higher serving throughput. Multi-GPU instances will auto-configure the tensor parallel size.
FIRST BOOT (Typical 5 to 10 minutes on g4dn.xlarge)
After launching the instance, please allow 5 to 10 minutes for initialization before attempting to log in. The instance will automatically:
- Configure the Docker NVIDIA container runtime.
- Initialize the private inference engine environment.
- Seed the secure administrator account and start Open WebUI behind the Nginx reverse proxy.
Access URL: https://YOUR_PUBLIC_IP/ (Direct HTTPS access, no port 3000 exposure)
First Login Credentials (Public signup is strictly disabled):
- Email: <admin@local.host>
- Password: Your unique EC2 Instance ID (e.g., i-0abcdef123456789f)
> CRITICAL SECURITY NOTE: Please change your administrator password immediately after your first successful login.
STORAGE (Recommended Expansion)
To accommodate large LLM weights, attach an additional gp3 EBS volume (100 GB or larger) during launch. On the first boot, our background script will automatically detect, partition, and mount any blank/unformatted secondary volume to /mnt/models. (Note: To protect your data assets, existing formatted volumes containing data will not be modified or overwritten. Ollama models will persist under /mnt/models/ollama).
SECURITY AND PRIVACY
- Network Isolation: HTTPS 443 for WebUI access; Ollama and vLLM are bound strictly to localhost to narrow the attack surface.
- Access Control: Admin seeded securely from your EC2 Instance ID; ENABLE_SIGNUP=false prevents rogue registrations.
- OS-Level Hardening: Pre-configured with UFW firewall, fail2ban, auditd, and unattended security upgrades.
- 100% Data Privacy: 100% on-instance inference. Your prompts, embeddings, and uploaded documents stay strictly within your own VPC.
- Audit Readiness: Includes a pre-configured CloudWatch Agent configuration template for Nginx audit logs (requires attaching an appropriate IAM role to the EC2 instance).
REQUIREMENTS
- Amazon EC2 GPU Instance: Compatible with g4dn, g5, g6, p3, p4d, or p5 families.
- Security Group Rules: Inbound TCP 443 for WebUI users, TCP 22 for administrative SSH.
- EC2 Key Pair: Required for SSH access (password-based SSH login is strictly disabled).
OPTIONAL vLLM SWITCHING & VRAM SAFETY
On single-GPU instances, running Ollama and vLLM simultaneously will cause a VRAM Out-of-Memory (OOM) crash. To safely switch from the default Ollama engine to the high-throughput vLLM engine, you MUST stop Ollama first to free up GPU memory:
To safely switch to vLLM:
sudo systemctl stop ollama
sudo systemctl start corenova-ai-vllm
To safely switch back to Ollama:
sudo systemctl stop corenova-ai-vllm
sudo systemctl start ollama
Part of the CoreNova Hardened AMI series. Technical Support & Inquiries: CoreNovaLabs@aipalnet.cn
Highlights
- All-in-One AI Stack: Pre-integrated Open WebUI, vLLM, and Ollama for instant local LLM deployment.
- Anti-Hijack Security Lock: Public registration is disabled; your unique AWS Instance ID is the default admin key.
- Auto Elastic Multi-GPU Tuning: Native CUDA optimization with automatic tensor parallelism for multi-card scaling.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Cost/hour |
|---|---|
g6.xlarge Recommended | $0.25 |
g5.12xlarge | $0.25 |
g6.48xlarge | $0.25 |
p5.48xlarge | $0.25 |
g5.xlarge | $0.25 |
g6.12xlarge | $0.25 |
g4dn.xlarge | $0.25 |
Vendor refund policy
30-day refund on AWS Marketplace software fees for this product. Email CoreNovaLabs@aipalnet.cn with your AWS account ID and purchase date. Software fees only; Amazon EC2 charges are not refundable by the seller. We reply within 5 business days. Free trial: no software charges during the 30-day trial.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
Initial release - Enterprise Secure AI Sandbox on Ubuntu 22.04 LTS (HVM).
Default chat engine: Ollama (GPU-accelerated). Open WebUI is preconfigured to use Ollama on first boot. A small default model is downloaded automatically so you can chat without manual setup.
Optional second engine: vLLM (OpenAI-compatible API on port 8000). vLLM is NOT started by default. Enable it manually when you need higher throughput or HuggingFace models. On single-GPU instances, do not run Ollama and vLLM at the same time (VRAM limit).
Stack:
- Open WebUI (HTTPS 443 via Nginx, self-signed TLS on first boot)
- Ollama (default, localhost only, models stored under /mnt/models/ollama)
- vLLM (optional profile, default model Qwen when enabled)
- CIS-oriented hardened baseline (UFW, fail2ban, auditd, unattended-upgrades)
- NVIDIA Driver 550 + CUDA 12.4 + Docker + NVIDIA Container Toolkit
- Admin bootstrap: admin@local.host , password = EC2 Instance ID, public signup disabled
- Optional EBS data volume auto-mount to /mnt/models
- CloudWatch Agent config template for audit logs
First boot typically takes 5 to 10 minutes on a GPU instance (download Ollama container, pull default model, seed admin, start WebUI). See Usage instructions for the step-by-step timeline.
AMI: ami-0f54023ca05f20159 (us-east-1)
Additional details
Usage instructions
OVERVIEW
- Default Engine: Ollama. Open WebUI communicates with Ollama locally on the instance.
- Clean Workspace (BYOM): Strict "Bring Your Own Model" policy. No pre-downloaded weights included. Pull models via WebUI or CLI when ready.
- vLLM Engine: Optional and turned OFF by default. Use it for OpenAI-compatible API endpoints or heavy HuggingFace models.
LAUNCH
- Subscribe to this product in AWS Marketplace, then click Launch in your target Region.
- Instance Type: MUST be an Amazon EC2 GPU instance family (e.g., g4dn, g5, g6, p3, p4, p5). Non-GPU instances will fail.
- Storage (Recommended): Attach an additional gp3 EBS volume (100 GB+). On first boot, it safely auto-mounts blank/unformatted secondary volumes to /mnt/models. Existing formatted data volumes will not be modified.
- Key Pair: Select your EC2 SSH key pair. Password-based SSH login is strictly disabled.
- Security Group: Allow inbound TCP 443 for WebUI users, and inbound TCP 22 strictly from your administrator IP.
FIRST BOOT TIMELINE (Typical g4dn.xlarge)
Please allow 5 to 7 minutes after the instance reaches "running" state before logging in.
- Minute 0-2: SSH becomes available, NVIDIA driver loads, and Docker GPU runtime configures.
- Minute 2-5: Local Ollama container service launches and prepares the inference environment.
- Minute 5-7: Open WebUI initializes, admin account is seeded, and Nginx serves HTTPS on port 443.
If the dashboard fails to load via https://YOUR_PUBLIC_IP/ after 10 minutes, SSH in and run these health checks:
sudo systemctl status corenova-docker-gpu corenova-bootstrap-admin nginx
sudo docker ps
curl -s http://127.0.0.1:11434/api/tags
WEBUI LOGIN
- Open WebUI: Go to https://YOUR_PUBLIC_IP/. Note: The system uses a self-signed TLS certificate. It is safe to click "Advanced" and bypass the browser privacy warning. Replace with your own public SSL certificate later.
- First Login (Public signup disabled):
- Email: <admin@local.host>
- Password: Your unique EC2 Instance ID (e.g., i-0abcdef123456789f)
Change your administrator password immediately after your first successful login.
- Pulling Models: In WebUI, go to Settings -> Models, enter a model identifier (e.g., llama3.2 or qwen2.5), and click Download. Select it from the top dropdown to chat.
SSH VERIFICATION
- To inspect the infrastructure directly, connect via terminal:
ssh -i your-key.pem ubuntu@YOUR_PUBLIC_IP
nvidia-smi
sudo docker exec corenova-ollama ollama list
sudo systemctl status corenova-ai-stack nginx
PULLING MODELS VIA CLI (Optional)
- Pull GGUF models directly via command line:
sudo docker exec corenova-ollama ollama pull <model_name>
Models persist under /mnt/models/ollama if the secondary data volume is attached.
OPTIONAL vLLM ENGINE (Advanced)
- vLLM is stopped by default. Running Ollama and vLLM simultaneously on single-GPU instances will cause a VRAM Out-of-Memory (OOM) crash.
To safely switch to vLLM, run these commands sequentially:
sudo systemctl stop ollama
sudo systemctl start corenova-ai-vllm
To switch back to Ollama, run these commands sequentially:
sudo systemctl stop corenova-ai-vllm
sudo systemctl start ollama
When vLLM is running, Open WebUI can interface with it at: http://127.0.0.1:8000/v1
CLOUDWATCH AUDIT LOGS
- To stream Nginx audit records, attach an IAM instance profile with CloudWatch Logs write permissions to this instance.
SUPPORT AND CONTACT
Support Email: CoreNovaLabs@aipalnet.cn
Support Website: https://aipalnet.cn/en/
When contacting support, please include your AWS Region, AMI ID, EC2 Instance ID, Instance Type, nvidia-smi output, and steps to reproduce the issue.
Support
Vendor support
Email: CoreNovaLabs@aipalnet.cn Web: https://aipalnet.cn
[Support & Refund Policy]
- Support Hours: Email support on business days (5x8).
- Refund Policy: 30-day software-fee refunds per AWS Marketplace Standard Contract (SCMP).
[Support Scope]
- What is Covered: Technical guidance and documentation for launching, configuring, and verifying this AMI (including SSH access, baseline container services, and local firewall setup).
- What is NOT Covered: We do not provide 24/7 managed production operations, custom LLM troubleshooting, or application-level code support unless separately agreed.
[When Contacting Support] To help us diagnose your issue faster, please include the following in your email:
- AWS Region & AMI ID
- EC2 Instance ID & Product Version
- A clear description of the issue along with relevant system or Docker logs.
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.