Overview
Overview
Enterprise Private LLM Server with Ollama is a preconfigured Amazon Linux 2023 environment designed to accelerate Generative AI, machine learning, and data science workloads on AWS. The solution combines Ollama for local Large Language Model (LLM) deployment, GPU acceleration through NVIDIA CUDA, modern development tools, and industry-leading AI frameworks in a single ready-to-use platform.
Organizations can rapidly build, test, and deploy private AI assistants, Retrieval-Augmented Generation (RAG) applications, document intelligence solutions, and machine learning models without spending time configuring operating systems, drivers, development environments, and AI dependencies.
The platform is optimized for AWS deployments and provides a secure, scalable environment for developers, researchers, data scientists, and enterprise AI teams.
Key Features
Private LLM Deployment with Ollama Run open-source Large Language Models directly within your AWS environment using Ollama, enabling private AI workloads while maintaining control over data and infrastructure.
GPU-Accelerated AI Environment Preconfigured NVIDIA Driver, CUDA Toolkit, and cuDNN support accelerated model inference, fine-tuning, and machine learning workloads on AWS GPU instances.
Generative AI Development Stack Includes Ollama, Hugging Face Transformers, LangChain, LlamaIndex, FAISS, LoRA/PEFT, FastAPI, MLflow, MONAI, and Weights & Biases for developing modern AI applications.
Retrieval-Augmented Generation (RAG) Ready Build enterprise search, document intelligence, knowledge management, and conversational AI solutions using vector search and retrieval frameworks.
Development and Collaboration Tools Includes Visual Studio Code, PyCharm Community Edition, RStudio Desktop, Jupyter Notebook, and JupyterLab for end-to-end AI development.
Containerized AI Workloads Docker and Docker Compose are preconfigured for deploying scalable AI applications and microservices.
AWS-Native Integration Compatible with Amazon Bedrock, Amazon SageMaker, Amazon OpenSearch Service, Amazon S3, and other AWS services commonly used in AI and machine learning architectures.
Secure Remote Access Amazon NICE DCV provides high-performance remote desktop access for development, experimentation, and visualization workloads.
Technical Details
Operating System Amazon Linux 2023
Remote Access Amazon NICE DCV
Browsers and Utilities Google Chrome Git AWS CLI 7-Zip
Programming Languages Python 3.x R
Development Tools Visual Studio Code PyCharm Community Edition RStudio Desktop
Notebook Environments Jupyter Notebook JupyterLab
Containerization Docker Docker Compose
AI and Generative AI Frameworks Ollama Hugging Face Transformers LangChain LlamaIndex FAISS LoRA/PEFT FastAPI MLflow MONAI Weights & Biases
Machine Learning and Data Science Libraries PyTorch TensorFlow Scikit-learn PySpark Dask Vowpal Wabbit
Productivity Tools LibreOffice
GPU Software Stack NVIDIA Driver CUDA Toolkit cuDNN
Highlights
- Private LLM Deployment with Ollama Run open-source Large Language Models securely within your AWS environment using Ollama, enabling private AI workloads without relying on external AI services.
- 2. GPU-Ready Generative AI Platform Preconfigured with NVIDIA Drivers, CUDA Toolkit, cuDNN, and leading AI frameworks to accelerate model inference, experimentation, and AI application development.
- 3. Build AI Applications in Minutes Includes JupyterLab, VS Code, Docker, LangChain, LlamaIndex, FAISS, and machine learning frameworks, allowing teams to rapidly develop RAG solutions, AI assistants, and data science workloads.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Vendor refund policy
NA
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
NA
Additional details
Usage instructions
Quick Usage Summary
-
Subscribe to the AWS Marketplace product and launch an EC2 instance.
-
For optimal AI and LLM performance, use a GPU-enabled instance such as: g4dn.xlarge g4dn.2xlarge g4dn.4xlarge g5.xlarge g5.2xlarge or larger.
-
Configure a root EBS volume of at least 100 GB. A minimum of 200 GB is recommended for AI models, datasets, and development workloads.
-
Connect to the instance using Amazon NICE DCV or SSH.
-
Verify the installed AI environment:
python3 --version docker --version ollama --version
Connect via NICE DCV
- Open a browser and navigate to:
https://PUBLIC_DNS_NAME:8443
-
Log in using your configured Linux user credentials.
-
Access the Amazon Linux desktop environment.
-
Launch Visual Studio Code, JupyterLab, RStudio Desktop, Google Chrome, or Terminal from the desktop.
Note: Ensure TCP port 8443 is allowed in the EC2 Security Group.
Using Ollama
-
Open a terminal window.
-
Verify Ollama installation:
ollama --version
- List available models:
ollama list
- Run a model:
ollama run llama3
- Download additional models:
ollama pull mistral
ollama pull qwen3
ollama pull deepseek-r1
- Access the Ollama API endpoint locally:
Using JupyterLab
-
Launch JupyterLab from the desktop menu.
-
Create a new notebook.
-
Import and use preinstalled AI and machine learning libraries.
-
Develop AI applications, machine learning workflows, and data science projects.
Using Development Tools
-
Launch Visual Studio Code, PyCharm Community Edition, or RStudio Desktop.
-
Create or open existing projects.
-
Build AI assistants, RAG applications, machine learning models, APIs, and analytics solutions.
Using Docker
- Verify Docker installation:
docker --version
- Start containerized applications:
docker compose up -d
- Deploy scalable AI services and development environments.
Preinstalled AI Frameworks
Ollama Hugging Face Transformers LangChain LlamaIndex FAISS PyTorch TensorFlow FastAPI MLflow MONAI LoRA/PEFT
AWS Service Integration
This environment can be integrated with:
Amazon Bedrock Amazon SageMaker Amazon OpenSearch Service Amazon S3 AWS IAM
These services can be used together with Ollama and the preinstalled AI frameworks to build enterprise-grade Generative AI solutions on AWS.
Support
Vendor support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.