- Version Inference-2023.12.26-NVIDIA-535.104-CUDA12.2.2-LLAMA.CPP-Ubu22
- By NI SP - High-End Remote Desktop and HPC
Starting from $0.06 to $0.56/hr for software + AWS usage fees
The Inference server offers the full infrastructure to run fast inference on GPUs. It includes llama.cpp inference, latest CUDA and NVIDIA Docker container toolkit. Leverage the multitude of models freely available to run inference with 8 bit or lower quantized models which makes inference...
Linux/Unix, Ubuntu 22 - 64-bit Amazon Machine Image (AMI)