Overview
Hugging Face Generative AI Microservices (HUGS Inference) empowers you to rapidly deploy and scale open-source generative AI models with zero configuration. Leveraging optimized inference engines for leading hardware like NVIDIA GPUs, AMD GPUs, Intel GPUs, AWS Inferentia, Habana Gaudi, and Google TPUs, HUGS delivers unparalleled performance. Seamlessly integrate models via the industry-standard OpenAI API, simplifying development with tools like LangChain and LlamaIndex. Focus on building cutting-edge applications, not complex deployments. Benefit from enterprise-grade security, control, and compliance features, including SLAs and SOC2.
HUGS Inference supports a diverse range of popular open-source LLMs, multimodal, and embedding models, including Meta-Llama, Mistral, Qwen, Gemma, and more. Choose between optimized container versions (turbo and light) to balance performance and resource requirements. Deploy pre-configured microservices tailored to your hardware, eliminating manual setup and maximizing efficiency. With HUGS, go from concept to production in minutes, not weeks.
Keywords: Generative AI, Inference, Microservices, Open-Source Models, LLMs, Multimodal Models, Embedding Models, Zero-Configuration, Optimized Inference, NVIDIA GPUs, AMD GPUs, Intel GPUs, AWS Inferentia, Habana Gaudi, Google TPUs, OpenAI API, LangChain, LlamaIndex, Kubernetes, Scalability, Security, Compliance, SLA, SOC2, Enterprise-Ready, Hugging Face, Text Generation Inference, Transformers, Meta-Llama, Mistral, Qwen, Gemma.
Highlights
- Optimized to run open LLMs on NVIDIA GPU and AWS Accelerators with Inferentia and Trainium
- powered by Hugging Face Open Source Technologies, like Text Generation Inference
- Build for Enterprise on standardized on the OpenAI API, enabling companies to switch from closed models to open models with a single configuration change
Details
Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/unit/hour |
---|---|---|
Hours | Container Hours | $1.00 |
Vendor refund policy
If you are not satisfied with your purchase, you may request a refund by providing a detailed explanation and contacting us at api-enterprise@huggingface.co .
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
open LLMs for NVIDIA GPUs
- Amazon EKS
- Amazon ECS
- Amazon ECS Anywhere
- Amazon EKS Anywhere
Container image
Containers are lightweight, portable execution environments that wrap server application software in a filesystem that includes everything it needs to run. Container applications run on supported container runtimes and orchestration services, such as Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). Both eliminate the need for you to install and operate your own container orchestration software by managing and scheduling containers on a scalable cluster of virtual machines.
Version release notes
Initial Release. Supporting HUGS Container for NVIDIA GPUs, including Meta Llama, Google Gemma, Mistral, Qwen and more.
Additional details
Usage instructions
Instructions how to deploy container using AWS EKS and our helm chart https://github.com/huggingface/hugs-helm-chart/blob/0.0.1/aws/README.md
Support
Vendor support
Please contact api-enterprise@huggingface.co for support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.