Listing Thumbnail

    Intel® AI for Enterprise Inference - Qwen3-14B

     Info
    Sold by: Intel 
    Deployed on AWS
    This deployment package enables seamless hosting of the Qwen/Qwen3-14B language model on Intel® Xeon® processors using the VLLM CPU-optimized Docker image. Designed for efficient inference on CPU-only environments, this solution leverages vLLM lightweight architecture to deliver fast and scalable performance without requiring GPU acceleration. Ideal for enterprise-grade NLP tasks, it offers a cost-effective and accessible way to run large language models on Intel-powered infrastructure.

    Overview

    This solution enables high-performance deployment of the Qwen/Qwen3-14B model-on Intel® Xeon® 6 processors using a vLLM CPU-optimized Docker image. Qwen3 is the latest generation in the Qwen LLM series, featuring both dense and Mixture-of-Experts (MoE) models. It introduces seamless switching between reasoning-intensive and general-purpose dialogue modes, significantly improving performance in math, coding, and logical tasks. Qwen3 also excels in human alignment, multilingual support (100+ languages), and agent-based tool integration, making it one of the most versatile open-source models available.

    The deployment leverages vLLM, a high-throughput inference engine optimized for CPU environments. VLLM uses PagedAttention, Tensor Parallelism, and PyTorch 2.0 to deliver efficient memory usage and low-latency inference. The Docker image is tuned for Intel® Xeon® 6 processors, which feature advanced architectural enhancements including Efficient-cores (E-cores) and Performance-cores (P-cores), support for Intel® Advanced Matrix Extensions (Intel® AMX), and Intel® Deep Learning Boost (DL Boost). These features accelerate AI workloads and enable scalable deployment of LLMs in cloud, edge, and enterprise environments.

    This containerized solution provides a plug-and-play experience for deploying Qwen3-14B on CPU-only infrastructure, eliminating the need for GPUs while maintaining competitive performance. It supports RESTful APIs, batch inference, and integration into existing ML pipelines, making it ideal for developers, researchers, and enterprises seeking cost-effective, scalable, and production-ready LLM deployment.

    Highlights

    • Run Qwen3-14B on Intel® Xeon® 6: Deploy Hugging Face instruction-tuned LLM efficiently on CPU-only infrastructure using Intel® AMX and DL Boost.
    • vLLM-Powered CPU Inference: Use vLLM with PyTorch 2.0 and PagedAttention for fast, scalable inference - no GPU required.

    Details

    Sold by

    Delivery method

    Delivery option
    Production-grade LLM inference service via CloudFormation.

    Latest version

    Operating system
    Ubuntu 22.04

    Deployed on AWS

    Unlock automation with AI agent solutions

    Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.
    AI Agents

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Intel® AI for Enterprise Inference - Qwen3-14B

     Info
    Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    Usage costs (12)

     Info
    Dimension
    Cost/hour
    r8i.metal-96xl
    $0.00
    r8i.96xlarge
    $0.00
    r8i.metal-48xl
    $0.00
    r8i.12xlarge
    $0.00
    r8i-flex.12xlarge
    $0.00
    r8i.32xlarge
    $0.00
    r8i.48xlarge
    $0.00
    r8i.16xlarge
    $0.00
    r8i-flex.16xlarge
    $0.00
    r8i-flex.8xlarge
    $0.00

    Vendor refund policy

    NA

    How can we make this page better?

    We'd like to hear your feedback and ideas on how to improve this page.
    We'd like to hear your feedback and ideas on how to improve this page.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    Production-grade LLM inference service via CloudFormation.

    N.A

    CloudFormation Template (CFT)

    AWS CloudFormation templates are JSON or YAML-formatted text files that simplify provisioning and management on AWS. The templates describe the service or application architecture you want to deploy, and AWS CloudFormation uses those templates to provision and configure the required services (such as Amazon EC2 instances or Amazon RDS DB instances). The deployed application and associated resources are called a "stack."

    Version release notes

    New Features:

    1. Model Deployment: Integrated support for deploying the Qwen/Qwen3-14B from Hugging Face, the latest generation in the Qwen LLM series, featuring both dense and Mixture-of-Experts (MoE) models.

    2. Intel® Xeon® 6 Optimization: Enhanced performance on Intel® Xeon® 6 processors using Intel® AMX, DL Boost, and AVX-512 for accelerated CPU inference.

    3. vLLM Inference Engine: Utilizes vLLM with PyTorch 2.0, PagedAttention, and Tensor Parallelism for efficient memory usage and low-latency inference.

    4. Containerized Setup: Docker-based deployment with REST API support for easy integration into existing ML workflows and backend services.

    Additional details

    Usage instructions

    This product uses an AWS CloudFormation template to deploy the Qwen/Qwen3-14B model on an EC2 instance using a VLLM CPU-optimized Docker image. Follow the steps below to ensure a successful setup:

    1. Pre-requisites: Before launching the CloudFormation stack, ensure the following resources are available in your AWS account:

    1a. Subnet ID and Security Group ID: Required for provisioning the EC2 instance within your VPC. Ensure the Security Group has appropriate inbound rules configured to allow traffic on port 8000 (TCP) from your IP or trusted sources. This is necessary to access the model endpoint. 1b. Hugging Face Access Token: Required to authenticate and pull the model from Hugging Face Hub. You can generate a token from your Hugging Face account at https://huggingface.co/settings/tokens .

    1. Launch the CloudFormation Stack: Subscribe to the product via AWS Marketplace and proceed to launch the CloudFormation template. Enter the required parameters: SubnetId, SecurityGroupId, HuggingFaceToken. Click Submit to deploy the stack.

    2. Access the Model Endpoint: Once the CloudFormation stack reaches the CREATE_COMPLETE state, navigate to the EC2 Console, locate the instance created by the stack, and copy its Public IP address. The model server will be accessible on port 8000. As the template pulls the vLLM CPU-optimized Docker image and loads the model, the inference service may take a few minutes to fully initialize - please allow some time before sending requests.

    3. Query the Model: You can interact with the model using a simple HTTP POST request.

    Example using curl:

    $ curl -X POST "http://<EC2_PUBLIC_IP>:8000/v1/chat/completions"
    -H "Content-Type: application/json"
    --data '{ "model": "Qwen/Qwen3-14B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'

    Note: Replace <EC2_PUBLIC_IP> with the actual public IP of your EC2 instance.

    Support

    Vendor support

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    0 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    0%
    0%
    0%
    0%
    0 AWS reviews
    No customer reviews yet
    Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.