Overview
he model's reasoning capabilities can be configured through a flag in the chat template. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks.
The model employs a hybrid Mixture-of-Experts (MoE) architecture, consisting of 23 Mamba-2 and MoE layers, along with 6 Attention layers. Each MoE layer includes 128 experts plus 1 shared expert, with 5 experts activated per token. The model has 3.5B active parameters and 30B parameters in total.
The supported languages include: English, German, Spanish, French, Italian, and Japanese. Improved using Qwen.
This model is ready for commercial use.
What is Nemotron? NVIDIA Nemotron™ is a family of open models with open weights, training data, and recipes, delivering leading efficiency and accuracy for building specialized AI agents. To get started, you can use our quickstart guide below
Highlights
- Quick Start Guide: https://build.nvidia.com/nvidia/nemotron-3-nano-30b-a3b/modelcard#quick-start-guide
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.g5.12xlarge Inference (Batch) Recommended | Model inference on the ml.g5.12xlarge instance type, batch mode | $1.00 |
ml.g5.12xlarge Inference (Real-Time) Recommended | Model inference on the ml.g5.12xlarge instance type, real-time mode | $1.00 |
ml.g5.24xlarge Inference (Batch) | Model inference on the ml.g5.24xlarge instance type, batch mode | $1.00 |
ml.g5.48xlarge Inference (Batch) | Model inference on the ml.g5.48xlarge instance type, batch mode | $1.00 |
ml.g5.24xlarge Inference (Real-Time) | Model inference on the ml.g5.24xlarge instance type, real-time mode | $1.00 |
ml.g5.48xlarge Inference (Real-Time) | Model inference on the ml.g5.48xlarge instance type, real-time mode | $1.00 |
ml.g6e.12xlarge Inference (Real-Time) | Model inference on the ml.g6e.12xlarge instance type, real-time mode | $1.00 |
ml.g6e.24xlarge Inference (Real-Time) | Model inference on the ml.g6e.24xlarge instance type, real-time mode | $1.00 |
ml.g6e.48xlarge Inference (Real-Time) | Model inference on the ml.g6e.48xlarge instance type, real-time mode | $1.00 |
ml.p4d.24xlarge Inference (Real-Time) | Model inference on the ml.p4d.24xlarge instance type, real-time mode | $1.00 |
Vendor refund policy
no refund
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Additional details
Inputs
- Summary
Model Input summary Nemotron-3-Nano-30B-A3B is a unified model for both reasoning and non-reasoning tasks. It responds to user queries input and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be configured through a flag in the chat template. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks.
- Input MIME type
- application/json
Input data descriptions
The following table describes supported input data fields for real-time inference and batch transform.
Field name | Description | Constraints | Required |
|---|---|---|---|
model | The specific model name, e.g., "nvidia/nemotron-3-nano". | Type: String | Yes |
messages | List of messages comprising the conversation. Each object must have a role and content. | Type: Array of Objects | Yes |
messages[].role | The role of the message author (e.g., "system", "user", or "assistant"). | Type: String | Yes |
messages[].content | The actual text content of the message. | Type: String | Yes |
temperature | Controls randomness. Lower values (e.g., 0.2) make output more deterministic. | Type: Float | No |
max_tokens | The maximum number of tokens to generate in the response. | Type: Integer | No |
stream | Set to True for streaming chunks or False for a single response. | Type: Boolean | No |
chat_template_kwargs | Additional parameters for the chat template, such as {"enable_thinking": True} for reasoning mode. | Type: Object | No |
Support
Vendor support
Free support via NVIDIA NIM Developer Forum:
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.