Overview
nemoretriever-parse will be capable of comprehensive text understanding and document structure understanding. It will be used in retriever and curator solutions. Its text extraction datasets and capabilities will help with LLM and VLM training, as well as improve run-time inference accuracy of VLMs. The nemoretriever-parse model will perform text extraction from PDF and PPT documents. The nemoretriever-parse can classify the objects (title, section, caption, index, footnote, lists, tables, bibliography, image) in a given document, and provide bounding boxes with coordinates.
Highlights
- Architecture Type: Transformer-based vision-encoder-decoder model
- Network Architecture: Vision Encoder: ViT-H model (https://huggingface.co/nvidia/C-RADIO) Adapter Layer: 1D convolutions & norms to compress dimensionality and sequence length of the latent space (1280 tokens to 320 tokens) Decoder: mBart [1] 10 blocks Tokenizer: Galactica (https://arxiv.org/abs/2211.09085); same as Nougat tokenizer
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.g5.2xlarge Inference (Batch) Recommended | Model inference on the ml.g5.2xlarge instance type, batch mode | $1.00 |
ml.g5.2xlarge Inference (Real-Time) Recommended | Model inference on the ml.g5.2xlarge instance type, real-time mode | $1.00 |
ml.g5.4xlarge Inference (Batch) | Model inference on the ml.g5.4xlarge instance type, batch mode | $1.00 |
ml.g5.8xlarge Inference (Batch) | Model inference on the ml.g5.8xlarge instance type, batch mode | $1.00 |
ml.g5.16xlarge Inference (Batch) | Model inference on the ml.g5.16xlarge instance type, batch mode | $1.00 |
ml.g5.12xlarge Inference (Batch) | Model inference on the ml.g5.12xlarge instance type, batch mode | $1.00 |
ml.g5.24xlarge Inference (Batch) | Model inference on the ml.g5.24xlarge instance type, batch mode | $1.00 |
ml.g5.48xlarge Inference (Batch) | Model inference on the ml.g5.48xlarge instance type, batch mode | $1.00 |
ml.g5.4xlarge Inference (Real-Time) | Model inference on the ml.g5.4xlarge instance type, real-time mode | $1.00 |
ml.g5.8xlarge Inference (Real-Time) | Model inference on the ml.g5.8xlarge instance type, real-time mode | $1.00 |
Vendor refund policy
No refund
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Additional details
Inputs
- Summary
Given an image, nemotron-parse is able to extract formatted-text, with bounding-boxes and the corresponding semantic class. It accepts JSON requests via the /invocations API, where the image content is provided as a Base64-encoded data URI.
- Input MIME type
- application/json
Input data descriptions
The following table describes supported input data fields for real-time inference and batch transform.
Field name | Description | Constraints | Required |
|---|---|---|---|
model | The specific model name, e.g., "nvidia/nemotron-parse". | String | Yes |
messages | Conversation history, typically containing a single "user" message. | Array of Objects | Yes |
messages[].content | Must contain an object with "type": "image_url" and an "image_url" object. The image data can be provided via a Base64-encoded data URI (e.g., data:image/png;base64,...) for local files (especially air-gapped deployment), or as a direct public URL to the online image file. | Array of Objects | Yes |
max_tokens | The maximum number of tokens to generate in the response. | Integer | No |
Support
Vendor support
Free support via NVIDIA NIM Developer Forum:
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.