
Overview
Palmyra-X-004 is a powerful large language model ranking among the top-performing AI models in the HELM evaluation framework. It demonstrates exceptional capabilities in knowledge tasks, coding, and mathematics, while excelling in instruction following and structured data handling. The model features a 32k context window and broad multilingual capabilities. Benchmarks show Palmyra-X-004 achieves performance levels comparable to industry leaders GPT-4o and Claude 3.5 Sonnet.
Highlights
- Palmyra-X-004 is built for enterprises that need to leverage their internal data and documents for accurate and reliable language processing. It excels at understanding structured data (like tables), generating structured outputs (especially JSON), and provides enhanced role-play implementation and condition-setting capabilities for chatbots.
- The model delivers exceptional speed and performance through optimized inference engines, enabling low-latency inference on accelerators on AWS. This optimization provides flexible cost and performance options for Amazon SageMaker customers while maintaining high-throughput production capabilities.
- Features 32K context window support, enabling processing of extensive documents. Includes comprehensive tool use capabilities for function calling and automation. Offers robust multilingual support across major global languages including English, Japanese, Chinese, French, and more.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.g5.48xlarge Inference (Batch) Recommended | Model inference on the ml.g5.48xlarge instance type, batch mode | $57.08 |
ml.p4d.24xlarge Inference (Real-Time) Recommended | Model inference on the ml.p4d.24xlarge instance type, real-time mode | $57.08 |
Vendor refund policy
All fees are non-refundable and non-cancellable except as required by law.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Initial release: Batch transform is not supported.
Additional details
Inputs
- Summary
The model accepts JSON requests with parameters that can be used to control the generated text. See examples and fields descriptions below.
- Input MIME type
- application/json
Input data descriptions
The following table describes supported input data fields for real-time inference and batch transform.
Field name | Description | Constraints | Required |
|---|---|---|---|
messages | Text input for the model to respond to. | Type: FreeText | Yes |
stream | If set to `true`, the system will return a stream of JSON events as the response. The stream concludes with a final event marked by an `event_type` of `"stream-end"`, which contains the full response. This streaming approach is particularly useful for user interfaces that display content incrementally as it's generated, allowing for a more dynamic and responsive experience. | Default value: FALSE
Type: Categorical
Allowed values: TRUE, FALSE | No |
temperature | To make the response more predictable and less random, choose a lower value for this setting. If you want to increase the variety and unpredictability in the output, you can do so by raising the value of the `p` parameter instead. | Default value: 1.0
Type: Continuous
Minimum: 0
Maximum: 2 | No |
max_tokens | The maximum number of tokens the model will generate as part of the response. Note: Setting a low value may result in incomplete generations. | Default value: 4096
Type: Integer
Minimum: 0
Maximum: 32768 | No |
Top P (top_p) | Use a lower value to ignore less probable options. | Default value: 0.99
Type: Continuous
Minimum: 0.01
Maximum: 0.99 | No |
presence_penalty | Used to reduce repetitiveness of generated tokens. Similar to `frequency_penalty`, except that this penalty is applied equally to all tokens that have already appeared, regardless of their exact frequencies. | Default value: 0.0
Type: Continuous
Minimum: -2.0
Maximum: 2.0 | No |
frequency_penalty | Reduces repetition in the output. Higher values apply a stronger penalty to tokens that have already appeared, based on their frequency in the prompt or previous generation. | Default value: 0.0
Type: Continuous
Minimum: -2.0
Maximum: 2.0 | No |
Resources
Vendor resources
Support
Vendor support
Email support services are available from Monday to Friday. support@writer.comÂ
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products


