
Overview
A comprehensive dataset designed for aligning language models with safety and ethical guidelines. Contains 8,361 curated triplets of prompts, responses, and safe responses across various risk categories. Each entry includes safety scores, judge reasoning, and harm probability assessments, making it valuable for model alignment, testing, and benchmarking.
Features and programs
Open Data Sponsorship Program
Pricing
This is a publicly available data set. No subscription is required.
How can we make this page better?
Legal
Content disclaimer
Delivery details
AWS Data Exchange (ADX)
AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.
Open data resources
Available with or without an AWS account.
- How to use
- To access these resources, reference the Amazon Resource Name (ARN) using the AWS Command Line Interface (CLI). Learn more
- Description
- The dataset is available as three files - train with 6,000 records, validation with 1,200 records, and test with 1,161 records. Each file is in parquet format and contains 14 columns including prompts, responses, safety scores, and probability assessments. Generated using Apache 2.0 licensed models including ibm-granite/granite-3.0-8b, Qwen2.5-7B, and Mistral-Nemo-Instruct-2407.
- Resource type
- S3 bucket
- Amazon Resource Name (ARN)
- arn:aws:s3:::gretel-datasets-public/gretel-synthetic-safety-alignment-en-v1
- AWS region
- us-west-2
- AWS CLI access (No AWS account required)
- aws s3 ls --no-sign-request s3://gretel-datasets-public/gretel-synthetic-safety-alignment-en-v1/
Resources
Vendor resources
Support
Contact
Managed By
How to cite
Gretel Synthetic Safety Alignment Dataset was accessed on DATE from https://registry.opendata.aws/gretel-synthetic-safety-alignment-en-v1 .
License
Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0 )