
Sold by: EssentialAI
Open data
|
Deployed on AWS
A 24-trillion-token dataset in which every document is annotated with a twelve-category taxonomy covering topic, format, content complexity, and quality.
Overview
A 24-trillion-token dataset in which every document is annotated with a twelve-category taxonomy covering topic, format, content complexity, and quality.
Features and programs
Open Data Sponsorship Program
This dataset is part of the Open Data Sponsorship Program, an AWS program that covers the cost of storage for publicly available high-value cloud-optimized datasets.
Pricing
This is a publicly available data set. No subscription is required.
How can we make this page better?
We'd like to hear your feedback and ideas on how to improve this page.
Legal
Content disclaimer
Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.
Delivery details
AWS Data Exchange (ADX)
AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.
Open data resources
Available with or without an AWS account.
- How to use
- To access these resources, reference the Amazon Resource Name (ARN) using the AWS Command Line Interface (CLI). Learn more
- Description
- Essential-Web v1.0: 24T tokens of organized web data
- Resource type
- S3 bucket
- Amazon Resource Name (ARN)
- arn:aws:s3:::essential-web-v1.0
- AWS region
- us-west-2
- AWS CLI access (No AWS account required)
- aws s3 ls --no-sign-request s3://essential-web-v1.0/
- Description
- Notifications for new Essential-Web v1.0 data
- Resource type
- SNS topic
- Amazon Resource Name (ARN)
- arn:aws:sns:us-west-2:021391128517:essential-web-v10-object_created
- AWS region
- us-west-2
Resources
Vendor resources
Support
Contact
Managed By
How to cite
Essential-Web v1.0: 24T tokens of organized web data was accessed on DATE from https://registry.opendata.aws/eai-essential-web-v1 .
License
Essential-Web-v1.0 contributions are made available under the ODC attribution license ; however, users should also abide by the Common Crawl - Terms of Use . We do not alter the license of any of the underlying data.
Similar products
This product has fees associated with the provision and deployment of the application and AMI support. This Wordpress/OpenLiteSpeed web server contains all of the essential features to run a highly accelerated hosting platform for WordPress.
This product has a fee associated with the provision and deployment of the application and AMI support. This Amazon Machine Image (AMI) is a fully equipped web scraping server designed to cater to a wide range of data extraction needs. Pre-installed with essential libraries such as Beautiful Soup and Scrapy, this server is ready to pull data from HTML and XML files with ease. Beautiful Soup simplifies data retrieval by parsing HTML and XML documents, allowing you to focus on data analysis rather than data gathering.
This RedHat 9.7 Minimal (redhat9) image has charges associated with it for seller support and maintenance. RedHat 9.7 Minimal is a lightweight, secure, and high-performance Linux operating system optimized for AWS environments. Designed as a stable foundation, Red Hat 9.7 Minimal provides a clean and efficient base for building cloud-ready workloads and applications. It includes only essential packages, allowing developers and enterprises to customize installations for web servers, databases, CMS, and DevOps pipelines. RedHat9 Minimal is ideal for users who value simplicity, security, and flexibility, making it a powerful base for modern cloud infrastructure, automation, and scalable deployments in enterprise and development settings.
This is a repackaged open source software product wherein additional charges apply for support provided by Galaxys. Enterprise-ready web server with Apache 2.4 and PHP 8.2 fully configured and optimized. Includes essential modules, enhanced security, enabled opcache, and maximum performance settings. Stable and reliable solution for corporate web applications, REST APIs, and enterprise systems. Immediate deployment with no technical configuration required.