Provision of Web-Scale Parallel Corpora for Official European Languages (ParaCrawl)

ParaCrawl is a set of large parallel corpora to/from English for all official EU languages by a broad web crawling effort. State-of-the-art methods are applied for the entire processing chain from identifying web sites with translated text all the way to collecting, cleaning and delivering parallel corpora that are ready as training data for CEF.AT and translation memories for DG Translation.

Overview

Features and programs

Open Data Sponsorship Program

This dataset is part of the Open Data Sponsorship Program, an AWS program that covers the cost of storage for publicly available high-value cloud-optimized datasets.

Learn more

Pricing

This is a publicly available data set. No subscription is required.

How can we make this page better?

Tell us how we can improve this page, or report an issue with this product.

Legal

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Delivery details

AWS Data Exchange (ADX)

AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.

Open data resources

Available with or without an AWS account.

How to use: To access these resources, reference the Amazon Resource Name (ARN) using the AWS Command Line Interface (CLI). Learn more

Description: Parallel Corpora to/from English for all official EU languages
Resource type: S3 bucket
Amazon Resource Name (ARN): arn:aws:s3:::web-language-models
AWS region: us-east-1
AWS CLI access (No AWS account required): aws s3 ls --no-sign-request s3://web-language-models/

Resources

Vendor resources

View this dataset on Github

Support

Contact

For questions regarding the datasets contact Kenneth Heafield, email kheafiel@inf.ed.ac.uk . For reporting any issues about bitextor pipeline visit https://github.com/bitextor/bitextor/issues .

Managed By

ParaCrawl

How to cite

Provision of Web-Scale Parallel Corpora for Official European Languages (ParaCrawl) was accessed on DATE from https://registry.opendata.aws/paracrawl .

License

Creative Commons CC0 license ("no rights reserved").

Similar products

Automated Storage Provision and Performance Tuning with StorageClasses

By Onedata Software Solutions

OneData Software leverages Kubernetes StorageClasses to automate the provisioning of persistent storage, aligning storage resources with application requirements. By integrating with cloud-native storage solutions like Amazon EBS and MinIO, OneData enables dynamic provisioning of storage volumes, ensuring optimal performance and scalability. This approach simplifies storage management, enhances application performance, and supports efficient data handling across hybrid and multi-cloud environments.

View product

InfraQ-Amazon-MSK-Provisioning-Accelerator

By SitadConsulting

InfraQ for Amazon MSK provides a simplified self-service infrastructure provisioning capability for accelerating the delivery of production grade Amazon MSK cluster infrastructure in Kraft mode. The provisioned Amazon MSK infrastructure is secure out of the box, enabled with Open Monitoring and Observability, and it is workload ready. InfraQ for Amazon MSK significantly reduces time-to-value for delivering streaming workloads, from Months to Minutes.

View product

Symphonica OSS - Telecom provisioning, activation and automation

By Intraway

Symphonica is a no-code telecom provisioning, activation and automation platform. Connect any BSS to any network technology or cloud service with just a few clicks. Join dozens of agile telecom companies that use Symphonica to create, test and launch new TM Forum-certified automation workflows and network connectors in minutes at www.intraway.com

View product

AWS Marketplace Provisioning Services

By Opsfleet

Designed for SaaS companies looking to expand their reach by selling on AWS Marketplace. Complex deployment and onboarding processes can often prove a significant barrier to going live so our AWS Marketplace Provisioning Services are designed to get a solution up and running quickly, without overburdening R&D and DevOps teams. By handling the complexities of deployment automation and providing guidance through the onboarding process, we ensure a seamless, hassle-free experience. With our expertise, you can quickly get your solution on AWS Marketplace, accessing millions of potential customers while allowing your teams to focus on core innovation. Maximize your sales potential with a fast, scalable, and simplified deployment process.

View product

Backflipt Provisioning Center - Axway MFT/B2Bi

By Backflipt

A Self-Service Partner Onboarding and File Route Configuration application that allows non-technical users to easily onboard partners and set up file routes while hiding technical complexity. Enforces standards through approval workflows while logging all actions for audit reviews. The application integrates with enterprise change management processes, ensuring that deployments, such as account creation and route setup, are approved before completion.

View product