
Overview
TAUS is a leading global provider of high-quality data with 13+ years of experience. We specialize in domain-specific language data for Machine Learning in addition to data generation by global communities of data contributors, data crawling, annotation, and custom data solutions. The various data types we support include text, audio, and image with text being the main focus. In this listing we provide language datasets for MT training.
TAUS is a leading global provider of high-quality data with 13+ years of experience. We specialize in domain-specific language data for Machine Learning in addition to data generation by global communities of data contributors, data crawling, annotation, and custom data solutions. The various data types we support include text, audio, and image with text being the main focus.
In order to build intelligent MT systems capable of understanding human language, machine learning models need to digest large amounts of structured and domain-specific bilingual language data. Finding suitable datasets is the first step in solving any language-based machine learning problem. We currently offer off-the-shelf datasets in the healthcare, ecommerce and finance domains that are cleaned and optimized for MT engine training and you can now access them through the AWS Marketplace and improve your results.
Currently, the below datasets are available through AWS Marketplace:
* 9 datasets in the Retail & Wholesale Distribution/E-Commerce Domain in the following language pairs: English (US) to Danish, Dutch, French, Finnish, German, Italian, Polish, Spanish, and Swedish. * 17 datasets in the Pharmaceuticals & Biotechnology Domain in the following language pairs: English (US) to Bulgarian, Czech, Danish, German, Greek, Spanish, Estonian, Finnish, French, Hungarian, Italian, Latvian, Dutch, Norwegian, Slovenian, and Swedish. * 4 datasets in the Financial Services Domain in the following language pairs: English (US) to Czech, Hungarian, Dutch and Romanian.TAUS offers 35 billion words of language data in 600+ language pairs across many domains. Based on specific requirements, data generation solutions are provided through the TAUS HLP Platform where over 10K global data contributors perform language data tasks from data generation to annotation and evaluation. [TAUS Data Marketplace](https://datamarketplace.taus.net/)can also be an alternative platform where the data you need can be sourced from.
If your project requires a different dataset than what is already offered through AWS Marketplace, contact us via sales@taus.net .
Highlights
- Leading global provider of scalable data solutions for MT and AI applications with 13+ years of experience in the industry
- Wide range of available data products along with in-house NLP and data preparation and quality assurance expertise supports a broad spectrum of use cases for MT training
- Scalable data generation solutions in common and long-tail languages with a dedicated community-based platform
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Vendor refund policy
Refunds will not be provided for this subscription.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
AWS Data Exchange (ADX)
AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.
Additional details
You will receive access to the following data sets.
Data set name | Type | Historical revisions | Future revisions | Sensitive information | Data dictionaries | Data samples |
|---|---|---|---|---|---|---|
TAUS MT training data report | All historical revisions | All future revisions | Not included | Not included |
Resources
Vendor resources
Similar products



