Listing Thumbnail

    Domain-specific Language Datasets For Machine Translation Training

     Info
    Sold by: TAUS 
    Deployed on AWS
    TAUS is a leading global provider of high-quality data with 13+ years of experience. We specialize in domain-specific language data for Machine Learning in addition to data generation by global communities of data contributors, data crawling, annotation, and custom data solutions. The various data types we support include text, audio, and image with text being the main focus. In this listing we provide language datasets for MT training.

    Overview

    TAUS is a leading global provider of high-quality data with 13+ years of experience. We specialize in domain-specific language data for Machine Learning in addition to data generation by global communities of data contributors, data crawling, annotation, and custom data solutions. The various data types we support include text, audio, and image with text being the main focus. In this listing we provide language datasets for MT training.

    TAUS  is a leading global provider of high-quality data with 13+ years of experience. We specialize in domain-specific language data for Machine Learning in addition to data generation by global communities of data contributors, data crawling, annotation, and custom data solutions. The various data types we support include text, audio, and image with text being the main focus.

    In order to build intelligent MT systems capable of understanding human language, machine learning models need to digest large amounts of structured and domain-specific bilingual language data. Finding suitable datasets is the first step in solving any language-based machine learning problem. We currently offer off-the-shelf datasets in the healthcare, ecommerce and finance domains that are cleaned and optimized for MT engine training and you can now access them through the AWS Marketplace and improve your results.

    Currently, the below datasets are available through AWS Marketplace:

    * 9 datasets in the Retail & Wholesale Distribution/E-Commerce Domain in the following language pairs: English (US) to Danish, Dutch, French, Finnish, German, Italian, Polish, Spanish, and Swedish. * 17 datasets in the Pharmaceuticals & Biotechnology Domain in the following language pairs: English (US) to Bulgarian, Czech, Danish, German, Greek, Spanish, Estonian, Finnish, French, Hungarian, Italian, Latvian, Dutch, Norwegian, Slovenian, and Swedish. * 4 datasets in the Financial Services Domain in the following language pairs: English (US) to Czech, Hungarian, Dutch and Romanian.

    TAUS offers 35 billion words of language data in 600+ language pairs across many domains. Based on specific requirements, data generation solutions are provided through the TAUS HLP Platform where over 10K global data contributors perform language data tasks from data generation to annotation and evaluation. [TAUS Data Marketplace](https://datamarketplace.taus.net/)can  also be an alternative platform where the data you need can be sourced from.

    If your project requires a different dataset than what is already offered through AWS Marketplace, contact us via  sales@taus.net .

    Highlights

    • Leading global provider of scalable data solutions for MT and AI applications with 13+ years of experience in the industry
    • Wide range of available data products along with in-house NLP and data preparation and quality assurance expertise supports a broad spectrum of use cases for MT training
    • Scalable data generation solutions in common and long-tail languages with a dedicated community-based platform

    Details

    Sold by

    Categories

    Delivery method

    Deployed on AWS
    New

    Introducing multi-product solutions

    You can now purchase comprehensive solutions tailored to use cases and industries.

    Multi-product solutions

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    Domain-specific Language Datasets For Machine Translation Training

     Info
    This product is available free of charge. Free subscriptions have no end date and may be canceled any time.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    Vendor refund policy

    Refunds will not be provided for this subscription.

    How can we make this page better?

    We'd like to hear your feedback and ideas on how to improve this page.
    We'd like to hear your feedback and ideas on how to improve this page.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    AWS Data Exchange (ADX)

    AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.

    Additional details

    Data sets (1)

     Info

    You will receive access to the following data sets.

    Data set name
    Type
    Historical revisions
    Future revisions
    Sensitive information
    Data dictionaries
    Data samples
    TAUS MT training data report
    All historical revisions
    All future revisions
    Not included
    Not included

    Resources

    Vendor resources

    Similar products