Data preparation

Select and transform data so you can feed your machine learning (ML) algorithms the right data for the problems you want to solve.

Machine learning outcomes are only as good as the data they are built on, but preparing data for machine learning is a time-consuming process. The work of preparing data (data wrangling) can consume the majority of effort required for a project.

Data wrangling solutions running on Amazon Web Services (AWS) can help streamline machine learning applications so that your teams can focus on the work that really matters: creating accurate predictions that improve your products, services, and your organization’s efficiency.

Data preparation



Trifacta Wrangler Pro makes the process of wrangling diverse datasets faster and more intuitive by enabling data engineers and analysts to explore and prepare data more efficiently. With Wrangler Pro, your team can connect to more data, schedule workflows, and share their work with colleagues within a managed cloud platform.

Wrangler Pro is specifically designed to accelerate data wrangling for teams that work with small-to-intermediate-sized datasets that don't require the parallel computing power of big data platforms.

Powered by Trifacta's in-memory Photon Compute Framework, analysts can collaborate on the process of exploring, structuring, and publishing analysis-ready datasets for faster, more accurate analysis.

  • Visually explore and prepare diverse datasets with guidance driven from machine-learning suggestions
  • Create repeatable data wrangling workflows easily operationalized for scheduled execution
  • Collaborate and share datasets, wrangling recipes and entire workflows across your organization

Wrangler Pro supports deployment in AWS Marketplace and integrates with a variety of services, including connectivity and publishing to Amazon Simple Storage Service (Amazon S3) and Amazon Redshift, and deployment on Amazon Elastic Compute Cloud (Amazon EC2).

Consensus Corporation

Consensus Corporation uses Trifacta to reduce data discovery and preparation time. Consensus Corporation, a subsidiary of Target, needed a way to streamline time-intensive data preparation tasks for use in a machine learning algorithm to predict fraudulent activity at point-of-sale locations for a large national retailer.

Consensus used Trifacta to wrangle large amounts of structured historical data stored in Amazon S3—anywhere from 500 MB to 1.2 GB at one time—much faster than with previous efforts. This resulted in a reduction of data discovery time from two or three days to under 24 hours and a reduction in data preparation time from eight hours to under one hour.

Harrison Lynch's quote
With better and more accurate data, we knew we had the potential to save retailers a lot more money. Trifacta is really intuitive, and it’s much easier to use than SQL/R solutions. Trifacta also complements our machine learning model solution very well.
-Harrison Lynch, Senior Director of Product Development, Consensus Corporation
AWS Marketplace

AWS Marketplace is a digital catalog with thousands of software listings from independent software vendors that make it easy to find, test, buy, and deploy software that runs on AWS.

Computer vision

Computer vision allows machines to identify people, places, and things in images.

Data science tools

Machine learning and data science tools leverage machine learning algorithms.

Natural language

Compare texts against each other for similarity or for efficient categorization.

Have questions? Have tips?

We're here to help you get started with AWS Marketplace. Ask for or give advice on the AWS Marketplace discussion forum.

Have questions? Have tips?

We're here to help you get started with AWS Marketplace. Ask for or give advice on the AWS Marketplace discussion forum.