What’s the key to developing accurate, game-changing predictions from data that can improve your business and open up new opportunities? Good, clean, structured data. Machine learning (ML) outcomes are only as good as the data they’re built upon.

Anyone who has undertaken data wrangling—the process of discovering, structuring, cleaning, enriching, validating, and publishing a dataset—knows how challenging it can be, particularly as data points grow in scope and data formats and sources become more varied. The perils of data preparation can prevent even the most seasoned subject matter experts from analyzing their data. 

The team at Trifacta wants to spread the word that it doesn’t have to be this way. The company’s mission is to accelerate the process of preparing data in order to help customers focus on what matters most: creating value for businesses through the power of accurate, data-driven insights.

Trifacta, an AWS Partner Network (APN) Advanced Technology Partner and AWS ML Competency Partner, leverages decades of innovative research in human-computer interaction, scalable data management, and machine learning to make the process of preparing data faster and more intuitive. When you open a dataset, Trifacta’s Wrangler application automatically presents compelling visual representations of your data. When you brush over or click on certain elements, Trifacta will suggest logical transformations powered by machine learning that you can select, edit, or build from scratch with real-time feedback. Then, Trifacta allows you to publish your prepared data to a number of applications and business intelligence tools. The process is seamless for users and accelerates the overall data preparation process by up to 80 percent while also maintaining security and governance.

“We find that so many individuals have relevant domain expertise in the data itself. They understand the stories the data should be able to tell, but they’re unable to access and prepare it in a way that allows them to make use of the data and drive predictions to aid in decision-making,” says Michael Minar, head of data science and machine learning at Trifacta. “One of the core questions driving our product roadmap and development is: ‘How can we make data available to more people?’ The business impacts of making data more accessible and usable are substantial: It can drive better decision-making and empower individuals in new ways with fresh insights.”

Trifacta considers its product to be an intelligent tool that gets better with use. The company continues to improve its product based on the user experience. As the company receives more data from users, it continually tweaks and enhances the tool’s functionality.

“The tool’s improvements are exponential in scale as more and more people start using the platform. As the customer base grows, we’re able to capture patterns of behavior—like what people are doing with data to build intelligent frameworks—and address sticking points for users,” says Minar. “It’s all about how many data points we can collect on what people are doing, what their use cases are, and what they seek to complete.”

The Trifacta team believes in the power of leveraging intelligence for data preparation tasks that require sophisticated judgment. “Building frameworks and continuously unblocking functionality gaps help us make users’ lives easier as we seek to organically embed intelligence as a part of a user’s normal workflow and engage in feedback loops with users. We’re trying to embed probabilistic intelligence into a workflow that users can accept, reject, or refine,” says Minar. “We try to expose why a decision was made for the user as a part of the feedback and augmentation cycle and explain to the user in a plain, natural language why we’re making the suggestions we’re making.”

These efforts to use machine learning to accelerate their product roadmap and deliver innovative capabilities are paying off with a broad market and end user recognition. Trifacta was positioned as the #1 data preparation vendor in the Dresner’s End User Data Prep Market Study for four years in a row. Additionally, the company was a leader in the latest Data Prep Solutions Forrester Wave.

Trifacta offers its Wrangler data preparation platform as software as a service (SaaS) through the AWS Marketplace, and offers native support for Amazon Elastic MapReduce (Amazon EMR). Trifacta takes advantage of AWS to drive internal product development, insights, and improvements.

“Internally, Trifacta leverages AWS in some elements of its data pipeline and machine learning architecture,” says Minar. “Trifacta is deployed to over 1,000 customers today across various environments. We collect anonymized metadata from customers and send it to Amazon Simple Storage Service (Amazon S3). Every night, we do our own cleaning and processing jobs to discover valuable intelligence from the data to produce reports for analytics and products teams. We pull down the raw data from AWS and we do processing, cleaning, and derivations on it.”

The company enables its teams to use Trifacta internally to query against raw data stored in Amazon S3. Once an individual decides on a piece of analytics valuable to them, they can operationalize it, which becomes a source of truth around derived metrics and operational flows. Individuals then use reporting tools to query, record, and visualize the data.

The Trifacta team has built strong relationships with other AWS partners to help users successfully tackle the full data lifecycle more intelligently and efficiently. “A good example of one of our strong partnerships is our work with DataRobot, another AWS ML Competency Partner,” says Bertrand Cariou, senior director of partner marketing at Trifacta. DataRobot’s automated machine learning platform captures the knowledge, experience, and best practices of the world’s leading data scientists, delivering automation and ease-of-use for machine learning initiatives.

“Combining Trifacta’s data wrangling capabilities with DataRobot’s automated machine learning models enables users of all backgrounds and areas of focus to prepare clean, accurate data and automatically build and deploy predictive models,” says Cariou.

As Trifacta continues to evolve its solution, the company will focus on how it can help users see data as accessible and empowering rather than burdensome. “We feel we’ve been able to lower the cost of data wrangling and helping users prepare data to get more value out of it, inspiring users to do more with their data,” says Minar. “We hope to continue to see users applying data to problems they wouldn’t have in the past because the cost to do so—in time, resources, and headache—is so much lower.”

“It’s inspiring to have users bring data to a task they wouldn’t have even thought of using it for before and then collaborating across their organization in new ways thanks to the accessibility of the data,” says Minar. Von Schaumburg agrees that the simplicity of using education delivered via AWS contributed to the success of the course. “When Rob asked me whether we needed to talk to the IT staff or get them in to set something up, I was happy to tell him that the experience was entirely based on software as a service and that we wouldn’t have to bother IT,” he says.


Trifacta, an AWS ML Competency Partner, makes data wrangling a faster and more intuitive process. Trifacta considers its product to be an intelligent tool that gets better with use. The company continues to improve its product based on the user experience.

Connect with Trifacta