AWS Big Data Blog

Category: AWS Glue DataBrew

Create single output files for recipe jobs using AWS Glue DataBrew

AWS Glue DataBrew offers over 350 pre-built transformations to automate data preparation tasks (such as filtering anomalies, standardizing formats, and correcting invalid values) that would otherwise require days or weeks writing hand-coded transformations. You can now choose single or multiple output files instead of autogenerated files for your DataBrew recipe jobs. You can generate a […]

A serverless operational data lake for retail with AWS Glue, Amazon Kinesis Data Streams, Amazon DynamoDB, and Amazon QuickSight

Do you want to reduce stockouts at stores? Do you want to improve order delivery timelines? Do you want to provide your customers with accurate product availability, down to the millisecond? A retail operational data lake can help you transform the customer experience by providing deeper insights into a variety of operational aspects of your […]

Trigger an AWS Glue DataBrew job based on an event generated from another DataBrew job

Organizations today have continuous incoming data, and analyzing this data in a timely fashion is becoming a common requirement for data analytics and machine learning (ML) use cases. As part of this, you need clean data in order to gain insights that can enable enterprises to get the most out of their data for business […]

Write prepared data directly into JDBC-supported destinations using AWS Glue DataBrew

AWS Glue DataBrew offers over 250 pre-built transformations to automate data preparation tasks (such as filtering anomalies, standardizing formats, and correcting invalid values) that would otherwise require days or weeks writing hand-coded transformations. You can now write cleaned and normalized data directly into JDBC-supported databases and data warehouses without having to move large amounts of […]

Cover Image

Build a data pipeline to automatically discover and mask PII data with AWS Glue DataBrew

Personally identifiable information (PII) data handling is a common requirement when operating a data lake at scale. Businesses often need to mitigate the risk of exposing PII data to the data science team while not hindering the productivity of the team to get to the data they need in order to generate valuable data insights. […]

Build event-driven data quality pipelines with AWS Glue DataBrew

Businesses collect more and more data every day to drive processes like decision-making, reporting, and machine learning (ML). Before cleaning and transforming your data, you need to determine whether it’s fit for use. Incorrect, missing, or malformed data can have large impacts on downstream analytics and ML processes. Performing data quality checks helps identify issues […]

Transform data and create dashboards using AWS Glue DataBrew and Tableau

Before you can create visuals and dashboards that convey useful information, you need to transform and prepare the underlying data. With AWS Glue DataBrew, you can now easily transform and prepare datasets from Amazon Simple Storage Service (Amazon S3), an Amazon Redshift data warehouse, Amazon Aurora, and other Amazon Relational Database Service (Amazon RDS) databases […]

Enrich datasets for descriptive analytics with AWS Glue DataBrew

Data analytics remains a constantly hot topic. More and more businesses are beginning to understand the potential their data has to allow them to serve customers more effectively and give them a competitive advantage. However, for many small to medium businesses, gaining insight from their data can be challenging because they often lack in-house data […]

Enforce customized data quality rules in AWS Glue DataBrew

GIGO (garbage in, garbage out) is a concept common to computer science and mathematics: the quality of the output is determined by the quality of the input. In modern data architecture, you bring data from different data sources, which creates challenges around volume, velocity, and veracity. You might write unit tests for applications, but it’s […]

Introducing PII data identification and handling using AWS Glue DataBrew

AWS Glue DataBrew, a visual data preparation tool, now allows users to identify and handle sensitive data by applying advanced transformations like redaction, replacement, encryption, and decryption on their personally identifiable information (PII) data, and other types of data they deem sensitive. With exponential growth of data, companies are handling huge volumes and a wide […]