AWS Big Data Blog

Category: AWS Glue

Now Available: Updated guidance on the Data Analytics Lens for AWS Well-Architected Framework

Nearly all businesses today require some form of data analytics processing, from auditing user access to generating sales reports. For all your analytics needs, the Data Analytics Lens for AWS Well-Architected Framework provides prescriptive guidance to help you assess your workloads and identify best practices aligned to the AWS Well-Architected Pillars: Operational Excellence, Security, Reliability, […]

Accelerate large-scale data migration validation using PyDeequ

March 2023: You can now use AWS Glue Data Quality to measure and manage the quality of your data. AWS Glue Data Quality is built on DeeQu and it offers a simplified user experience for customers who want to this open-source package. Refer to the blog and documentation for additional details. Many enterprises are migrating their […]

Stream data from relational databases to Amazon Redshift with upserts using AWS Glue streaming jobs

Traditionally, read replicas of relational databases are often used as a data source for non-online transactions of web applications such as reporting, business analysis, ad hoc queries, operational excellence, and customer services. Due to the exponential growth of data volume, it became common practice to replace such read replicas with data warehouses or data lakes […]

Build operational metrics for your enterprise AWS Glue Data Catalog at scale

Over the last several years, enterprises have accumulated massive amounts of data. Data volumes have increased at an unprecedented rate, exploding from terabytes to petabytes and sometimes exabytes of data. Increasingly, many enterprises are building highly scalable, available, secure, and flexible data lakes on AWS that can handle extremely large datasets. After data lakes are […]

How Amazon Transportation Service enabled near-real-time event analytics at petabyte scale using AWS Glue with Apache Hudi

This post is co-written with Madhavan Sriram and Diego Menin from Amazon Transportation Services (ATS). The transportation and logistics industry covers a wide range of services, such as multi-modal transportation, warehousing, fulfillment, freight forwarding, and delivery. At Amazon Transportation Service (ATS), the lifecycle of the shipment is digitally tracked and appended to tens of tracking […]

Simplify data integration pipeline development using AWS Glue custom blueprints

June 2023: This post was reviewed and updated for accuracy. August 2021: AWS Glue custom blueprints are now generally available. Please visit https://docs.aws.amazon.com/glue/latest/dg/blueprints-overview.html to learn more. Organizations spend significant time developing and maintaining data integration pipelines that hydrate data warehouses, data lakes, and lake houses. As data volume increases, data engineering teams struggle to keep up with […]

athena-quicksight-cross-account-architecture

Use Amazon Athena and Amazon QuickSight in a cross-account environment

This blog post was last reviewed and updated May, 2022 to include AWS Lake Formation resource sharing model. Many AWS customers use a multi-account strategy to host applications for different departments within the same company. However, you might deploy services like Amazon QuickSight using a single-account approach, which raises challenges when you need to use […]

How NortonLifelock built a serverless architecture for real-time analysis of their VPN usage metrics

August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. This post presents a reference architecture and optimization strategies for building serverless data analytics solutions on AWS using Amazon Kinesis Data Analytics. In addition, this post shows […]

How MOIA built a fully automated GDPR compliant data lake using AWS Lake Formation, AWS Glue, and AWS CodePipeline

This is a guest blog post co-written by Leonardo Pêpe, a Data Engineer at MOIA. MOIA is an independent company of the Volkswagen Group with locations in Berlin and Hamburg, and operates its own ride pooling services in Hamburg and Hanover. The company was founded in 2016 and develops mobility services independently or in partnership […]

Create a custom Amazon S3 Storage Lens metrics dashboard using Amazon QuickSight

Companies use Amazon Simple Storage Service (Amazon S3) for its flexibility, durability, scalability, and ability to perform many things besides storing data. This has led to an exponential rise in the usage of S3 buckets across numerous AWS Regions, across tens or even hundreds of AWS accounts. To optimize costs and analyze security posture, Amazon […]