AWS Big Data Blog

Category: Analytics

Implement slowly changing dimensions in a data lake using AWS Glue and Delta

In a data warehouse, a dimension is a structure that categorizes facts and measures in order to enable users to answer business questions. To illustrate an example, in a typical sales domain, customer, time or product are dimensions and sales transactions is a fact. Attributes within the dimension can change over time—a customer can change […]

shows a simplified data mesh architecture with a single producer account, a centralized governance account, and a single consumer account

AWS Glue crawlers support cross-account crawling to support data mesh architecture

Data lakes have come a long way, and there’s been tremendous innovation in this space. Today’s modern data lakes are cloud native, work with multiple data types, and make this data easily available to diverse stakeholders across the business. As time has gone by, data lakes have grown significantly and have evolved to data meshes […]

Deep Pool boosts software quality control using Amazon QuickSight

Deep Pool Financial Solutions, an investor servicing and compliance solutions supplier, was looking to build key performance indicators to track its software tests, failures, and successful fixes to pinpoint the specific areas for improvement in its client software. Deep Pool was unable to access the large amounts of data that its project management software provided, […]

Visualize Confluent data in Amazon QuickSight using Amazon Athena

This is a guest post written by Ahmed Saef Zamzam and Geetha Anne from Confluent. Businesses are using real-time data streams to gain insights into their company’s performance and make informed, data-driven decisions faster. As real-time data has become essential for businesses, a growing number of companies are adapting their data strategy to focus on […]

Manage your data warehouse cost allocations with Amazon Redshift Serverless tagging

Amazon Redshift Serverless makes it simple to run and scale analytics without having to manage your data warehouse infrastructure. Developers, data scientists, and analysts can work across databases, data warehouses, and data lakes to build reporting and dashboarding applications, perform real-time analytics, share and collaborate on data, and even build and train machine learning (ML) […]

Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation

We recently announced support for AWS Lake Formation fine-grained access control policies in Amazon Athena queries for data stored in any supported file format using table formats such as Apache Iceberg, Apache Hudi and Apache Hive. AWS Lake Formation allows you to define and enforce database, table, and column-level access policies to query Iceberg tables […]

solution architecture and user flow.

Manage users and group memberships on Amazon QuickSight using SCIM events generated in IAM Identity Center with Azure AD

Amazon QuickSight is cloud-native, scalable business intelligence (BI) service that supports identity federation. AWS Identity and Access Management (IAM) allows organizations to use the identities managed in their enterprise identity provider (IdP) and federate single sign-on (SSO) to QuickSight. As more organizations are building centralized user identity stores with all their applications, including on-premises apps, […]

Accelerating revenue growth with real-time analytics: Poshmark’s journey

August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. This post was co-written by Mahesh Pasupuleti and Gaurav Shah from Poshmark. Poshmark is a leading social marketplace for new and secondhand styles for women, men, kids, […]

Introducing native support for Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue for Apache Spark, Part 2: AWS Glue Studio Visual Editor

In the first post of this series, we described how AWS Glue for Apache Spark works with Apache Hudi, Linux Foundation Delta Lake, and Apache Iceberg datasets tables using the native support of those data lake formats. This native support simplifies reading and writing your data for these data lake frameworks so you can more […]

Extend geospatial queries in Amazon Athena with UDFs and AWS Lambda

Amazon Athena is a serverless and interactive query service that allows you to easily analyze data in Amazon Simple Storage Service (Amazon S3) and 25-plus data sources, including on-premises data sources or other cloud systems using SQL or Python. Athena built-in capabilities include querying for geospatial data; for example, you can count the number of […]