AWS Big Data Blog

Query Snowflake using Athena Federated Query and join with data in your Amazon S3 data lake

If you use data lakes in Amazon Simple Storage Service (Amazon S3) and use Snowflake as your data warehouse solution, you may need to join your data in your data lake with Snowflake. For example, you may want to build a dashboard by joining historical data in your Amazon S3 data lake and the latest […]

Read More

Build a serverless event-driven workflow with AWS Glue and Amazon EventBridge

Customers are adopting event-driven-architectures to improve the agility and resiliency of their applications. As a result, data engineers are increasingly looking for simple-to-use yet powerful and feature-rich data processing tools to build pipelines that enrich data, move data in and out of their data lake and data warehouse, and analyze data. AWS Glue is a […]

Read More

Auto scaling Amazon Kinesis Data Streams using Amazon CloudWatch and AWS Lambda

This post is co-written with Noah Mundahl, Director of Public Cloud Engineering at United Health Group. In this post, we cover a solution to add auto scaling to Amazon Kinesis Data Streams. Whether you have one stream or many streams, you often need to scale them up when traffic increases and scale them down when […]

Read More

Data preparation using an Amazon RDS for MySQL database with AWS Glue DataBrew

With AWS Glue DataBrew, data analysts and data scientists can easily access and visually explore any amount of data across their organization directly from their Amazon Simple Storage Service (Amazon S3) data lake, Amazon Redshift data warehouse, or Amazon Aurora and Amazon Relational Database Service (Amazon RDS) databases. You can choose from over 250 built-in […]

Read More

Incremental data matching using AWS Lake Formation and AWS Glue

AWS Lake Formation provides a machine learning (ML) capability (FindMatches transform) to identify duplicate or matching records in your dataset, even when the records don’t have a common unique identifier and no fields match exactly. Customers across many industries have come to rely on this feature for linking datasets like patient records, customer databases, and […]

Read More

Query your Oracle database using Athena Federated Query and join with data in your Amazon S3 data lake

If you use data lakes in Amazon Simple Storage Service (Amazon S3) and use Oracle as your transactional data store, you may need to join the data in your data lake with Oracle on Amazon Relational Database Service (Amazon RDS), Oracle running on Amazon Elastic Compute Cloud (Amazon EC2), or an on-premises Oracle database, for […]

Read More

Enable federation to multiple Amazon QuickSight accounts with Microsoft Azure Active Directory

Amazon QuickSight is a scalable, serverless, embeddable, machine learning (ML)-powered business intelligence (BI) service built for the cloud that supports identity federation in both Standard and Enterprise editions. Organizations are working towards centralizing their identity and access strategy across all of their applications, including on-premises, third-party, and applications on AWS. Many organizations use Microsoft Azure […]

Read More

Create a secure data lake by masking, encrypting data, and enabling fine-grained access with AWS Lake Formation

You can build data lakes with millions of objects on Amazon Simple Storage Service (Amazon S3) and use AWS native analytics and machine learning (ML) services to process, analyze, and extract business insights. You can use a combination of our purpose-built databases and analytics services like Amazon EMR, Amazon OpenSearch Service, and Amazon Redshift as […]

Read More

Design a data mesh architecture using AWS Lake Formation and AWS Glue

Organizations of all sizes have recognized that data is one of the key enablers to increase and sustain innovation, and drive value for their customers and business units. They are eagerly modernizing traditional data platforms with cloud-native technologies that are highly scalable, feature-rich, and cost-effective. As you look to make business decisions driven by data, […]

Read More

Case-insensitive collation support for string processing in Amazon Redshift

Amazon Redshift is a fast, fully managed, cloud-native data warehouse. Tens of thousands of customers have successfully migrated their workloads to Amazon Redshift. We hear from customers that they need case-insensitive collation for strings in Amazon Redshift in order to maintain the same functionality and meet their performance goals when they migrate their existing workloads […]

Read More