AWS Big Data Blog

Migrate a large data warehouse from Greenplum to Amazon Redshift using AWS SCT – Part 3

In this third post of a multi-part series, we explore some of the edge cases in migrating a large data warehouse from Greenplum to Amazon Redshift using AWS Schema Conversion Tool (AWS SCT) and how to handle these challenges. Challenges include how best to use virtual partitioning, edge cases for numeric and character fields, and […]

Build an AWS Lake Formation permissions inventory dashboard using AWS Glue and Amazon QuickSight

AWS Lake Formation makes it easier to centrally govern, secure, and share data for analytics with familiar database-style grant features managed through the Glue Data Catalog. Lake Formation provides a single place to define fine-grained access control on catalog resources. These permissions are granted to the principals by a data lake admin, and integrated engines […]

Query cross-account Amazon DynamoDB tables using Amazon Athena Federated Query

Amazon DynamoDB is ideal for applications that need a flexible NoSQL database with low read and write latencies and the ability to scale storage and throughput up or down as needed without code changes or downtime. You can use DynamoDB for use cases including mobile apps, gaming, digital ad serving, live voting, audience interaction for live […]

How Blueshift integrated their customer data environment with Amazon Redshift to unify and activate customer data for marketing

This post was co-written with Vijay Chitoor, Co-Founder & CEO, and Mehul Shah, Co-Founder and CTO from the Blueshift team, as the lead authors. Blueshift is a San Francisco-based startup that helps marketers deliver exceptional customer experiences on every channel, delivering relevant personalized marketing. Blueshift’s SmartHub Customer Data Platform (CDP) empowers marketing teams to activate […]

How dynamic data masking support in Amazon Redshift helps achieve data privacy and compliance

Amazon Redshift is a fast, petabyte-scale cloud data warehouse delivering the best price–performance. It makes it fast, simple, and cost-effective to analyze all your data using standard SQL and your existing business intelligence (BI) tools. Today, tens of thousands of customers run business-critical workloads on Amazon Redshift. Dynamic data masking (DDM) support in Amazon Redshift […]

Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy (preview)

Amazon Redshift is a fast, petabyte-scale cloud data warehouse that makes it simple and cost-effective to analyze all of your data using standard SQL and your existing business intelligence (BI) tools. Tens of thousands of customers today rely on Amazon Redshift to analyze exabytes of data and run complex analytical queries, delivering the best price-performance. […]

Gain visibility into your Amazon MSK cluster by deploying the Conduktor Platform

This is a guest post by AWS Data Hero and co-founder of Conduktor, Stephane Maarek. Deploying Apache Kafka on AWS is now easier, thanks to Amazon Managed Streaming for Apache Kafka (Amazon MSK). In a few clicks, it provides you with a production-ready Kafka cluster on which you can run your applications and create data […]

Amazon EMR launches support for Amazon EC2 C6i, M6i, I4i, R6i and R6id instances to improve cost performance for Spark workloads by 6–33%

Amazon EMR provides a managed service to easily run analytics applications using open-source frameworks such as Apache Spark, Hive, Presto, Trino, HBase, and Flink. The Amazon EMR runtime for Spark and Presto includes optimizations that provide over two times performance improvements over open-source Apache Spark and Presto, so that your applications run faster and at […]

Enable federation to Amazon QuickSight with automatic provisioning of users between AWS IAM Identity Center and Microsoft Azure AD

Organizations are working towards centralizing their identity and access strategy across all their applications, including on-premises, third-party, and applications on AWS. Many organizations use identity providers (IdPs) based on OIDC or SAML-based protocols like Microsoft Azure Active Directory (Azure AD) and manage user authentication along with authorization centrally. This authorizes users to access Amazon QuickSight […]

Perform multi-cloud analytics using Amazon QuickSight, Amazon Athena Federated Query, and Microsoft Azure Synapse

In this post, we show how to use Amazon QuickSight and Amazon Athena Federated Query to build dashboards and visualizations on data that is stored in Microsoft Azure Synapse databases. Organizations today use data stores that are best suited for the applications they build. Additionally, they may also continue to use some of their legacy […]