AWS Big Data Blog

Accelerate analytics on Amazon OpenSearch Service with AWS Glue through its native connector

As the volume and complexity of analytics workloads continue to grow, customers are looking for more efficient and cost-effective ways to ingest and analyse data. Data is stored from online systems such as the databases, CRMs, and marketing systems to data stores such as data lakes on Amazon Simple Storage Service (Amazon S3), data warehouses […]

New Amazon CloudWatch log class to cost-effectively scale your AWS Glue workloads

AWS Glue is a serverless data integration service that makes it easier to discover, prepare, and combine data for analytics, machine learning (ML), and application development. You can use AWS Glue to create, run, and monitor data integration and ETL (extract, transform, and load) pipelines and catalog your assets across multiple data stores. One of […]

Sun King uses Amazon Redshift data sharing to accelerate data analytics and improve user experience

This post is co-authored with Guillaume Saint-Martin at Sun King.  Sun King is the world’s leading off-grid solar energy company, and is on a mission to power access to brighter lives through off-grid solar. Sun King designs, distributes, installs, and finances solar home energy products for people currently living without reliable energy access. It serves […]

AWS re:Invent 2023 Amazon Redshift Sessions Recap

Amazon Redshift powers data-driven decisions for tens of thousands of customers every day with a fully managed, AI-powered cloud data warehouse, delivering the best price-performance for your analytics workloads. Customers use Amazon Redshift as a key component of their data architecture to drive use cases from typical dashboarding to self-service analytics, real-time analytics, machine learning […]

Build efficient ETL pipelines with AWS Step Functions distributed map and redrive feature

AWS Step Functions is a fully managed visual workflow service that enables you to build complex data processing pipelines involving a diverse set of extract, transform, and load (ETL) technologies such as AWS Glue, Amazon EMR, and Amazon Redshift. You can visually build the workflow by wiring individual data pipeline tasks and configuring payloads, retries, […]

Automatically detect Personally Identifiable Information in Amazon Redshift using AWS Glue

With the exponential growth of data, companies are handling huge volumes and a wide variety of data including personally identifiable information (PII). PII is a legal term pertaining to information that can identify, contact, or locate a single person. Identifying and protecting sensitive data at scale has become increasingly complex, expensive, and time-consuming. Organizations have […]

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

Data loses value over time. We hear from our customers that they’d like to analyze the business transactions in real time. Traditionally, customers used batch-based approaches for data movement from operational systems to analytical systems. Batch load can run once or several times a day. A batch-based approach can introduce latency in data movement and […]

Architecture Diagram

Federate IAM-based single sign-on to Amazon Redshift role-based access control with Okta

Amazon Redshift accelerates your time to insights with fast, easy, and secure cloud data warehousing at scale. Tens of thousands of customers rely on Amazon Redshift to analyze exabytes of data and run complex analytical queries. You can use your preferred SQL clients to analyze your data in an Amazon Redshift data warehouse. Connect seamlessly by […]

Orchestrate Amazon EMR Serverless Spark jobs with Amazon MWAA, and data validation using Amazon Athena

As data engineering becomes increasingly complex, organizations are looking for new ways to streamline their data processing workflows. Many data engineers today use Apache Airflow to build, schedule, and monitor their data pipelines. However, as the volume of data grows, managing and scaling these pipelines can become a daunting task. Amazon Managed Workflows for Apache […]

How FanDuel adopted a modern Amazon Redshift architecture to serve critical business workloads

This post is co-written with Sreenivasa Mungala and Matt Grimm from FanDuel. In this post, we share how FanDuel moved from a DC2 nodes architecture to a modern Amazon Redshift architecture, which includes Redshift provisioned clusters using RA3 instances, Amazon Redshift data sharing, and Amazon Redshift Serverless. About FanDuel Part of Flutter Entertainment, FanDuel Group […]