AWS Big Data Blog

Dive deep into AWS Glue 4.0 for Apache Spark

Jul 2023: This post was reviewed and updated with Glue 4.0 support in AWS Glue Studio notebook and interactive sessions. Deriving insight from data is hard. It’s even harder when your organization is dealing with silos that impede data access across different data stores. Seamless data integration is a key requirement in a modern data […]

Softbrain provides advanced analytics to sales customers with Amazon QuickSight

This is a guest post by Kenta Oda from SOFTBRAIN Co., Ltd. Softbrain is a leading Japanese producer of software for sales force automation (SFA) and customer relationship management (CRM). Our main product, e-Sales Manager (eSM), is an SFA/CRM tool that provides sales support to over 5,500 companies in Japan. We provide our sales customers […]

Stream data with Amazon MSK Connect using an open-source JDBC connector

March 2026: This post was reviewed and updated for accuracy. Customers are adopting Amazon Managed Service for Apache Kafka (Amazon MSK) as a fast and reliable streaming platform to build their enterprise data hub. In addition to streaming capabilities, setting up Amazon MSK enables organizations to use a pub/sub model for data distribution with loosely […]

Jerry Wang, Peloton’s Director of Data Engineering (left), and Evy Kho, Peloton's Manager of Subscription Analytics, discuss how the company has benefited from using Amazon Redshift.

Peloton embraces Amazon Redshift to unlock the power of data during changing times

If you want to learn how to make the most of your company’s data, especially during today’s uncertain times, don’t miss AWS Data Insights Day on May 24, 2023. Learn more. Credit: Phil GoldsteinJerry Wang, Peloton’s Director of Data Engineering (left), and Evy Kho, Peloton’s Manager of Subscription Analytics, discuss how the company has benefited from using […]

How Zoom implemented streaming log ingestion and efficient GDPR deletes using Apache Hudi on Amazon EMR

In today’s digital age, logging is a critical aspect of application development and management, but efficiently managing logs while complying with data protection regulations can be a significant challenge. Zoom, in collaboration with the AWS Data Lab team, developed an innovative architecture to overcome these challenges and streamline their logging and record deletion processes. In […]

Improve power utility operational efficiency using smart sensor data and Amazon QuickSight

This blog post is co-written with Steve Alexander at PG&E. In today’s rapidly changing energy landscape, power disturbances cause businesses millions of dollars due to service interruptions and power quality issues. Large utility territories make it difficult to detect and locate faults when power outages occur, leading to longer restoration times, recurring outages, and unhappy […]

Amazon OpenSearch Service Under the Hood: Multi-AZ with Standby

Amazon OpenSearch Service recently announced Multi-AZ with Standby, a new deployment option for managed clusters that enables 99.99% availability and consistent performance for business-critical workloads. With Multi-AZ with Standby, clusters are resilient to infrastructure failures like hardware or networking failure. This option provides improved reliability and the added benefit of simplifying cluster configuration and management […]

Perform secure database write-backs with Amazon QuickSight

Amazon QuickSight is a scalable, serverless, machine learning (ML)-powered business intelligence (BI) solution that makes it easy to connect to your data, create interactive dashboards, get access to ML-enabled insights, and share visuals and dashboards with tens of thousands of internal and external users, either within QuickSight itself or embedded into any application. A write-back […]

Ten new visual transforms in AWS Glue Studio

AWS Glue Studio is a graphical interface that makes it easy to create, run, and monitor extract, transform, and load (ETL) jobs in AWS Glue. It allows you to visually compose data transformation workflows using nodes that represent different data handling steps, which later are converted automatically into code to run. AWS Glue Studio recently […]

Use SAML Identities for programmatic access to Amazon OpenSearch Service

Customers of Amazon OpenSearch Service can already use Security Assertion Markup Language (SAML) to access OpenSearch Dashboards. This post outlines two methods by which programmatic users can now access OpenSearch using SAML identities. This applies to all identity providers (IdPs) that support SAML 2.0, including prevalent ones like Active Directory Federation Service (ADFS), Okta, AWS […]