AWS Big Data Blog

Category: AWS Big Data

The art and science of data product portfolio management

This post is the first in a series dedicated to the art and science of practical data mesh implementation (for an overview of data mesh, read the original whitepaper The data mesh shift). The series attempts to bridge the gap between the tenets of data mesh and its real-life implementation by deep-diving into the functional […]

Near-real-time analytics using Amazon Redshift streaming ingestion with Amazon Kinesis Data Streams and Amazon DynamoDB

Amazon Redshift is a fully managed, scalable cloud data warehouse that accelerates your time to insights with fast, easy, and secure analytics at scale. Tens of thousands of customers rely on Amazon Redshift to analyze exabytes of data and run complex analytical queries, making it the widely used cloud data warehouse. You can run and […]

Improved scalability and resiliency for Amazon EMR on EC2 clusters

Amazon EMR is the cloud big data solution for petabyte-scale data processing, interactive analytics, and machine learning using open-source frameworks such as Apache Spark, Apache Hive, and Presto. Customers asked us for features that would further improve the resiliency and scalability of their Amazon EMR on EC2 clusters, including their large, long-running clusters. We have […]

How Amazon Finance Automation built a data mesh to support distributed data ownership and centralize governance

Amazon Finance Automation (FinAuto) is the tech organization of Amazon Finance Operations (FinOps). Its mission is to enable FinOps to support the growth and expansion of Amazon businesses. It works as a force multiplier through automation and self-service, while providing accurate and on-time payments and collections. FinAuto has a unique position to look across FinOps […]

How AWS helped Altron Group accelerate their vision for optimized customer engagement

This is a guest post co-authored by Jacques Steyn, Senior Manager Professional Services at Altron Group. Altron is a pioneer of providing data-driven solutions for their customers by combining technical expertise with in-depth customer understanding to provide highly differentiated technology solutions. Alongside their partner AWS, they participated in AWS Data-Driven Everything (D2E) workshops and a […]

With a zero-ETL approach, AWS is helping builders realize near-real-time analytics

Data is at the center of every application, process, and business decision. When data is used to improve customer experiences and drive innovation, it can lead to business growth. According to Forrester, advanced insights-driven businesses are 8.5 times more likely than beginners to report at least 20% revenue growth. However, to realize this growth, managing […]

Choosing an open table format for your transactional data lake on AWS

August 2023: This post was updated to include Apache Iceberg support in Amazon Redshift. Disclaimer: Due to rapid advancements in AWS service support for open table formats, recent developments might not yet be reflected in this post. For the latest information on AWS service support for open table formats, refer to the official AWS service […]

How Cargotec uses metadata replication to enable cross-account data sharing

This is a guest blog post co-written with Sumesh M R from Cargotec and Tero Karttunen from Knowit Finland. Cargotec (Nasdaq Helsinki: CGCBV) is a Finnish company that specializes in cargo handling solutions and services. They are headquartered in Helsinki, Finland, and operates globally in over 100 countries. With its leading cargo handling solutions and […]

Ten new visual transforms in AWS Glue Studio

AWS Glue Studio is a graphical interface that makes it easy to create, run, and monitor extract, transform, and load (ETL) jobs in AWS Glue. It allows you to visually compose data transformation workflows using nodes that represent different data handling steps, which later are converted automatically into code to run. AWS Glue Studio recently […]

How Encored Technologies built serverless event-driven data pipelines with AWS

This post is a guest post co-written with SeonJeong Lee, JaeRyun Yim, and HyeonSeok Yang from Encored Technologies. Encored Technologies (Encored) is an energy IT company in Korea that helps their customers generate higher revenue and reduce operational costs in renewable energy industries by providing various AI-based solutions. Encored develops machine learning (ML) applications predicting […]