AWS Big Data Blog
Category: Analytics
Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB
Customers have been using data warehousing solutions to perform their traditional analytics tasks. Recently, data lakes have gained lot of traction to become the foundation for analytical solutions, because they come with benefits such as scalability, fault tolerance, and support for structured, semi-structured, and unstructured datasets. Data lakes are not transactional by default; however, there […]
Automate alerting and reporting for AWS Glue job resource usage
Data transformation plays a pivotal role in providing the necessary data insights for businesses in any organization, small and large. To gain these insights, customers often perform ETL (extract, transform, and load) jobs from their source systems and output an enriched dataset. Many organizations today are using AWS Glue to build ETL pipelines that bring data […]
Defontana provides business administration solutions to Latin American customers using Amazon QuickSight
This is a guest post by Cynthia Valeriano, Jaime Olivares, and Guillermo Puelles from DeFontana. Defontana develops fully cloud-based business applications for the administration and management of companies. Based in Santiago, Chile, with operations in Peru, Mexico, and most recently Colombia, our main product is a 100% cloud-based enterprise resource planning (ERP) system that has […]
Improve operational efficiencies of Apache Iceberg tables built on Amazon S3 data lakes
Apache Iceberg is an open table format for large datasets in Amazon Simple Storage Service (Amazon S3) and provides fast query performance over large tables, atomic commits, concurrent writes, and SQL-compatible table evolution. When you build your transactional data lake using Apache Iceberg to solve your functional use cases, you need to focus on operational […]
Real-time time series anomaly detection for streaming applications on Amazon Managed Service for Apache Flink
August 30, 2023: Amazon Kinesis Data Analytics has been renamed to Amazon Managed Service for Apache Flink. Read the announcement in the AWS News Blog and learn more. Detecting anomalies in real time from high-throughput streams is key for informing on timely decisions in order to adapt and respond to unexpected scenarios. Stream processing frameworks […]
Simplify AWS Glue job orchestration and monitoring with Amazon MWAA
Organizations across all industries have complex data processing requirements for their analytical use cases across different analytics systems, such as data lakes on AWS, data warehouses (Amazon Redshift), search (Amazon OpenSearch Service), NoSQL (Amazon DynamoDB), machine learning (Amazon SageMaker), and more. Analytics professionals are tasked with deriving value from data stored in these distributed systems […]
Dive deep into AWS Glue 4.0 for Apache Spark
Jul 2023: This post was reviewed and updated with Glue 4.0 support in AWS Glue Studio notebook and interactive sessions. Deriving insight from data is hard. It’s even harder when your organization is dealing with silos that impede data access across different data stores. Seamless data integration is a key requirement in a modern data […]
Softbrain provides advanced analytics to sales customers with Amazon QuickSight
This is a guest post by Kenta Oda from SOFTBRAIN Co., Ltd. Softbrain is a leading Japanese producer of software for sales force automation (SFA) and customer relationship management (CRM). Our main product, e-Sales Manager (eSM), is an SFA/CRM tool that provides sales support to over 5,500 companies in Japan. We provide our sales customers […]
Stream data with Amazon MSK Connect using an open-source JDBC connector
Customers are adopting Amazon Managed Service for Apache Kafka (Amazon MSK) as a fast and reliable streaming platform to build their enterprise data hub. In addition to streaming capabilities, setting up Amazon MSK enables organizations to use a pub/sub model for data distribution with loosely coupled and independent components. To publish and distribute the data […]
Peloton embraces Amazon Redshift to unlock the power of data during changing times
If you want to learn how to make the most of your company’s data, especially during today’s uncertain times, don’t miss AWS Data Insights Day on May 24, 2023. Learn more. Credit: Phil GoldsteinJerry Wang, Peloton’s Director of Data Engineering (left), and Evy Kho, Peloton’s Manager of Subscription Analytics, discuss how the company has benefited from using […]