AWS Big Data Blog

Load CDC data by table and shape using Amazon Kinesis Data Firehose Dynamic Partitioning

Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores, and analytics services. Customers already use Amazon Kinesis Data Firehose to ingest raw data from various data sources using direct API call or by integrating Kinesis Data Firehose with Amazon Kinesis Data Streams including “change data capture” […]

SEEK Asia modernizes search with CI/CD and Amazon OpenSearch Service

This post was written in collaboration with Abdulsalam Alshallah (Salam), Software Architect, and Hans Roessler, Principal Software Engineer at SEEK Asia. SEEK is a market leader in online employment marketplaces with deep and rich insights into the future of work. As a global business, SEEK has a presence in Australia, New Zealand, Hong Kong, Southeast Asia, Brazil and Mexico and its websites attract over 400 million visits per year. SEEK Asia’s business operates across seven countries and includes leading portal brands such as and and leverages data and technology to create innovative solutions for candidates and hirers.

In this post, we share how SEEK Asia modernized their search-based system with a continuous integration and continuous delivery (CI/CD) pipeline and Amazon OpenSearch Service.

Visualize live analytics from Amazon QuickSight connected to Amazon OpenSearch Service

Live analytics refers to the process of preparing and measuring data as soon as it enters the database or persistent store. In other words, you get insights or arrive at conclusions immediately. Live analytics enables businesses to respond to events without delay. You can seize opportunities or prevent problems before they happen. Speed is the […]

Use unsupervised training with K-means clustering in Amazon Redshift ML

Amazon Redshift is a fast, petabyte-scale cloud data warehouse delivering the best price–performance. Tens of thousands of customers use Amazon Redshift to process exabytes of data every day to power their analytics workloads. Data analysts and database developers want to use this data to train machine learning (ML) models, which can then be used to […]

Run queries 3x faster with up to 70% cost savings on the latest Amazon Athena engine

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon Simple Storage Service (Amazon S3) using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. In November 2020, Athena announced the General Availability of the V2 […]

Lucerna Health uses Amazon QuickSight embedded analytics to help healthcare customers uncover new insights

This is a guest post by Lucerna Health. Founded in 2018, Lucerna Health is a data technology company that connects people and data to deliver value-based care (VBC) results and operational transformation. At Lucerna Health, data is at the heart of our business. Every day, we use clinical, sales, and operational data to help healthcare […]

Integrate Etleap with Amazon Redshift Streaming Ingestion (preview) to make data available in seconds

Amazon Redshift is a fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using SQL and your extract, transform, and load (ETL), business intelligence (BI), and reporting tools. Tens of thousands of customers use Amazon Redshift to process exabytes of data per day and power analytics workloads. Etleap […]

Announcing Amazon EMR Serverless (Preview): Run big data applications without managing servers

Today we’re happy to announce Amazon EMR Serverless, a new option in Amazon EMR that makes it easy and cost-effective for data engineers and analysts to run petabyte-scale data analytics in the cloud. With EMR Serverless, you can run applications built using open-source frameworks such as Apache Spark and Hive without having to configure, manage, […]

The following diagram shows our solution architecture.

Effective data lakes using AWS Lake Formation, Part 2: Creating a governed table for streaming data sources

February 2023: The content of this blog post can be now be found on AWS Lake Formation public documentation. Please refer to it instead. We announced the general availability of AWS Lake Formation transactions, row-level security, and acceleration at AWS re:Invent 2021. In Part 1 of this series, we explained how to set up a […]

Amazon QuickSight: 2021 in review

With AWS re:Invent just around the corner, we at the Amazon QuickSight team have put together this post to provide you with a handy list of all the key updates this year. We’ve broken this post into three key sections: insights for every user, embedded analytics with QuickSight, scaling and governance. Insights for every user […]