AWS Big Data Blog

Configure alerts of high CPU usage in applications using Amazon OpenSearch Service anomaly detection: Part 1

Amazon OpenSearch Service is a fully managed service that makes it easy to deploy, secure, and run Elasticsearch cost-effectively at scale. Amazon OpenSearch Service supports many use cases, including application monitoring, search, security information and event management (SIEM), and infrastructure monitoring. Amazon OpenSearch Service also offers a rich set of functionalities such as UltraWarm, fine-grained […]

Load CDC data by table and shape using Amazon Kinesis Data Firehose Dynamic Partitioning

February 9, 2024: Amazon Kinesis Data Firehose has been renamed to Amazon Data Firehose. Read the AWS What’s New post to learn more. Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores, and analytics services. Customers already use Amazon Kinesis Data Firehose to ingest raw data […]

SEEK Asia modernizes search with CI/CD and Amazon OpenSearch Service

This post was written in collaboration with Abdulsalam Alshallah (Salam), Software Architect, and Hans Roessler, Principal Software Engineer at SEEK Asia. SEEK is a market leader in online employment marketplaces with deep and rich insights into the future of work. As a global business, SEEK has a presence in Australia, New Zealand, Hong Kong, Southeast Asia, Brazil and Mexico and its websites attract over 400 million visits per year. SEEK Asia’s business operates across seven countries and includes leading portal brands such as jobsdb.com and jobstreet.com and leverages data and technology to create innovative solutions for candidates and hirers.

In this post, we share how SEEK Asia modernized their search-based system with a continuous integration and continuous delivery (CI/CD) pipeline and Amazon OpenSearch Service.

Visualize live analytics from Amazon QuickSight connected to Amazon OpenSearch Service

Live analytics refers to the process of preparing and measuring data as soon as it enters the database or persistent store. In other words, you get insights or arrive at conclusions immediately. Live analytics enables businesses to respond to events without delay. You can seize opportunities or prevent problems before they happen. Speed is the […]

Use unsupervised training with K-means clustering in Amazon Redshift ML

June 2023: This post was reviewed and updated for accuracy. Amazon Redshift is a fast, petabyte-scale cloud data warehouse delivering the best price–performance. Tens of thousands of customers use Amazon Redshift to process exabytes of data every day to power their analytics workloads. Data analysts and database developers want to use this data to train […]

Run queries 3x faster with up to 70% cost savings on the latest Amazon Athena engine

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon Simple Storage Service (Amazon S3) using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. In November 2020, Athena announced the General Availability of the V2 […]

Lucerna Health uses Amazon QuickSight embedded analytics to help healthcare customers uncover new insights

This is a guest post by Lucerna Health. Founded in 2018, Lucerna Health is a data technology company that connects people and data to deliver value-based care (VBC) results and operational transformation. At Lucerna Health, data is at the heart of our business. Every day, we use clinical, sales, and operational data to help healthcare […]

Integrate Etleap with Amazon Redshift Streaming Ingestion (preview) to make data available in seconds

Amazon Redshift is a fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using SQL and your extract, transform, and load (ETL), business intelligence (BI), and reporting tools. Tens of thousands of customers use Amazon Redshift to process exabytes of data per day and power analytics workloads. Etleap […]

Announcing Amazon EMR Serverless (Preview): Run big data applications without managing servers

Today we’re happy to announce Amazon EMR Serverless, a new option in Amazon EMR that makes it easy and cost-effective for data engineers and analysts to run petabyte-scale data analytics in the cloud. With EMR Serverless, you can run applications built using open-source frameworks such as Apache Spark and Hive without having to configure, manage, […]

The following diagram shows our solution architecture.

Effective data lakes using AWS Lake Formation, Part 2: Creating a governed table for streaming data sources

February 2023: The content of this blog post can be now be found on AWS Lake Formation public documentation. Please refer to it instead. We announced the general availability of AWS Lake Formation transactions, row-level security, and acceleration at AWS re:Invent 2021. In Part 1 of this series, we explained how to set up a […]