AWS Big Data Blog

Extract Oracle OLTP data in real time with GoldenGate and query from Amazon Athena

This post describes how you can improve performance and reduce costs by offloading reporting workloads from an online transaction processing (OLTP) database to Amazon Athena and Amazon S3. The architecture described allows you to implement a reporting system and have an understanding of the data that you receive by being able to query it on arrival.

Read More

Automate Amazon Redshift cluster creation using AWS CloudFormation

In this post, I explain how to automate the deployment of an Amazon Redshift cluster in an AWS account. AWS best practices for security and high availability drive the cluster’s configuration, and you can create it quickly by using AWS CloudFormation. I walk you through a set of sample CloudFormation templates, which you can customize as per your needs.

Read More

How to migrate a large data warehouse from IBM Netezza to Amazon Redshift with no downtime

In this article, we explain how this customer performed a large-scale data warehouse migration from IBM Netezza to Amazon Redshift without downtime, by following a thoroughly planned migration process, and leveraging AWS Schema Conversion Tool (SCT) and Amazon Redshift best practices.

Read More

Perform biomedical informatics without a database using MIMIC-III data and Amazon Athena

This post describes how to make the MIMIC-III dataset available in Athena and provide automated access to an analysis environment for MIMIC-III on AWS. We also compare a MIMIC-III reference bioinformatics study using a traditional database to that same study using Athena.

Read More

Discovering metadata with AWS Lake Formation: Part 1

In this post, you will create and edit your first data lake using the Lake Formation. You will use the service to secure and ingest data into an S3 data lake, catalog the data, and customize the metadata of the data sources. In part 2 of this series, we will show you how to discover your data by using the metadata search capabilities of Lake Formation.

Read More

Integrate and deduplicate datasets using AWS Lake Formation FindMatches

AWS Lake Formation FindMatches is a new machine learning (ML) transform that enables you to match records across different datasets as well as identify and remove duplicate records, with little to no human intervention. FindMatches is part of Lake Formation, a new AWS service that helps you build a secure data lake in a few simple steps.

To use FindMatches, you don’t have to write code or know how ML works. Your data doesn’t have to include a unique identifier, nor must fields match exactly.

Read More

Analyzing AWS WAF logs with Amazon ES, Amazon Athena, and Amazon QuickSight

This post presents a simple approach to aggregating AWS WAF logs into a central data lake repository, which lets teams better analyze and understand their organization’s security posture. I walk through the steps to aggregate regional AWS WAF logs into a dedicated S3 bucket. I follow that up by demonstrating how you can use Amazon ES to visualize the log data. I also present an option to offload and process historical data using AWS Glue ETL. With the data collected in one place, I finally show you how you can use Amazon Athena and Amazon QuickSight to query historical data and extract business insights.

Read More