AWS Big Data Blog

Upgrade your resume with the AWS Certified Big Data — Specialty Certification

The AWS Certified Big Data — Specialty certification is a great option to help grow your career. AWS Certification shows prospective employers that you have the technical skills and expertise required to perform complex data analyses using core AWS Big Data services like Amazon EMR, Amazon Redshift, Amazon QuickSight, and more. This certification validates your understanding of data collection, storage, processing, analysis, visualization, and security.

Read More

Secure your Amazon EMR cluster from unintentional network exposure with Block Public Access configuration

AWS security groups act as a network firewall that allows you to control access to your cluster to only whitelisted IP addresses. Proper management of security groups rules is critical to protect your application and data on the cluster. Amazon EMR strongly recommends creating restrictive security group rules that include the necessary network ports, protocols, […]

Read More

Create advanced insights using Level Aware Aggregations in Amazon QuickSight

Amazon QuickSight recently launched Level Aware Aggregations (LAA), which enables you to perform calculations on your data to derive advanced and meaningful insights. In this blog post, we go through examples of applying these calculations to a sample sales dataset so that you can start using these for your own needs.

Read More

Implement perimeter security in Amazon EMR using Apache Knox

Perimeter security helps secure Apache Hadoop cluster resources to users accessing from outside the cluster. It enables a single access point for all REST and HTTP interactions with Apache Hadoop clusters and simplifies client interaction with the cluster. For example, client applications must acquire Kerberos tickets using Kinit or SPNEGO before interacting with services on Kerberos enabled clusters. In this post, we walk through setup of Apache Knox to enable perimeter security for EMR clusters.

Read More

Extract Oracle OLTP data in real time with GoldenGate and query from Amazon Athena

This post describes how you can improve performance and reduce costs by offloading reporting workloads from an online transaction processing (OLTP) database to Amazon Athena and Amazon S3. The architecture described allows you to implement a reporting system and have an understanding of the data that you receive by being able to query it on arrival.

Read More

Automate Amazon Redshift cluster creation using AWS CloudFormation

In this post, I explain how to automate the deployment of an Amazon Redshift cluster in an AWS account. AWS best practices for security and high availability drive the cluster’s configuration, and you can create it quickly by using AWS CloudFormation. I walk you through a set of sample CloudFormation templates, which you can customize as per your needs.

Read More

How to migrate a large data warehouse from IBM Netezza to Amazon Redshift with no downtime

In this article, we explain how this customer performed a large-scale data warehouse migration from IBM Netezza to Amazon Redshift without downtime, by following a thoroughly planned migration process, and leveraging AWS Schema Conversion Tool (SCT) and Amazon Redshift best practices.

Read More

Perform biomedical informatics without a database using MIMIC-III data and Amazon Athena

This post describes how to make the MIMIC-III dataset available in Athena and provide automated access to an analysis environment for MIMIC-III on AWS. We also compare a MIMIC-III reference bioinformatics study using a traditional database to that same study using Athena.

Read More