AWS Big Data Blog

Securely Access Web Interfaces on Amazon EMR Launched in a Private Subnet

Ben Snively is a Solutions Architect with AWS Private subnets allow you to limit access to deployed components, and to control security and routing of the system. You can also use a private subnet to connect an on-premises local network to AWS through a VPN or AWS Direct Connect.  Amazon EMR allows customers to launch […]

Read More

Performance Tuning Your Titan Graph Database on AWS

At AWS re:Invent 2017, we announced the preview of Amazon Neptune, a fast and reliable graph database built for the cloud. Neptune is fully managed and highly available, and it includes read replicas, point-in-time recovery, and continuous backups to Amazon S3. If you are about to build an application yourself and need a graph database, […]

Read More

Top 10 Performance Tuning Techniques for Amazon Redshift

Ian Meyers is a Solutions Architecture Senior Manager with AWS Zach Christopherson, an Amazon Redshift Database Engineer, contributed to this post Amazon Redshift is a fully managed, petabyte scale, massively parallel data warehouse that offers simple operations and high performance. Customers use Amazon Redshift for everything from accelerating existing database environments that are struggling to […]

Read More

Migrating Metadata when Encrypting an Amazon Redshift Cluster

NOTE: The information in this blog post is now outdated. For the most current information, please visit https://aws.amazon.com/about-aws/whats-new/2018/10/encrypt-amazon-redshift-1-click/ ————————————— John Loughlin is a Solutions Architect with Amazon Web Services A customer came to us asking for help expanding and modifying their Amazon Redshift cluster. In the course of responding to their request, we made use […]

Read More

Big Data AWS Training Course Gets Big Update

Michael Stroh is Communications Manager for AWS Training & Certification AWS offers a number of in-depth technical training courses, which we’re regularly updating in response to student feedback and changes to the AWS platform. Today I want to tell you about some exciting changes to Big Data on AWS, our most comprehensive training course on […]

Read More

Building a Near Real-Time Discovery Platform with AWS

Assaf Mentzer is a Senior Consultant for AWS Professional Services In the spirit of the U.S presidential election of 2016, in this post I use Twitter public streams to analyze the candidates’ performance, both Republican and Democrat, in a near real-time fashion. I show you how to integrate AWS managed services—Amazon Kinesis Firehose, AWS Lambda […]

Read More

Building a Recommendation Engine with Spark ML on Amazon EMR using Zeppelin

Guy Ernest is a Solutions Architect with AWS Many developers want to implement the famous Amazon model that was used to power the “People who bought this also bought these items” feature on Amazon.com. This model is based on a method called Collaborative Filtering. It takes items such as movies, books, and products that were […]

Read More

Using AWS Lambda for Event-driven Data Processing Pipelines

awVadim Astakhov is a Solutions Architect with AWS Some big data customers want to analyze new data in response to a specific event, and they might already have well-defined pipelines to perform batch processing, orchestrated by AWS Data Pipeline. One example of event-triggered pipelines is when data analysts must analyze data as soon as it […]

Read More