AWS Big Data Blog

Category: Analytics

Running an External Zeppelin Instance using S3 Backed Notebooks with Spark on Amazon EMR

Dominic Murphy is an Enterprise Solution Architect with Amazon Web Services Apache Zeppelin is an open source GUI which creates interactive and collaborative notebooks for data exploration using Spark. You can use Scala, Python, SQL (using Spark SQL), or HiveQL to manipulate data and quickly visualize results. Zeppelin notebooks can be shared among several users, […]

Read More

Query Routing and Rewrite: Introducing pgbouncer-rr for Amazon Redshift and PostgreSQL

NOTE: You can now use federated queries in Amazon Redshift to query and analyze data across operational databases, data warehouses, and data lakes. For more information, please review the Amazon Redshift documentation article, “Querying Data with Federated Query in Amazon Redshift.” ———————————- Bob Strahan is a senior consultant with AWS Professional Services. Have you ever […]

Read More

Securely Access Web Interfaces on Amazon EMR Launched in a Private Subnet

Ben Snively is a Solutions Architect with AWS Private subnets allow you to limit access to deployed components, and to control security and routing of the system. You can also use a private subnet to connect an on-premises local network to AWS through a VPN or AWS Direct Connect.  Amazon EMR allows customers to launch […]

Read More

Migrating Metadata when Encrypting an Amazon Redshift Cluster

NOTE: Amazon Redshift now supports enabling and disabling encryption with 1-click. For more information, please review this “What’s New” post. ————————————— John Loughlin is a Solutions Architect with Amazon Web Services. A customer came to us asking for help expanding and modifying their Amazon Redshift cluster. In the course of responding to their request, we […]

Read More

Building a Near Real-Time Discovery Platform with AWS

Assaf Mentzer is a Senior Consultant for AWS Professional Services In the spirit of the U.S presidential election of 2016, in this post I use Twitter public streams to analyze the candidates’ performance, both Republican and Democrat, in a near real-time fashion. I show you how to integrate AWS managed services—Amazon Kinesis Firehose, AWS Lambda […]

Read More

Using AWS Lambda for Event-driven Data Processing Pipelines

awVadim Astakhov is a Solutions Architect with AWS Some big data customers want to analyze new data in response to a specific event, and they might already have well-defined pipelines to perform batch processing, orchestrated by AWS Data Pipeline. One example of event-triggered pipelines is when data analysts must analyze data as soon as it […]

Read More

Persist Streaming Data to Amazon S3 using Amazon Kinesis Firehose and AWS Lambda

Derek Graeber is a Senior Consultant in Big Data Analytics for AWS Professional Services Streaming data analytics is becoming main-stream (pun intended) in large enterprises as the technology stacks have become more user-friendly to implement. For example, Spark-Streaming connected to an Amazon Kinesis stream is a typical model for real-time analytics. But one area that […]

Read More

Automating Analytic Workflows on AWS

Wangechi Doble is a Solutions Architect with AWS Organizations are experiencing a proliferation of data. This data includes logs, sensor data, social media data, and transactional data, and resides in the cloud, on premises, or as high-volume, real-time data feeds. It is increasingly important to analyze this data: stakeholders want information that is timely, accurate, […]

Read More