AWS Big Data Blog

Persist Streaming Data to Amazon S3 using Amazon Kinesis Firehose and AWS Lambda

Derek Graeber is a Senior Consultant in Big Data Analytics for AWS Professional Services Streaming data analytics is becoming main-stream (pun intended) in large enterprises as the technology stacks have become more user-friendly to implement. For example, Spark-Streaming connected to an Amazon Kinesis stream is a typical model for real-time analytics. But one area that […]

Read More

Automating Analytic Workflows on AWS

Wangechi Doble is a Solutions Architect with AWS Organizations are experiencing a proliferation of data. This data includes logs, sensor data, social media data, and transactional data, and resides in the cloud, on premises, or as high-volume, real-time data feeds. It is increasingly important to analyze this data: stakeholders want information that is timely, accurate, […]

Read More

AWS Big Data Analytics Sessions at re:Invent 2015

Roy Ben-Alta is a Business Development Manager – Big Data & Analytics If you will be attending re:Invent 2015 in Las Vegas next week, you know that you’ll have many opportunities to learn more about Big Data & Analytics on AWS at the conference–and this year we have over 20 sessions! The following breakout sessions compose this […]

Read More

How Coursera Manages Large-Scale ETL using AWS Data Pipeline and Dataduct

This is a guest post by Sourabh Bajaj, a Software Engineer at Coursera. Coursera in their own words: “Coursera is an online educational startup with over 14 million learners across the globe. We offer more than 1000 courses from over 120 top universities.” At Coursera, we use Amazon Redshift as our primary data warehouse because […]

Read More

Scaling Writes on Amazon DynamoDB Tables with Global Secondary Indexes

Ian Meyers is a Solutions Architecture Senior Manager with AWS Amazon DynamoDB is a fast, flexible, and fully managed NoSQL database service that supports both document and key-value store models that need consistent, single-digit millisecond latency at any scale. In this post, we discuss a technique that can be used with DynamoDB to ensure virtually […]

Read More

Introduction to Python UDFs in Amazon Redshift

Christopher Crosbie is a Healthcare and Life Science Solutions Architect with Amazon Web Services When your doctor takes out a prescription pad at your yearly checkup, do you ever stop to wonder what goes into her thought process as she decides on which drug to scribble down? We assume that journals of scientific evidence coupled […]

Read More

Using BlueTalon with Amazon EMR

This is a guest post by Pratik Verma, Founder and Chief Product Officer at BlueTalon. Leonid Fedotov, Senior Solution Architect at BlueTalon, also contributed to this post. Amazon Elastic MapReduce (Amazon EMR) makes it easy to quickly and cost-effectively process vast amounts of data in the cloud. EMR gets used for log, financial, fraud, and […]

Read More

Integrating Amazon Kinesis, Amazon S3 and Amazon Redshift with Cascading on Amazon EMR

This is a guest post by Ryan Desmond, Solutions Architect at Concurrent. Concurrent is an AWS Advanced Technology Partner. With Amazon Kinesis developers can quickly store, collate and access large, distributed data streams such as access logs, click streams and IoT data in real-time. The question then becomes, how can we access and leverage this […]

Read More