AWS Big Data Blog

Category: AWS Data Pipeline*

Introducing On-Demand Pipeline Execution in AWS Data Pipeline

Marc Beitchman is a Software Development Engineer in the AWS Database Services team Now it is possible to trigger activation of pipelines in AWS Data Pipeline using the new on-demand schedule type. You can access this functionality through the existing AWS Data Pipeline activation API. On-demand schedules make it easy to integrate pipelines in AWS […]

Read More

Using AWS Lambda for Event-driven Data Processing Pipelines

awVadim Astakhov is a Solutions Architect with AWS Some big data customers want to analyze new data in response to a specific event, and they might already have well-defined pipelines to perform batch processing, orchestrated by AWS Data Pipeline. One example of event-triggered pipelines is when data analysts must analyze data as soon as it […]

Read More

Automating Analytic Workflows on AWS

Wangechi Doble is a Solutions Architect with AWS Organizations are experiencing a proliferation of data. This data includes logs, sensor data, social media data, and transactional data, and resides in the cloud, on premises, or as high-volume, real-time data feeds. It is increasingly important to analyze this data: stakeholders want information that is timely, accurate, […]

Read More

How Coursera Manages Large-Scale ETL using AWS Data Pipeline and Dataduct

This is a guest post by Sourabh Bajaj, a Software Engineer at Coursera. Coursera in their own words: “Coursera is an online educational startup with over 14 million learners across the globe. We offer more than 1000 courses from over 120 top universities.” At Coursera, we use Amazon Redshift as our primary data warehouse because […]

Read More

Using AWS Data Pipeline’s Parameterized Templates to Build Your Own Library of ETL Use-case Definitions

Leena Joseph is an SDE for AWS Data Pipeline In an earlier post, we introduced you to ETL processing using AWS Data Pipeline and Amazon EMR. This post shows how to build ETL workflow templates with AWS Data Pipeline, and build a library of recipes to implement common use cases. This is an introduction to […]

Read More

ETL Processing Using AWS Data Pipeline and Amazon Elastic MapReduce

Manjeet Chayel is an AWS Solutions Architect This blog post shows you how to build an ETL workflow that uses AWS Data Pipeline to schedule an Amazon Elastic MapReduce (Amazon EMR) cluster to clean and process web server logs stored in an Amazon Simple Storage Service (Amazon S3) bucket. AWS Data Pipeline is an ETL […]

Read More