AWS Data Pipeline is a web service that provides a simple management system for data-driven workflows. Using AWS Data Pipeline, you define a pipeline composed of the “data sources” that contain your data, the “activities” or business logic such as EMR jobs or SQL queries, and the “schedule” on which your business logic executes. For example, you could define a job that, every hour, runs an Amazon Elastic MapReduce (Amazon EMR)–based analysis on that hour’s Amazon Simple Storage Service (Amazon S3) log data, loads the results into a relational database for future lookup, and then automatically sends you a daily summary email.

AWS Data Pipeline handles:

  • Your jobs' scheduling, execution, and retry logic
  • Tracking the dependencies between your business logic, datasources, and previous processing steps to ensure that your logic does not run until all of its dependencies are met
  • Sending any necessary failure notifications
  • Creating and managing any temporary compute resources your jobs may require

Get Started with AWS for Free

Create a Free Account
Or Sign In to the Console

AWS Free Tier includes 3 Low Frequency Preconditions and 5 Low Frequency Activities with AWS Data Pipeline.

View AWS Free Tier Details »

To ensure that data is available prior to the execution of an activity, AWS Data Pipeline allows you to optionally create data availability checks called “preconditions.” These checks will repeatedly attempt to verify data availability and will block any dependent activities from executing until the preconditions succeed.

To use AWS Data Pipeline, you simply:

  • Use the AWS Management Console, Command Line Interface, or the service APIs to define your data sources, preconditions, activities, the schedule on which you want them to execute, and any optional notification conditions
  • Receive configurable, automatic notifications if your data doesn’t become available when expected or if your activities encounter errors

You can find (and use) a variety of popular AWS Data Pipeline tasks in the AWS Management Console’s template section. These tasks include:

  • Hourly analysis of Amazon S3‐based log data
  • Daily replication of AmazonDynamoDB data to Amazon S3
  • Periodic replication of on-premises JDBC database tables into RDS

For more information, see the AWS Data Pipeline Developer Guide.

Your use of this service is subject to the Amazon Web Services Customer Agreement.