AWS Big Data Blog

Introducing On-Demand Pipeline Execution in AWS Data Pipeline

Marc Beitchman is a Software Development Engineer in the AWS Database Services team

Now it is possible to trigger activation of pipelines in AWS Data Pipeline using the new on-demand schedule type. You can access this functionality through the existing AWS Data Pipeline activation API. On-demand schedules make it easy to integrate pipelines in AWS Data Pipeline with other AWS services and with on-premise orchestration engines.

For example, you can build AWS Lambda functions to activate an AWS Data Pipeline execution in response to AWS CloudWatch cron expression events or AWS S3 event notifications. You can also invoke the AWS Data Pipeline activation API directly from the AWS CLI and SDK.

To get started, create a new pipeline and use the default object to specify a property of ‘scheduleType”:”ondemand”. Setting this parameter enables on-demand activation of the pipeline.

Note: Activating a running on-demand pipeline has no impact on a pipeline that is in the running state. The pipeline can only be activated from the pending, scheduled, or finished state.

Below is a simple example of a default object configured for on-demand activation.

 "id": "Default",
 "scheduleType": "ondemand"      

The screen shot below shows an on-demand pipeline with two Hadoop activities. The pipeline has been run three times.

Check out our samples in the AWS Data Pipeline samples Github repository. These samples show you how to create an AWS Lambda function that triggers an on-demand pipeline activation in response to CreateObject (new file) events in S3 and how to trigger an on-demand pipeline activation in response to AWS CloudWatch cron expression events.

If you have questions or suggestions, please leave a comment below.



How Coursera Manages Large-Scale ETL using AWS Data Pipeline and Dataduct


Looking to learn more about Big Data or Streaming Data? Check out our Big Data and Streaming data educational pages.