AWS Big Data Blog
Introducing On-Demand Pipeline Execution in AWS Data Pipeline
February 2023 Update: Console access to the AWS Data Pipeline service will be removed on April 30, 2023. On this date, you will no longer be able to access AWS Data Pipeline though the console. You will continue to have access to AWS Data Pipeline through the command line interface and API. Please note that AWS Data Pipeline service is in maintenance mode and we are not planning to expand the service to new regions. For information about migrating from AWS Data Pipeline, please refer to the AWS Data Pipeline migration documentation. |
Now it is possible to trigger activation of pipelines in AWS Data Pipeline using the new on-demand schedule type. You can access this functionality through the existing AWS Data Pipeline activation API. On-demand schedules make it easy to integrate pipelines in AWS Data Pipeline with other AWS services and with on-premise orchestration engines.
For example, you can build AWS Lambda functions to activate an AWS Data Pipeline execution in response to AWS CloudWatch cron expression events or AWS S3 event notifications. You can also invoke the AWS Data Pipeline activation API directly from the AWS CLI and SDK.
To get started, create a new pipeline and use the default object to specify a property of ‘scheduleType”:”ondemand”. Setting this parameter enables on-demand activation of the pipeline.
Note: Activating a running on-demand pipeline has no impact on a pipeline that is in the running state. The pipeline can only be activated from the pending, scheduled, or finished state.
Below is a simple example of a default object configured for on-demand activation.
{ "id": "Default", "scheduleType": "ondemand" }
The screen shot below shows an on-demand pipeline with two Hadoop activities. The pipeline has been run three times.
Check out our samples in the AWS Data Pipeline samples Github repository. These samples show you how to create an AWS Lambda function that triggers an on-demand pipeline activation in response to CreateObject (new file) events in S3 and how to trigger an on-demand pipeline activation in response to AWS CloudWatch cron expression events.
If you have questions or suggestions, please leave a comment below.
Related:
How Coursera Manages Large-Scale ETL using AWS Data Pipeline and Dataduct
Looking to learn more about Big Data or Streaming Data? Check out our Big Data and Streaming data educational pages.
About the author
Marc Beitchman is a Software Development Engineer in the AWS Database Services team