Data Pipeline Roles, Tags, and Security Improvements Now Available

Posted on: Feb 23, 2015

You can now assign an Amazon EMR service role and an Amazon EC2 instance profile to an EMR cluster defined in a pipeline, giving you the ability to limit the overall permissions of the EMR cluster. For example, you can control the access that the EMR service has to communicate on your behalf with other AWS services like EC2 or S3. To use this feature as an existing Data Pipeline customer, you will need to opt-in from the Data Pipeline console. To learn more about assigning an Identity and Access Management (IAM) role to an EMR cluster, visit the documentation.

You can also now control access to pipelines across IAM users within the same account, in order to develop and maintain pipelines collaboratively. Given authorization, you can now view, edit and activate pipelines created by other IAM users in the same account. To learn more, visit the documentation.

You can now tag your pipelines and use tags to segment pipelines. These tags will be propagated to the EMR clusters and EC2 instances launched by the pipeline. For example, you can add details such as owner, workflow, or cost center as tags, and use this information to segment your billing statements and associate resource costs, as appropriate. To learn more about adding and managing tags, visit the documentation.

As an existing Data Pipeline customer, before you can assign IAM roles to EMR clusters, grant access to pipelines across IAM users or tag your pipelines, you will need to opt-in to using these features from the Data Pipeline console.

Lastly, server side encryption (SSE) is now enabled by default for all data you store in an S3DataNode in your pipeline. To learn more about this feature in Data Pipeline, please refer to the S3DataNode documentation.