AWS Big Data Blog

Integrate Amazon MWAA with Microsoft Entra ID using SAML authentication

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) provides a fully managed solution for orchestrating and automating complex workflows in the cloud. Amazon MWAA offers two network access modes for accessing the Apache Airflow web UI in your environments: public and private. Customers often deploy Amazon MWAA in private mode and want to use existing login authentication mechanisms and single sign-on (SSO) features to have seamless integration with the corporate Active Directory (AD). Also, the end-users don’t need to log in to the AWS Management Console to access the Airflow UI.

In this post, we illustrate how to configure an Amazon MWAA environment deployed in private network access mode with customer managed VPC endpoints and authenticate users using SAML federated identity using Microsoft Entra ID and Application Load Balancer (ALB). Users can seamlessly log in to the Airflow UI with their corporate credentials and access the DAGs. This solution can be modified for Amazon MWAA public network access mode as well.

Solution overview

The architectural components involved in authenticating the Amazon MWAA environment using SAML SSO are depicted in the following diagram. The infrastructure components include two public subnets and three private subnets. The public subnets are required for the internet-facing ALB. Two private subnets are used to set up the Amazon MWAA environment, and the third private subnet is used to host the AWS Lambda authorizer function. This subnet will have a NAT gateway attached to it, because the function needs to verify the signer to confirm the JWT header has the expected LoadBalancer ARN.

The workflow consists of the following steps:

  1. For SAML configuration, Microsoft Entra ID serves as the identity provider (IdP).
  2. Amazon Cognito serves as the service provider (SP).
  3. ALB has built-in support for Amazon Cognito and authenticates requests.
  4. Post-authentication, ALB forwards the requests to the Lambda authorizer function. The Lambda function decodes the user’s JWT token and validates whether the user’s AD group is mapped to the relevant AWS Identity and Access Management (IAM) role.
  5. If valid, the function creates a web login token and redirects to the Amazon MWAA environment for successful login.

The following are the high-level steps to deploy the solution:

  1. Create an Amazon Simple Storage Service (Amazon S3) bucket for artifacts.
  2. Create an SSL certificate and upload it to AWS Certificate Manager (ACM).
  3. Deploy the Amazon MWAA infrastructure stack using AWS CloudFormation.
  4. Configure Microsoft Entra ID services and integrate the Amazon Cognito user pool.
  5. Deploy the ALB CloudFormation stack.
  6. Log in to Amazon MWAA using Microsoft Entra ID user credentials.

Prerequisites

Before you get started, make sure you have the following prerequisites:

  • An AWS account
  • Appropriate IAM permissions to deploy AWS CloudFormation stack resources
  • A Microsoft Azure account is required for creating the Microsoft Entra ID app (IdP config) and Microsoft Entra ID P2.
  • A public certificate for the ALB in the AWS Region where the infrastructure is being deployed and a custom domain name relevant to the certificate.

Create an S3 bucket

In this step, we create an S3 bucket to store your Airflow DAGs, custom plugins in a plugins.zip file, and Python dependencies in a requirements.txt file. This bucket is used by the Amazon MWAA environment to fetch DAGs and dependency files.

  1. On the Amazon S3 console, choose the Region where you want to create a bucket.
  2. In the navigation pane, choose Buckets.
  3. Choose Create bucket.
  4. For Bucket type, select General purpose.
  5. For Bucket name, enter a name for your bucket (for this post, mwaa-sso-blog-<your-aws-account-number>).
  6. Choose Create bucket. 



  7. Navigate to the bucket and choose Create folder.
  8. For Folder name, enter a name (for this post, we name the folder dags).
  9. Choose Create folder.


Import certificates into ACM

ACM is integrated with Elastic Load Balancing (ALB). In this step,  you can request a public certificate using ACM or import a certificate into ACM. To import organization certificates linked to a custom DNS into ACM, you must provide the certificate and its private key. To import a certificate signed by a non-AWS Certificate Authority (CA), you must also include the private and public keys of the certificate.

  1. On the ACM console, choose Import certificate in the navigation pane.
  2. For Certificate body, enter the contents of the cert.pem file.
  3. For Certificate private key, enter the contents of the privatekey.pem file.
  4. Choose Next.


  5. Choose Review and import.
  6. Review the metadata about your certificate and choose Import.

After the import is successful, the status of the imported certificate will show as Issued.

Create the Azure AD service, users, groups, and enterprise application

For the SSO integration with Azure, an enterprise application is required, which acts as the IdP for the SAML flow. We add relevant users and groups to the application and configure the SP (Amazon Cognito) details.

Airflow comes with five default roles: Public, Admin, Op, User, Viewer. In this post, we focus on three: Admin , User and Viewer. We create three roles and three corresponding users and assign memberships appropriately.

  1. Log in to the Azure portal.
  2. Navigate to Enterprise applications and choose New application.



  3. Enter a name for your application (for example, mwaa-environment) and choose Create.



    You can now view the details of your application.



    Now you create two groups.

  4. In the search bar, search for Microsoft Entra ID.

  5. On the Add menu, choose Group.

  6. For Group type, choose a type (for this post, Security).
  7. Enter a group name (for example, airflow-admins) and description.
  8. Choose Create.


  9. Repeat these steps to create two more groups, named airflow-users and airflow-viewers.
  10. Note the object IDs for each group (these are required in a later step).


    Next, you create users.
  11. On the Overview page, on the Add menu, choose User and Create new user.
  12. Enter a name for your user (for example, mwaa-user), display name, and password.
  13. Choose Review + create.


  14. Repeat these steps to create a user called mwaa-admin.
  15. In your airflow-users group details page, choose Members in the navigation pane.
  16. Choose Add members.
  17. Search for and select the users you created and choose Select.


  18. Repeat these steps to add the users to each group.

  19. Navigate to your application and choose Assign users and groups.

  20. Choose Add user/group.

  21. Search for and select the groups you created, then choose Select.

 

Deploy the Amazon MWAA environment stack

For this solution, we provide two CloudFormation templates that set up the services illustrated in the architecture. Deploying the CloudFormation stacks in your account incurs AWS usage charges.

The first CloudFormation stack creates the following resources:
  • A VPC with two public subnets and three private subnets and relevant route tables, NAT gateway, internet gateway, and security group
  • VPC endpoints required for the Amazon MWAA environment
  • An Amazon Cognito user pool and user pool domain
  • Application Load Balancer
Deploy the stack by completing the following steps:
  1. Choose Launch Stack to launch the CloudFormation stack.


  2. For Stack name, enter a name (for example, sso-blog-mwaa-infra-stack).

  3.  Enter the following parameters:

    1. For MWAAEnvironmentName, enter the environment name.

    2. For MwaaS3Bucket, enter the S3 artifacts bucket you created.

    3. For VpcCIDR, enter the specify IP range (CIDR notation) for this VPC.

    4. For PrivateSubnet1CIDR, enter the IP range (CIDR notation) for the private subnet in the first Availability Zone.

    5.  For PrivateSubnet2CIDR, enter the IP range (CIDR notation) for the private subnet in the second Availability Zone.

    6. For PrivateSubnet3CIDR, enter the IP range (CIDR notation) for the private subnet in the third Availability Zone.

    7. For PublicSubnet1CIDR, enter the IP range (CIDR notation) for the public subnet in the first Availability Zone.

    8. For PublicSubnet2CIDR, enter the IP range (CIDR notation) for the public subnet in the second Availability Zone.

  4. Choose Next

  5. Review the template and choose Create stack.

After the stack is deployed successfully, you can view the resources on the stack’s Outputs tab on the AWS CloudFormation console. Note the ALB URL, Amazon Cognito user pool ID, and domain.

 

Integrate the Amazon MWAA application with the Azure enterprise application

Next, you configure the SAML configuration in the enterprise application by adding the SP details and redirect URLs (in this case, the Amazon Cognito details and ALB URL).

  1. In the Azure portal, navigate to your environment.
  2. Choose Set up single sign on.


  3. For Identifier, enter urn:amazon:cognito:sp:<your cognito user_id>.
  4. For Reply URL, enter https://<Your user pool domain>/saml2/idpresponse.
  5. For Sign on URL, enter https://<Your application load balancer DNS>.
  6. In the Attributes & Claims section, choose Add a group claim.
  7. Select Security groups.
  8. For Source attribute, choose Group ID.
  9. Choose Save.
  10. Note the values for App Federation Metadata Url and Login URL.


Deploy the ALB stack

When the SAML configuration is complete on the Azure end, the IdP details have to be configured in Amazon Cognito. When users access the ALB URL, they will be authenticated against the corporate identity using SAML through Amazon Cognito. After they’re authenticated, they’re redirected to the Lambda function for authorization against the group they belong to. The user’s group is then validated against matching IAM role. If it’s valid, the Lambda function adds the web login token to the URL, and the user will gain access to the Amazon MWAA environment.

This CloudFormation stack creates the following resources:

  • Two target groups: the Lambda target group and Amazon MWAA target group
  • Listener rules for the ALB to redirect URL requests to the relevant target groups
  • A user pool client and SAML provider (Azure) details to the Amazon Cognito user pool
  • IAM roles for Admin, User, and Viewer personas required for Airflow
  • The Lambda authorizer function to validate the JWT token and map Azure groups to IAM roles for appropriate Airflow UI access

Deploy the stack by completing the following steps:

  1. Choose Launch Stack to launch the CloudFormation stack:


  2. For Stack name, enter a name (for example, sso-blog-mwaa-alb-stack).

  3. Enter the following parameters:

    1. For MWAAEnvironmentName, enter your environment name.

    2. For ALBCertificateArn, enter the certificate ARN required for ALB. 

    3. For AzureAdminGroupID, enter the group name for the Azure Admin persona.

    4. For AzureUserGroupID, enter the group name for the Azure User persona.

    5. For AzureViewerGroupID, enter the group name for the Azure Viewer persona.

    6. For EntraIDLoginURL, enter the Azure IdP URI.

    7. For AppFederationMetadataURL, enter the URL of the metadata file for the SAML provider. 

  4. Choose Next.

  5. Review the template and choose Create stack.

Test the solution

Now that the SAML configuration and relevant AWS services are created, it’s time to access the Amazon MWAA environment.

  1. Open your web browser and enter the ALB DNS name.
    The SP initiates the sign-in request process and the browser redirects you to the Microsoft login page for credentials.

  2. Enter the Admin user credentials.

    The SAML request sign-in process completes and the SAML response is redirected to the Amazon Cognito user pool attached to the ALB.

    The listener rules will validate the query URL and pass the requests to the Lambda authorizer to validate the JWT and assign the appropriate group (Azure) to role (AWS) mapping.


  3. Repeat the steps to log in with User and Viewer credentials and observe the differences in access.

Clean up

When you’re done experimenting with this solution, it’s essential to clean up your resources to avoid incurring AWS charges.

  1. On the AWS CloudFormation console, delete the stacks you created.
  2. Remove the SSM parameters and private webserver and database VPC endpoints (created by the Lambda events function):
    aws ssm delete-parameters --names "MyFirstParameter" "MySecondParameter"
    aws ec2 delete-vpc-endpoints --vpc-endpoint-ids "Endpoint1" "Endpoint2"
  3. Delete the users, groups, and enterprise application in the Azure environment.

Conclusion

In this post, we demonstrated how to integrate Amazon MWAA with organization Azure AD services. We walked through the solution that solves this problem using infrastructure as code. This solution allows different end-user personas in your organization to access the Amazon MWAA Airflow UI using SAML SSO.

For additional details and code examples for Amazon MWAA, visit the Amazon MWAA User Guide and the Amazon MWAA examples GitHub repo.


About the Authors

Satya Chikkala is a Solutions Architect at Amazon Web Services. Based in Melbourne, Australia, he works closely with enterprise customers to accelerate their cloud journey. Beyond work, he is very passionate about nature and photography.

Vijay Velpula is a Data Lake Architect with AWS Professional Services. He assists customers in building modern data platforms by implementing big data and analytics solutions. Outside of his professional responsibilities, Velpula enjoys spending quality time with his family, as well as indulging in travel, hiking, and biking activities.