How do I install libraries in my Amazon MWAA environment?

6 minute read
0

I want to install libraries in my Amazon Managed Workflows for an Apache Airflow (Amazon MWAA) environment.

Short description

Use the requirements.txt and plugins.zip files to install Python libraries in Amazon MWAA. If you use the requirements.txt file to install packages, then the packages are installed from the Python Package Index (from the PyPI website) by default. If you ship libraries (.whl files) with compiled artifacts, then use the plugins.zip file to install these Python wheels.

For Amazon MWAA versions 2.2.2 and 2.4.3, outbound internet access isn't available on the private web server option. Amazon MWAA installs requirements and plugins on the web server in the Amazon MWAA service virtual private cloud (VPC). These requirements include a NAT gateway with outbound internet access. However, for earlier Amazon MWAA versions (2.0.2 and 1.10.12), requirements and plugins aren't installed on the web server by default.

For Amazon MWAA versions 2.2.2 and 2.4.3, the Amazon MWAA local runner provides a command line utility to download and package Python dependencies (.whl) and requirements in the plugins.zip file.

For all versions of Amazon MWAA, you can use the plugins.zip file to install custom Apache Airflow operations, hooks, sensors, or interfaces. Plugins export environment variables, authentication, and config files such as, .crt and .yaml.

Resolution

Install libraries using .whl on environments with a private web server (Amazon MWAA versions 2.2.2 and 2.4.3)

Set up your Amazon MWAA local environment

1.    Build the Docker image and set up an Amazon MWAA local environment (from the GitHub website). MWAA repository provides a command line interface (CLI) utility that replicates an Amazon MWAA environment locally.

2.    Add the Python library and dependences to a requirements.txt file. 

3.    Use the following script to test the requirements.txt file:

#aws-mwaa-local-runner % ./mwaa-local-env test-requirements

The output looks similar to the following one: 

 Installing requirements.txt
Collecting aws-batch (from -r /usr/local/airflow/dags/requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/5d/11/3aedc6e150d2df6f3d422d7107ac9eba5b50261cf57ab813bb00d8299a34/aws_batch-0.6.tar.gz
Collecting awscli (from aws-batch->-r /usr/local/airflow/dags/requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/07/4a/d054884c2ef4eb3c237e1f4007d3ece5c46e286e4258288f0116724af009/awscli-1.19.21-py2.py3-none-any.whl (3.6MB)
    100% |████████████████████████████████| 3.6MB 365kB/s 
...
...
...
Installing collected packages: botocore, docutils, pyasn1, rsa, awscli, aws-batch
  Running setup.py install for aws-batch ... done
Successfully installed aws-batch-0.6 awscli-1.19.21 botocore-1.20.21 docutils-0.15.2 pyasn1-0.4.8 rsa-4.7.2

Build the .whl files from the requirements.txt file

Run the following package-requirements local-runner command to build the .whl files from the requirements.txt file:

#aws-mwaa-local-runner % ./mwaa-local-env package-requirements

The command downloads all .whl files into the folder: aws-mwaa-local-runner/plugin.

Create a plugins.zip file including .whl files and an Amazon MWAA constraint

Download and copy the constraints.txt into the plugin's directory. Then, run the following command to create the plugins.zip file: 

#aws-mwaa-local-runner % curl -o plugins/constraints.txt "https://raw.githubusercontent.com/apache/airflow/constraints-2.2.2/constraints-3.7.txt"
#aws-mwaa-local-runner % zip -j dags/plugins.zip plugins/constraints.txt

Create a new requirements.txt file that points to the .whl files that are packaged in the plugins.zip file

1. Create the new requirements.txt file. In a text tool of your choice, use a format similar to the following one:

=========new requirements.txt==========

--find-links /usr/local/airflow/plugins

--no-index

--constraint “/usr/local/airflow/plugins/constraints.txt”

aws-batch==0.6
====================================

2. Upload the plugins.zip files and requirements.txt files to the Amazon Simple Storage Service (Amazon S3) bucket of your MWAA cluster. Then, update the environment.

Use Python wheels to install custom libraries

A Python wheel is a package file with compiled artifacts. To install this package, place the .whl file in a plugins.zip file. Then, refer to this file in a requirements.txt file. After you add the .whl file to the plugins.zip file, update the environment. The .whl file is shipped to the  Amazon Elastic Container Service (Amazon ECS) Fargate container's location: /usr/local/airflow/plugins/

Install Python wheels

1.    Create the plugins.zip file. Run the following command to create a local Amazon MWAA plugin directory on your system:

$ mkdir plugins

2.    Copy the .whl file into the plugins directory that you created. Run the following command to change the directory to point to your local Airflow plugins directory:

$ cd plugins

Run the following command to confirm that the contents have executable permissions:

plugins$ chmod -R 755

Run the following command to zip the contents within your plugin folder:

plugins$ zip -r plugins.zip .

3.    Include the path of the .whl file in the requirements.txt file (for example, /usr/local/airflow/plugins/example_wheel.whl).

Note: Be sure to turn on versioning for your Amazon S3 bucket.

4.    Upload the plugins.zip and requirements.txt files to an Amazon S3 bucket (for example, s3://example-bucket/plugins.zip).

5.    Specify the plugins.zip version on the Amazon MWAA console.

Install custom operators, hooks, sensors, or interfaces

Amazon MWAA supports Apache Airflow’s built-in plugin manager. The plugin manager allows you to use custom Apache Airflow operators, hooks sensors, or interfaces. These custom plugins are placed in the plugins.zip file with either a flat and nested directory structure. The contents of plugins.zip file are written to the backend Amazon ECS Fargate containers at: /usr/local/airflow/plugins/. For more information, see Examples of custom plugins.

Create a custom plugin to generate runtime environment variables

Create a custom plugin that generates runtime environment variables on your Amazon MWAA environment. Then, use these environment variables in your Directed Acyclic Graph (DAG) code. For more information, see Creating a custom plugin that generates runtime environment variables.

Export PEM, .crt, and configuration files (.yaml)

If you don't need to update certain files continuously when your environment is running, then use the plugins.zip file to distribute the files. You can also create DAGs and use the plugins.zip for files that don't require user access (for example, certificate, PEM, and configuration YAML files).

The following example command bundles a ca-certificates.crt file into a plugins.zip file:

$ zip plugins.zip ca-certificates.crt

After you update the environment with this plugins.zip file, the .crt file is synced on the path /usr/local/airflow/plugins/ca-certificates.crt on each of the worker containers. Then, your DAGs access this file. Follow the same process for other files types.

Install custom plugins and binaries

You can use Oracle for Amazon MWAA to create a custom plugin, and combine it with other custom plugins and binaries in your plugins.zip file. This includes non-Python packages. For more information, see Creating a custom plugin with Oracle.

To add binaries to your containers, or to set and modify environment variables, see Creating a custom plugin with Apache Hive and Hadoop.

Troubleshoot package installation

Use aws-mwaa-local-runner (from the GitHub website) to test DAGs, custom plugins, and Python dependencies.

View the log file. You can view the file from the Apache Airflow Worker or Scheduler log groups.

Important: Before you install the packages or plugins.zip file, it's a best practice to use the Amazon MWAA CLI utility to test Python dependencies and the plugins.zip file.

Related information

Plugins

AWS OFFICIAL
AWS OFFICIALUpdated a year ago