How do I install libraries in my Amazon MWAA environment?
Last updated: 2022-03-02
I want to install libraries in my Amazon Managed Workflows for Apache Airflow (Amazon MWAA) environment.
When you create an Amazon MWAA environment, some basic packages/distributions are included in the Apache Airflow base installation. You can install packages apart from the ones included in the base installation using the requirements.txt or plugins.zip file in your Amazon MWAA environment.
By default, packages are installed in Amazon MWAA from public repositories (PyPi.org). You can also install packages from privately hosted repositories, such as JFROG and AWS CodeArtifact.
You can install libraries from either the default public repository (PyPi.org) or private repositories. Choose the approach that best suits your use case.
Install from public repositories
To install packages from the default public repository, you can simply add the package name and version (==) in your requirements.txt file. Then, upload the requirements.txt file to your Amazon S3 bucket. Specify the version of this file in the Requirements file field when you update the environment using the Amazon MWAA console.
Install from private or local repositories
If you don't have the internet access on your environment to install packages from the public repository, you can install libraries/custom packages from your privately hosted PyPi repository. To do so, download all the packages and include these packages in your local repository that might be JFROG, AWS CodeArtifact, or any other repository. After adding the packages to the repository, use the URL of the repository to import these packages. To import private libraries, specify the repository URL (--index-url) in the requirements.txt file.
Use one of the following options to install a package (example: redis==3.5.3) from your local repository.
If you're using a private repository, then update your requirements.txt file to include the following:
--trusted-host artifactory-aws.example-org.com --index-url https://artifactory-aws.example-org.com/artifactory/api/pypi/pypi-virtual/simple redis==3.5.3
If you're using the JFROG repository, then update your requirements.txt file to include the following:
You can also include the --index-url in a text file instead of directly adding the URL in the requirements.txt file. You can specify the --index-url in a text file (for example, codeartifact.txt), and then upload the codeartifact.txt into your S3 bucket under the /dags folder.
Then, update the requirements.txt file to include the path of the codeartifact.txt file:
-r /usr/local/airflow/dags/codeartifact.txt <br>redis==3.5.3
See Amazon MWAA for Analytics workshop - Private PyPi repository to learn how to do the following:
Note: This solution uses the codeartifact.txt file to specify the --index-url.
- Create a private PyPi repository with a connection to an external source using AWS CodeArtifact.
- Provision a private MWAA environment without a connection to public internet and leverage VPC Endpoints to connect to AWS CodeArtifact.
- Create a Lambda function to keep the authorization token for your private PyPi repository up to date.
To install packages from either the public or private repository, update your environment to include the requirement.txt file:
- Create a local copy of the requirements.txt file, and upload this copy to your Amazon Simple Storage Service (Amazon S3) bucket (example: s3://example-bucket/requirements.txt) using either the Amazon S3 console or AWS Command Line Interface (AWS CLI).
Note: Remember to turn on versioning on your S3 bucket.
- To edit the environment, open the Environments page on the Amazon MWAA console.
- Select the environment from the list, and then choose Edit.
Note: If you're uploading the requirements.txt file into your environment for the first time, then follow steps 4, 5, and 6. If you've already uploaded the file and recently updated it, follow only step 6.
- On the Specify details page, in the DAG code in Amazon S3 section, choose Browse S3 under the Requirements file - optional field.
- Select the requirements.txt file in your Amazon S3 bucket, and then choose Choose.
- For Choose a version under the Requirements file - optional field, select the latest version that you've uploaded.
- Choose Next, and then choose Save.
You can start using the new packages immediately after the environment is updated.
The environment takes 15 to 20 minutes to get updated. After the environment is updated, the packages listed in the requirements.txt file are installed. If the dependencies aren't installed within 10 minutes, the AWS Fargate service might time out and attempt to roll back the environment to a stable state.
Troubleshoot the installation process
If you have issues during the installation of these packages, you can view the log file (requirements_install_ip) from either the Apache Airflow Worker/Scheduler log groups.
If the log file has errors for the packages, then you might retry installing the packages by adding a different version of the package. It's a best practice to prefix the requirements.txt file with the constraints file.
Note: Be sure to replace example-Airflow-version with your environment's version number.
Important: It's a best practice to test the Python dependencies using the Amazon MWAA CLI utility (aws-mwaa-local-runner) before installing the packages on your Amazon MWAA environment.