Recommended AWS CDK project structure for Python applications
September 22, 2022: Migrated the reference application to AWS CDK v2. Renamed
backend/component.pyto support multi-component use cases and better emphasize the mapping of AWS Well-Architected Framework component terminology. Renamed
toolchain.pyto expand the scope to any tools related to component’s software development life cycle (e.g. continuous deployment pipeline, pull request validation build, etc). The component is now a
cdk.Stack. Moved the definition of the production
cdk.Stageto the toolchain implementation.
In this blog post, I describe the recommended AWS Cloud Development Kit (AWS CDK) project structure for Python applications. This is based on Best practices for developing and deploying cloud infrastructure with the AWS CDK.
The AWS CDK application maps to a component as defined by the AWS Well-Architected Framework. A component usually includes logical units (e.g., api, database), and optionally can have a toolchain with a continuous deployment pipeline. The logical units should be implemented as constructs, including the infrastructure (e.g., Amazon S3 buckets, Amazon RDS databases, Amazon VPC network), runtime (e.g., AWS Lambda function code), and configuration code.
For an example, I will walk through a user management backend component that utilizes Amazon API Gateway, AWS Lambda, and Amazon DynamoDB to provide basic CRUD operations for managing users. The project also includes a toolchain with a continuous deployment pipeline. This essentially contains everything required for managing the component as a unit of ownership, including the specific deployment environments.
We recommend organizing the project directory structure based on the component’s logical units. Each logical unit should have a directory and include the related infrastructure, runtime, and configuration code. For example:
. |-- backend | |-- api | | |-- runtime | | | |-- lambda_function.py | | | `-- requirements.txt | | `-- infrastructure.py | |-- database | | `-- infrastructure.py
This way, if I need to make API changes, then I can easily find the code related to that logical unit. If I need to refactor the code, or make it a separate unit of ownership, it can be changed in a single place. In other words, it is a self-contained unit.
The logical units should be implemented as constructs and not as stacks. Constructs are the basic building blocks of AWS CDK applications, while stacks are deployment units. All of the AWS resources defined within the scope of a stack, either directly or indirectly, are provisioned as a single unit. Implementing logical units as constructs provides the flexibility to support different deployment layouts and enables future reuse as construct libraries. I will further discuss the deployment layout later.
Note: When refactoring constructs, consider logical ID stability to avoid unexpected infrastructure changes.
Before the AWS CDK arrived, runtime and infrastructure code remained two separate concepts. The AWS CDK abstraction lets you to combine the infrastructure and runtime code of a logical unit behind a single construct interface.
Let’s look at the recommended project structure in detail. Clone the example I use in this blog post from https://github.com/aws-samples/aws-cdk-project-structure-python. Note that I have left out source code snippets and files in the blog post, such as linters and the full pipeline structure. The focus is on the code that illustrates the project structure recommendations, while the source code still provides a fully functional project for reference. Below is a snapshot of the project structure, excluding files not in the scope of this blog post:
# The recommended project structure example . |-- backend | |-- api | | |-- runtime | | | |-- lambda_function.py | | | `-- requirements.txt | | `-- infrastructure.py | |-- database | | `-- infrastructure.py | |-- monitoring | | `-- infrastructure.py | `-- component.py |-- app.py |-- constants.py |-- requirements.txt `-- toolchain.py
Three logical units compose the user management backend: API, database, and monitoring. Each logical unit contains an
infrastructure.py module. If the infrastructure implementation were more complex, then I would replace the
infrastructure.py module with
infrastructure package, which contains multiple modules. Some logical units also have a
runtime directory. For example, the API has a
runtime directory containing Lambda function code.
Next, I will cover
app.py (the AWS CDK application entry point),
backend/component.py (the user management backend deployment layout), and
toolchain.py (the continuous deployment pipeline) modules in order to show the implementation of the recommended project structure. We’ll start with
# The AWS CDK application entry point ... import constants from backend.component import Backend from toolchain import Toolchain app = cdk.App() # Component sandbox stack Backend( app, constants.APP_NAME + "Sandbox", env=cdk.Environment( account=os.environ["CDK_DEFAULT_ACCOUNT"], region=os.environ["CDK_DEFAULT_REGION"], ), api_lambda_reserved_concurrency=1, database_dynamodb_billing_mode=dynamodb.BillingMode.PAY_PER_REQUEST, ) # Toolchain stack (defines the continuous deployment pipeline) Toolchain( app, constants.APP_NAME + "Toolchain", env=cdk.Environment(account="111111111111", region="eu-west-1"), ) app.synth()
The module defines the
app object, followed by the component sandbox stack and the toolchain stack with a continuous deployment pipeline.
constants.APP_NAMEis utilized as part of the construct identifier (e.g.,
constants.APP_NAME + "Sandbox"above) in order to set a unique prefix for a CloudFormation stack name.
In this case, I utilize CDK Pipelines for continuous deployment, and instantiate the toolchain stack that defines the pipeline here. Then, the pipeline deploys the user management backend to production environment.
Note: We recommend deploying the toolchain in a separate production deployment account. See Separating CI/CD management capabilities from workloads for more details.
During development, I will iterate quickly and deploy changes to my sandbox environment. The
Backend definition above enables this. It defines the user management backend deployment layout for my sandbox environment. The
Backend class is imported from the
backend/component.py module. Let’s look into it.
# The user management backend deployment layout ... from backend.api.infrastructure import API from backend.database.infrastructure import Database from backend.monitoring.infrastructure import Monitoring class Backend(cdk.Stack): def __init__( self, scope: cdk.Construct, id_: str, *, database_dynamodb_billing_mode: dynamodb.BillingMode, api_lambda_reserved_concurrency: int, **kwargs: Any, ): super().__init__(scope, id_, **kwargs) database = Database( self, "Database", dynamodb_billing_mode=database_dynamodb_billing_mode ) api = API( self, "API", dynamodb_table_name=database.dynamodb_table.table_name, lambda_reserved_concurrency=api_lambda_reserved_concurrency, ) Monitoring(self, "Monitoring", database=database, api=api) database.dynamodb_table.grant_read_write_data(api.lambda_function) self.api_endpoint = cdk.CfnOutput( self, "APIEndpoint", value=api.api_gateway_http_api.url, )
Backend class inherits from cdk.Stack—a unit of deployment in the AWS CDK. As I mentioned above, all AWS resources defined within the scope of a stack, either directly or indirectly, are provisioned as a single unit. The
Backend class composes the
Monitoring constructs into a single deployment unit. The class also defines the permissions between the logical units and the stack outputs.
Finally, let’s look at the toolchain definition.
# The continuous deployment pipeline ... import aws_cdk as cdk from aws_cdk import pipelines import constants from backend.component import Backend class Toolchain(cdk.Stack): def __init__(self, scope: cdk.Construct, id_: str, **kwargs: Any): super().__init__(scope, id_, **kwargs) ... pipeline = pipelines.CodePipeline(...) Toolchain._add_production_stage(codepipeline) ... @staticmethod def _add_production_stage(self, pipeline: pipelines.CodePipeline) -> None: production = cdk.Stage( pipeline, PRODUCTION_ENV_NAME, env=cdk.Environment( account=PRODUCTION_ENV_ACCOUNT, region=PRODUCTION_ENV_REGION ), ) backend = Backend( production, constants.APP_NAME + PRODUCTION_ENV_NAME, stack_name=constants.APP_NAME + PRODUCTION_ENV_NAME, api_lambda_reserved_concurrency=10, database_dynamodb_billing_mode=dynamodb.BillingMode.PROVISIONED, ) ... pipeline.add_stage(production, post=[smoke_test])
This time, the
Backend stage is utilized for deployment to a production environment via a pipeline. This lets me keep my sandbox environment similar to the production environment, all while remaining able to add customizations. For example, I use the
database_dynamodb_billing_mode argument to set DynamoDB capacity mode to on-demand for the sandbox environment and to provisioned mode for the production environment.
The AWS CDK allows for infrastructure code to be located in the same repository with runtime code. This leads to additional considerations, such as how to structure the project. In this blog post, I have described the recommended AWS CDK project structure for Python applications, thereby aiming to ease the maintenance and evolution of your projects.
If you think I’ve missed something, or you have a use case that I didn’t cover, we would love to hear from you on the aws-cdk-project-structure-python GitHub repository. Happy coding!
About the author
Alex Pulver is a Sr. Partner Solutions Architect at AWS SaaS Factory team. He works with AWS Partners at any stage of their software-as-a-service (SaaS) journey in order to help build new products, migrate existing applications, or optimize SaaS solutions on AWS. His areas of interest include builder experience (e.g., developer tools, DevOps culture, CI/CD), containers, security, IoT, and AWS multi-account strategy.