AWS Developer Tools Blog

Recommended AWS CDK project structure for Python applications

In this blog post, I describe the recommended AWS Cloud Development Kit (AWS CDK) project structure for Python applications. This is based on Best practices for developing and deploying cloud infrastructure with the AWS CDK.

The AWS CDK is an open source software development framework for modeling and provisioning your cloud application resources through AWS CloudFormation by utilizing familiar programming languages, including TypeScript, JavaScript, Python, C#, Java, and Go.

The AWS CDK application maps to a component as defined by the AWS Well-Architected Framework. A component usually includes logical units (e.g., api, database), and optionally can have a continuous deployment pipeline. The logical units should be implemented as constructs, including the infrastructure (e.g., Amazon S3 buckets, Amazon RDS databases, Amazon VPC network), runtime (e.g., AWS Lambda function code), and configuration code.

For an example, I will walk through a user management backend component that utilizes Amazon API Gateway, AWS Lambda, and Amazon DynamoDB to provide basic CRUD operations for managing users. The project also includes a continuous deployment pipeline. This essentially contains everything required for managing the component as a unit of ownership, including the specific deployment environments.

Concepts

We recommend organizing the project directory structure based on the component’s logical units. Each logical unit should have a directory and include the related infrastructure, runtime, and configuration code. For example:

.
|-- api
|   |-- infrastructure.py
|   `-- runtime
|       |-- app.py
|       `-- requirements.txt
|-- database
|   `-- infrastructure.py

This way, if I need to make API changes, then I can easily find the code related to that logical unit. If I need to refactor the code, or make it a separate unit of ownership, it can be changed in a single place. In other words, it is a self-contained unit.

The logical units should be implemented as constructs and not as stacks. Constructs are the basic building blocks of AWS CDK applications, while stacks are deployment units. All of the AWS resources defined within the scope of a stack, either directly or indirectly, are provisioned as a single unit. Implementing logical units as constructs provides the flexibility to support different deployment layouts and enables future reuse as construct libraries. I will further discuss the deployment layout later.

Note: When refactoring constructs, consider logical ID stability to avoid unexpected infrastructure changes.

Before the AWS CDK arrived, runtime and infrastructure code remained two separate concepts. The AWS CDK abstraction lets you to combine the infrastructure and runtime code of a logical unit behind a single construct interface.

Project structure

Let’s look at the recommended project structure in detail. Clone the example I use in this blog post from https://github.com/aws-samples/aws-cdk-project-structure-python. Note that I have left out source code snippets and files in the blog post, such as linters and the full pipeline structure. The focus is on the code that illustrates the project structure recommendations, while the source code still provides a fully functional project for reference. Below is a snapshot of the project structure, excluding files not in the scope of this blog post:

# The recommended project structure example
.
|-- api
|   |-- __init__.py
|   |-- infrastructure.py
|   `-- runtime
|       |-- app.py
|       `-- requirements.txt
|-- database
|   |-- __init__.py
|   `-- infrastructure.py
|-- monitoring
|   |-- __init__.py
|   `-- infrastructure.py
|-- app.py
|-- constants.py
|-- deployment.py
|-- pipeline.py
`-- requirements.txt

Three logical units compose the user management backend: API, database, and monitoring. Each logical unit contains an infrastructure.py module. If the infrastructure implementation were more complex, then I would replace the infrastructure.py module with infrastructure package, which contains multiple modules. Some logical units also have a runtime directory. For example, the API has a runtime directory containing Lambda function code.

Next, I will cover app.py (the AWS CDK application entry point), deployment.py (the user management backend deployment layout), and pipeline.py (the continuous deployment pipeline) modules in order to show the implementation of the recommended project structure. We’ll start with app.py.

app.py

# The AWS CDK application entry point
...
import constants
from deployment import UserManagementBackend
from pipeline import Pipeline

app = cdk.App()

# Development
UserManagementBackend(
    app,
    f"{constants.CDK_APP_NAME}-Dev",
    env=constants.DEV_ENV,
    api_lambda_reserved_concurrency=constants.DEV_API_LAMBDA_RESERVED_CONCURRENCY,
    database_dynamodb_billing_mode=constants.DEV_DATABASE_DYNAMODB_BILLING_MODE,
)

# Production pipeline
Pipeline(app, f"{constants.CDK_APP_NAME}-Pipeline", env=constants.PIPELINE_ENV)

app.synth()

The module contains the app object, followed by the development environment definition and the continuous deployment pipeline.

Note: constants.CDK_APP_NAME is utilized as part of the construct identifier (e.g., f"{constants.CDK_APP_NAME}-Dev" above) in order to set a unique prefix for a CloudFormation stack name.

In this case, I utilize CDK Pipelines for continuous deployment, and instantiate the pipeline stack here. Then, the pipeline deploys the user management backend to Prod environment.

Note: We recommend deploying the pipeline in a separate production deployment account. See Separating CI/CD management capabilities from workloads for more details.

During development, I will iterate quickly and deploy changes to my development environment. The UserManagementBackend definition above enables this. It defines the user management backend deployment layout for my development environment. As you can see, the UserManagementBackend class is imported from the deployment.py module. Let’s look into it.

deployment.py

# The user management backend deployment layout
...
from api.infrastructure import API
from database.infrastructure import Database
from monitoring.infrastructure import Monitoring

class UserManagementBackend(cdk.Stage):
    def __init__(
        self,
        scope: cdk.Construct,
        id_: str,
        *,
        database_dynamodb_billing_mode: dynamodb.BillingMode,
        api_lambda_reserved_concurrency: int,
        **kwargs: Any,
    ):
        super().__init__(scope, id_, **kwargs)

        stateful = cdk.Stack(self, "Stateful")
        database = Database(
            stateful, "Database", dynamodb_billing_mode=database_dynamodb_billing_mode
        )
        stateless = cdk.Stack(self, "Stateless")
        api = API(
            stateless,
            "API",
            dynamodb_table=database.table,
            lambda_reserved_concurrency=api_lambda_reserved_concurrency,
        )
        Monitoring(stateless, "Monitoring", database=database, api=api)

        self.api_endpoint_url = api.endpoint_url

The UserManagementBackend class inherits from cdk.Stage—an abstract deployment unit consisting of one or more stacks that should be deployed together. This is where the separation between constructs as logical units, and their deployment layout as stacks, reveals its flexibility. In this example, I deploy the stateful database logical unit in a separate stack from the stateless API and monitoring logical units. The UserManagementBackend class combines the stateful and stateless stacks into a single deployment stage.

Finally, let’s look at the pipeline definition.

pipeline.py

# The continuous deployment pipeline
...
from aws_cdk import core as cdk
from aws_cdk import pipelines

import constants
from deployment import UserManagementBackend

class Pipeline(cdk.Stack):
    def __init__(self, scope: cdk.Construct, id_: str, **kwargs: Any):
        super().__init__(scope, id_, **kwargs)
        ...
        codepipeline = pipelines.CodePipeline(...)
        self._add_prod_stage(codepipeline)
    ...
    def _add_prod_stage(self, codepipeline: pipelines.CodePipeline) -> None:
        prod_stage = UserManagementBackend(
            self,
            f"{constants.CDK_APP_NAME}-Prod",
            env=constants.PROD_ENV,
            api_lambda_reserved_concurrency=constants.PROD_API_LAMBDA_RESERVED_CONCURRENCY,
            database_dynamodb_billing_mode=constants.PROD_DATABASE_DYNAMODB_BILLING_MODE,
        )
        ...
        codepipeline.add_stage(prod_stage, post=[smoke_test_shell_step])

This time, the UserManagementBackend stage is utilized for deployment to a Prod environment via pipeline. This lets me keep my development environment similar to the Prod environment, all while remaining able to add customizations. For example, I use the database_dynamodb_billing_mode argument to set DynamoDB capacity mode to on-demand for the development environment and to provisioned mode for the Prod environment.

Conclusion

The AWS CDK allows for infrastructure code to be located in the same repository with runtime code. This leads to additional considerations, such as how to structure the project. In this blog post, I have described the recommended AWS CDK project structure for Python applications, thereby aiming to ease the maintenance and evolution of your projects.

If you think I’ve missed something, or you have a use case that I didn’t cover, we would love to hear from you on the aws-cdk GitHub repository. Happy coding!

About the author

Alex Pulver is a Sr. Partner Solutions Architect at AWS SaaS Factory team. He works with AWS Partners at any stage of their software-as-a-service (SaaS) journey in order to help build new products, migrate existing applications, or optimize SaaS solutions on AWS. His areas of interest include builder experience (e.g., developer tools, DevOps culture, CI/CD), containers, security, IoT, and AWS multi-account strategy.