Best practices for developing cloud applications with AWS CDK

April 20, 2022: Updates are available in the Best practices topic of the AWS CDK documentation. The documentation is the most up-to-date resource going forward.

In this post, we discuss strategies for organizing the development of complex cloud applications with large teams, using the AWS Cloud Development Kit (AWS CDK) as a central technology. AWS CDK allows developers and administrators to define their cloud applications using a familiar programming language, such as TypeScript, Python, Java, or C#. Applications are organized into stages, stacks, and constructs, which allows for modular design techniques in both runtime logic (such as AWS Lambda code or containerized services) and infrastructure components such as Amazon Simple Storage Service (Amazon S3) buckets, Amazon Relational Database Service (Amazon RDS) databases, and network infrastructure.

In this post, we go beyond simple tutorials on basic AWS CDK concepts. We discuss how developers write and test code locally, how it gets deployed to production and various staging accounts, and how to organize a team’s apps to fit into a larger company-wide structure.

If you’re new to AWS CDK, we highly recommend that you start your journey with the AWS CDK Intro Workshop. This post covers some advanced topics, and it’s good to have a grasp of the fundamentals. For more information, see AWS CDK Reference Documentation and sample code in the aws-cdk-examples GitHub repo.

The AWS CDK philosophy

In a previous post, we discussed some of the history and motivation behind the AWS CDK. When we designed AWS CDK, we took a close look at the needs of our customers and of our own internal teams, and we analyzed some of the common failure patterns that arose during the deployment and ongoing maintenance of complex applications. Often times, failures are related to what we call out-of-band changes to an application, such as an edit to a configuration file that defines how the application functions in production causing an error that was never seen in testing environments. The AWS CDK enables a model in which your entire application is defined in your code, and any change to your deployed app is always triggered by a push to your source repository.

In your organization, you may have separate teams that own different aspects of the applications, such as a team that creates all your infrastructure, a team that does all the software development, and an operations team that handles configuration and deployment. Or maybe you have embraced the two pizza team way of organizing yourselves, but those divisions still exist within your application (see the following diagram), with one repository for infrastructure, a separate one for your code, and your deployments are all configured on an independent CI/CD system.

Siloed application development

With AWS CDK, we allow you bring all these concerns together under a single roof, creating a single application, housed in a single repository. This application defines underlying components such as your VPC, S3 buckets, Amazon Elastic Compute Cloud (Amazon EC2) instances, and security groups. It also contains your runtime logic, whether that is Lambda function code written in Typescript or a container-based server written in Java. It also specifies your delivery pipeline, which enables continuous integration (CI) and continuous delivery/deployment (CD). All your target deployment environments are fully configured in your source code, rather than parameterizing a deployment artifact to be configured later.

When an AWS CDK application is synthesized, the result is a cloud assembly, which contains not only all the generated AWS CloudFormation templates for your stacks in all target accounts and Regions, but your file assets as well, which are later deployed by the AWS CDK CLI.

Organization

You may be adopting AWS CDK as a part of a wider effort within your company to adopt modern application development practices, and it’s very important to consider how you’re organized in order to achieve this. We already briefly mentioned the idea of a two pizza team, which empowers small, autonomous teams to quickly act in the company’s best interests without needing to go through complex approval chains every time they want to change something. Although this sounds great in theory, in practice it can lead to some chaotic situations if those teams don’t have well-defined guardrails within which to operate.

Making this shift to AWS CDK and a decentralized, continuous deployment model can be difficult, so it’s a best practice to have a team of experts responsible for training and guiding the rest of the company as they start using it. We highly recommend that all medium and large-sized organizations spin up a Cloud Center of Excellence (CCoE) to act as mentors, trainers, and guardians of your company’s policies for application development and deployment.

One of the first responsibilities of a CCoE is to create a landing zone to define your organizational units within AWS. A landing zone is a pre-configured, secure, scalable, multi-account AWS environment based on best practice blueprints. You can use several AWS services to implement a landing zone, and tie them all together with AWS Control Tower, a high-level service that allows you to configure and manage the entire multi-account system from a single pane of glass. Development teams should be able to freely spin up new accounts that they can use to test and deploy their applications. This enables you to adopt another one of our best practice recommendations: deploying to multiple accounts, as illustrated in the following diagram.

Deploy to multiple accounts

In this architecture, developers deploy resources to their own accounts and treat those accounts as extensions of their own development workstations. When code is pushed to a repository and passes code review, it’s picked up by a shared services account, where your delivery pipeline is configured. That account is responsible for building, testing, and deploying your application to target environments, such as beta, gamma, and prod, each of which is hosted in its own isolated account. You may go farther than this, deploying each stage of your application into a distinct account within each AWS Region, which adds up to a large number of accounts. AWS CDK can help you manage this complexity by modeling all aspects of those target environments in code.

Code organization best practices

In this section, we present best practices for organizing your code. The diagram below shows the relationship between a team, their code repositories, packages, applications, and construct libraries.

Team Repositories Apps Packages

Start simple, add complexity when you need to

The guiding principle for most of these best practices is to keep it simple, unless you have requirements that require a more complicated setup. You can always move code around later, so start simple and diverge only when you have to.

Every application starts with a single package in a single repository

A single package is the entry point of your AWS CDK app. This is where you define how and where the different components of your application are deployed, as well as the CI/CD pipeline to deploy the application. This app uses constructs that define what the actual application looks like.

Depending on how reusable these application constructs are, they live in the same AWS CDK app package, or may be split out into separate packages. If the constructs in question are very specific to this particular application, it doesn’t make much sense to generalize them and package them differently. On the other hand, if they are reused across multiple applications, they should be moved to a separate package with a separate lifecycle and testing strategy.

Dependencies between packages in the same repository are managed by your build tooling.

Though possible, we generally don’t recommend having multiple applications in the same repository, especially when using automated deployment pipelines, because this increases the blast radius of changes during deployment. If multiple applications are in a repository, the following occurs:

Changes to one application trigger deployment of the other ones, even though nothing changed
If changes to one application break the build, the other application can no longer be deployed either

Divide into repositories based on code lifecycle or team ownership

It’s time to start moving packages into separate repositories when one of the following becomes true:

Packages are starting to be used in multiple applications simultaneously; they now need to live in a place where they can be referenced by the build systems of all applications, and need to be changed on cadences independent of the lifecycles of those applications.
Multiple teams have commit permissions to different sets of packages that make up your application. Separating those out into different code repositories helps enforce access control.

To start consuming packages across repository boundaries, you now need the following:

Internal package repository – The repository hosts your packages in a place where other application teams inside your organization can use them. Many such package repositories exist for various languages. AWS CodeArtifact can act as a package repository for most popular programming languages.
Release process – This process does a build of your packages, tests them appropriately, and publishes a new version to those package repositories. The release process is usually an automated pipeline that either runs on demand or on a periodic cadence like daily or weekly. As an example, the AWS CDK team uses a construct library called delivlib to manage their release pipeline.

Dependencies on packages in the package repository are managed by your package manager. Your package manager is responsible for making sure builds are repeatable (by encoding what specific versions of every dependency package your application depends on).

Shared packages need a different testing strategy: although for a single application it might be good enough to deploy the application to a testing environment and confirm that it still works, packages that are shared between applications need to be tested independently of the consuming application.

Remember, a construct can be arbitrarily simple or complex, and arbitrarily flexible or opinionated. A Bucket is a construct, but so is a CameraShopWebsite. One team’s responsibility inside your organization could be to work on and produce the CameraShopWebsite construct.

Infrastructure code and application code lives in the same package

Remember that AWS CDK is not just about generating CloudFormation templates—it also includes a powerful asset bundler that handles deployment of things like Lambda code bundles and Docker images.

As we discussed in the section on AWS CDK philosophy, it’s completely acceptable and even encouraged for you to combine your infrastructure components and the code that implements your business logic into the same construct. They don’t need to be in separate repositories, or even in separate packages. A construct is self-contained in that way, a complete description of a piece of functionality.

By keeping infrastructure and runtime code together, it’s easy to evolve them together, test them in isolation, share and reuse across projects, and keep them in sync and version them together.

Construct library best practices

In this section, we go over some best practices to apply when you’re developing constructs, which are composable and reusable modules that encapsulate resources.

Model your app through constructs, not stacks

When you organize your app into units, we recommend that each unit be represented through a class that extends the Construct base class and not the Stack base class. Stacks are a unit of deployment, and tend to be specific to an individual application. By using constructs, you give yourself and your users, the flexibility to compose stacks in the way that makes the most sense for each deployment scenario. For example, you could compose multiple constructs into a DevStack with some configuration for development environments and then have a different composition for production.

Configure with APIs (properties, methods), not environment variables

One of the common anti-patterns that we see is environment variable lookups inside constructs and stacks. Both of these should accept a properties object in the constructor that allows for full configurability, rather than relying on an environment variable on the target machine. If you reference any environment variables, they should be limited to the very top level of your application, and even there these lookups should be limited to the configuration of local development stacks.

Unit test your infrastructure

One of the benefits of following the AWS CDK best practice of creating deterministic builds (avoiding network lookups during synthesis, and modeling all your production stages in code, which we cover later), is that you can run a full suite of unit tests at build time, consistently in all environments. If any single Git commit always results in the same generated templates, you can trust the unit tests that you write to confirm that the generated templates look how you expect them to.

Don’t change the scope and ID of stateful resources

Changing the logical ID for a resource results in a resource replacement by AWS CloudFormation, which is almost never what you want for a stateful resource like a database, or persistent infrastructure like a VPC. Be careful about any refactors of your AWS CDK code that result in the ID changing. Write tests to assert that the logical IDs of your stateful resources didn’t change.

You need more than constructs for compliance

Another common pattern we have seen, particularly among enterprise customers, is creating a collection of construct libraries based on the L2 constructs included with the AWS CDK, with a 1-1 mapping of subclasses. For example, you might create a class called MyCompanyBucket that extends s3.Bucket and MyCompanyFunction that extends lambda.Function. Inside those subclasses, you implement your company’s security best practices like encrypting data at rest, or requiring the use of certain security policies. This pattern is useful for surfacing security guidance early in the software development lifecycle, but it cannot be relied on as the sole means of enforcement.

Investigate the usage of service control polices and permissions boundaries at the organization level to fully enforce your security guardrails. Use aspects or tools like CFN Guard to make assertions about the properties of infrastructure elements before deployment.

Additionally, when developing an “L2.5” compliance-oriented library, be aware of potential drawbacks. An inflexible library may prevent your developer community from taking advantage of the growing ecosystem of AWS CDK packages, such as AWS Solutions Constructs.

AWS CDK application best practices

In the previous section, we covered best practices for construct libraries; now we discuss how to write your AWS CDK applications, which combine one to many constructs to define your specific usage of the resources, and how they are configured and deployed.

Make decisions at synth time, not deployment time

Although AWS CloudFormation allows you to make decisions at deployment time (by means of Conditions and { Fn::If } and Parameters), and AWS CDK allows some access to these mechanism, we actually recommend you don’t use them.

Try to make all decisions, like what construct to generate, in your AWS CDK application at synthesis time. You can use your programming language’s if over Fn.if, use function parameters over CfnParameters, and so on. The reason is that the types of values and operations that can be done on values in AWS CloudFormation are quite limited. For example, iterating over lists and instantiating a resource isn’t possible in AWS CloudFormation expressions, but it’s possible (and used a lot) in AWS CDK. This includes telling your app where you’re going to deploy it, so it can look up relevant context information at synthesis time.

Treat AWS CloudFormation as an implementation detail that we use for robust cloud deployments, not as a language target.

Use generated resource names, not physical names

Names are a precious resource. Every name can only be used once, so if you hardcode a table name or bucket name into your infrastructure and application, you can’t deploy that piece of infrastructure twice side by side anymore.

What’s worse, you can’t make any more changes to the resource that requires it to be replaced, which is an AWS CloudFormation description for what needs to happen if you want to change a property of a resource that can only be set in the Create call and never changed again—for example, the KeySchema of an Amazon DynamoDB table. To accommodate any changes to immutable properties, first, a new table with the new key schema is created before the old one is deleted. But if that table has a deterministic name, the new table can’t be created because the old one still exists and is still using that name.

A better approach is to specify as few names as possible. If you leave out resource names, a unique fresh name is generated for it, and so you don’t run into these kinds of problems. You then parameterize your application, for example by passing in the actual generated table name (which you can reference as table.tableName in your AWS CDK application) as an environment variable into your Lambda function, or you generate a config file on your EC2 instance on startup, or you write the actual table name to AWS Systems Manager Parameter Store and your application reads it from there to figure out what actual table it should be reading from. This is like dependency injection, but for resources.

Separate your application stage into multiple stacks when it’s dictated by deployment requirements

When deciding on how many stacks to have in your application, there is no hard and fast rule, such as putting all resources into a single stack, or putting each resource into its own stack. You usually end up somewhere in the middle, basing the decision on your deployment patterns. Keep in mind the following guidelines:

It’s typically easiest to keep as many resources in the same stack as possible, so keep them together unless you know you want them separated.
It’s a good idea to keep stateful resources (like databases) separated from the stateless resources. You can turn on termination protection on the stack with stateful resources, and can freely destroy or create multiple copies of the stack with stateless resources without risk of data loss.
Stateful resources are also more sensitive to construct renaming—renaming leads to resource replacement—so it makes sense not to nest them too much into other constructs that are likely to be moved around or renamed (unless, of course, the state is a temporary state that can be rebuilt if lost, like a cache).

Commit `cdk.context.json` to avoid nondeterministic network lookups

Your CDK app is written in a a general-purpose programming language, and could make arbitrary network calls. It might be tempting to use the AWS SDK to perform some calls against your AWS account during your app’s synthesis, but should you do this? Any network call you add will add credential setup requirements, latency, and a tiny chance of failure every time you run cdk synth. You can do it, but you need to be aware of the impact it will have on your application.

Another aspect to consider is determinism. If you are going to make network calls, they should definitely not have side effects: changes to your account happen during the CloudFormation deployment, not during the synthesis of your infrastructure. If you need to run custom code as part of your deployments, you can use custom resources. But even strictly read-only calls are not necessarily safe. Consider what happens if the value that’s returned from the network call will change. What part of your infrastructure will that impact? What will happen to deployed infrastructure if the value suddenly changes? Here are some examples of what might happen if an AWS call suddenly starts returning a different value:

If you use DescribeAvailabilityZones to provision a VPC to all available Availability Zones, which at some point in time happens to return 2 availability zones, your IP space gets split in two. If AWS launches a new Availability Zone the next day, the next deployment tries to split your IP space in 3, requiring all subnets to be recreated; this probably won’t be possible because instances are still running, and the issue will take manual work to clean up.
If you use DescribeImages to query the latest Amazon Linux AMI and deploy an instance, and the next day a new AMI is released, the deployment the day after immediately picks up the new AMI and replaces all your instances. This may not be what you expected to happen.

As you can tell, depending on how you use them, directly using responses from network calls may lead to unexpected results at inopportune times.

Fortunately, AWS CDK comes with a mechanism to record sources of nondeterministic information, which makes sure that future synthesis produces the same templates, and important values don’t change when you are not expecting them to. You can see these functions on constructs as .fromLookup() calls, and use an AWS CDK mechanism called context providers. The values get written to a file named cdk.context.json, which you should commit to version control to ensure future executions of your CDK app use the same values. The CDK CLI comes with commands to work with the context cache, so you can refresh select entries on demand. For more information, see the section Runtime Context in the AWS CDK Developer Guide.

If you need to look up some information that a context provider doesn’t exist for, rather than add a network call to your CDK app, we recommend you write a separate script to query for this information, write it to a file (perhaps a JSON file), and read and use that file in your application to generate your infrastructure.

Allow AWS CDK to manage roles and security groups

One of the great features of the AWS CDK construct library is the convenience methods that have been built in to the resources to allow quick and simple creation of AWS Identity and Access Management (IAM) roles. We have designed that functionality to allow minimally-scoped permissions for the components of your application to interact with each other. For example, consider a typical line of code like the following:

myBucket.grantRead(myLambda)

This one line results in a policy being added to the Lambda function’s role, which is also created behind the scenes for you. That role and its policies are more than a dozen lines in the CloudFormation template that you don’t have to write, and we take best practices into account when we generate the template.

If you force your developers to always use pre-created roles that were defined by a security team, coding for AWS CDK becomes much more complicated, and your teams lose a lot of flexibility in how they design their applications. A better alternative is to use service control policies and permissions boundaries to ensure that developers are staying within the guardrails.

Model all production stages in code

In an AWS CDK application, the way you design your deployment pipeline differs significantly from a traditional setup where your goal is to produce a single deployable artifact that is parameterized so that it can be deployed to various target environments after applying configuration values specific to those environments. In AWS CDK, you can build that configuration right into your source code. Create a code file for your production environment, and a separate one for each of your other stages, and put the configuration values right there in the source. Use services like AWS Secrets Manager and Systems Manager Parameter Store for any sensitive values that you don’t want to check in to source control, substituting the names or ARNs for those resources. When you synthesize your application, the cloud assembly that is created in the cdk.out folder contains a separate template for each environment. In this way, your entire build is deterministic. There are no out-of-band changes to your application, and any given Git commit always yields the exact same CloudFormation template and accompanying assets, which makes unit testing much more reliable.

Measure everything

Achieving the goal of full continuous deployment, without the need for human intervention, requires a high level of automation, and that automation isn’t possible without extensive amounts of monitoring. Create metrics, alarms, and dashboards to measure all aspects of your deployed resources. And don’t just measure simple things like CPU usage and disk space, also record your business metrics, and use those measurements to automate deployment decisions like rollbacks. Most of the L2 constructs in AWS CDK have convenience methods to help you create metrics, such as the metricUserErrors() method on the dynamodb.Table class.

Summary

In this post, we introduced you to a set of best practices that we think lead to robust, operationally excellent applications developed using AWS CDK. You should now understand the guiding philosophy that has dictated the design and evolution of AWS CDK, moving you towards a fully automated deployment pipeline that is entirely based on Git commits to a repository where all aspects of your application are maintained. In Part 2 of this series, we walk you through a complete sample application that demonstrates many of these best practices.

Happy coding!

AWS DevOps Blog