AWS Cloud Operations Blog

Accelerate application development with AWS CloudFormation by preventing stack rollback

AWS CloudFormation helps minimize downtime when you are deploying application and infrastructure resources. By default, it supports a deployment safety approach (that is, one in which it rolls back to the last known good state whenever an error is encountered during the deployment of stack resources). This works well for production systems, but might not be a good fit for a development workflow. In this blog, we explore a new feature in CloudFormation that helps accelerate development workflows by preventing stack rollback and allowing stack operations to be retried from the point of failure.

This option allows development teams to move faster, which is critical for software development; especially to customers such as Formula One. As Martynas Juras, a cloud engineer at Formula One, told us “It’s great that the template keeps provisioning resources even though a resource has failed to create. Deploying the second time round was much quicker – a massive improvement there, as to us, Speed is key”.

Consider the following example template that creates a sample VPC that follows best practices of Multi-AZ support with public and private subnets:

The description in the CloudFormation template says it deploys a VPC with a pair of public and private subnets across two Availability Zones. It deploys an internet gateway and two NAT gateways (one in each AZ).

Figure 1: Example template

This template consists of 22 unique CloudFormation resources, even before you add a single Amazon Elastic Compute Cloud (Amazon EC2) instance. You can see how quickly the resource count can add up as applications become more complex. What happens if there’s a problem with one of these resources during deployment?

When CloudFormation encounters an error during stack deployment, the default behavior is to roll the stack back to its last known good state. Rolling back includes canceling any pending operations, reverting any changes to existing resources, and deleting any newly created resources. Rolling back a stack create operation is particularly cumbersome for development because the stack will enter a terminal state where the only option is to delete the stack and start over. In the example in Figure 1, if the final resource failed during creation, the stack would have to be deleted. A redeployment would require deploying those 22 resources again.

Rolling back to a last known good state is ideal for a production deployment that is focused on minimizing downtime, but less desirable for deployments where an iterative workflow is more efficient.

Deactivate stack rollback

To provide more flexibility in the stack development process, CloudFormation recently introduced the ability to deactivate stack rollback. This means if CloudFormation encounters an error during stack deployment, provisioned resources are left in place; developers can debug the stack, fix any issues, and continue deployment.

You might be thinking this doesn’t seem like a new feature. The disable rollback option (for stack create operations) has been around for a while! What’s new is the ability to perform additional stack operations (that is, a stack in a CREATE_FAILED state is no longer terminal) and to deactivate rollback in both create stack and update stack operations.

I’ll show you what this looks like when a stack encounters an issue with provisioning an EC2 instance.

Stack behavior on resource failure

In the following example, a new stack is deployed with the stack options set to prevent rollback. Under Stack failure options, choose Preserve successfully provisioned resources.

In the CloudFormation console, under Behavior on provisioning failure, the options are to roll back all stack resources and to preserve successfully provisioned resources.

Figure 2: Stack failure options

The stack is deployed, but fails during creation due to an error with the provisioning of an EC2 resource. The stack enters a CREATE_FAILED state. At this point, all previously provisioned resources are still in place, including the failed EC2 resource.

The resources are displayed in a table with columns for logical ID, physical ID, type, status (in this example, CREATE_FAILED), status reason, and module.

Figure 3: Resources in the CloudFormation console

Because the EC2 resource is intact, log files on the EC2 instance can be inspected for any errors that occurred during instance initialization. Here is the cfn-init.log file:

[ec2-user@ip-10-192-10-55 ~]$ cat /var/log/cfn-init.log 
2021-07-27 19:34:07,704 [DEBUG] CloudFormation client initialized with endpoint https://cloudformation.us-east-1.amazonaws.com
2021-07-27 19:34:07,710 [DEBUG] Describing resource MyInstance in stack preventRollback
2021-07-27 19:34:07,824 [INFO] -----------------------Starting build-----------------------
2021-07-27 19:34:07,825 [DEBUG] Not setting a reboot trigger as scheduling support is not available
2021-07-27 19:34:07,825 [INFO] Running configSets: default
2021-07-27 19:34:07,826 [INFO] Running configSet default
2021-07-27 19:34:07,827 [INFO] Running config config
2021-07-27 19:34:15,825 [ERROR] mysoql is not available to be installed
2021-07-27 19:34:15,826 [ERROR] Error encountered during build of config: Yum does not have mysoql available for installation

Evidently, there was a typo in the EC2 instance template definition. The instance tried to install mysoql at instance creation time instead of mysql. I will fix that.

Remediating stack issues

Stacks that are set to deactivate stack rollback (that is, stacks that are set to Preserve successfully provisioned resources in the console) will enter either a CREATE_FAILED or UPDATE_FAILED state when the stack encounters an operational failure. There are now three new options in the CloudFormation console:

  • Retry: Fix the error outside of the template or stack and then retry the stack operation.
  • Update: Make changes to the template to remediate the issue and then retry the stack operation.
  • Rollback: Roll back to the last known good state.

The Stack rollback paused message says that all successfully provisioned resources are live. The console displays Retry, Update, and Roll back buttons.

Figure 4: Stack rollback paused

In this example, I edit the stack template to correct the typo in the Metadata section of the EC2 instance:

MyInstance:
    Type: AWS::EC2::Instance
    Metadata:
      'AWS::CloudFormation::Init':
        config:
          files:
            /tmp/test.txt:
              content: Hello world!
              mode: '000755'
              owner: root
              group: root
          packages:
            yum:
              mysql: []

I choose Update, provide the corrected template, and the deployment resumes. Because the EC2 instance was the only change to the original template, and all previously deployed resources are still in place, the only stack operations performed are on the EC2 instance.

The Events tab of a stack in the CloudFormation console shows that the stack was previously in a state of CREATE_FAILED, but is now in a state of UPDATE_COMPLETE.

Figure 5: Events tab

Now that I have fixed the typo in the template, the stack is now successfully deployed and entering a state of UPDATE_COMPLETE.

Although this example highlights the console behavior, this functionality is supported in the AWS CLI and CloudFormation API operations (using the disable-rollback option). For more information about how you can use API operations to deactivate resource rollback, see the stack failure options in the AWS CloudFormation documentation.

Conclusion

Deactivating stack rollback in AWS CloudFormation allows developers and builders to be more efficient. By adopting an iterative approach to provisioning infrastructure as code (IaC), you can speed up your development cycles and reduce churn.

This new capability is available for no additional charge in 23 AWS Regions:

US East (N. Virginia, Ohio), US West (Oregon, N. California), AWS GovCloud (US) (US-East, US-West), Canada (Central), Europe (Frankfurt, Ireland, London, Milan, Paris, Stockholm), Asia Pacific (Hong Kong, Mumbai, Osaka, Seoul, Singapore, Sydney, Tokyo), Middle East (Bahrain), Africa (Cape Town), and South America (São Paulo).

About the authors

Ryan Kiel

Ryan Kiel

Ryan Kiel is Solutions Architect for AWS based out of Virginia. He helps large-scale enterprises with their cloud journey on AWS by leveraging best practices and the newest technology.

Jaswanthi Meganathan

Jaswanthi Meganathan

Jaswanthi is a Senior Product Manager for AWS CloudFormation. As a product enthusiast with deep customer empathy, she works closely with AWS developers to design and launch next generation products and features.

Craig Lefkowitz

Craig Lefkowitz is a Senior Developer Advocate for AWS CloudFormation. When not writing blogs or coding, Craig works with customers helping them adopt modern cloud development and operations practices through automation. Prior to his current role, Craig worked as both an AWS Solutions Architect and AWS Professional Services consultant for enterprise customers, as well as, state and local governments. Craig can be reached directly through his Twitter account @CraigLefkowitz.