AWS Cloud Operations Blog
Accelerate application development with AWS CloudFormation by preventing stack rollback
AWS CloudFormation helps minimize downtime when you are deploying application and infrastructure resources. By default, it supports a deployment safety approach (that is, one in which it rolls back to the last known good state whenever an error is encountered during the deployment of stack resources). This works well for production systems, but might not be a good fit for a development workflow. In this blog, we explore a new feature in CloudFormation that helps accelerate development workflows by preventing stack rollback and allowing stack operations to be retried from the point of failure.
This option allows development teams to move faster, which is critical for software development; especially to customers such as Formula One. As Martynas Juras, a cloud engineer at Formula One, told us “It’s great that the template keeps provisioning resources even though a resource has failed to create. Deploying the second time round was much quicker – a massive improvement there, as to us, Speed is key”.
Consider the following example template that creates a sample VPC that follows best practices of Multi-AZ support with public and private subnets:
Figure 1: Example template
This template consists of 22 unique CloudFormation resources, even before you add a single Amazon Elastic Compute Cloud (Amazon EC2) instance. You can see how quickly the resource count can add up as applications become more complex. What happens if there’s a problem with one of these resources during deployment?
When CloudFormation encounters an error during stack deployment, the default behavior is to roll the stack back to its last known good state. Rolling back includes canceling any pending operations, reverting any changes to existing resources, and deleting any newly created resources. Rolling back a stack create operation is particularly cumbersome for development because the stack will enter a terminal state where the only option is to delete the stack and start over. In the example in Figure 1, if the final resource failed during creation, the stack would have to be deleted. A redeployment would require deploying those 22 resources again.
Rolling back to a last known good state is ideal for a production deployment that is focused on minimizing downtime, but less desirable for deployments where an iterative workflow is more efficient.
Deactivate stack rollback
To provide more flexibility in the stack development process, CloudFormation recently introduced the ability to deactivate stack rollback. This means if CloudFormation encounters an error during stack deployment, provisioned resources are left in place; developers can debug the stack, fix any issues, and continue deployment.
You might be thinking this doesn’t seem like a new feature. The disable rollback option (for stack create operations) has been around for a while! What’s new is the ability to perform additional stack operations (that is, a stack in a CREATE_FAILED state is no longer terminal) and to deactivate rollback in both create stack and update stack operations.
I’ll show you what this looks like when a stack encounters an issue with provisioning an EC2 instance.
Stack behavior on resource failure
In the following example, a new stack is deployed with the stack options set to prevent rollback. Under Stack failure options, choose Preserve successfully provisioned resources.
Figure 2: Stack failure options
The stack is deployed, but fails during creation due to an error with the provisioning of an EC2 resource. The stack enters a CREATE_FAILED state. At this point, all previously provisioned resources are still in place, including the failed EC2 resource.
Figure 3: Resources in the CloudFormation console
Because the EC2 resource is intact, log files on the EC2 instance can be inspected for any errors that occurred during instance initialization. Here is the cfn-init.log
file:
Evidently, there was a typo in the EC2 instance template definition. The instance tried to install mysoql at instance creation time instead of mysql. I will fix that.
Remediating stack issues
Stacks that are set to deactivate stack rollback (that is, stacks that are set to Preserve successfully provisioned resources in the console) will enter either a CREATE_FAILED or UPDATE_FAILED state when the stack encounters an operational failure. There are now three new options in the CloudFormation console:
- Retry: Fix the error outside of the template or stack and then retry the stack operation.
- Update: Make changes to the template to remediate the issue and then retry the stack operation.
- Rollback: Roll back to the last known good state.
Figure 4: Stack rollback paused
In this example, I edit the stack template to correct the typo in the Metadata section of the EC2 instance:
I choose Update, provide the corrected template, and the deployment resumes. Because the EC2 instance was the only change to the original template, and all previously deployed resources are still in place, the only stack operations performed are on the EC2 instance.
Figure 5: Events tab
Now that I have fixed the typo in the template, the stack is now successfully deployed and entering a state of UPDATE_COMPLETE.
Although this example highlights the console behavior, this functionality is supported in the AWS CLI and CloudFormation API operations (using the disable-rollback option). For more information about how you can use API operations to deactivate resource rollback, see the stack failure options in the AWS CloudFormation documentation.
Conclusion
Deactivating stack rollback in AWS CloudFormation allows developers and builders to be more efficient. By adopting an iterative approach to provisioning infrastructure as code (IaC), you can speed up your development cycles and reduce churn.
This new capability is available for no additional charge in 23 AWS Regions:
US East (N. Virginia, Ohio), US West (Oregon, N. California), AWS GovCloud (US) (US-East, US-West), Canada (Central), Europe (Frankfurt, Ireland, London, Milan, Paris, Stockholm), Asia Pacific (Hong Kong, Mumbai, Osaka, Seoul, Singapore, Sydney, Tokyo), Middle East (Bahrain), Africa (Cape Town), and South America (São Paulo).