AWS Smart Business Blog

The Importance of IT Business Continuity Planning for Small and Medium Businesses

Over the past few years we at Amazon Web Services have witnessed a dramatic shift from what we considered rare anticipated or unanticipated events to becoming a new normal and redefining what we call “business as usual.” The COVID-19 pandemic, supply chain shortages, climate change, and economic uncertainty—just to name a few events—have given rise to creating and updating business continuity plans in a rapidly volatile landscape. This can be especially difficult for small and medium businesses (SMBs) that cannot absorb these shocks as well as funded startups and large enterprises.

What is a business continuity plan and why should SMBs care?

A business continuity plan (BCP) keeps a business operational during a disaster or crisis, accommodating elements of your business beyond your IT workload. While this sounds like a disaster recovery plan, it’s important to note that a disaster recovery plan alone focuses on restoring data access and IT infrastructure after a disaster. A disaster recovery plan is actually a subset of your business continuity plan, and should not be just a standalone document.

Given the high level of interdependencies in the world, you may be asking yourself what a BCP means in this modern age. Whether you’re just now exploring the introduction of a BCP for your business, or you’re wanting to refresh and modernize around this new normal, it’s important to understand how this plan is a lot more than just disaster recovery.

Overall, your BCP will be based on impact analysis, defining disaster recovery plans, implementing plans based on impact analysis, as well as maintaining plans by routinely performing evaluations and exercises.

Circular BCP diagram depicting the five elements discussed in this blog post.

Figure 1: Five interconnected parts of an impact plan

When it comes to exploring and creating a BCP for your business, you may be faced with a number of challenging scenarios making the process feel daunting. Common challenges include:

  • Funding: Is there money to support this initiative?
  • Resource availability: Are you, your IT staff, or third-party tech vendor spending more time on the operations side of your workload instead of developing and focusing on your business goals?
  • Technical debt: Is there extra work required to accommodate for previous implementations, limitations, or gaps in workloads?
  • Experience and training gaps: If you are fortunate enough to have an in-house IT team, are they trained in cloud computing best practices? Do they certifications related to selecting AWS software and operating workloads in the cloud?

Stakeholders and decision makers filling multiple positions within their business experience these challenges in greater depth. Performing an impact analysis will help you sort through these challenges, defining desired goals and requirements while outlining the pros and cons of services and architectures selected to stay within defined guardrails. This is where the breadth and depth of AWS can shine.

With over 200 AWS fully featured services, decision makers are able to manage their workloads to address common challenges. Since many AWS managed services utilize or allow for multi-AZ (multiple Availability Zones) within an AWS region, simply migrating to a managed service or enabling multi-AZ may help you meet many of your risk mitigation needs. Not sure what an AZ is? It’s a logical datacenter in a region available for use by any AWS customer. Need vetted solutions and guidance based on your use case? Visit the AWS Solutions Library to discover solutions built by AWS experts.

Conducting a business impact analysis for your SMB

A BCP identifies impact on elements of your business within your workload but also on elements outside of your business. The process of performing an impact analysis will quantify the business impact on internal and external customers should there be a disruption to your workload. Here are some examples of important areas to consider when performing an impact analysis:

  • Impacted resources outside of the cloud
    • Identify third-party service impacts such as utility services
    • Physical infrastructure such as data centers, office space, production facilities and required technology equipment
    • Inventory including raw materials and supply chain disruptions
    • Employees
  • Impacts due to lost sales and or increased expenses
  • Regulatory fine and contractual penalties
  • Customer dissatisfaction and customer churn
  • Innovation and new product or feature release delays
  • How quickly a workload needs to be made available following an event
  • How much data loss can be tolerated and the length of a disruption
  • The cost of the disaster recovery options should be evaluated to ensure that the disaster recovery strategy provides the correct level of business value
  • A risk assessment of the possible natural disasters and geographical impacts of proposed and current infrastructure. For example, a higher probability of hurricanes and flooding when operating along coastlines

Banner driving to our interactive assessment tool

Defining key recovery elements in BCPs

After conducting a business impact analysis, the next step is to construct a BCP that ensures the disaster recovery plan aligns with the results from the analysis and meets continuity requirements. Key decision makers and individual contributors will be responsible for the different components. In general, a well-designed BCP will cover the following:

Decision makers:

  • A defined Recovery Point Objective (RPO), which is the maximum amount of data that can be lost after recovery from an event. Review each component of a workload as the RPO may vary from resource to resource potentially saving recovery costs. For example, restoring only a production database to continue customer facing service may be more important than restoring a developer database at that time.
  • A defined Recovery Time Objective (RTO), which is the maximum amount of time allowed for a service to be restored to avoid unacceptable loss or consequences. Identify and define which elements of business operations require more or less recovery time.
Diagraam showing how RPO and RTO work during a disaster

Figure 2: Diagram depicting RPO and RTO amidst a disaster

  • The cost of the disaster recovery options have been evaluated to confirm the disaster recovery strategy provides the correct level of business value.
  • Determine maximum Total Cost of Ownership (TCO) to identify if there is a disaster recovery solution that has a lower TCO than the cost estimated in the risk analysis. The AWS Pricing Calculator is a service you can use to create cost estimates to suit your AWS use cases.
  • When an event occurs, how will decision makers and stakeholders know? Who owns each component during an escalation event? Are individual contributors using run books to follow vetted recovery procedures? AWS Systems Manager Incident Manager provides escalation paths through your defined contacts.
  • Meets compliance requirements (depending on customer configuration). Visit AWS Compliance to learn more about our compliance offerings and how you can implement them.

Individual contributors, other decision makers, and service managers:

  • Automated internal tasks: Are you using automation for common infrastructure tasks? AWS Systems Manager Automation allows you to setup runbooks to automate common application and infrastructure tasks.
  • Automated internal notifications: Are you receiving automated texts, calls, or emails when an event occurs? Amazon CloudWatch can be setup to use Amazon SNS notifications for email notifications.
  • External communications: Are you automatically notifying your customers or stakeholders that an event is or has occurred, along with restoration events?
  • Backups: Are you backing up critical production data in a method that meets continuity and cost options discovered during the business impact analysis? Does this align with RTO and RPO definitions? AWS Backup is a fully managed service that enables you to centralize and automate data protection across on-premises and AWS services.
  • Testing and optimization: Has the disaster recovery plan been tested? Choose a solution that simplifies disaster recovery drills. It’s also important to continuously monitor and check for configuration drift: Testing disaster recovery.
  • Application and infrastructure support limitations: Ensure all critical applications can recover on supported infrastructure. For example, can a recovered application run on the selected EC2 instance? Are all needed resourced installed to run the application?
  • Ease of operation: Can your users and employees still operate and use the recovered workload?

Implementing a BCP

There are a number of AWS resources available to help implement business continuity for your organization. Based on the business impact analysis performed, you can choose the best set of AWS services to help you achieve BCP goals. A common path for introducing components of a BCP into your existing or new architectures is the use of AWS managed services; AWS does the heavy lifting of data center operations while removing the operational burden of managing operating systems and applications. Additionally, many AWS services allow you to adopt a consumption model, paying only for the resources that you use allowing you to decrease the cost of disaster recovery options.

How AWS can help your SMB with its plans

To address the often seen challenge of technical resource restraints, whether it’s lack of time to dedicate or a skills gap, migrating and modernizing with AWS managed services will help your organization adopt BCP best practices around monitoring, security, patching, backups, and cost optimization without all of the additional operational heavy lifting of managing the infrastructure. AWS managed services are also designed with reliability and high availability in mind, allowing you to achieve your BCP goals by simply enabling features or replacing components within your architecture to utilize AWS managed services. For example, AWS Elastic Disaster Recovery is recovery application in the cloud. It minimizes downtime and data loss with fast, reliable recovery of on-premises and cloud-based applications.

Managing the BCP program

After modernizing your workloads on AWS to meet the requirements defined during the impact analysis, it’s important to implement a continuous plan to train, test, and audit your architectures. It is also recommended to take preventive measures to ensure that your workload is functional and operational in AWS. You can use AWS Resilience Hub to continuously validate and track the resilience of your AWS workloads.

Testing the BCP program

How often you should test your disaster recovery plan is usually governed by your business policies, government regulations, and compliance. It is recommended to test your plan at least once or twice per year, documenting and fixing any gaps that you identify in these tests. Similarly, you should update all security and data protection strategies frequently to prevent inadvertent unauthorized access.

You can perform non-disruptive tests to confirm that implementation is complete if you are using Elastic Disaster Recovery. It reduces downtime and data loss with the fast, reliable recovery of on-premises and cloud-based applications.You can quickly recover operations after unexpected events, such as software issues or data center hardware failures. It is also a flexible solution, so you can add or remove replicating servers and test various applications without specialized skill sets.

Upskilling your team

Along with a routine training plan, customers can use AWS digital or classroom trainings to provide on going continued learning around AWS services, features, and best practices based guidance. There are also AWS Partner Network consultants specialized in designing and implementing business continuity planning for customers.

Next steps

In this blog we discussed about common challenges for SMB customers in planning and implementing business continuity. We also looked into considerations for conducting a business impact analysis and role of stakeholders in defining key elements of BCP. You can always refer to AWS Business Continuity Plan Whitepaper to get some guidance around best practices for BCP in AWS. AWS Resilience Hub provides a central place to define, validate, and track the resilience of your applications on AWS. Training and Certification is a great place to start your journey for BCP in AWS. Contact us if you’re ready to create or modernize your BCP.

Shikhar Mishra

Shikhar Mishra

Shikhar Mishra is a Sr. Solutions Architect who supports SMB customers at AWS. He has over 15 years of extensive hands-on experience in Solution Architecture and Implementation. Shikhar is proficient in leading architectural design sessions, developing proof of concepts/pilots, and implementing cloud projects. He is based in the Washington, D.C. area (US).

Dennis Thurmon

Dennis Thurmon

Dennis Thurmon is a Solutions Architect who supports SMB customers at AWS. He is an ultra-learner who enjoys diving deep into the core of a customer's business goals and understanding the unique challenges per customer across many different verticals. He holds a degree in Computer Science from the Missouri University of Science and Technology and is based in Florida (US).