AWS for Industries

How to consider Store System’s Disaster Recovery Architecture with AWS

For retailers operating on a nationwide scale, it is important to have a disaster recovery (DR) architecture in place. It can be challenging to know what kind of architecture to implement when considering various things such as disaster assumptions, business continuity, and recovery requirements. Running store systems, such as store computers and point of sale (POS), on Amazon Web Services (AWS) has advantages in cost reduction and advanced functions.

In this blog, I will introduce the advantages of running a store system on AWS, then things to consider when determining the DR architecture that is best for your use case and AWS services which are useful for implementation.

Store Systems and Advantages of Running on the Cloud

In stores, various operations are carried out, and information systems are used for that purpose. For example, POS manages information at the point of sale of products, or the ones managing product ordering/disposal registration. Shift/attendance management for store staff can be included. There are many cases where the main store business systems operate on store computers installed in various stores.

Store computers don’t operate independently. They work with systems that operate on the cloud or in an on-premises data center. Examples include cooperation with logistics systems that respond to product orders, and association with information systems such as product and customer analysis.

The conceptual configuration of the system when a store computer is used is shown in Figure 1. Since store computers are installed at every store, many store computers are deployed in retail businesses with multiple stores, such as convenience stores and supermarkets.

Figure 1 Concept of system arrangement of store systems

Figure 1 – Concept of system arrangement of store systems

With recent advances in cloud computing technology, the target of systems that the customers attempt to operate on the cloud has expanded—including store systems. By converting store computers to the cloud, there are advantages such as:

  • Cost reduction by eliminating the need for hardware
  • Increased operational efficiency through system consolidation
  • Increased functionality due to ease of data utilization

As the number of stores increases, the number of store computers being migrated to AWS increases, so the benefits of cost reduction and operational efficiency increase. A study conducted by The Hackett Group revealed that organizations which migrated to AWS from on-premises achieved quantifiable business value such as:

  • 69% decrease in unplanned downtime
  • 43% lower time to market for new features
  • 29% increase in staff focus on innovation
  • 20% cost savings on technology infrastructure

Also, as an example of increased functionality, there is demand forecasting using machine learning. Taco Bell maximizes sales and minimizes inventory by forecasting demand using Amazon Forecast.

Concerns of Running Store Systems on the Cloud

Since systems distributed to each store are aggregated and configured in the cloud, it is necessary to take measures regarding:

  • Response performance due to inability to access completed workloads stored in the cloud due to a disaster or failure
  • Availability and reliability designed to prevent the failure of one system spreading throughout the network

Regarding response performance, AWS Direct Connect can be used to achieve stable network bandwidth. Also, caching at the edge is an effective solution. AWS Internet of Things (AWS IoT) makes it simple to update data between a local data store and the cloud.

Meanwhile, with regard to availability and reliability, the greater the number of stores the greater a DR requirement is needed. Even if a large-scale disaster occurs, it is expected that stores in regions not affected by the disaster will continue to operate using the store system on the DR site.

Figure 2 DR site failover during a large-scale disaster

Figure 2 – DR site failover during a large-scale disaster

The AWS global infrastructure corresponds to every application and provides a secure and scalable infrastructure, so you can configure your system according to your requirement. You can deploy high availability and reliability architecture using multiple Availability Zones (multi-AZ) and DR architecture using multiple AWS Regions (multi-region).

Figure 3 Store system using multi-AZ and multi-region

Figure 3 – Store system using multi-AZ and multi-region

Things to Consider When Determining a Disaster Recovery Architecture

Here are the things to consider when determining a DR architecture:

  • Think about anticipated disasters and areas of impact
  • Examine the system configuration based on the business continuity
  • Consider the system configuration according to recovery time objective (RTO)/recovery point objective (RPO)
  • Adopt a simple DR site switchover mechanism
  • Conduct training on a regular basis

Think about anticipated disasters and areas of impact
You should decide what kind of disaster to anticipate as a business continuity plan. Depending on the anticipated disaster, the scope of influenced the AWS global infrastructure and available DR site’s the deployment pattern may change.

Figure 4 – Examples of anticipated disasters and approaches

Figure 4 – Examples of anticipated disasters and approaches

Examine the system configuration based on the business continuity
When examining, it is necessary to establish a system redundancy based on business continuity. Assume that:

1) An order placed at a store is performed in the following steps:

a. Confirmation of the recommended order amount
b. Ordering
c. Cooperation with the business partner, as shown in the Figure 5

2) Separate systems are prepared for each step

If a disaster occurs in a situation where only the order system has a DR site, it is possible to place an order even though the recommended order amount cannot be confirmed. However, since cooperation with the business partner is not possible, the order cannot be completed. Orders can be completed if a DR site is prepared for a cooperation system with your business partners. In order for products to arrive at stores, it is also necessary to proceed with maintenance so that the logistics network is not interrupted.

Figure 5 – Development of system redundancy based on business continuityFigure 5 – Development of system redundancy based on business continuity

Consider system configuration according to RTO/RPO

When adopting a multi-region configuration in AWS as redundancy, configurations such as active/passive and active/active are determined according to RTO and RPO, which are recovery requirements.

Figure 6 – Recovery scenarios according to RTORPO

Figure 6 – Recovery scenarios according to RTO/RPO

Adopt a simple DR site switchover mechanism
Any configuration must be recoverable to the DR site in case of emergency. Using Amazon Route 53 Application Recovery Controller, not only in multi-region configurations, but also in multi-AZ configurations, operations are simplified through recovery readiness checks and automation. This results in high reliability.

DR mechanisms using Amazon Route 53, including the Amazon Route 53 Application Recovery Controller, are explained in detail in Creating Disaster Recovery Mechanisms Using Amazon Route 53. The blog explains the following principles of mechanisms:

  • Use data plane functions
  • Control failover from your standby region
  • Understand and reduce dependencies to make failover more reliable
  • Pre-provision critical components
  • Review authentication methods
  • Test regularly

Conduct training on a regular basis
Regular testing is important, but preparing and conducting tests in real-world scenarios is complex and time-consuming. AWS Fault Injection Simulator is a fully managed fault injection service that makes it easier for teams to perform regular testing.

For example, using AWS FIS network connectivity distraction, you can simulate an AWS region failure. You can test whether your system continues to provide service by failing over to the DR site correctly. AWS FIS also provides a number of other predefined templates that simulate real-world failures. Not only does this save teams resources and time for testing, but it also allows them to improve operational procedure through test feedback.

Conclusion

Having a DR architecture in place is especially important for retailers that operate on a national scale. However, implementing an DR architecture based on various considerations such as disaster preconditions and business continuity is a challenge.

In this blog, I introduced things to consider when determining what Disaster Recovery architecture would be best for your company and AWS services which are useful for implementation. You can deploy an DR architecture with multi-region using the AWS global infrastructure. AWS services such as AWS Route 53 Application Recovery Controller and AWS FIS can help simplify and improve your operation. Also, running store computers on AWS can provide advantages in cost reduction and advanced functions.

Contact an AWS Representative to know how we can help accelerate your business.

Further Reading

Kenji Hirai

Kenji Hirai

Kenji Hirai is a solution architect at AWS Japan. He has 20+ years of experience in the IT industry. He began his career as a database specialist and his current focus is to help retail customers with their cloud-adoption journeys. He enjoys making spiced curry.