Disaster recovery has strengthened critical grid operations and maintains regulatory compliance
What is our primary use case?
AWS Elastic Disaster Recovery is utilized for recovering on-premises and cloud applications onto AWS by continuously replicating servers to a staging area. In our organization, we have had a huge case to ensure business continuity and rapid recovery of our critical IT and OT systems. This includes our data centers, our SCADA systems, our metering platforms, our customer billing systems, and the telemetry from substations and field assets. Our goal is to minimize the downtime, maintain safe and reliable electricity across the state, and support strategic objectives of operational excellence and regulatory compliance. Disaster recovery is centered around protecting and quickly restoring our critical systems that support the state's electricity transmission and distribution network.
Given that our organization manages high-voltage transmission lines and lower-voltage distribution networks, any IT or OT downtime can directly affect hundreds and thousands of customers. It can destabilize the grid and affect operational safety. AWS Elastic Disaster Recovery is essential for replicating the data center workloads, protecting our OT and SCADA systems, supporting disaster recovery planning and testing, and ensuring regulatory compliance and operational resilience.
Our main use case, along with our utilization of AWS Elastic Disaster Recovery, is for regular, non-disruptive disaster recovery drills. Routinely, our team runs failover simulations of both the IT and OT systems without affecting production, which validates the recovery time objectives of under four hours and the recovery point objectives of near zero data loss. We also use AWS Elastic Disaster Recovery to integrate with our OT telemetry and our asset data. Unlike many other organizations that focus just on IT, we focus on replicating even our SCADA systems, our pole telemetry, and our substation data so that field crews can respond immediately during network events.
Our teams also utilize AWS Elastic Disaster Recovery for prioritizing data for regulatory compliance. We classify the workloads based on criticality. Customer billing, outage management, and compliance reporting get the highest priority replication, while less critical analysis or internal reporting are replicated on a lower schedule. We utilize a hybrid recovery approach. AWS Elastic Disaster Recovery allows us to combine our on-premises data centers with cloud replication, giving us flexibility and cost efficiency while keeping sensitive operational data secure and compliant with state and federal requirements.
Several features of AWS Elastic Disaster Recovery stand out for us. The most critical is the continuous block-level replication, ensuring our critical IT and OT workloads are replicated in near real-time, which minimizes data loss during failover events. This is crucial for our SCADA systems, customer billing, and outage management because in these scenarios, even small delays or lost data can impact safety and service reliability. Another feature is the non-disruptive DR testing, which helps us perform DR drills without affecting production. This allows us to validate the recovery plans for both IT and OT systems. For example, if we want to simulate failover for substations and the field asset telemetry regularly, it does not affect our operations.
Another feature we appreciate is the automated orchestration of recovery, as AWS Elastic Disaster Recovery provides predefined recovery workflows we can use to spin up the replicated systems quickly, including networking, IP configurations, and dependencies. During last year's storm, we reduced the manual recovery effort by over 60%. The hybrid deployment flexibility of AWS Elastic Disaster Recovery also supports both on-premises data centers and cloud failover, allowing us to protect sensitive operational data while leveraging AWS scalability during high-demand recovery. Another feature we use day-to-day is their integration with monitoring and alerting, which lets us integrate AWS DRS with our monitoring tool to alert teams instantly if a replication falls behind schedule or if a recovery agent is triggered, ensuring proactive response and minimizing downtime.
We have deployed AWS Elastic Disaster Recovery in a hybrid cloud setup. We use Amazon Web Services (AWS) as the public cloud provider in our hybrid cloud setup.
What is most valuable?
I can provide a specific example of a situation where AWS Elastic Disaster Recovery helped us recover our critical systems. Last year, we had storms in the state with high winds and flooding in many areas that caused damage and threatened two of our data centers. Using AWS Elastic Disaster Recovery, we were able to fail over our customer management and metering systems to the AWS cloud within a few hours. When we replicate and fail over our customer management, metering, and outage tracking systems to the AWS cloud, we were able to upload to the AWS cloud in just under three hours, compared to an estimated 36 to 48 hours had we done it through manual recovery. Hundreds of thousands of customers were affected by the storms, but since we were able to replicate it to the AWS cloud in under three hours, our customers continued to receive accurate outage notifications through SMS and email because the replicated systems remained operational. Our field crew teams also had real-time access to the pole IDs, substation telemetry, and the asset status that helped them in improving restoration efficiency. This reduced the average time to restore power per affected area by 25%. Billing and regulatory reporting data were fully intact, which helped us prevent any errors and ensured compliance with the Australian Energy Regulator requirements. AWS Elastic Disaster Recovery has allowed us to maintain critical operations during high-impact natural disasters, protecting both our customers and our assets while demonstrating measurable improvements in our response time and regulatory compliance.
Continuous block-level replication and automated orchestration have significantly helped our team in daily operations. I can relate this to last year's storm, during which several substations in our state experienced partial outages. Thanks to the continuous block-level replication, all the telemetry from SCADA systems, pole inspections, and customer meter readings were still up-to-date in the cloud. This allowed our control center to monitor real-time network conditions without relying on compromised on-site servers. Automated orchestration means that if one data center server in our state went offline, AWS Elastic Disaster Recovery automatically spins up the replicated systems in the cloud, including network configuration and monitoring dashboards. This reduced what would have been a manual 10 to 12-hour effort down to less than three hours. The monitoring integration also played a key role because alerts were triggered immediately when replication lag approached the thresholds, helping our teams proactively address issues even before any customer impact occurred. For our daily teams, these features provide our field crews and control center staff the confidence that our critical operational data, such as outage reports, asset condition, and customer information will always be accurate and available, helping teams prioritize restoration, maintain safety, and comply with AER reporting requirements.
One very small but handy feature is AWS Elastic Disaster Recovery point-in-time recovery snapshots. On one end, continuous replication keeps the data current, but having these point-in-time recovery snapshots allows us to quickly roll back specific systems just in case a configuration error happens or if corrupted data is accidentally pushed without affecting other replicated workloads. Another feature that doesn't always get highlighted is that AWS Elastic Disaster Recovery supports both IT and OT workloads. Many disaster recovery tools focus just on IT, but the ability to replicate operational technology data, such as SCADA systems and pole telemetry, gives our field crews real-time access during outages, which has been invaluable during natural disasters including storms and extreme weather events. AWS Elastic Disaster Recovery also has minimal bandwidth and storage overhead for replication, helping us manage costs effectively while maintaining robust disaster recovery capabilities across our thousands of kilometers of network and hundreds of thousands of customers.
AWS Elastic Disaster Recovery has positively impacted our organization. Prior to using it, recovering our critical IT and OT systems after an outage could take anywhere between 10 to 12 hours manually. However, after implementing AWS Elastic Disaster Recovery and using its automated orchestration, we can now restore systems in under three hours. This means we have reduced our downtime by more than 70%. We have improved our data reliability, ensuring that telemetry, SCADA, and customer metering data are always up-to-date, which has reduced errors in operational decision-making. In terms of costs, we utilized cloud failover instead of building a full secondary on-premises disaster recovery site, resulting in an avoided capital expenditure of approximately $1.2 million while maintaining regulatory compliance with AER. We have saved time, money, and ensured customer data is up-to-date, allowing our teams to quickly generate compliance reports and outage logs, meeting AER timelines without last-minute scrambling. Our field crews and control center staff also have instant access to all our replicated OT and IT systems during any emergency, enabling faster response times and safer operations. These are some specific outcomes thanks to AWS Elastic Disaster Recovery.
Since implementing AWS Elastic Disaster Recovery, we have seen improvements in customer satisfaction and regulatory audits. During the storm last year, the replicated SCADA and metering systems allowed us to communicate outage status in near real-time, rather than relying on delayed manual reporting, which improved our customer response metrics by roughly 25% due to faster updates and restoration notifications. Our customers were able to plan their time more effectively during outages, reducing frustrations. Regarding regulatory audits for the Australian Energy Regulator, having continuous, accurate, and auditable data from AWS Elastic Disaster Recovery has simplified our submissions. Recovery logs, outage timelines, and asset status were available immediately, which has helped us reduce our time spent on manually documenting regulatory reports by about 40% and minimized our risk of non-compliance. Our field teams reported that having up-to-date cloud-accessible OT data such as pole conditions or substation status reduces guesswork and improves their safety. The accelerated restoration work has boosted their confidence in operational decisions.
What needs improvement?
A couple of things where AWS Elastic Disaster Recovery could improve are the granular testing of OT workloads. It would be helpful to have fully isolated test recoveries for our OT data, such as SCADA or pole telemetry, without impacting replication, to help validate disaster recovery readiness more frequently. Additionally, advanced reporting and analytics would be beneficial. If the tool could provide more built-in dashboards to show replication lag trends, failover readiness, or system dependencies, it would save time and improve transparency for both field teams and regulatory reporting.
In terms of integration, tighter integration with our asset management systems and GIS databases would streamline automated recovery of linked OT systems and data relationships, making failover more efficient. There should also be more fine-grained alerts for replication lag or orchestration failures, with customizable thresholds for different types of workloads to improve proactive incident response.
My advice would be to start with a clear disaster recovery strategy. Identify which IT and OT systems are critical, calculate the recovery time objective, and which assets need replication first. Keep latency-sensitive or legacy OT systems on-premises while replicating core IT workloads to AWS for fast, reliable failover. It is essential to keep testing failovers regularly, as it builds confidence and uncovers gaps that help ensure smooth operation during real incidents. Actively monitor costs by paying attention to replication storage and compute usage since AWS Elastic Disaster Recovery is pay-as-you-go, which allows us to save thousands of dollars annually. Connecting disaster recovery events with field operations, SCADA systems, and asset management dashboards streamlines operational responses. The AWS team is great, and engaging with their support and architects, along with their documentation and best practices, is very helpful.
For how long have I used the solution?
I have been using AWS Elastic Disaster Recovery for one and a half years.
What do I think about the stability of the solution?
AWS Elastic Disaster Recovery is stable and reliable.
What do I think about the scalability of the solution?
AWS Elastic Disaster Recovery is scalable and has handled growth in our organization well.
How are customer service and support?
I have not interacted with their customer support team.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
I did not previously use a different solution for disaster recovery.
How was the initial setup?
My experience with pricing, setup cost, and licensing for AWS Elastic Disaster Recovery was that there was minimal upfront capital expenditure compared to building a traditional secondary disaster recovery site. Most costs were operational in nature and based on the amount of data replicated and storage used in AWS. The licensing model uses a pay-as-you-go model, so we only pay for the replication storage, compute used during failover, and additional orchestrated testing. There is no heavy licensing fee, making it scalable and cost-efficient as our network and data grow. Overall, the flexibility and transparency of AWS pricing have been a major advantage.
What about the implementation team?
We did not purchase AWS Elastic Disaster Recovery through the AWS Marketplace.
What was our ROI?
I have seen a return on investment. We saved around $1.2 million in capital expenditure by avoiding a dedicated secondary on-premises disaster recovery site. Our downtime recovery time used to be 10 to 12 hours; now it has dropped to under three hours. We have reduced dedicated disaster recovery personnel hours by almost 30%, allowing staff to focus more on proactive network maintenance and asset management. In terms of regulatory and audit readiness, we have reduced preparation time for compliance checks by roughly 40%.
What's my experience with pricing, setup cost, and licensing?
There is no heavy licensing fee, making it scalable and cost-efficient as our network and data grow.
Which other solutions did I evaluate?
Before choosing AWS Elastic Disaster Recovery, we evaluated traditional on-premises disaster recovery sites and considered building a secondary data center locally, but rejected that option due to high capital expenditure. We also evaluated VM replication tools such as Veeam and Zerto for virtual machine replication, but those lacked seamless orchestration for hybrid IT and OT systems. Other cloud-based DR solutions, such as Microsoft Azure Site Recovery and Google Cloud's disaster recovery offerings, were also considered, but we went with AWS Elastic Disaster Recovery because it aligned best with our existing AWS workloads, offering continuous block-level replication, automated orchestration, and simple pay-as-you-go pricing.
What other advice do I have?
My overall review rating for AWS Elastic Disaster Recovery is 9.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Cross-region recovery has protected critical apps and reduces downtime with proactive alerts
What is our primary use case?
My main use case for AWS Elastic Disaster Recovery is for any databases or applications when they go down on a cross-region. For instance, when an application is spinning up into multiple regions, we lost one, and AWS Elastic Disaster Recovery helped us recover. In that situation, when there was an event that happened in the cloud stack, AWS Elastic Disaster Recovery helped us get things back up and running. Although this happened only once, we would like to have this multi-region, multi-data center level recovery for disaster recovery, so we are incorporating this technology.
How has it helped my organization?
AWS Elastic Disaster Recovery has positively impacted my organization. We have a priority one application that was recently deployed, and it was important for us to recover the data when the cloud stack went down. Since deploying AWS Elastic Disaster Recovery, we have mostly seen an improvement in uptime, which contributes to reducing downtime.
What is most valuable?
The best features AWS Elastic Disaster Recovery offers are the insights and alerting, which inform developers or application developers about what's going on and how the system is running.
The insights and alerting features help my team day-to-day by allowing SREs to know when an event has happened and how we are supposed to be doing recovery. They provide alerts to the SREs and groups that are subscribed, and they are alerted early. I am currently exploring the features, but for now, I find it very useful in the event of the disaster that happened.
What needs improvement?
I think insights are an area for improvement. It would be beneficial to get some insights when a disaster happens, including identification and probable solutions to ensure effective recovery. That insight and solution suggestion area is the main thing I would want to see improved.
We believe that customer support for AWS Elastic Disaster Recovery needs to be improved because although we do raise tickets, the response can take some time.
For how long have I used the solution?
I have been using AWS Elastic Disaster Recovery for two years.
What do I think about the stability of the solution?
AWS Elastic Disaster Recovery is stable. It is definitely a stable application.
What do I think about the scalability of the solution?
The scalability of AWS Elastic Disaster Recovery is good. We can expand it to multiple data centers or different areas such as EMEA and APAC.
How are customer service and support?
I would rate the customer support an eight, as it often takes a lot of time to engage and get a solution. About eighty percent of the time, I think it will be resolved quickly.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
Previously, we were using a homegrown application that tracks these systems before switching to AWS Elastic Disaster Recovery.
How was the initial setup?
We did purchase AWS Elastic Disaster Recovery through the AWS Marketplace, but it's mostly the procurement team that has handled that. The management, particularly the procurement team, looks at pricing and setup costs, so I know a little about pricing, but I'm not directly involved in it.
What about the implementation team?
We are just customers and consume a lot of AWS services, and do not have another business relationship with this vendor.
What was our ROI?
We have seen a return on investment by needing fewer employees for maintenance and related matters. We no longer have to schedule employees on weekends since the system automatically triggers alerts, allowing engineers to respond as needed.
Which other solutions did I evaluate?
We did not evaluate other options before choosing AWS Elastic Disaster Recovery.
What other advice do I have?
My advice for others looking into using AWS Elastic Disaster Recovery is to definitely consider it if you are scaling your applications significantly, especially if your applications are spanned across different regions. I would give this product an eight out of ten because it's a fair score. The education of our technology and operations or SRE teams is needed since most people don't know, only a few do. I suggest that improvement in customer service for disaster recovery and the alerting system would be great.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Seamless service management and integration with good flexibility
What is our primary use case?
I use the solution to deploy a Docker image application. It is hosted on GitHub, and the servers we run on are not ECR.
What is most valuable?
What I like about ECR AWS is that it is a fully managed service, so I don't need to manage the underlying infrastructure or worry about scalability in AWS concerning building, maintenance, security, and high availability.
It offers seamless integration with services like ACL, EKS, and Fargate for deploying containerized applications. It works great with AWS, and it is flexible to use a public repository for open-source projects or a private repository for secure storage.
What needs improvement?
In its current state, ECL integrates with CloudWatch for basic logging and monitoring, yet improvements could include more detailed logs for specific actions, like when I perform actions such as push or pull. This would detail user activity directly in the ACL console for easier debugging and auditing.
Additionally, an improved AWS pricing model is needed. AWS charges for storage and data transfer, which can add up, especially with large images or frequent pulls. Improvement should focus on offering more storage or better volume discounts for long-term use. It would also be beneficial to allow free pulls within the AWS account and vision.
Moreover, image scanning for vulnerabilities can sometimes be slow, especially for large images. Speeding up the scanning process or providing optimized scanning for critical workflows would be welcome advancements.
For how long have I used the solution?
I have used it for about seven months now.
What do I think about the stability of the solution?
Since the time I have been using ECL, my application on AWS has not broken down. I have not had any issues with it for now. It is working well. It is very good and very reliable.
How are customer service and support?
I never had to contact the support team.
Which solution did I use previously and why did I switch?
I didn't really use Azure. However, that was in my last organization before I joined this new one.
What other advice do I have?
I would rate AWS nine out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Managed services with seamless integration and good reliability
What is our primary use case?
Our human resources solution is used by higher management competency. This is critical to the organization since it is used by higher management. ITM is really essential for the organization.
What is most valuable?
For the past year, I have been using AWS, as there was previously no native replication service available. Initially, they offered services like CloudEndure, which was a third-party service. This caused problems with integrations with existing servers. However, with AWS Elastic Disaster Recovery Service being a native service, integration is seamless. Moreover, since it is a managed service, I reduce my time to manage infrastructure and applications, which adds another benefit.
What needs improvement?
Since I have to view everything on the console, the previous application solutions like IBM and Sanavi showed the RPO and RTO status directly. In AWS Disaster Recovery Service, these details are not available, making it difficult to check my replication status. I have to calculate whether my data is replicated to the Adarabad region or not. These features, if available in AWS, would be beneficial.
For how long have I used the solution?
I have been using it since 2019.
What do I think about the stability of the solution?
AWS is not difficult, but the cost associated with replicating data to another region can be significant. This is due to services like the duplication server, which continuously runs in AWS. I have more than 200 hosts, including email solutions and others, which contribute to the high cost. Cost is a concern. Otherwise, the service is reliable.
How are customer service and support?
Customer service is quite helpful. I have AWS enterprise-level support, which is very beneficial. In case of any issue, they are ready to provide support within the defined SLA timeline.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
Earlier, I worked with IBM Sonavi. I stopped using it since we moved from on-premise to cloud. It's not in use right now.
How was the initial setup?
There were no issues during the initial setup.
What about the implementation team?
The implementation is actually managed by our partner. I have taken a rate per user storage. The licensing part is completely managed by the partner.
What was our ROI?
For the past year, I have been using AWS, as there was previously no native replication service available. Initially, they offered services like CloudEndure, which was a third-party service. This caused problems with integrations with existing servers. However, with AWS Elastic Disaster Recovery Service being a native service, integration is seamless, highlighting the return on investment.
What's my experience with pricing, setup cost, and licensing?
The setup is actually managed by our partner. I have taken a rate of per user. Licensing is completely managed by the partner. I am paying per user and per GB storage cost, while the infrastructure cost is separate.
What other advice do I have?
Although no financial benefit from using it has been observed, I recommend the solution. The overall product rating is eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
CloudEndure Disaster Recovery: A Reliable Cloud-Based Solution with Room for Improvement
What do you like best about the product?
Best things about CloudEndure Disaster Recovery is Ease of Use, Automated Replication and Recovery,Cost-Effective,Minimal Downtime,Multi-Platform Support,Replication Across Regions,Non-Disruptive Testing,Scalability
What do you dislike about the product?
Somethings which i dislike are Costs and Pricing Complexity,Dependency on AWS Ecosystem,Initial Replication Time,Complexity for Non-AWS Users,RTO and RPO Limitations,Lack of Physical Server Support,Support and Response Time
What problems is the product solving and how is that benefiting you?
CloudEndure Disaster Recovery solves the critical challenges of data loss, downtime, and costly disaster recovery solutions. Its cloud-based approach, automation, and scalability benefit businesses by providing a cost-effective, reliable, and easy-to-manage disaster recovery solution, ensuring business continuity and minimizing potential losses during disruptive events