Runbook automation has reduced incident response time and now improves uptime and collaboration
What is our primary use case?
Our main use case for PagerDuty Operations Cloud is for alerting purposes whenever any kind of downtime or downstream incident happens with our application which causes any downtime, and PagerDuty Operations Cloud will alert us through calls and SMS so we can get notified and quickly remediate the issue.
A unique aspect of our main use case with PagerDuty Operations Cloud is using the Runbook flow. Whenever we experience a specific kind of incident, the Runbook will trigger automation to either remediate the issues or perform root cause analysis, thus enhancing our workflow automations.
What is most valuable?
PagerDuty Operations Cloud helps our team respond by increasing our response time. Whenever there is any incident, we will get notified and through PagerDuty Operations Cloud, we receive calls 24/7, allowing us to instantly get into a call or investigation and remediate the issue as early as possible. This way, PagerDuty Operations Cloud helps us reduce the MTTR and ensures our application is more reliable and resilient.
We have been using the Runbook automation feature for building automated flows that help us add extra monitoring for specific alerts or incidents and perform remediation tasks autonomously using this Runbook flow.
One feature I particularly appreciate about PagerDuty Operations Cloud is that it offers multiple notification options. I receive alerts via call as well as SMS, which is beneficial. If I miss the call, I may still receive the SMS and vice versa.
Through PagerDuty Operations Cloud, our MTTR has been reduced by at least 30% over the last year due to its instant notification features like SMS and calls, which help us jump on calls quickly to remediate issues. This reduction has impacted our application downtime, ensuring an uptime of approximately 99% throughout the year.
What needs improvement?
One suggestion for improving PagerDuty Operations Cloud is to provide more insights about incidents, such as root cause analysis or additional information, which could assist SRE teams in reducing remediation time and incident detection before jumping on a call.
From an integration point of view, everything is functioning well. However, we primarily use the desktop interface as our main tool, and adding more details on incidents directly from PagerDuty Operations Cloud's analysis would enhance the user experience.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud for the last three years.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is absolutely stable. We have never experienced any downtime or latency issues from PagerDuty Operations Cloud.
What do I think about the scalability of the solution?
We don't have much insight on scalability, as a separate enterprise PagerDuty Operations Cloud team is responsible for handling all scaling activities.
How are customer service and support?
We have internal enterprise support within the application, which is very interactive. They escalate issues to the external PagerDuty Operations Cloud team when necessary, and they are very supportive.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
We have not previously used a different solution. PagerDuty Operations Cloud is the first alerting tool I have been using since the beginning.
How was the initial setup?
PagerDuty Operations Cloud onboarding is pretty straightforward in our organization, as new candidates simply need to be part of specific Windows AD groups to complete the onboarding process and gain access.
What about the implementation team?
There are automations in our organization that connect PagerDuty Operations Cloud to other ticketing tools such as Jira and ServiceNow. Whenever an incident occurs, automation that uses the Runbook flow triggers to extract data from the PagerDuty Operations Cloud alert to create incidents and Jira tickets for the development team.
What was our ROI?
In terms of return on investment, we have reduced our MTTR by 30% in the last year, indirectly improving our application's uptime to nearly 99%, which enhances client experience and boosts our business.
What's my experience with pricing, setup cost, and licensing?
I have no personal experience with pricing, setup costs, or licensing, as a separate enterprise PagerDuty Operations Cloud team manages those processes.
What other advice do I have?
The escalation policies within PagerDuty Operations Cloud are user-friendly and customizable, allowing us to set up multi-level escalations from SRE engineers to SRE leads and then to management.
PagerDuty Operations Cloud helps our team collaborate during incidents by automatically updating incident status based on progress. We have alerting integrated with Slack for this, where incidents show as red when active, yellow when acknowledged, and green when resolved.
Regarding performance metrics, there is a dedicated enterprise PagerDuty Operations Cloud team that handles monitoring, so as an SRE, I don't need to manage these performance aspects myself.
My advice to others looking into using PagerDuty Operations Cloud is that it is one of the best tools in the market for production support and SRE engineers. It is essential for our operations, functioning as our bread and butter.
We have covered almost everything regarding PagerDuty Operations Cloud. It has been a great tool for SRE and production support teams, and we look forward to more features, especially with trending technologies like AI. I would rate this product an 8 out of 10.
Automated on-call scheduling has reduced manual effort and now keeps holiday coverage reliable
What is our primary use case?
My main use case for PagerDuty Operations Cloud is to set up shifts for people on-call.
A specific example of how I use PagerDuty Operations Cloud for setting up shifts is for when we need to set up shifts for holidays. In our team, we'll assign people who will be on-call and create an Excel sheet and upload it to PagerDuty. It works normally, gives notifications, and everything else functions properly. It is very easy to set up and manage.
I usually discuss with my team who will be on-call during holidays, and we will set up how many people are needed. We create an Excel sheet, upload it to PagerDuty, and set up the line of who is the first person to reach, and if they miss it, then whom to escalate to. The web view and website are also very easy to use. I think this is the normal use case. Perhaps other teams are using it differently, but this works well for us. Before, it was very manual, and it was quite difficult.
What is most valuable?
The best features PagerDuty Operations Cloud offers are that it is simple to set up and supports Excel sheet uploads, which was very helpful. Setting up notifications and the integration with Datadog was excellent. We can automate many things.
PagerDuty Operations Cloud has positively impacted my organization because the support team is very happy. Before, setting up everything was very difficult. Now, we don't have to think about it. We can simply set it up in PagerDuty and it works. The escalation and everything simply works with the configuration we set up six months to one year ago, and it still functions. We make only minor changes. I think a lot of manual effort has been reduced, and the system is more reliable.
Since implementing PagerDuty Operations Cloud, before the L1 team had to stay online at night, and if someone fell asleep and missed an issue, it would easily escalate to a manager or someone higher up, creating a lot of fuss. That is almost gone now. The discussion part about deciding who will be on-call and setting that up was not as foolproof when we were creating it manually, and someone had to invest a lot of time, around one or two hours weekly. Now, it takes simply less than five minutes. Every week, we simply discuss and it's done. I think a lot of time has been saved, and a lot of mental effort has been saved.
What needs improvement?
I think the view on the website regarding how we see the chart and graph of who is on-call at what time could be improved. We could make that line more expressive to show who will get escalated if someone misses.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is stable; we didn't find any bugs or unintended behavior.
What do I think about the scalability of the solution?
PagerDuty Operations Cloud is scalable; we can easily add teams, manage tags, and create teams. It is very easy to manage, and adding the line of priority and deciding whom to go first was very easy.
How are customer service and support?
The customer support is adequate; usually, they respond and help us fix issues during integration. It was helpful.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
Before using PagerDuty Operations Cloud, there was no solution in place. The L1 team was the one who checked the issues and called the developers, asking them if the error was related to them. This involved manually calling fifteen to twenty developers, which would take half an hour, and the issue would have persisted long enough, reducing the reliability of the site. Now it is automatic and very effective.
What was our ROI?
I have seen a return on investment; a lot of time has been saved. As I mentioned earlier, it would take a lot of manual effort before. Sometimes by mistake, two or more than one person would be assigned on-call, and it was not foolproof. The escalation was not possible at all before, which led to the L1 team being under too much stress. Now, it is not that severe; the L1 team had to coordinate with many people and call many people from their phones when they got an error. It was actually very bad. Now, PagerDuty escalates and will call them, and if it belongs to them, they will join. It is much more efficient and much less stressful.
Which other solutions did I evaluate?
We were not involved in evaluating other options; I think the higher team decided to go with PagerDuty, and we are happy with it.
What other advice do I have?
I don't want to add anything else about the features; we use this much and it's great. We don't want anything more for now. I don't think there is anything to improve; we are using PagerDuty Operations Cloud to set up on-call duty and it works. I chose a rating of nine because there may be some improvements in the future. My advice to others looking into using PagerDuty Operations Cloud is that the feature of on-call duty and setting up the on-call person are excellent. You can simply proceed with it, and even if teams are big, it will not be annoying or feel overwhelming. Just set it up and forget it; that's all. It is very effective. I have no additional thoughts about PagerDuty Operations Cloud before we wrap up; it is excellent. You can adopt it if you don't have any special needs; it is commonly accepted and effective. I gave this review a rating of nine out of ten.
Automated alerts have improved incident response in banking operations but calling notifications still need refinement
What is our primary use case?
My main use case for
PagerDuty Operations Cloud is to set up alerts for any failures, such as when one server is down, a particular service is down, or when APIs are not responding due to technical issues, with PagerDuty triggering an alert and also calling my personal mobile number to notify me about the issue, allowing me to acknowledge that I am looking into it and take necessary actions. I can give an example of a situation where
PagerDuty Operations Cloud helped us handle an incident, such as when our payment system was about to go down. During that time, we usually monitor the system manually, but there are incidents where an automated system works more efficiently than a human. PagerDuty Operations Cloud identified the issue first by alerting us that something went wrong with the servers or services, which enabled us to contact the DevOps and Dev team to identify the exact issue in our banking app, highlighting how helpful PagerDuty Operations Cloud has been from the beginning. PagerDuty Operations Cloud is very helpful for monitoring purposes, allowing us to set up multiple alerting methods such as SMS alerting, email alerting, and call alerting, all of which we commonly use, proving its usefulness across various banking services, with teams including Dev, DevOps, and SecOps relying on it heavily.
What is most valuable?
The best features of PagerDuty Operations Cloud include the SMS and call alerting functions, which I find very beneficial compared to other tools I have used, such as Coral Logics and Sumo Logic, which primarily focus on email alerting. PagerDuty Operations Cloud goes a level beyond by notifying users through SMS and calls, allowing us to tag the concerned teams to address specific issues promptly. In addition to those features, I also find the integration and reporting aspects of PagerDuty Operations Cloud valuable, as it records all triggered calls and incidents, enabling us to analyze patterns and identify the times when systems go down, thus assisting us in understanding and addressing the underlying causes. PagerDuty Operations Cloud has positively impacted our organization by ensuring that our banking applications, which operate 24/7, remain functional and efficient, contributing to better service availability.
What needs improvement?
I would suggest that the calling service could be adjusted since if a system goes down, the user receives continuous calls, which can be overwhelming. Once a user acknowledges the alert, it may be unnecessary to continue calling again and again, as it distracts from their work on resolving the issue. That is the only improvement I can suggest regarding the calling aspect of PagerDuty Operations Cloud.
For how long have I used the solution?
I have been working in my current field for around three or more years.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is stable and provides accurate information.
What do I think about the scalability of the solution?
The scalability of PagerDuty Operations Cloud is very good, and it can handle increasing traffic as our organization expands without any issues.
How are customer service and support?
I have not needed to contact the customer support of PagerDuty Operations Cloud, as they provided thorough knowledge transfer once they handed over the services, and since then we have not encountered issues requiring support.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
PagerDuty Operations Cloud was our first client for an alert-based system, as it was part of a pilot project started around three years ago, and we did not use any other solutions before it.
What was our ROI?
PagerDuty Operations Cloud helps us reduce downtime and identify which systems have issues since it is not feasible to manually monitor everything. The automated system efficiently monitors the infrastructure and notifies us about problems, benefiting both our customers and the bank's reputation.
What's my experience with pricing, setup cost, and licensing?
I do not have any information regarding the pricing, setup cost, or licensing, as those details are managed by the organizational leadership.
Which other solutions did I evaluate?
We did not evaluate other options and directly chose PagerDuty Operations Cloud for our needs.
What other advice do I have?
I advise companies in the FinTech and banking sectors to consider using the alert-based system of PagerDuty Operations Cloud for their projects because digital services are becoming more prevalent across various sectors, and this system can enhance business operations by reducing incidents of system downtime and failures. My relationship with PagerDuty is strictly that of a customer utilizing their product for our business needs. I have covered multiple aspects of PagerDuty Operations Cloud in our discussion. My overall rating for this product is 7 out of 10.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Reliable Incident Alerts and Seamless Team Coordination
What do you like best about the product?
PagerDuty makes incident alerts fast and reliable. It sends the right notifications at the right time and keeps our team organized. The on-call schedules are easy to manage, and the mobile app helps us respond quickly. It reduces downtime and makes our work smoother.
What do you dislike about the product?
PagerDuty can feel noisy at times when alerts are not filtered well. Some settings are a bit complex for new users. The pricing is also on the higher side for small teams. Overall, it works well but could be simpler and more cost-friendly.
What problems is the product solving and how is that benefiting you?
PagerDuty helps us catch issues immediately with fast and reliable alerting. It brings all alerts into one place, so we don’t miss anything important. It also makes on-call management easy, ensuring the right person gets notified at the right time. This reduces downtime, helps us respond faster, and keeps our systems running smoothly.
Empowers Incident Management with Reliable Alerts and Seamless Collaboration
What do you like best about the product?
I really appreciate how PagerDuty provides comprehensive visibility and control over our incident management process. Its real-time alerting is both reliable and quick, ensuring that the appropriate team members are notified instantly and without unnecessary distractions. The user-friendly dashboards, efficient on-call scheduling, and smooth integrations with our monitoring tools make it straightforward for teams to work together during incidents. Overall, PagerDuty enables us to respond more quickly, stay organized, and maintain high service uptime with minimal effort.
What do you dislike about the product?
One thing I dislike about PagerDuty is that the interface can feel a bit overwhelming at times, especially when navigating deeper configuration settings or managing complex on-call schedules. Some customization options also require extra steps, which adds to the learning curve for new users. Additionally, alert noise can build up quickly if not fine-tuned properly, making the initial setup a bit time-consuming. While the platform is powerful, simplifying certain workflows would make the overall experience even better.
What problems is the product solving and how is that benefiting you?
PagerDuty has significantly improved our incident response process by making sure that critical issues are quickly identified and escalated to the appropriate teams. This has led to faster response times, less confusion during outages, and smoother on-call rotations without the need for manual coordination. Thanks to its real-time insights, automated workflows, and integrations, we can take a more proactive approach, stop minor problems from escalating, and provide more reliable services to our users.
Real-Time Alerts and Seamless Integrations Boost Incident Response
What do you like best about the product?
Real-time alerting provides immediate notifications for critical incidents, which helps reduce response times. The escalation policies ensure that if one engineer does not respond, the incident is automatically escalated to the next person, preventing incidents from being overlooked. On-call scheduling is straightforward, making it easy to manage rotation schedules and distribute on-call duties among team members. The integration capability is strong, as it works seamlessly with monitoring tools, ticketing systems, Slack, Teams, and various automation platforms. Incident timeline visibility is excellent, offering a clear view of the sequence of events, including who acknowledged the incident and what actions were taken. This setup helps reduce MTTR, as teams can respond more quickly and in a coordinated manner, leading to less downtime overall. Mobile app support is also available, allowing users to acknowledge and respond to incidents directly from their phones when they are away from their desks.
What do you dislike about the product?
Alert fatigue can be an issue, as the system sometimes generates an excessive number of alerts, many of which are not critical. This can make it harder to maintain focus and respond with the necessary urgency. Additionally, there is a lot of noise from duplicate or repeated alarms—if thresholds or integrations are not properly configured, the same problem may trigger multiple alerts, which can be distracting. The pressure of being on-call is also significant; during particularly busy periods, the constant stream of notifications can be stressful and negatively affect work-life balance. Configuring escalation chains, routing rules, and service dependencies can be complex, especially for new users, and is not always intuitive. Finally, the alerts themselves sometimes lack sufficient context, so you often have to consult monitoring tools or logs to get the full details, as the PagerDuty notification alone is frequently insufficient for diagnosis.
What problems is the product solving and how is that benefiting you?
PagerDuty addresses the issue of delayed incident response and the absence of a clear escalation process during critical network or service outages. By consolidating alerts from various monitoring tools into a single platform, it ensures that the appropriate engineer is notified right away. If the initial responder does not act, the system automatically escalates the alert to the next on-call team member, preventing incidents from being overlooked.
For me and our NOC, this results in faster response times, which helps minimize service downtime. The clear on-call structure eliminates confusion about who should take action, while real-time incident visibility makes it easier to monitor progress. With less need for manual coordination, the platform efficiently manages alerting and escalation. It also assists in distinguishing between critical and informational alerts, helping us focus on the most urgent issues.
Seamless Incident Management with Powerful Integrations
What do you like best about the product?
PagerDuty ensures critical issues are notified to the right person via multiple channels (mobile app, SMS, email) and enables well-defined on-call rotations and escalation policies
integration with tools like Slack, Zoom, monitoring systems (e.g., Prometheus) and usage of webhooks to build automated incident workflows
can manage multiple teams, services, and global operations
the mobile app and remote acknowledgement/resolution functionality are called out as strong points
What do you dislike about the product?
- maybe new AI/automated workflows or automatic post-mortem functionality
- pricing is high relative to perceived value, especially for smaller orgs
What problems is the product solving and how is that benefiting you?
it helps prevent critical alerts from being missed by routing monitoring or event tool signals into a structured oncall system where the right person is notified with escalation until someone responds
Powerful Alerting and Monitoring, Though Interface Takes Getting Used To
What do you like best about the product?
Serves as a useful application for configuring alerts and monitoring key processes. The ability to set up an on-call rota and define clear escalation policies are great features.
What do you dislike about the product?
Nothing major for me, but I felt the interface was clunky and it took me a while to get used to the navigation. For a new user, it could be a daunting experience.
What problems is the product solving and how is that benefiting you?
We use PagerDuty to monitor or data pipelines and table refreshes. We also use it to alert failure of the BI model builds. We were able to integrate PD with those platforms we use for the above-mentioned processes, and it helps us stay on top of issues and address them immediately before they cause further, bigger problems.
Timely Alerts with Effortless Setup
What do you like best about the product?
I find the PagerDuty mobile app very useful because it allows me to check, acknowledge, resolve, and assign alerts right from the app. The alerts are communicated promptly through phone, email, or SMS, enabling immediate investigation and resolution. The simple UI of the mobile app and the detailed web portal are standout features.
What do you dislike about the product?
Sometimes I face issues with sso login. I cant find any other issues
What problems is the product solving and how is that benefiting you?
PagerDuty ensures timely alerts through phone, email, or SMS, enabling prompt investigation and resolution of issues, which benefits both my organization and me.
Enhancing Incident Management with Pager Duty
What do you like best about the product?
I love that pager duty has several different audible alerts, some of them are hilarious. Since picking up pager duty, I've been able to respond to incident and engage teams more efficiently.
What do you dislike about the product?
The only thing that I don't like about Pager Duty is that once I resolve an incident I can not re open the same incident if the issue recurs I have to start a new incident which sometimes can cause confusion with stakeholders and users.
What problems is the product solving and how is that benefiting you?
Pager Duty allows us to manage schedules and communicate with teams which makes it easier to respond to incidents in a timely manner, contact the appropriate people when needed and collaborate across teams to resolve issues, mitigate down time and provide excellent customer service.