On-call teams have reduced downtime and respond faster through integrated alerting workflows
What is our primary use case?
My main use case for PagerDuty Operations Cloud is monitoring and on-call management for downtime.
Recently, we had a service go down last week, and we were alerted via PagerDuty Operations Cloud of the issue. One of our on-call engineers responded to the page and quickly resolved the problem through PagerDuty Operations Cloud app.
What is most valuable?
The best features PagerDuty Operations Cloud offers include the ability to integrate its app through various platforms such as Teams and various monitoring platforms such as New Relic and DynaTrace. It is easy to use, easy to log in and configure your on-call rotation, as well as utilizing their business services and technical services to properly configure how you want things monitored and alerted.
The integrations and easy configuration help our team by saving time and reducing errors. We use Terraform to create various modules, including integrations with PagerDuty Operations Cloud and our monitoring platform, New Relic. When a team creates a new application, we ask them to use our monitoring module to monitor their service using New Relic and PagerDuty Operations Cloud. By doing that, we save time and errors by preventing people from manually having to set up their PagerDuty Operations Cloud operations; it is all done through this module, which is easy to use.
PagerDuty Operations Cloud has positively impacted our organization by allowing us to be immediately paged when a system or service is down, enabling us to quickly respond and provide updates to the organization on issues and their resolution.
This quick response has led to measurable improvements, with reduced downtime and faster incident resolution times, as our on-call engineers are appropriately alerted when things happen. We understand based on the page what is going on and how to quickly respond to it, and if we need help, we can loop in other engineers and our managers that own the product to resolve it quicker.
What needs improvement?
PagerDuty Operations Cloud can be improved by using automation or AI to advance the product in such a way that it allows the implementation of automation to resolve issues or speed up workflows.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud for six years.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is stable.
What do I think about the scalability of the solution?
Its scalability is impressive; it scales very well, allowing us to add licenses, add services, and more very quickly and easily.
How are customer service and support?
The customer support is great; we have never had an issue when reaching out to someone in customer service when we have questions.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
Previously, we were using New Relic for monitoring, which sent us alerts when issues went down, but we ended up using PagerDuty Operations Cloud alongside it because PagerDuty Operations Cloud is used for on-call alerting.
How was the initial setup?
Our experience with pricing, setup cost, and licensing has been straightforward and easy. We have been using PagerDuty Operations Cloud for several years, so our pricing and cost have definitely increased over time, especially as we have hired additional engineers. Adding additional users and/or licenses is very straightforward, and we have always had a good experience with customer service from PagerDuty Operations Cloud side.
What was our ROI?
The best return on investment comes from being alerted and paged for ongoing issues or new issues appropriately, allowing us to set up those schedules and engineers. The fact that PagerDuty Operations Cloud allows us to be alerted when things go down and configure how our engineers are alerted speaks to the return on investment due to the quick response it facilitates.
What's my experience with pricing, setup cost, and licensing?
Our experience with pricing, setup cost, and licensing has been straightforward and easy. We have been using PagerDuty Operations Cloud for several years, so our pricing and cost have definitely increased over time, especially as we have hired additional engineers. Adding additional users and/or licenses is very straightforward, and we have always had a good experience with customer service from PagerDuty Operations Cloud side.
Which other solutions did I evaluate?
I did not evaluate other options before choosing PagerDuty Operations Cloud.
What other advice do I have?
I recommend PagerDuty Operations Cloud as a great service and application to anyone that needs to improve their on-call process at their company. I gave this product a rating of 10.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
On-call automation has reduced downtime and has enabled faster incident response at scale
What is our primary use case?
PagerDuty Operations Cloud is a platform that helps teams manage incidents, automate operations, and ensure system reliability by bringing alerts, on-call schedules, and real-time responses into one place. When we had to push things into production, we set up PagerDuty schedules on a weekly or biweekly basis. If an issue occurred at night, a roster would pop up, and the respective engineer would have to handle that use case.
A specific incident where PagerDuty Operations Cloud helped my team was during the peak season in America when lakhs of orders were placed in December, and a major S1 severity production issue suddenly happened. If no monitoring tool had been in place, the company would have faced doomed circumstances, incurring lakhs of dollars in losses. PagerDuty came to our rescue at the last moment when nothing was happening. At 3:00 a.m. my time, I received a message and subsequently a call while sleeping, and I learned that this issue had occurred. I logged in quickly, promptly fixed that issue, and within an hour or so, the issue was resolved with minimal damage. I even received appreciation for my quick response.
PagerDuty Operations Cloud helps in similar situations because whenever some issue happens and we are not aware of it, PagerDuty comes with a flag telling us that there is an issue that needs to be fixed before it becomes a major problem.
What is most valuable?
Some of the best features PagerDuty Operations Cloud offers are comprehensive incident management, automation, and AI operations, all integrated into one platform. Second, it provides noise reduction and smarter alert grouping through global intelligent alert grouping that uses machine learning to group and correlate alerts across services. It also provides automation to reduce toil and speed up resolutions and artificial intelligence, including generative AI assistance, to help teams respond faster and smarter. Additionally, it has built-in workflows with standardized, repeatable processes, improved visibility, collaboration, and a unified operations view, and support for bridging customer-facing teams and engineering and the SRE teams. The last thing it provides is scalability for enterprise environments.
The AI-powered alert grouping and automation have made a difference in my day-to-day work by reducing alert noise. It automatically groups multiple related alerts into a single incident, so instead of 20 separate alerts, I get one meaningful alert, which prevents on-call engineers from being spammed. It also helps in faster root cause understanding because AI looks at patterns across systems including logs, metrics, alarms, and graphs, finally providing a broad summary about that. This cuts down the response time, helps in prioritization, and reduces the burnout of on-call teams.
PagerDuty Operations Cloud has positively impacted my organization by helping in faster incident detection and resolution with less downtime. It has reduced noise and fewer false alerts, allowing better focus for teams, meaning that on-call engineers can focus only on real and important issues rather than all the duplicate and negligible issues. It has helped with automation and efficiency, better collaboration and communication among teams, improved post-incident learning and prevention, and has not only helped in operational cost savings and better return on investment, but also in scalability and readiness for growth.
What needs improvement?
Even though PagerDuty Operations Cloud is a strong platform, many things can be improved. Analytic and reporting depth can be improved with better depth. Noise suppression and alert grouping robustness can be improved because sometimes the grouping becomes vague and somewhat unclear. Usability can improve, and user interface and user experience can improve because it becomes quite complex for new users. Integration and ecosystem limitations can be improved, as well as cost because for small or mid-sized organizations, it would become quite expensive to pay for this solution. Complexity for smaller teams or simpler needs can also improve.
I think we can have richer analytics, and the reporting dashboards can improve. More robust noise suppression can help us. Native support for alert attachment can help us. A simpler user interface and user experience can be implemented, and pricing tiers and models should be more favorable. Accessible documents and easier onboarding can help a lot.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud since the first year of my job, and I have worked on four projects, using it in all of them.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is one of the most stable platforms.
What do I think about the scalability of the solution?
The scalability of PagerDuty Operations Cloud is quite great. I have seen it scale in a very easy and robust manner.
PagerDuty Operations Cloud has met my needs as my team and workload have grown. The workload would definitely grow because since we are going online, production issues might happen, but PagerDuty has helped reduce that workload.
How are customer service and support?
I never faced an issue that would make me have to reach out to PagerDuty customer support because I think it worked fantastically. However, if that happens in the future, I would be happy to share my experience.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
This is my first company, and I have been working here since the beginning of my career, so PagerDuty Operations Cloud is the only solution I have worked with.
How was the initial setup?
Before using PagerDuty Operations Cloud, my team often took longer to identify the root cause of incidents because alerts were scattered across different tools. After moving to PagerDuty Operations Cloud, AI-powered alert grouping and automated flows have helped us detect issues much faster. We now mobilize the right team within minutes, and our overall incident resolution time has dropped significantly, which has directly reduced our downtime and improved service reliability.
What about the implementation team?
It was not a team-level decision whether my organization evaluated other options before choosing PagerDuty Operations Cloud.
What was our ROI?
Cost savings happened since losses were prevented. Time savings also occurred, response time reduced, and many such things happened which I have already mentioned.
What's my experience with pricing, setup cost, and licensing?
Pricing, setup cost, and licensing were not my headaches, and the organization already provided me with everything set up. I just had to log in and start using it.
Which other solutions did I evaluate?
I did not purchase PagerDuty Operations Cloud through the AWS Marketplace because it is an organization-wide decision, so my company would have done that.
What other advice do I have?
I would definitely recommend trying this solution. If you are thinking to go with production in the near future, definitely give it a try. If someone is trying to go to production and wants to have reduced service level agreements and reduced time for root cause analysis and everything, definitely give it a try. It is a tool that you should work with, and I rate this product a 10 out of 10.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Reliable Alerting and Seamless On-Call Management for 24x7 Teams
What do you like best about the product?
Reliable alerting
PagerDuty excels at getting the right alerts to the right people quickly through multiple channels like mobile app, phone, SMS, and email, which is crucial for reducing MTTR in production environments. Users consistently highlight that alerts are timely, granular, and dependable, which helps avoid missed incidents and supports 24x7 operations.
On‑call scheduling and escalations
The on‑call management features make it easy to build fair rotations, escalation policies, and handoffs without manual spreadsheets or ad‑hoc processes. This structure improves accountability, prevents burnout, and ensures someone is always available to respond, which is especially valuable for global or follow‑the‑sun teams.
Integrations and workflow automation
PagerDuty integrates with most major monitoring, logging, and ITSM tools, turning raw alerts into actionable incidents and routing them automatically. Automation capabilities (including AI-driven and runbook automation) can trigger diagnostics, remediation steps, and collaboration workflows, cutting noise and speeding up resolution.
Collaboration and visibility
Incident dashboards, status updates, and post‑incident features give teams and stakeholders a shared view of what is happening during outages. This improves coordination across SRE, infrastructure, app teams, and management, and makes it easier to learn from incidents and improve reliability over time.
What do you dislike about the product?
PagerDuty's most common drawbacks include high pricing, a clunky user interface, and alert overload during incidents.
UI and usability issues
The interface is often called unintuitive, overwhelming for schedule overrides, rotations, and configs, with extra steps for simple tasks like editing overrides (requiring delete/recreate). Mobile app notifications nag about settings, and setup complexity adds a steep learning curve, especially for non-experts managing on-call.
Alert noise and reliability
Multiple rapid alerts can overwhelm phones with repeated calls, preventing acknowledgment and escalating stress during outages. While upstream monitoring fixes help, PagerDuty's lack of built-in noise reduction in lower tiers contributes to fatigue and morale hits for on-call staff
What problems is the product solving and how is that benefiting you?
Core Problems Addressed
PagerDuty tackles unreliable alerting by providing real-time, multi-channel notifications (mobile, SMS, phone) that ensure critical issues reach the right responders without delay. It fixes chaotic on-call scheduling through flexible rotations, escalations, and handoffs, eliminating spreadsheets and ad-hoc emails for 24x7 coverage. Noise reduction via AIOps and automation filters out low-value alerts, while integrations with 600+ tools (Jira, Slack, Azure, Datadog) centralize workflows and prevent tool sprawl.
Operational Benefits
Teams see faster MTTR and reduced downtime from automated triage, guided remediation, and runbook automation that standardize responses and cut manual steps. On-call burnout drops with fair rotations and stakeholder updates, improving morale and accountability during outages. Post-incident analytics and PIRs drive continuous improvement, identifying trends for proactive reliability enhancements.
Business Impact
Downtime minimization protects revenue and SLAs, with users reporting 40% fewer unnecessary alerts and quicker resolutions. Cross-team visibility boosts collaboration, bridging ops, dev, and support for scaled service ownership. In your Azure/Jira/Slack setup, it would streamline Severity A escalations and incident war rooms by automating Jira pulls and Slack posts
Streamlines incident response and has built customer trust but needs deeper analytics insights
What is our primary use case?
My main use case for PagerDuty Operations Cloud is incident management, as we use it for alerting people who are on call.
I definitely use PagerDuty Operations Cloud for incident management; we have set up the account, schedule, teams, etc., and we continuously monitor our logs for any anomalies with proactive alerts. We define priority because we don't want to alert people on the phone unnecessarily, thus we categorize alerts based on severity and business disruption, sending information via the integrated APIs to the relevant teams, specifying whether to communicate through Slack or phone based on the severity.
This is the main use case we have; it's a tool that last mile connect kind of people use.
How has it helped my organization?
PagerDuty Operations Cloud positively impacts my organization by helping us win customer trust; when problems occur, the speed of our reaction and involvement with customers is crucial, and PagerDuty Operations Cloud facilitates quick responses to potential issues. PagerDuty Operations Cloud allows our team to react swiftly, which can be challenging without it, as we can't manually sift through all logs. Automation for remediation is also in place, enhancing confidence and allowing some issues to be resolved without manual intervention.
What is most valuable?
The best features of PagerDuty Operations Cloud include integration, mobile app, reporting, and analytics, which I find very useful based on the access I have.
We review the data periodically to see our performance; for example, we check for alert fatigue, how many alerts have been addressed, and our TDX metrics such as time to respond. The analytics of PagerDuty Operations Cloud is so good that it gives me good visibility with just a few clicks, which helps in discussions with the team for continuous improvements.
What needs improvement?
More analytics can be brought into PagerDuty Operations Cloud; while I know there are some, they still seem basic to me, and having options for user-customized charts would be really helpful, especially in this GenAI world where prompts can yield valuable data.
The analytics provided by PagerDuty Operations Cloud can be significantly improved, as they still feel basic to me.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud for more than five years.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is stable.
How are customer service and support?
I don't think I have ever needed customer support because our usage and the lack of issues may prevent any corner cases or other problems, or perhaps it relates to licensing. I personally have never utilized customer support.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
I previously used another solution which I won't name because it's proprietary, but the user experience was not great; when I learned about PagerDuty Operations Cloud, it had a lot of positive discussions, and I was excited to find we were already using it when I joined this company.
What was our ROI?
TDX metrics are definitely improving due to PagerDuty Operations Cloud being in place; time to respond, time to initiate, and time to mitigate are key metrics influenced positively.
What's my experience with pricing, setup cost, and licensing?
I usually am not involved in pricing, setup cost, or licensing, as that's handled by another team, so I don't have much visibility on that part.
What other advice do I have?
I advise others to use PagerDuty Operations Cloud, as it's going to help in building customer trust. There's an operations team for those aspects. My overall review rating for PagerDuty Operations Cloud is 7 out of 10.
Alert workflows have reduced missed incidents and now scheduling needs simpler complex rotations
What is our primary use case?
I have been in my current role for the past 18 months, and we started using PagerDuty Operations Cloud earlier this year around January or February to manage our operations.
PagerDuty Operations Cloud's primary use case is alerting. We switched to ensure alerts are efficient and effective so that the on-call engineer does not miss any alert. We instrument many alerts on it, including VPN downtime, transaction monitoring, success rate, and latency. We configured PagerDuty Operations Cloud so that if any of those metrics are met or if any of those SLOs and SLIs are breached, we can quickly take action and resolve the issue. For day-to-day use, we run a 24-hour shift where all shifts are entered into the system, and every on-call engineer uses PagerDuty Operations Cloud to receive alerts. Beyond alerting, we also use scheduling, incident management, and incident reports.
What is most valuable?
The best features of PagerDuty Operations Cloud include alerting, which is very important and the main reason we retain it, and scheduling as well.
Initially, we used Excel to manage our on-call engineers' schedules, but with PagerDuty Operations Cloud, it shows when you are on duty and allows other team members from different teams to check who is on duty without needing to ask. This has significantly reduced the time spent on checking who is on duty by providing visibility at each point.
Scheduling with PagerDuty Operations Cloud has reduced confusion because we set it up with a round-robin rotation, and nobody needs to update it every day unlike with Excel, where we had to create a new schedule every two months. Now we only make changes when necessary, making the process more efficient and organized for on-call engineers to know when they are on duty. The system also alerts them in advance for their upcoming shifts.
What needs improvement?
One way PagerDuty Operations Cloud could improve is through the scheduling feature, which can be tricky, especially with complex schedules. I have found it stressful to schedule effectively, even after going through PagerDuty University and the forums. Sometimes I need to manually interchange people because minor changes can scatter the whole schedule. A more efficient scheduling system or better guidance for complex schedules would help.
Another area for improvement is alerting. When multiple incidents occur simultaneously, it would be helpful if alerts listed the issues instead of muddling them together. This would make it easier to manage what needs urgent attention without missing anything.
Which solution did I use previously and why did I switch?
Initially, when I first joined the company, we primarily used Grafana and Slack as our means to manage incidents. The alert was on Slack, and the dashboard was on Grafana, which required us to use three different applications to do the same thing.
With PagerDuty Operations Cloud now, we don't need to go through multiple tools to manage alerts and incidents. We don't need to go through Jira to log incidents. It streamlines the process, and with incident management, it can escalate to the next person so that alerts are rarely missed. It has made our workflow easier and much more efficient.
What other advice do I have?
For incident management in my team, PagerDuty Operations Cloud has really helped with alerting in such a way that when an issue happens, it reaches out to the on-call engineer to ensure they don't miss it. There is a pop-up, probably on your browser or phone, and if you miss the pop-up or don't acknowledge it in time, it moves to your phone and starts calling; sometimes it sends texts and sometimes calls your phone. The call is very persistent, so if the incident is not acknowledged, it escalates to the next line, which can be your manager or your functional manager, and it keeps escalating until it gets acknowledged. This way, the alert is rarely missed because at some point, somebody will surely pick up.
PagerDuty Operations Cloud helps us effectively manage incidents without needing to sit down all day and watch our screens.
Alerting is key, and scheduling is also important but not as crucial as alerting. We also use incident management and incident reporting, which allow us to manage who should be escalated to during incidents and keep track of when incidents happen and when they are resolved so that everyone knows what occurred and how it was handled.
PagerDuty Operations Cloud has positively impacted my organization by providing effectiveness and efficiency in the way we work, with less alert fatigue, meaning alerts are rarely missed. For example, if four or five alerts come on Slack at the same time, you might miss them while focusing on resolving current issues. However, with PagerDuty Operations Cloud, since it calls for every issue, you will see any new alerts and resolve them, thus reducing missed alerts and increasing efficiency. This leads to better service for our end users, increased profit, and less pressure on engineers, making it a win-win for everybody.
Our MTTR has significantly reduced; however, I cannot provide specific numbers because with Slack, we were not measuring it accurately. Now with PagerDuty Operations Cloud, we can measure how long it takes to acknowledge alerts and resolve issues, giving us metrics to manage this effectively.
My advice for others looking into using PagerDuty Operations Cloud is that if their workflow requires them to be alert to incidents while continuing their work without being tethered to a screen, it is a very helpful tool to have.
One additional thought about PagerDuty Operations Cloud is that if they started issuing certificates for completing courses on PagerDuty University, it would encourage more people to engage with the training, similar to how New Relic operates. Having a certificate would demonstrate rigorous training and the capability to apply what was learned. I would rate this product a 6 out of 10.
Runbook automation has reduced incident response time and now improves uptime and collaboration
What is our primary use case?
Our main use case for PagerDuty Operations Cloud is for alerting purposes whenever any kind of downtime or downstream incident happens with our application which causes any downtime, and PagerDuty Operations Cloud will alert us through calls and SMS so we can get notified and quickly remediate the issue.
A unique aspect of our main use case with PagerDuty Operations Cloud is using the Runbook flow. Whenever we experience a specific kind of incident, the Runbook will trigger automation to either remediate the issues or perform root cause analysis, thus enhancing our workflow automations.
What is most valuable?
PagerDuty Operations Cloud helps our team respond by increasing our response time. Whenever there is any incident, we will get notified and through PagerDuty Operations Cloud, we receive calls 24/7, allowing us to instantly get into a call or investigation and remediate the issue as early as possible. This way, PagerDuty Operations Cloud helps us reduce the MTTR and ensures our application is more reliable and resilient.
We have been using the Runbook automation feature for building automated flows that help us add extra monitoring for specific alerts or incidents and perform remediation tasks autonomously using this Runbook flow.
One feature I particularly appreciate about PagerDuty Operations Cloud is that it offers multiple notification options. I receive alerts via call as well as SMS, which is beneficial. If I miss the call, I may still receive the SMS and vice versa.
Through PagerDuty Operations Cloud, our MTTR has been reduced by at least 30% over the last year due to its instant notification features like SMS and calls, which help us jump on calls quickly to remediate issues. This reduction has impacted our application downtime, ensuring an uptime of approximately 99% throughout the year.
What needs improvement?
One suggestion for improving PagerDuty Operations Cloud is to provide more insights about incidents, such as root cause analysis or additional information, which could assist SRE teams in reducing remediation time and incident detection before jumping on a call.
From an integration point of view, everything is functioning well. However, we primarily use the desktop interface as our main tool, and adding more details on incidents directly from PagerDuty Operations Cloud's analysis would enhance the user experience.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud for the last three years.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is absolutely stable. We have never experienced any downtime or latency issues from PagerDuty Operations Cloud.
What do I think about the scalability of the solution?
We don't have much insight on scalability, as a separate enterprise PagerDuty Operations Cloud team is responsible for handling all scaling activities.
How are customer service and support?
We have internal enterprise support within the application, which is very interactive. They escalate issues to the external PagerDuty Operations Cloud team when necessary, and they are very supportive.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
We have not previously used a different solution. PagerDuty Operations Cloud is the first alerting tool I have been using since the beginning.
How was the initial setup?
PagerDuty Operations Cloud onboarding is pretty straightforward in our organization, as new candidates simply need to be part of specific Windows AD groups to complete the onboarding process and gain access.
What about the implementation team?
There are automations in our organization that connect PagerDuty Operations Cloud to other ticketing tools such as Jira and ServiceNow. Whenever an incident occurs, automation that uses the Runbook flow triggers to extract data from the PagerDuty Operations Cloud alert to create incidents and Jira tickets for the development team.
What was our ROI?
In terms of return on investment, we have reduced our MTTR by 30% in the last year, indirectly improving our application's uptime to nearly 99%, which enhances client experience and boosts our business.
What's my experience with pricing, setup cost, and licensing?
I have no personal experience with pricing, setup costs, or licensing, as a separate enterprise PagerDuty Operations Cloud team manages those processes.
What other advice do I have?
The escalation policies within PagerDuty Operations Cloud are user-friendly and customizable, allowing us to set up multi-level escalations from SRE engineers to SRE leads and then to management.
PagerDuty Operations Cloud helps our team collaborate during incidents by automatically updating incident status based on progress. We have alerting integrated with Slack for this, where incidents show as red when active, yellow when acknowledged, and green when resolved.
Regarding performance metrics, there is a dedicated enterprise PagerDuty Operations Cloud team that handles monitoring, so as an SRE, I don't need to manage these performance aspects myself.
My advice to others looking into using PagerDuty Operations Cloud is that it is one of the best tools in the market for production support and SRE engineers. It is essential for our operations, functioning as our bread and butter.
We have covered almost everything regarding PagerDuty Operations Cloud. It has been a great tool for SRE and production support teams, and we look forward to more features, especially with trending technologies like AI. I would rate this product an 8 out of 10.
Automated on-call scheduling has reduced manual effort and now keeps holiday coverage reliable
What is our primary use case?
My main use case for PagerDuty Operations Cloud is to set up shifts for people on-call.
A specific example of how I use PagerDuty Operations Cloud for setting up shifts is for when we need to set up shifts for holidays. In our team, we'll assign people who will be on-call and create an Excel sheet and upload it to PagerDuty. It works normally, gives notifications, and everything else functions properly. It is very easy to set up and manage.
I usually discuss with my team who will be on-call during holidays, and we will set up how many people are needed. We create an Excel sheet, upload it to PagerDuty, and set up the line of who is the first person to reach, and if they miss it, then whom to escalate to. The web view and website are also very easy to use. I think this is the normal use case. Perhaps other teams are using it differently, but this works well for us. Before, it was very manual, and it was quite difficult.
What is most valuable?
The best features PagerDuty Operations Cloud offers are that it is simple to set up and supports Excel sheet uploads, which was very helpful. Setting up notifications and the integration with Datadog was excellent. We can automate many things.
PagerDuty Operations Cloud has positively impacted my organization because the support team is very happy. Before, setting up everything was very difficult. Now, we don't have to think about it. We can simply set it up in PagerDuty and it works. The escalation and everything simply works with the configuration we set up six months to one year ago, and it still functions. We make only minor changes. I think a lot of manual effort has been reduced, and the system is more reliable.
Since implementing PagerDuty Operations Cloud, before the L1 team had to stay online at night, and if someone fell asleep and missed an issue, it would easily escalate to a manager or someone higher up, creating a lot of fuss. That is almost gone now. The discussion part about deciding who will be on-call and setting that up was not as foolproof when we were creating it manually, and someone had to invest a lot of time, around one or two hours weekly. Now, it takes simply less than five minutes. Every week, we simply discuss and it's done. I think a lot of time has been saved, and a lot of mental effort has been saved.
What needs improvement?
I think the view on the website regarding how we see the chart and graph of who is on-call at what time could be improved. We could make that line more expressive to show who will get escalated if someone misses.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is stable; we didn't find any bugs or unintended behavior.
What do I think about the scalability of the solution?
PagerDuty Operations Cloud is scalable; we can easily add teams, manage tags, and create teams. It is very easy to manage, and adding the line of priority and deciding whom to go first was very easy.
How are customer service and support?
The customer support is adequate; usually, they respond and help us fix issues during integration. It was helpful.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
Before using PagerDuty Operations Cloud, there was no solution in place. The L1 team was the one who checked the issues and called the developers, asking them if the error was related to them. This involved manually calling fifteen to twenty developers, which would take half an hour, and the issue would have persisted long enough, reducing the reliability of the site. Now it is automatic and very effective.
What was our ROI?
I have seen a return on investment; a lot of time has been saved. As I mentioned earlier, it would take a lot of manual effort before. Sometimes by mistake, two or more than one person would be assigned on-call, and it was not foolproof. The escalation was not possible at all before, which led to the L1 team being under too much stress. Now, it is not that severe; the L1 team had to coordinate with many people and call many people from their phones when they got an error. It was actually very bad. Now, PagerDuty escalates and will call them, and if it belongs to them, they will join. It is much more efficient and much less stressful.
Which other solutions did I evaluate?
We were not involved in evaluating other options; I think the higher team decided to go with PagerDuty, and we are happy with it.
What other advice do I have?
I don't want to add anything else about the features; we use this much and it's great. We don't want anything more for now. I don't think there is anything to improve; we are using PagerDuty Operations Cloud to set up on-call duty and it works. I chose a rating of nine because there may be some improvements in the future. My advice to others looking into using PagerDuty Operations Cloud is that the feature of on-call duty and setting up the on-call person are excellent. You can simply proceed with it, and even if teams are big, it will not be annoying or feel overwhelming. Just set it up and forget it; that's all. It is very effective. I have no additional thoughts about PagerDuty Operations Cloud before we wrap up; it is excellent. You can adopt it if you don't have any special needs; it is commonly accepted and effective. I gave this review a rating of nine out of ten.
Automated alerts have improved incident response in banking operations but calling notifications still need refinement
What is our primary use case?
My main use case for
PagerDuty Operations Cloud is to set up alerts for any failures, such as when one server is down, a particular service is down, or when APIs are not responding due to technical issues, with PagerDuty triggering an alert and also calling my personal mobile number to notify me about the issue, allowing me to acknowledge that I am looking into it and take necessary actions. I can give an example of a situation where
PagerDuty Operations Cloud helped us handle an incident, such as when our payment system was about to go down. During that time, we usually monitor the system manually, but there are incidents where an automated system works more efficiently than a human. PagerDuty Operations Cloud identified the issue first by alerting us that something went wrong with the servers or services, which enabled us to contact the DevOps and Dev team to identify the exact issue in our banking app, highlighting how helpful PagerDuty Operations Cloud has been from the beginning. PagerDuty Operations Cloud is very helpful for monitoring purposes, allowing us to set up multiple alerting methods such as SMS alerting, email alerting, and call alerting, all of which we commonly use, proving its usefulness across various banking services, with teams including Dev, DevOps, and SecOps relying on it heavily.
What is most valuable?
The best features of PagerDuty Operations Cloud include the SMS and call alerting functions, which I find very beneficial compared to other tools I have used, such as Coral Logics and Sumo Logic, which primarily focus on email alerting. PagerDuty Operations Cloud goes a level beyond by notifying users through SMS and calls, allowing us to tag the concerned teams to address specific issues promptly. In addition to those features, I also find the integration and reporting aspects of PagerDuty Operations Cloud valuable, as it records all triggered calls and incidents, enabling us to analyze patterns and identify the times when systems go down, thus assisting us in understanding and addressing the underlying causes. PagerDuty Operations Cloud has positively impacted our organization by ensuring that our banking applications, which operate 24/7, remain functional and efficient, contributing to better service availability.
What needs improvement?
I would suggest that the calling service could be adjusted since if a system goes down, the user receives continuous calls, which can be overwhelming. Once a user acknowledges the alert, it may be unnecessary to continue calling again and again, as it distracts from their work on resolving the issue. That is the only improvement I can suggest regarding the calling aspect of PagerDuty Operations Cloud.
For how long have I used the solution?
I have been working in my current field for around three or more years.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is stable and provides accurate information.
What do I think about the scalability of the solution?
The scalability of PagerDuty Operations Cloud is very good, and it can handle increasing traffic as our organization expands without any issues.
How are customer service and support?
I have not needed to contact the customer support of PagerDuty Operations Cloud, as they provided thorough knowledge transfer once they handed over the services, and since then we have not encountered issues requiring support.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
PagerDuty Operations Cloud was our first client for an alert-based system, as it was part of a pilot project started around three years ago, and we did not use any other solutions before it.
What was our ROI?
PagerDuty Operations Cloud helps us reduce downtime and identify which systems have issues since it is not feasible to manually monitor everything. The automated system efficiently monitors the infrastructure and notifies us about problems, benefiting both our customers and the bank's reputation.
What's my experience with pricing, setup cost, and licensing?
I do not have any information regarding the pricing, setup cost, or licensing, as those details are managed by the organizational leadership.
Which other solutions did I evaluate?
We did not evaluate other options and directly chose PagerDuty Operations Cloud for our needs.
What other advice do I have?
I advise companies in the FinTech and banking sectors to consider using the alert-based system of PagerDuty Operations Cloud for their projects because digital services are becoming more prevalent across various sectors, and this system can enhance business operations by reducing incidents of system downtime and failures. My relationship with PagerDuty is strictly that of a customer utilizing their product for our business needs. I have covered multiple aspects of PagerDuty Operations Cloud in our discussion. My overall rating for this product is 7 out of 10.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Reliable Incident Alerts and Seamless Team Coordination
What do you like best about the product?
PagerDuty makes incident alerts fast and reliable. It sends the right notifications at the right time and keeps our team organized. The on-call schedules are easy to manage, and the mobile app helps us respond quickly. It reduces downtime and makes our work smoother.
What do you dislike about the product?
PagerDuty can feel noisy at times when alerts are not filtered well. Some settings are a bit complex for new users. The pricing is also on the higher side for small teams. Overall, it works well but could be simpler and more cost-friendly.
What problems is the product solving and how is that benefiting you?
PagerDuty helps us catch issues immediately with fast and reliable alerting. It brings all alerts into one place, so we don’t miss anything important. It also makes on-call management easy, ensuring the right person gets notified at the right time. This reduces downtime, helps us respond faster, and keeps our systems running smoothly.
Empowers Incident Management with Reliable Alerts and Seamless Collaboration
What do you like best about the product?
I really appreciate how PagerDuty provides comprehensive visibility and control over our incident management process. Its real-time alerting is both reliable and quick, ensuring that the appropriate team members are notified instantly and without unnecessary distractions. The user-friendly dashboards, efficient on-call scheduling, and smooth integrations with our monitoring tools make it straightforward for teams to work together during incidents. Overall, PagerDuty enables us to respond more quickly, stay organized, and maintain high service uptime with minimal effort.
What do you dislike about the product?
One thing I dislike about PagerDuty is that the interface can feel a bit overwhelming at times, especially when navigating deeper configuration settings or managing complex on-call schedules. Some customization options also require extra steps, which adds to the learning curve for new users. Additionally, alert noise can build up quickly if not fine-tuned properly, making the initial setup a bit time-consuming. While the platform is powerful, simplifying certain workflows would make the overall experience even better.
What problems is the product solving and how is that benefiting you?
PagerDuty has significantly improved our incident response process by making sure that critical issues are quickly identified and escalated to the appropriate teams. This has led to faster response times, less confusion during outages, and smoother on-call rotations without the need for manual coordination. Thanks to its real-time insights, automated workflows, and integrations, we can take a more proactive approach, stop minor problems from escalating, and provide more reliable services to our users.