My main use case for PagerDuty Operations Cloud is to handle all technical perspectives of incident management and real-time alerting. I primarily use PagerDuty Operations Cloud for the health tech environment, specifically in telemedicine platform and EHR-related applications, where it is required to ensure high availability.
PagerDuty Operations Cloud
PagerDutyExternal reviews
External reviews are not included in the AWS star rating for the product.
Centralized Alert Management with Seamless Integration
Fast Incident Response and Clear Alerts, but Setup Takes Time to Learn
Effortless Alert Management, Needs Better Security
Real-time incident response has improved but alert grouping and setup still need refinement
What is our primary use case?
What is most valuable?
PagerDuty Operations Cloud offers excellent features including the alert system and automated incident escalation capability, which is effective for routing issues to the right team through mobile notifications, SMS, or phone calls.
The real-time monitoring feature in PagerDuty Operations Cloud makes the biggest difference for my team, as it is quite helpful for on-call management and day-to-day operations.
PagerDuty Operations Cloud has positively impacted my organization by working effectively with most cloud service providers like AWS or Azure, improving visibility and reliability of incident responses.
I have seen a reduction in incident response time, with MTTR efficiently reduced by 30 to 40%, better SLA compliance, and improved operational visibility through incident analytics and reporting dashboards since implementing PagerDuty Operations Cloud.
The alert reduction feature in PagerDuty Operations Cloud has minimized downtime and improved incident response efficiency in my organization.
What needs improvement?
There are a couple of areas where PagerDuty Operations Cloud can be improved, such as enhancing Event Intelligence and alert grouping features and simplifying the initial configuration of escalation policies.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud for more than two years.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is stable.
What do I think about the scalability of the solution?
The scalability of PagerDuty Operations Cloud is quite good, as it works well with enterprise-level systems.
How are customer service and support?
Customer support for PagerDuty Operations Cloud is quite effective.
Which solution did I use previously and why did I switch?
Before implementing PagerDuty Operations Cloud, I relied on basic monitoring tools and email alerts for health tech monitoring, and I noticed a significant improvement after switching.
What was our ROI?
I have not reached ROI yet, but I am close, with a reduction in downtime and faster incident resolution.
What's my experience with pricing, setup cost, and licensing?
I would say the pricing is quite mid-range, but the setup cost and licensing can sometimes be a little challenging.
Which other solutions did I evaluate?
Before choosing PagerDuty Operations Cloud, I evaluated basic monitoring tools and email alert systems but found that PagerDuty offered stronger integration and operational visibility.
What other advice do I have?
I would advise others to clearly define their incident management strategy before implementing PagerDuty Operations Cloud. I would rate this solution a 7 out of 10.
User-Friendly with Simple Setup
Centralized incident response has reduced downtime and now needs more predictable costs
What is our primary use case?
My main use case for PagerDuty Operations Cloud is for cloud-based operations, including incident management and resolving incident responses to reduce downtime and improve reliability.
PagerDuty Operations Cloud provides a central command center that collects data signals from various IT systems, which helps us detect high-priority incidents and reduce the noise.
In addition to my main use case, we are able to perform on-call scheduling and routing with the help of PagerDuty Operations Cloud very easily.
What is most valuable?
The best features PagerDuty Operations Cloud offers include automated on-call scheduling, AI-driven alert grouping, and an impressive number of integrations, as it has more than 700 integrations, which really help us.
The user interface is user-friendly for non-technical persons, and it has been maintained well over the years, making it easier for our project managers and product managers to navigate PagerDuty Operations Cloud dashboards.
PagerDuty Operations Cloud's embedded AI has helped reduce alert fatigue and lower costs from incidents, which has contributed to retaining revenue by minimizing financial losses.
The AI-driven alert grouping, particularly for incident management, helps us significantly as it streamlines our processes.
PagerDuty Operations Cloud has positively impacted our organization by accelerating incident response and reducing MTTR by up to 27%, and we are also integrating AI and ML into PagerDuty Operations Cloud.
What needs improvement?
One area for improvement in PagerDuty Operations Cloud is the unpredictable costs that can cause issues in our organization and project complexity, along with the occasional perception of an outdated user interface by non-tech personnel.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud for around three years.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is stable and scalable, capable of handling enterprise environments and multi-cloud setups efficiently.
How are customer service and support?
I would rate customer support a seven out of 10.
Which solution did I use previously and why did I switch?
Previously, we were using incident.io, AlertOps, and DataDog, but we switched to PagerDuty Operations Cloud due to its all-in-one solution capabilities.
What about the implementation team?
I have implemented AI and automation through PagerDuty Operations Cloud for incident response, which has significantly changed our operational efficiency, allowing us to accomplish more with less manual input.
I have experimented with PagerDuty Operations Cloud's autonomous AI agents, striving to automate repetitive tasks, which improves operational efficiency.
What was our ROI?
While I cannot provide exact return on investment metrics, I estimate that we save around 20% of costs and approximately 10 to 15 hours a week with the efficient use of PagerDuty Operations Cloud.
The 27% reduction in MTTR has had a direct business impact as it enhances business growth and supports impactful business decisions.
The alert reduction feature has greatly impacted our ability to prevent costly incidents, as we can accurately respond to alerts with the help of autonomous AI agents, which reduces erroneous notifications.
What's my experience with pricing, setup cost, and licensing?
My experience with pricing, setup cost, and licensing has been positive compared to other tools, as PagerDuty Operations Cloud simplifies many management tasks that would otherwise be burdensome.
Which other solutions did I evaluate?
Before choosing PagerDuty Operations Cloud, I evaluated previous tools such as incident.io and DataDog, but we selected PagerDuty Operations Cloud for its comprehensive features and strong alerting.
What other advice do I have?
My advice for others considering PagerDuty Operations Cloud is to first understand their organizational needs before selecting any tool to prevent mismatches.
I believe all aspects of my experience with PagerDuty Operations Cloud have been covered.
I would rate my overall experience with PagerDuty Operations Cloud a seven out of 10.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Streamlines incident response and has built customer trust but needs deeper analytics insights
What is our primary use case?
My main use case for PagerDuty Operations Cloud is incident management, as we use it for alerting people who are on call.
I definitely use PagerDuty Operations Cloud for incident management; we have set up the account, schedule, teams, etc., and we continuously monitor our logs for any anomalies with proactive alerts. We define priority because we don't want to alert people on the phone unnecessarily, thus we categorize alerts based on severity and business disruption, sending information via the integrated APIs to the relevant teams, specifying whether to communicate through Slack or phone based on the severity.
This is the main use case we have; it's a tool that last mile connect kind of people use.
How has it helped my organization?
PagerDuty Operations Cloud positively impacts my organization by helping us win customer trust; when problems occur, the speed of our reaction and involvement with customers is crucial, and PagerDuty Operations Cloud facilitates quick responses to potential issues. PagerDuty Operations Cloud allows our team to react swiftly, which can be challenging without it, as we can't manually sift through all logs. Automation for remediation is also in place, enhancing confidence and allowing some issues to be resolved without manual intervention.
What is most valuable?
The best features of PagerDuty Operations Cloud include integration, mobile app, reporting, and analytics, which I find very useful based on the access I have.
We review the data periodically to see our performance; for example, we check for alert fatigue, how many alerts have been addressed, and our TDX metrics such as time to respond. The analytics of PagerDuty Operations Cloud is so good that it gives me good visibility with just a few clicks, which helps in discussions with the team for continuous improvements.
What needs improvement?
More analytics can be brought into PagerDuty Operations Cloud; while I know there are some, they still seem basic to me, and having options for user-customized charts would be really helpful, especially in this GenAI world where prompts can yield valuable data.
The analytics provided by PagerDuty Operations Cloud can be significantly improved, as they still feel basic to me.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud for more than five years.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is stable.
How are customer service and support?
I don't think I have ever needed customer support because our usage and the lack of issues may prevent any corner cases or other problems, or perhaps it relates to licensing. I personally have never utilized customer support.
Which solution did I use previously and why did I switch?
I previously used another solution which I won't name because it's proprietary, but the user experience was not great; when I learned about PagerDuty Operations Cloud, it had a lot of positive discussions, and I was excited to find we were already using it when I joined this company.
What was our ROI?
TDX metrics are definitely improving due to PagerDuty Operations Cloud being in place; time to respond, time to initiate, and time to mitigate are key metrics influenced positively.
What's my experience with pricing, setup cost, and licensing?
I usually am not involved in pricing, setup cost, or licensing, as that's handled by another team, so I don't have much visibility on that part.
What other advice do I have?
I advise others to use PagerDuty Operations Cloud, as it's going to help in building customer trust. There's an operations team for those aspects. My overall review rating for PagerDuty Operations Cloud is 7 out of 10.
Alert workflows have reduced missed incidents and now scheduling needs simpler complex rotations
What is our primary use case?
I have been in my current role for the past 18 months, and we started using PagerDuty Operations Cloud earlier this year around January or February to manage our operations.
PagerDuty Operations Cloud's primary use case is alerting. We switched to ensure alerts are efficient and effective so that the on-call engineer does not miss any alert. We instrument many alerts on it, including VPN downtime, transaction monitoring, success rate, and latency. We configured PagerDuty Operations Cloud so that if any of those metrics are met or if any of those SLOs and SLIs are breached, we can quickly take action and resolve the issue. For day-to-day use, we run a 24-hour shift where all shifts are entered into the system, and every on-call engineer uses PagerDuty Operations Cloud to receive alerts. Beyond alerting, we also use scheduling, incident management, and incident reports.
What is most valuable?
The best features of PagerDuty Operations Cloud include alerting, which is very important and the main reason we retain it, and scheduling as well.
Initially, we used Excel to manage our on-call engineers' schedules, but with PagerDuty Operations Cloud, it shows when you are on duty and allows other team members from different teams to check who is on duty without needing to ask. This has significantly reduced the time spent on checking who is on duty by providing visibility at each point.
Scheduling with PagerDuty Operations Cloud has reduced confusion because we set it up with a round-robin rotation, and nobody needs to update it every day unlike with Excel, where we had to create a new schedule every two months. Now we only make changes when necessary, making the process more efficient and organized for on-call engineers to know when they are on duty. The system also alerts them in advance for their upcoming shifts.
What needs improvement?
One way PagerDuty Operations Cloud could improve is through the scheduling feature, which can be tricky, especially with complex schedules. I have found it stressful to schedule effectively, even after going through PagerDuty University and the forums. Sometimes I need to manually interchange people because minor changes can scatter the whole schedule. A more efficient scheduling system or better guidance for complex schedules would help.
Another area for improvement is alerting. When multiple incidents occur simultaneously, it would be helpful if alerts listed the issues instead of muddling them together. This would make it easier to manage what needs urgent attention without missing anything.
Which solution did I use previously and why did I switch?
Initially, when I first joined the company, we primarily used Grafana and Slack as our means to manage incidents. The alert was on Slack, and the dashboard was on Grafana, which required us to use three different applications to do the same thing.
With PagerDuty Operations Cloud now, we don't need to go through multiple tools to manage alerts and incidents. We don't need to go through Jira to log incidents. It streamlines the process, and with incident management, it can escalate to the next person so that alerts are rarely missed. It has made our workflow easier and much more efficient.
What other advice do I have?
For incident management in my team, PagerDuty Operations Cloud has really helped with alerting in such a way that when an issue happens, it reaches out to the on-call engineer to ensure they don't miss it. There is a pop-up, probably on your browser or phone, and if you miss the pop-up or don't acknowledge it in time, it moves to your phone and starts calling; sometimes it sends texts and sometimes calls your phone. The call is very persistent, so if the incident is not acknowledged, it escalates to the next line, which can be your manager or your functional manager, and it keeps escalating until it gets acknowledged. This way, the alert is rarely missed because at some point, somebody will surely pick up.
PagerDuty Operations Cloud helps us effectively manage incidents without needing to sit down all day and watch our screens.
Alerting is key, and scheduling is also important but not as crucial as alerting. We also use incident management and incident reporting, which allow us to manage who should be escalated to during incidents and keep track of when incidents happen and when they are resolved so that everyone knows what occurred and how it was handled.
PagerDuty Operations Cloud has positively impacted my organization by providing effectiveness and efficiency in the way we work, with less alert fatigue, meaning alerts are rarely missed. For example, if four or five alerts come on Slack at the same time, you might miss them while focusing on resolving current issues. However, with PagerDuty Operations Cloud, since it calls for every issue, you will see any new alerts and resolve them, thus reducing missed alerts and increasing efficiency. This leads to better service for our end users, increased profit, and less pressure on engineers, making it a win-win for everybody.
Our MTTR has significantly reduced; however, I cannot provide specific numbers because with Slack, we were not measuring it accurately. Now with PagerDuty Operations Cloud, we can measure how long it takes to acknowledge alerts and resolve issues, giving us metrics to manage this effectively.
My advice for others looking into using PagerDuty Operations Cloud is that if their workflow requires them to be alert to incidents while continuing their work without being tethered to a screen, it is a very helpful tool to have.
One additional thought about PagerDuty Operations Cloud is that if they started issuing certificates for completing courses on PagerDuty University, it would encourage more people to engage with the training, similar to how New Relic operates. Having a certificate would demonstrate rigorous training and the capability to apply what was learned. I would rate this product a 6 out of 10.