PagerDuty Operations Cloud logo

    PagerDuty Operations Cloud

    Sold by
    The PagerDuty Operations Cloud is essential infrastructure for all unplanned, time-sensitive, critical work. It automatically detects and diagnoses disruptive events mobilizes the right team members to respond and automate infrastructure and workflows across your digital operations. This means you can resolve unplanned, unstructured, time-sensitive, and high-impact issues quickly - with fewer escalations to your technical teams while minimizing the impact on your customers and maintaining brand trust.

    Ratings and reviews

    4.5
    1007 ratings
    36 AWS reviews
    |
    971 external reviews
    External reviews are from G2  and PeerSpot .

    Filters

    Review type

    AWS Marketplace reviews
    External reviews
    Reviews (1007)
    Trevis Macnon

    Incident tracking has become centralized and real-time alerts have supported structured responses

    Reviewed on Jun 09, 2026
    Review from a verified AWS customer

    What is our primary use case?

    We deployed PagerDuty solely to resolve incidents that were previously scattered across emails, making them difficult to track. PagerDuty has a brilliant interface that logs incidents and helps me track issues encountered with software and apps on my workstation.

    How has it helped my organization?

    PagerDuty has been instrumental in our IT department as we integrate it with monitoring platforms to help trigger alerts in real-time.

    What is most valuable?

    There are reliable alerts escalation, seamless on-call scheduling across teams, offers structured response, and strong integration.

    What needs improvement?

    I would like to see the configuration simplified.

    For how long have I used the solution?

    I have been using the solution for 5 years.

    Which solution did I use previously and why did I switch?

    We have not used any other solutions before PagerDuty.

    What's my experience with pricing, setup cost, and licensing?

    It's a great tool and very reliable, with alerting escalation and keeping incidents from failing through the cycle.

    Which other solutions did I evaluate?

    We considered Slack as an alternate solution.

    What other advice do I have?

    PagerDuty has a complex setup, and its advanced features feel hidden within this complexity.

    Saurab Gnagurde

    On-call automation has reduced critical incident impact and ensures faster production responses

    Reviewed on Jun 08, 2026
    Review from a verified AWS customer

    What is our primary use case?

    As a cloud operation team, I was a user who set the alerts, and whatever important incidents or anomalies were detected that needed to be immediately taken care of were bifurcated through our APM tools that we integrated with PagerDuty Operations Cloud. As a cloud operation team, we supported the platform for rotational shifts. My roles involved setting the person in the shift according to the shift roster, so whenever any incidents triggered, they would get the call. The primary use was supporting production operations and cloud activities.

    Our multi-environment consists of AWS infrastructure, Linux servers, Kubernetes clusters, and customer-facing applications. PagerDuty Operations Cloud was mainly used for incident management and alerting. We integrated it with AppDynamics, Instana, and CloudWatch, where it would monitor the patterns and platform, and then PagerDuty Operations Cloud would generate the critical alerts that the appropriate support team who was working in that present shift would get notified of immediately. This platform really helped us manage production incidents beyond service outages, mostly high CPU utilization where we set alerts, application failures, pod issues in Kubernetes, and infrastructure-related alerts. We configured all kinds of alerts, which ensured that alerts were routed to the correct on-call person, helping us reduce response time in critical situations.

    What is most valuable?

    One of the best features I would mention about PagerDuty Operations Cloud is its on-call rotational scheduling support and escalation management practices. If an engineer did not acknowledge the alert within a defined time frame, the incident was automatically escalated to the next person, support team, or manager of that specific team. Another useful feature was its integration capability. We were able to integrate PagerDuty Operations Cloud with monitoring and observability tools that allow alerts to generate automatically whenever issues were detected in the environment within a fraction of time. We also had the mobile application that was very helpful because the engineer could receive calls, notifications, and acknowledge the incident and track the updates even when they were away from their laptop.

    I also valued the centralized incident management dashboard that provides visibility into active incidents, response status, escalation history, and overall operational health. I used to get all the data accumulated there through the dashboard.

    PagerDuty Operations Cloud helps us manage production incidents beyond service outages, mostly high CPU utilization where we set alerts, application failures, pod issues in Kubernetes, and infrastructure-related alerts.

    What needs improvement?

    My experience with PagerDuty Operations Cloud has been positive overall. One area where I believe improvement can be made is reporting and dashboard customization to make it more user-friendly. The operations team often requires different views compared to the management team. Having more flexibility in generating custom reports would be helpful. Another improvement could be providing more advanced AI-driven collaboration capabilities to reduce unnecessary noise alerts and help the team focus on the most critical issues. Apart from these areas, the platform is very reliable and effective for managing production incidents and on-call operations.

    For how long have I used the solution?

    I have been using PagerDuty Operations Cloud for almost five to six years.

    What do I think about the stability of the solution?

    PagerDuty Operations Cloud has been stable and performing well wherever our incident management or alerting was configured for production support. Timely notifications and incident responses were critical. PagerDuty Operations Cloud delivers alerts immediately through multiple channels which we configured, including mobile on-call notifications, email, SMS, and phone calls. Since PagerDuty Operations Cloud was integrated with our monitoring and observability tools, it helped ensure that critical incidents were captured and routed to the appropriate on-call team. During my usage, I did not encounter any significant outages or stability issues that impacted our operations due to PagerDuty Operations Cloud.

    What do I think about the scalability of the solution?

    PagerDuty Operations Cloud is highly scalable and works well with small and large environments. The project I worked on was integrated with multiple application servers and cloud resources for monitoring. PagerDuty Operations Cloud handles all the alerts from different resources and routes them to the appropriate teams. As the infrastructure grows, new services get implemented, escalation policies get defined, and schedules and teams are easily available without requiring major changes in our existing setup. This makes it suitable for an organization to manage large cloud infrastructure and multiple team supports.

    Which solution did I use previously and why did I switch?

    When I joined this project, they had already implemented PagerDuty Operations Cloud. When I joined, the SOPs and testing were already in process. After a few days, when I was actually onboarded, many of the alerts were configured in PagerDuty Operations Cloud. I did not get the chance to work on different tools besides PagerDuty Operations Cloud.

    How was the initial setup?

    During the initial setup of PagerDuty Operations Cloud, when I joined the project, I got a Jira ticket listing a few of the servers where I needed to install PagerDuty agents so it could trigger any alerts or integrate with the server. I was mostly involved in the configuration part.

    The setup was straightforward. PagerDuty Operations Cloud also helped us in this process. It was not directly integrated on the individual servers, but we integrated our monitoring tools and observability with PagerDuty Operations Cloud. The servers and applications were monitored through application monitoring tools such as Instana, Zabbix, and Splunk. Whenever critical alerts were generated, they would automatically forward to PagerDuty Operations Cloud through the configured integrations we set up with the application. PagerDuty Operations Cloud would notify the on-call engineers and follow different escalation policies if the alerts were not acknowledged within a specific time. Our flow was that we had EC2 instances, AWS servers, and CloudWatch alarms, and if any alert triggered, it would send through SNS, AWS Simple Notification Service, and then to PagerDuty Operations Cloud and the on-call engineer.

    What about the implementation team?

    We followed the documentation provided by PagerDuty Operations Cloud for the configuration part.

    The documentation is full-fledged with proper details on how to configure it depending on the integration with any application monitoring tool. They specify what steps need to be followed. If integrating with servers, they mention which type of server, whether it is Windows or Linux, and accordingly, they have provided all the documents. The documentation is comprehensive and easy to understand, such that even a layperson can do the configuration part with the way they have provided the documentation.

    What other advice do I have?

    We are not mostly focused on utilizing PagerDuty's autonomous AI agents because we are working on cloud infrastructure where we do the deployments. We have not implemented AI in our cloud to that extent. Going forward, if our infrastructure is AI-based, then we will definitely explore where PagerDuty Operations Cloud can help in that.

    As of now, we do not use generative AI capabilities of PagerDuty Operations Cloud. Our infrastructure is huge, and there is a dedicated developer team working on AI-related things. They are still in two POCs, and the POC is being evaluated. If it looks good, then only we can roll this out into production because my application is customer-facing, and we do not want anything to go wrong or if the alert triggers unnecessarily due to some AI alert that did not notify us. That would ultimately cause us to lose our SLAs and SLOs, and all the other escalation matrices would come into the picture. That is why we are still in POCs as it is critical.

    That part is taken care of by a different team or mostly the clients themselves. My main role is to keep the environment always up and running, and all alerts should be properly centralized and customized accordingly.

    PagerDuty Operations Cloud is basically where we get the alert, and we can integrate through Slack and on-call rotational shifts on cell phones. Prior to this, we were mostly relying on application monitoring tools only and emails and Slack notifications. If an on-call shift person is not at their desk and if any alert has been triggered and no one is there to acknowledge it or look into it and take necessary action, then ultimately there will be customer impact. That is why we implemented PagerDuty Operations Cloud. Even if the on-call person is not near their laptop, they will get the call and can immediately acknowledge and report to the team that we have received a P1 call for this specific environment or that the alert is regarding a production issue. Another team member will immediately take action, so there will not be any miss.

    I did not encounter any issues that required contacting support for PagerDuty Operations Cloud. This review represents an overall rating of 9 out of 10.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    reviewer2848911

    Centralized incident workflows have reduced outage windows and improved response coordination

    Reviewed on Jun 04, 2026
    Review from a verified AWS customer

    What is our primary use case?

    PagerDuty is predominantly used for our enterprise notifications for all of the incident management processes, especially the major incident management. We have many applications and infrastructure components. Earlier, we used a solution that only provided text-based communication. When we wanted to look for something with multi-channel notification and correlation capability, that is where we leveraged PagerDuty Operations Cloud.

    I am currently going through the governance process to get additional capabilities onboarded. GenAI is not yet enabled since I am from a regulated organization and had to secure approvals before enabling any AI-related components. Most probably in the next two or three months, we will be enabling both GenAI, SRE agent, and the AI capabilities of PagerDuty Operations Cloud.

    What is most valuable?

    The ease of use is one of the key strengths. Creating the escalation policies and notification channels per user is straightforward, and it is not a requirement that everyone has the same notification rules. Users have flexibility in getting the communication they need. Event orchestration is the other part which works well for us.

    Primarily, we were able to get the right people at the right time through our escalation mechanism, which is an automated switch from level one to level two. This helped us improve the overall MTTA, and the acknowledgment rate has drastically improved. For the major incidents, we were able to triage everything with PagerDuty Operations Cloud itself instead of switching between multiple tools such as Teams or other orchestration platforms. With one solution, we are able to do the triaging, and that definitely reduced the outage window and the average outage window.

    We do have automations in two main ways. One is the incoming automation where we have multiple monitoring tools and systems that generate events. We ingest them into PagerDuty Operations Cloud and then using event orchestration, we create all of the respective incidents, whether they are PagerDuty Operations Cloud-only incidents, ServiceNow incidents, or different methods we use. The other automation method is incident workflows where we are able to call out to respective endpoints for the remaining automations. This is growing at this point in time, but event orchestration is mainly what we use for the automation of the triaging.

    We used to have a two-digit figure of MTTA, and now it is reduced to less than one hour.

    Getting the right people on board whenever there is a major issue and dialing them individually took a longer time. Now with PagerDuty Operations Cloud, having all of the predefined rules and the orchestrations we can create, it is definitely bringing value. Bringing the right people at the right time and improving the restoration time so that we do not impact any of the business end-user services is where PagerDuty Operations Cloud definitely plays a key role in delivering the business value.

    What needs improvement?

    I have submitted a few enhancement requests. Dynamic scheduling is something I was waiting for almost three or four years. Finally, I believe they are coming up in a few weeks with dynamic scheduling because whenever any operations deals occur, the shift rosters will not be static. People may be rotating between different shifts, and setting up on PagerDuty Operations Cloud was a challenging task. They are in the early access stage of dynamic rosters, and I believe that will address this issue. On the reporting perspective, there is a wide variety of reports, and the out-of-the-box reports can be matured further. Though we are getting customized reports through professional services and it is beneficial, if they were out-of-the-box, then they would further help. There are plenty of reports, but still, there is maturity that can be addressed.

    For how long have I used the solution?

    I have been using PagerDuty Operations Cloud for almost four years.

    What do I think about the stability of the solution?

    There are not many issues except during Cloudflare or major AWS issues. Otherwise, we do not have any performance issues. The platform is performing well.

    What do I think about the scalability of the solution?

    PagerDuty Operations Cloud is scalable, but how you will take the business model matters. We are on the user license basis, so we know how many users we can onboard to PagerDuty Operations Cloud. The rest of the things are definitely scalable, depending on how you agree with them on the contractual level. There is no challenge with that unless you have not calculated or forecasted your requirement.

    How are customer service and support?

    I used PagerDuty Operations Cloud support.

    I would say they are pretty good, with regular support scoring eight or nine out of ten, and professional services scoring around nine out of ten. Both are pretty good for our business requirements.

    Which solution did I use previously and why did I switch?

    We were using different HP tools for all of the alerting and also a solution from OnSolve, earlier called TelAlert. Those solutions were distributed and not one central solution for incident management and alerts. Now it is centralized with one of our ITSM tools and PagerDuty Operations Cloud for both alerting and incident management.

    How was the initial setup?

    The initial setup was comparatively easy. We had to train the people because it was a new solution altogether. We got professional services support, and they helped us move forward. We did not have many challenges on the system level. Only user experience took more time as the team needed to learn how to use and operate the solution.

    What about the implementation team?

    I used PagerDuty Operations Cloud support.

    What was our ROI?

    From the pricing perspective, we got a good deal. When we took the tool, we did a comparison of the competitors and evaluated, and we are satisfied with that pricing. From the ROI perspective specific to the tool, we have not had a chance to calculate it. But overall, with the end-to-end process where PagerDuty Operations Cloud is present, I think we are almost near to getting the ROI.

    Which other solutions did I evaluate?

    We verified Twilio and two other solutions at that time.

    What other advice do I have?

    I would definitely ask them to do a PoC and do integrations with their existing ITSM tools or wherever they are looking for and thoroughly verify one end-to-end testing. Taking a major incident as a simulation and performing comparison on what metrics they do internally and what additional could help them out with the new solution of PagerDuty Operations Cloud, I think these two things definitely should be tested.

    PagerDuty Operations Cloud as a product, I would give an eight out of ten. The only reason I put eight instead of ten is the enhancement requests or any new features. The time to market has to be much faster than what they have at this point. Some flexibility on the customization should also be provided. My overall review rating for this product is eight out of ten.

    Which deployment model are you using for this solution?

    On-premises

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    reviewer2848710

    Cloud platform has improved efficiency and time savings but still needs stronger AI integrations

    Reviewed on Jun 03, 2026
    Review from a verified AWS customer

    What is our primary use case?

    I have been using PagerDuty Operations Cloud for the past few months. My main use case is that I use it in my day-to-day applications. A specific example of how I use PagerDuty Operations Cloud in my application is that I use it for hosting my agent. Hosting my agent on PagerDuty Operations Cloud helps me with my day-to-day work by being efficient in terms of scalability and managing infrastructure. It has been pretty helpful.

    What is most valuable?

    PagerDuty Operations Cloud's best features include scalability, managing infrastructure, and managing other services. It helps me manage other services comprehensively, and I think it is pretty good overall. PagerDuty Operations Cloud has positively impacted my organization by being effective in terms of managing the system and in terms of scalability. A specific outcome that shows how PagerDuty Operations Cloud has helped my organization is that it has improved efficiency and helped in saving a lot of time.

    What needs improvement?

    I think PagerDuty Operations Cloud can be improved in terms of services, such as integration with AI.

    For how long have I used the solution?

    I have been working in my current field for the past three years.

    What do I think about the stability of the solution?

    PagerDuty Operations Cloud has been stable based on what we have used.

    What do I think about the scalability of the solution?

    PagerDuty Operations Cloud's scalability has been pretty good because we are able to spin up different resources based on the use case and load.

    How are customer service and support?

    We have not used customer support explicitly.

    Which solution did I use previously and why did I switch?

    I did not previously use a different solution.

    What was our ROI?

    I have seen a return on investment, as I mentioned earlier; there has been a lot of improvement in terms of time and cost.

    What's my experience with pricing, setup cost, and licensing?

    My experience with pricing, setup cost, and licensing has been quite reasonable and cost-effective.

    Which other solutions did I evaluate?

    Before choosing PagerDuty Operations Cloud, I did not evaluate other options.

    What other advice do I have?

    I would rate PagerDuty Operations Cloud six out of ten because I believe you can add more features to make the platform even better. Regarding PagerDuty Operations Cloud's AI capabilities, I think its governance and security are pretty good, and the applications are quite secure. As for PagerDuty Operations Cloud's accuracy and reliability of output, I think the accuracy is pretty high and pretty good, and I believe it should be quite reliable, though I have not explored much on the recent AI capabilities.

    I would definitely suggest PagerDuty Operations Cloud as a good platform, but it depends on your use case and the amount of scalability that you are looking for. PagerDuty Operations Cloud is pretty good and quite helpful. My overall rating for PagerDuty Operations Cloud is six out of ten.

    Which deployment model are you using for this solution?

    Private Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    reviewer2848644

    Centralized alerts have improved incident response and now support flexible on-call workflows

    Reviewed on Jun 03, 2026
    Review provided by PeerSpot

    What is our primary use case?

    My main use case for PagerDuty Operations Cloud is on-call staff. For instance, when we have sites go down, we need somebody to investigate, so we require a text SMS or a phone call alert.

    What is most valuable?

    PagerDuty Operations Cloud offers several best features including cloud-based hosting, reliable performance, and flexible expandability.

    Regarding the flexibility and expandability, you can scale up and down the amount of employees, add different paths to contacting people, and have monitoring capabilities, which has greatly helped my team.

    PagerDuty Operations Cloud has positively impacted my organization with its very good interface and centralized operation. Having a centralized interface has made things easier by providing easy access administration.

    What needs improvement?

    PagerDuty Operations Cloud could be improved with clearer instructions for beginners.

    For how long have I used the solution?

    I have been using PagerDuty Operations Cloud for a year.

    What do I think about the stability of the solution?

    PagerDuty Operations Cloud is stable.

    What do I think about the scalability of the solution?

    The scalability of PagerDuty Operations Cloud is very good; when we need to add or reduce employees, it can adjust.

    How are customer service and support?

    Customer support has been very good, and I can reach somebody anytime. I would rate customer support an eight on a scale of one to ten.

    Which solution did I use previously and why did I switch?

    Previously, we used just a custom alerting solution.

    How was the initial setup?

    We are testing AI and automation through PagerDuty Operations Cloud for incident response right now, but not too much has changed yet.

    What's my experience with pricing, setup cost, and licensing?

    My experience with pricing, setup cost, and licensing has been fairly reasonable and not expensive.

    Which other solutions did I evaluate?

    Before choosing PagerDuty Operations Cloud, I did not evaluate other options and only considered some standard custom operator solutions.

    What other advice do I have?

    I would rate PagerDuty Operations Cloud an eight out of ten because it is pretty good, but it is not perfect yet. Regarding PagerDuty Operations Cloud's AI capabilities, I think its governance and security are pretty good with no issues. Regarding PagerDuty Operations Cloud's AI capabilities, I find its accuracy and reliability of output to be pretty accurate and pretty stable. My advice to others looking into using PagerDuty Operations Cloud is to see how many users you need and use the licensing accordingly. My overall review rating for PagerDuty Operations Cloud is eight out of ten.

    Aksharma Aksharma

    Automated incident alerts have reduced response times and improve on-call efficiency

    Reviewed on Jun 02, 2026
    Review from a verified AWS customer

    What is our primary use case?

    My main use case for PagerDuty Operations Cloud is monitoring multiple platforms. In cloud operations, whenever any application or device has an issue, it triggers PagerDuty, and the on-call shift engineer can immediately check on it.

    For a specific example of how I have used PagerDuty Operations Cloud to handle an incident, if any media connect flow breaks or applications deployed on cloud instances have issues, we start receiving alerts. This can be configured using PagerDuty schedules to inform the on-call engineer to take immediate action.

    Regarding my main use case, I would add that we do not need to be on the dashboards or monitor manually. Instead, our APIs work in the back-end. They check for things working on the platform's cloud, and if any action is required, the appropriate team can be aligned using PagerDuty.

    What is most valuable?

    The best features PagerDuty Operations Cloud offers are the policies and the insights. It provides a lot of data which is required to drill down the specific errors we have to work on. The escalation policy works well because it timely triggers the respective people required to work on those issues.

    I use those insights and escalation policies in my day-to-day work to review the root causes of why we are getting those issues and why in such high numbers. It helps us to drill down the specific areas where we need to improve in order to have better robustness in terms of solution. It also creates the required awareness. We can use PagerDuty Operations Cloud to trigger the correct stakeholders who need to be involved when such issues occur, based on the definition defined.

    PagerDuty Operations Cloud has positively impacted our organization by helping us save a lot of costs, and our responsiveness and actions towards any issues on the platform have improved many folds. If any issue occurs in real-time, we get paged. The correct engineer starts working on it. Considering we have thousands of customers, it is not possible to monitor all of those, but PagerDuty Operations Cloud helps in defining which ones are actionable and which ones can be ignored.

    Regarding response times, it used to take around 10 to 15 minutes, and now that can be achieved within seconds. Similarly, for operations, around five to six engineers were required in a shift, and now that can be fulfilled using two to three engineers. That definitely saves a lot of effort and time and helps to improve operations as well.

    What needs improvement?

    PagerDuty Operations Cloud can be improved by using proper AI into it, where several actions can be triggered from PagerDuty Operations Cloud itself, instead of writing code. That could be a good approach. Additionally, grouping alerts would be beneficial. Though grouping alerts exists in PagerDuty Operations Cloud, it is not that effective.

    Regarding PagerDuty Operations Cloud's AI capabilities, I am not aware of how the security and governance aspects are being handled. For an AI tool from PagerDuty Operations Cloud, it would be helpful in reading the correct metrics and can perform actions automatically. Based on the data we can feed, we can establish use case scenarios.

    Concerning PagerDuty Operations Cloud's AI capabilities, I think its accuracy and reliability of output is still maturing. I would say it is going well. Still, a lot of work needs to be done to get this working fluently. Whatever we have achieved so far is decent to use.

    Customer support can be improved. At times, we get a delay in response. It takes time to get things back on track and to get the fulfillment done. That is something PagerDuty can work on.

    For how long have I used the solution?

    I have been using PagerDuty Operations Cloud for around seven to eight years.

    What do I think about the stability of the solution?

    PagerDuty Operations Cloud is stable.

    What do I think about the scalability of the solution?

    Regarding PagerDuty Operations Cloud's scalability, it is good. As and when you require it based on the license, you can immediately get it. That is not much of a problem for us.

    How are customer service and support?

    Customer support can be improved. At times, we get a delay in response. It takes time to get things back on track and to get the fulfillment done. That is something PagerDuty can work on. I would rate the customer support on a scale of one to ten as seven.

    Which solution did I use previously and why did I switch?

    We were using PagerDuty Operations Cloud only. We did not use any other solution; it was always PagerDuty Operations Cloud.

    How was the initial setup?

    Regarding my experience with pricing, setup cost, and licensing, pricing looks a little on the higher side, which can definitely be improved. Setup is quite easy and nice and convenient to use.

    What about the implementation team?

    PagerDuty Operations Cloud was purchased by a security team. They take on those licenses in bulk. It is being used. It is not us directly purchasing it, but there is a specific team for that.

    What was our ROI?

    I have seen a return on investment. Previously, this was handled by a team of 40 to 50 people. Now, in terms of licenses, we can operate similar functionality with fewer people using AI tools in place. Those actions get automatically performed, so not everyone needs licenses. This definitely saves cost.

    What's my experience with pricing, setup cost, and licensing?

    Regarding my experience with pricing, setup cost, and licensing, pricing looks a little on the higher side, which can definitely be improved.

    Which other solutions did I evaluate?

    Before choosing PagerDuty Operations Cloud, we evaluated Opsgenie, which looks good, but it is in the initial state. It will be interesting to see how they evolve soon. Datadog is good. Then there is Grafana On-call and GoAlert. There used to be a tool called One Time. That was quite decent as well. But trusting the legacy of PagerDuty Operations Cloud and its reliability, we went with PagerDuty Operations Cloud.

    What other advice do I have?

    My advice to others looking into using PagerDuty Operations Cloud is that it is quite a stable platform. You can use it and work on it easily. It is an effective tool. Considering what is available in the market right now, PagerDuty Operations Cloud definitely has an edge in functionality and robustness.

    Concerning PagerDuty Operations Cloud's generative AI's effectiveness in providing insights for decision-making, it is good. These are new types of models, so it is getting tuned to our requirements and business requirements. Initially it is good, but it can improve.

    For automations, we are using PagerDuty Operations Cloud things and then it triggers our tool. As soon as it gets triggered with the specific information, the respective actions are performed. This has become more of an automation, and there is a team working on AI as well. They are looking to get this built up, but it is a work in progress as of now.

    I am not aware of much regarding the solution's alert reduction feature on preventing costly incidents in my organization.

    It saves time regarding how PagerDuty Operations Cloud's AI functionality has improved my team's ability to focus on core tasks rather than routine issues. The repetitive actions, reporting, and all those have become automatic now. A person or human does not need to sit and perform those. It has saved a lot of effort and time and monotony of work. I would rate this product overall as an eight out of ten.

    Which deployment model are you using for this solution?

    Private Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    Di Singh Negi

    Incident workflows have transformed and now reduce downtime for critical gaming services

    Reviewed on May 31, 2026
    Review from a verified AWS customer

    What is our primary use case?

    My name is Dinesh Singh Negi and I currently work as a Lead DataOps Engineer in the online gaming industry. My primary responsibility is ensuring the reliability, availability, and performance of our data platform and complete production system. I work extensively with AWS services, Prometheus, Grafana, and PagerDuty for monitoring, alerting, and incident management. My team supports critical gaming workloads and data pipelines that require high uptime and quick incident response. A significant part of my role involves setting up monitoring strategies, managing on-call operations, handling production incidents, and performing root cause analysis. We drive operational improvements, and we use PagerDuty Operations Cloud as our central incident management platform to ensure alerts are routed to the right team and escalated appropriately. I have been working in operations and reliability for nine to ten years and have hands-on experience managing large-scale customer-facing environments where managing, minimizing downtime, and reducing meantime to resolution are key priorities. We use PagerDuty Operations Cloud to understand the maximum time of acknowledgment and maximum time of resolution to derive meaningful analysis from the incidents that have been triggered to different teams.

    I have been working for nine to ten years in operation, production support, reliability engineering, and mixed roles during this time. I have worked extensively on monitoring, incident management, system reliability, and operational excellence while particularly supporting large-scale online platforms and data operations. For five to six years, my focus has been on ensuring high availability, managing production incidents, optimizing monitoring and alerting strategies, and improving operational processes. Throughout these years, I have gained hands-on experience with AWS Cloud, Prometheus, Grafana, and PagerDuty Operations Cloud, which are the core tools we use for monitoring, alerting, and incident responses.

    What is most valuable?

    The best features are those we have been using for incident management. We have been using PagerDuty Operations Cloud for on-call scheduling, escalation policies, and integration capabilities. Incident management is extremely valuable because it ensures critical alerts are delivered to the right people immediately. On-call scheduling and escalation policies are very helpful because we can define clear ownership for the services and automatically escalate incidents if they are not acknowledged within a specific timeframe. Another key strength is the integration ecosystem. We can integrate it with our monitoring stack including Prometheus, Grafana, and AWS services, which helps us automate alerts ingestion and incident creation without manual intervention. The most valuable features are automating alerts, escalations, on-call management, integrations, and incident analytics.

    One example that stands out was a production incident where we experienced a sudden spike in database latency during peak gaming hours. This started impacting player transactions and causing delays in some backend services. Our Prometheus and Grafana monitoring detected this abnormal latency and error rate increase, which went beyond a threshold, and the alert was automatically routed to PagerDuty Operations Cloud. PagerDuty Operations Cloud immediately notified the on-call engineer of our team and triggered the escalation workflow based on the incident severity. Since the issue occurred during peak traffic, quick response was critical, which was maintained. PagerDuty Operations Cloud helped us coordinate multiple teams, including DataOps, application, and other infrastructure teams. The platform helped ensure everyone was engaged quickly and that no critical notifications were missed. While we were under investigation, we identified a resource bottleneck in the database layer caused by an unexpected traffic surge. With the help of the database team, we scaled the required AWS resource and optimized a few long-running queries. This restored normal performance.

    What needs improvement?

    A significant positive impact is improving incident response efficiency and overall service reliability. Before we had a mature incident management process, coordinating responses during critical issues often required manual communication and follow-ups. PagerDuty Operations Cloud automated all of those things, including alert ownership, escalation, ensuring that incidents are routed to the right team members immediately. One of the most measurable benefits is the reduction in meantime to acknowledge and meantime to resolve. Faster detection and response help minimize service disruptions and maintain a stable experience for our users, which is especially important in the online gaming industry where availability and performance directly affect customer satisfaction. The platform has helped us mature our operational practices by analyzing incident trends, alert volumes, and escalation patterns. We have been able to refine our monitoring, reduce alert fatigue, and proactively address recurring issues before they become major bottlenecks in production.

    For how long have I used the solution?

    I have been using PagerDuty Operations Cloud for approximately more than five years.

    What do I think about the stability of the solution?

    PagerDuty Operations Cloud is stable.

    What do I think about the scalability of the solution?

    When you are using a tool for incident response, you need to trust that notifications and escalations work when a critical event occurs. PagerDuty Operations Cloud has been very dependable in that regard. Another aspect we have found valuable is the flexibility to support different teams and services as our environment grows. We have added new applications, data pipelines, and AWS service resources. We are able to extend our PagerDuty Operations Cloud configuration without major challenges or changes to our overall operational model.

    Which solution did I use previously and why did I switch?

    I have not used any solution previously. Since the beginning of 2021, I have been using PagerDuty Operations Cloud.

    How was the initial setup?

    The setup and customization process was relatively straightforward. The integrations were one of the easiest parts. PagerDuty Operations Cloud provides well-documented integrations for monitoring tools and cloud platforms. Connecting it with our Prometheus, Grafana, and AWS monitoring stack did not require significant development efforts. The initial setup involved configuring alert routing, defining service ownership, and mapping severity levels to appropriate escalation policies. Customizing on-call schedules and escalation workflows was also quite flexible. We were able to create different schedules for various teams, define escalation paths based on incident severity, and establish notification rules that match our operational requirements. As our team and environment grew, we refined the configuration further by tuning alert thresholds and reducing noise to avoid alert fatigue. It is important to ensure engineers receive only actionable alerts rather than excessive notifications.

    What about the implementation team?

    PagerDuty Operations Cloud's AI and automation capabilities are primarily used for alert correlation, event intelligence, noise reduction, incident prioritization, and providing operational context to responders. These capabilities help engineers identify and respond to issues more quickly while keeping humans in control of critical decisions. We see value in the direction of autonomous operations. If AI agents continue to improve in areas such as incident triage, root cause analysis, and automated remediation for well-understood scenarios, they could further reduce response times and operational overhead.

    What was our ROI?

    We have seen a positive return on investment from PagerDuty Operations Cloud through improved operational efficiencies, faster incident response, and reduced downtime. I cannot share financial figures, but I can speak to operational outcomes we have observed. Since implementing PagerDuty Operations Cloud and integrating it with AWS, Prometheus, and Grafana monitoring stack, we have seen measurable improvements in incident processes such as MTTA and MTTR, or reduced alert fatigue by using event correlation and alert deduplication. These improvements have helped us a great deal.

    Which other solutions did I evaluate?

    I did not get a chance to evaluate any other applications. When I was in the company, they were using PagerDuty Operations Cloud only, so I started with that.

    What other advice do I have?

    My advice would be to start with a clear incident management strategy rather than focusing only on the tool itself. PagerDuty Operations Cloud delivers the most value when you have well-defined service ownership, escalation policies, severity levels, and monitoring practices in place. The platform is very powerful, but its effectiveness depends on the quality of the alerts and operational processes behind it. I would also recommend investing time in alert tuning early on and integrating PagerDuty Operations Cloud with your monitoring stack, whether it is AWS, Prometheus, Grafana, or any other observability tool. Make sure the alerts being sent are actionable. Reducing noise from the beginning will help prevent alert fatigue and improve adoption among engineering teams. I would rate this product an eight out of ten.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    Patel Dhulva

    AI-driven incident management has reduced downtime and improves focus on strategic work

    Reviewed on May 30, 2026
    Review from a verified AWS customer

    What is our primary use case?

    PagerDuty Operations Cloud is a multifunctional digital operations platform that meets my organization's needs.

    I am impressed by this digital operations solution because it is the most appropriate tool for incident detection and alerting.

    PagerDuty Operations Cloud is a very user-friendly tool, highly accurate, and an easy-to-customize digital operations management system that suits my organization's needs.

    It has intelligent noise reduction capabilities that play a significant role in minimizing alert floods.

    What is most valuable?

    PagerDuty Operations Cloud offers top-tier features that enable real-time alerting and accelerate incident response.

    The solution is reliable and effective when it comes to automating routine diagnostic tasks.

    Regarding how the real-time alerting and automation features have helped my team, problem-solving became automatic, and incident management becomes less complex to manage.

    PagerDuty Operations Cloud has positively impacted my organization by enabling faster issue response, which helped reduce downtime, saved revenue by avoiding long outages, improved team accountability during incidents, reduced manual effort in handling alerts, and helped maintain a better customer experience.

    The solution's alert reduction feature has had a major impact on preventing costly incidents in my organization. By grouping related alerts and de-duplicating noise, my team was able to spot real issues faster instead of getting buried in alerts, helping us prevent two to three potential outages because engineers responded to the root alert instead of missing it in noise.

    What needs improvement?

    The user interface should be easier to customize and use.

    The pricing could be less expensive, especially for smaller organizations.

    The user interface could be made easier to customize and navigate so that users who are new to this platform find the learning curve smoother.

    PagerDuty Operations Cloud needs improvements because sometimes integrations are not very seamless and misbehave.

    For how long have I used the solution?

    I have been using PagerDuty Operations Cloud for about one year and a few months.

    What other advice do I have?

    PagerDuty Operations Cloud is a great operational efficiency tool, not just for paging.

    It is very cost-effective, especially for organizations that are not limited by budgets.

    PagerDuty Operations Cloud solves a lot of problems.

    For example, if any issue arises during our online exam with our client, then PagerDuty Operations Cloud alerts the right team and the right people, and tasks are assigned so those problems can be resolved at the correct time and our real task does not get disrupted.

    PagerDuty Operations Cloud's AI functionality has improved my team's ability to focus on core tasks rather than routine issues by removing routine alert triage.

    The AI groups and de-duplicates alerts automatically, so our engineers are not manually sorting through twenty duplicate notifications for one root issue, allowing them to save a lot of time and focus on other strategic tasks, which improves productivity in my organization.

    We are using PagerDuty Operations Cloud's autonomous AI agents for low-severity incidents, which automatically triage, correlate, and resolve known issues without human intervention, such as restarting services or acknowledging flapping alerts.

    This has contributed to efficiency by cutting manual workload by thirty-five percent and also reducing MTTR for routine incidents.

    The effectiveness of PagerDuty Operations Cloud's generative AI in providing insights for decision-making is effective during incidents.

    The AI provides clear insights through incident summaries and what-changed analysis, helping us decide where to start troubleshooting instead of guessing, enabling us to make data-driven decisions easily, and providing actionable insights that improve response decisions.

    The influence of PagerDuty Operations Cloud's embedded AI on revenue protection in terms of reducing alert fatigue and incident costs has a positive impact by reducing downtime risks and operational costs per incident.

    I would rate this review nine out of ten.

    Which deployment model are you using for this solution?

    Public Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    reviewer2847372

    Automated incident paging has improved on-call response but reporting and pricing still need work

    Reviewed on May 30, 2026
    Review from a verified AWS customer

    What is our primary use case?

    The main use case for using PagerDuty Operations Cloud is that we get paged as and when required for all the issues and incidents which are happening, rather than requiring us to keep track of all of them. We are in the exploring phase for using AI around PagerDuty, but that is still in exploration and we haven't started with that.

    When there is an incident, we get paged for an alert. We have escalation policies set up that are being followed, and if someone is not acknowledging the page or if someone is not available, then accordingly it will go to the next level of escalation. This ensures that none of the alerts are missed.

    PagerDuty Operations Cloud has multiple integrations. In our case, we use the Slack integration the most. The alert triggers from our SignalFx stack, goes to PagerDuty, follows the escalation policy, and reaches the user. Along with that, it is also sent to the Slack channels so that whatever triaging happens for that alert or incident happens over Slack in that particular thread where the alert is triggered from PagerDuty.

    What is most valuable?

    PagerDuty Operations Cloud is very easy to use and user-friendly.

    Regarding the features that PagerDuty Operations Cloud offers, I have explored the automation area and it has a good amount of integrations. For example, the event intelligence and the noise reduction are areas where PagerDuty is really powerful. It reduces and cleans up alerts by doing alert de-duplication and alert grouping. It has also recently got machine learning capabilities, which would surely be helpful. We also have automations and runbooks in place which can help to do auto-remediation of issues or trigger scripts as per the runbooks. We haven't been using all of those things, but I know that these things are present. The incident response on-call management is very easy to use with PagerDuty. There are flexible on-call schedules, escalation policies, and the ability to set up overrides easily. There are multiple channels by which you can send alerts including SMS, calls, and notification pushes.

    PagerDuty Operations Cloud also has war room features. Many emerging tools provide this as well, but since PagerDuty is a pretty established company, it has a very mature model with all of these features. The analytics and reporting are also decent.

    PagerDuty Operations Cloud has improved our incident management process by ensuring that the right set of people are notified within time. The best part is that it has automated on-call schedules and escalation policies, so you don't have to set them again and again for every week or every month. Features including alert grouping, alert de-duplication, and good analytics and reporting are very helpful during incident management and also for post-incident activities.

    What needs improvement?

    The analytics and reporting have some scope for improvement. First, it should have more granular capabilities and we should be able to query it in a more granular way. There should also be more advanced trend analysis or cross-team operational insights available. That would be helpful. Licensing is also a bit expensive, so there should be some cost optimization for large deployments to take care of licensing cost optimization. Since we are in the AI era, I know PagerDuty has been investing in a lot of AI capabilities, but there should be good enhancements which we are looking forward to, such as automated root cause analysis or doing historical pattern matching. There could also be recommendations around runbook automation.

    For how long have I used the solution?

    I have been using PagerDuty since the last one and a half years at Splunk, and before that I was also an active user of PagerDuty in my last organization.

    What do I think about the stability of the solution?

    PagerDuty Operations Cloud is stable.

    What do I think about the scalability of the solution?

    PagerDuty Operations Cloud is pretty scalable. I never had any issue where a large number of alerts impacted PagerDuty.

    How are customer service and support?

    The support is decent.

    Which solution did I use previously and why did I switch?

    Previously, we were using OpsGenie, but that was quite a long time ago. PagerDuty Operations Cloud already has all the things which OpsGenie had, as per my knowledge.

    How was the initial setup?

    The costing took away two points from my overall rating. There are still some good amount of areas of improvement which took away the last one point, resulting in a rating of seven out of ten.

    What about the implementation team?

    I am not in the position to select any tool. I am not the one who selected or chose PagerDuty or evaluated any tools before that. We are just end users.

    What was our ROI?

    The return on investment is nine out of ten.

    Which other solutions did I evaluate?

    We are still in the phase of doing that evaluation and it is not yet completed. However, it is pretty helpful because PagerDuty itself has a good amount of data which can be used with AI to make the best use of it. I am still in the experimenting phase, but the AI functionality of PagerDuty would definitely be a good way to analyze the ongoing issues and how issues are handled right now, tracking the MTTR and MTTDI, and finding spots where there are a lot of areas of improvements which are needed.

    What other advice do I have?

    PagerDuty Operations Cloud already has a lot of integrations available, which is pretty good. The user experience is swift and smooth, which is a very good thing about PagerDuty Operations Cloud that I appreciate. I am not very aware of the governance and security aspects, but it has SSO as well, which is pretty good. Many organizations would be happy to adopt it, though I am not very aware of these features. The AI capabilities are not very reliable or accurate at the moment, but it is in the development phase and should improve over time.

    I don't have the exact metrics available, but there is a significant amount of improvement which we can see after onboarding to PagerDuty Operations Cloud. Normally, before PagerDuty Operations Cloud, I can compare with my previous to previous organization because in that company we didn't have PagerDuty Operations Cloud. There were quite a good amount of alerts which were getting missed. With PagerDuty Operations Cloud, there is a good layer of notifications and notification policies that you have. Even if you miss any page, you will get a push notification on your mobile. If you miss that, you will get a call on your mobile, which is pretty good.

    The overall pricing, setup cost, and licensing are pretty expensive. The PagerDuty Operations Cloud licensing is a bit confusing because it is primarily based on users, not on the number of alerts or incidents which are triggered. If it is a small organization, it is good, but if it is a large organization, it is difficult because many people would need to use PagerDuty Operations Cloud. At the same time, to make it more efficient or to get the best out of it, we need to have an end-to-end setup on PagerDuty Operations Cloud, which does take time. There should be some flexible licensing options.

    PagerDuty Operations Cloud is a pretty mature product. If you are a mid-scale organization who is trying to get the best out of PagerDuty Operations Cloud, I would recommend going for it. My overall rating for this product is seven out of ten.

    Shashank Venugopal

    Integrated incident workflows have improved on-call efficiency and automated critical alerts

    Reviewed on May 29, 2026
    Review from a verified AWS customer

    What is our primary use case?

    We are currently using PagerDuty Operations Cloud for incident management, escalations, on-call, and the status page, which represents our main product utilization.

    What I like the most about it is that it has so many integrations like Azure integrations, AWS integrations, and Prometheus and Grafana integration for the alerting system, which makes it more convenient for us. We are using all kinds of tools like Grafana and others, which are easy to access and integrate with PagerDuty Operations Cloud. Our infrastructure is going to be more secured whenever incidents get triggered, and with the help of PagerDuty Operations Cloud, we are able to get incidents triggered automatically after alerts are triggered.

    Currently, there is one tool called Rootly. I think they are new to the industry and we are also using that for one of our other clients. It's somewhat similar, but I think they have the potential to compete with PagerDuty Operations Cloud in the future as well.

    As of now, we are not using any generative AI features in PagerDuty Operations Cloud. We are currently using it for on-call and other things.

    What is most valuable?

    What I like the most about it is that it has so many integrations like Azure integrations, AWS integrations, and Prometheus and Grafana integration for the alerting system, which makes it more convenient for us. We are using all kinds of tools like Grafana and others, which are easy to access and integrate with PagerDuty Operations Cloud. Our infrastructure is going to be more secured whenever incidents get triggered, and with the help of PagerDuty Operations Cloud, we are able to get incidents triggered automatically after alerts are triggered.

    The benefits in terms of on-call are that we are getting maximum utilization of it. Previously, we were not having any alerting system for our client, and after implementing PagerDuty Operations Cloud, we started finding out the root cause and made other things easier compared to earlier. With the help of PagerDuty Operations Cloud, we are able to fix most of the issues and reduce repetitive issues in our infrastructure.

    What needs improvement?

    There is nothing I dislike about PagerDuty Operations Cloud, but perhaps it's due to the networks or the medium which it is taking. Usually, what happens is that if an incident gets triggered, suppose if it triggers in five to ten seconds, but sometimes, maybe due to latency or other factors, the call gets triggered after two or three minutes. That is quite understandable, but some kind of production issues need to be addressed at the earliest critical issues. So that latency needs to be reduced from PagerDuty Operations Cloud. I think they need to work on that. Apart from that, most of the things they are doing well, and we are not facing any such kind of issues. Everything is good.

    Except for the frequency of the call, we don't see any lagging, crashing, or downtime. In rare cases, we hear some noises in the call, which is rare but not frequent. Apart from that, the triggering latency is a bit slow, but not every time.

    For how long have I used the solution?

    We have been currently using PagerDuty Operations Cloud for more than two years.

    What do I think about the stability of the solution?

    Except for the frequency of the call, we don't see any lagging, crashing, or downtime. In rare cases, we hear some noises in the call, which is rare but not frequent. Apart from that, the triggering latency is a bit slow, but not every time.

    What do I think about the scalability of the solution?

    Regarding scalability, I don't think there are any issues; it is going well.

    How are customer service and support?

    We have very good support with PagerDuty Operations Cloud.

    In few cases, not frequently, we have had to contact the technical support for clarification regarding the integration or for creating escalation things. Initially, we reached out to the technical support, but now we are well-versed with the tool. The community is good, and I think we are able to get solutions within the community itself.

    For the support of PagerDuty Operations Cloud, I would give them a score of nine to ten.

    Which solution did I use previously and why did I switch?

    Currently, there is one tool called Rootly. I think they are new to the industry and we are also using that for one of our other clients. It is somewhat similar, but I think they have the potential to compete with PagerDuty Operations Cloud in the future as well.

    How was the initial setup?

    I don't think the deployment for PagerDuty Operations Cloud is difficult to handle. It is easy to handle, and the best thing is they have a very good support team that we can reach out to at any time.

    What's my experience with pricing, setup cost, and licensing?

    The pricing for PagerDuty Operations Cloud is a bit expensive, especially for startups like us, compared to the other platform which I mentioned, which is Rootly. Rootly is not based on a per-user model. In PagerDuty Operations Cloud, it is going to cost fifty dollars per user for admins or other roles, whereas in the other platform there is no such kind of thing; it is based on a pay-as-you-go model. I think that is one of the drawbacks for PagerDuty Operations Cloud regarding billing and other aspects. Apart from that, the plans and other things for incident creations and the triggering of calls are quite good.

    Which other solutions did I evaluate?

    Currently, there is one tool called Rootly. I think they are new to the industry and we are also using that for one of our other clients. It is somewhat similar, but I think they have the potential to compete with PagerDuty Operations Cloud in the future as well.