I use PagerDuty Operations Cloud for core tasks only because my client completely belongs to healthcare. Healthcare is a high priority where we need to help people. I primarily use it for core tasks and sometimes for primary tasks as well.
PagerDuty Operations Cloud
PagerDutyExternal reviews
External reviews are not included in the AWS star rating for the product.
Incident response has become faster and coordination is now streamlined across critical teams
What is our primary use case?
How has it helped my organization?
The main advantage of using PagerDuty Operations Cloud is that it's user-friendly. Even a fresher can easily understand the user interface and learn it without much technical knowledge. Everything is already provided in PagerDuty Operations Cloud, such as which team, who is available on call currently, and which person belongs to which team. Sometimes I don't know which team a person belongs to, so I can quickly go into PagerDuty Operations Cloud, type his account, and find which team this person belongs to. Considering that team, I can use that team name and call, create a new PagerDuty Operations Cloud incident, and call that team to join my call to discuss it.
My designation is Major Incident Management, so I only handle P1 and P2 incidents. Whenever I feel there is a P1 incident going on, I require multiple teams. The only thing I use is PagerDuty Operations Cloud to call all the relevant technical teams that I need to resolve this P1 or P2 incident. For my role, PagerDuty Operations Cloud is the biggest advantage because there are three levels of escalations. If the first person is unable to respond to my PagerDuty Operations Cloud alert, it automatically triggers to the next person within the next 15 minutes. If the second person is also unable to come to my call and ignores the alert, it automatically escalates to a level three person. Within 15, 5, and 5 minutes, it will trigger to three different persons in the same team where they can quickly jump to the call to help me in resolving my P1 or P2 incidents.
I also have multiple bundles available. If I have one following issue, I can customize PagerDuty Operations Cloud things. It's a group format. If I have an issue with my login, the bundle name is P1 login. One click of the bundle automatically triggers almost 50 to 60 people who are included in that bundle. It also gives me the option to customize who can be on this team and who I do not want in this team. Many customization, AI features, and automatic features are really making my work easy. Being a Major Incident Manager of my team, resolving P1 and P2 incidents quickly and on time, PagerDuty Operations Cloud really helps me a lot to deal with it.
What is most valuable?
One of the greatest features of PagerDuty Operations Cloud is that there is no need to disturb people by directly calling from personal numbers or Teams or anything else. I can just click on PagerDuty Operations Cloud. It will directly ring the person's phone, message, and Outlook, and the app directly, so that they know someone is calling and they have to attend that call in an emergency case.
What needs improvement?
The features PagerDuty Operations Cloud has are completely fine. I don't think any new enhancement is required to PagerDuty Operations Cloud as of now.
I have never faced any issue with PagerDuty Operations Cloud tool itself, so I never had an opportunity to get in touch with them. If I really wanted to have contact with them, I have an option directly from the website to raise a ticket if I'm facing anything. The option is clearly visible over the user interface.
I don't think much that I can improve with PagerDuty Operations Cloud as of now because they are completely updating all the features. Artificial intelligence is also implemented into PagerDuty Operations Cloud which is making my work very smooth. As per my needs, it is absolutely fine as of now.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud for about three years.
What do I think about the stability of the solution?
Since three years I started with PagerDuty Operations Cloud, I have never faced any issue with the performance or downtime. If there is any downtime with PagerDuty Operations Cloud, they are going to inform me prior, at least two days before, by sending an email. I don't think I have ever faced anything with PagerDuty Operations Cloud.
What do I think about the scalability of the solution?
I don't think much that I can improve with PagerDuty Operations Cloud as of now because they are completely updating all the features. Artificial intelligence is also implemented into PagerDuty Operations Cloud which is making my work very smooth. As per my needs, it is absolutely fine as of now.
How are customer service and support?
I have no issues at all.
I have never faced any issue with PagerDuty Operations Cloud tool itself, so I never had an opportunity to get in touch with them. If I really wanted to have contact with them, I have an option directly from the website to raise a ticket if I'm facing anything. The option is clearly visible over the user interface.
Which solution did I use previously and why did I switch?
In the current field, I have been working for around three years. The day I joined this company, I got the access for PagerDuty Operations Cloud. Previously, I held two years of experience working with Amazon. My old company did not have PagerDuty Operations Cloud.
How was the initial setup?
The first thing is that I had to buy the license for PagerDuty Operations Cloud on a yearly basis. Once the license was obtained, an administrative account was given to one person, and whoever required a PagerDuty Operations Cloud account in the company, the administrative account holder would create credentials and share them. Once I had the credentials, I registered with a one-time password with PagerDuty Operations Cloud. Once I was all set, I was assigned a team. Everything was set up by my managers. It was a very easy thing to register because everything was handled by the administrative account, which was in the United States.
What was our ROI?
Resolving the issue within time saves a lot of revenue and there is no work stoppage.
What's my experience with pricing, setup cost, and licensing?
I am completely unaware of the costing part because it's a client atmosphere. We are completely unaware of it.
What other advice do I have?
I utilize artificial intelligence functionality in PagerDuty Operations Cloud. There was a limit that has been set in PagerDuty Operations Cloud. Whenever some threshold is crossed, it automatically triggers the particular group. The group will be joining the call and will be resolving the issue that is going on. It is interlinked with other applications that I use, such as DataDog.
The GenAI capabilities for providing insights for decision-making processes were set manually, and the feature itself is a very useful feature. If a person is missing something, the artificial intelligence doesn't miss it. Whenever it crosses the threshold, it automatically triggers the team that they have to look into this issue immediately. It also informs the team that this is the issue that is going on with their team, and they need to resolve it as soon as possible. It also has an option to set the priority, from P1 to P5.
I have already implemented artificial intelligence and automation through PagerDuty Operations Cloud for incident response. It is interlinked with the remaining tools which I have with the help of artificial intelligence. There was a limit, a threshold, which has been adjusted manually by the technical team. If the threshold is increasing above the set limit, PagerDuty Operations Cloud automatically takes the alert and triggers to the technical team who can solve that issue. PagerDuty Operations Cloud automatically triggers the alert to them. They will get to know that this issue is happening within the tool, and they have to quickly jump into it and resolve it.
I am completely unaware of the influence of PagerDuty Operations Cloud's embedded artificial intelligence on revenue protection in terms of reducing alert fatigue and incident costs because I use a client's atmosphere and it's not owned by me.
I don't see this as a negativity, but one feature that could be implemented in PagerDuty Operations Cloud is the following. Whenever I am triggering an alert through PagerDuty Operations Cloud, it reaches out to the person. Sometimes the person only adds their email ID to receive the alert, and sometimes the person only adds their phone number to check the incident notification over message. Some people only select app alert. If they have the application installed on their mobile, the application itself rings or vibrates so that the person is able to find it. I strongly believe that for the person who only has an Outlook email registered for themselves, the person who is currently out of their office may not have access to their email every day. There should be a particular rule to PagerDuty Operations Cloud that everything should be registered. The person should receive an alert over phone, email, message, and also WhatsApp. Out of office, they can get the alert over WhatsApp so that they can join the call quickly to help me. This might be one thing that can be implemented. As of now, there is no particular rule that all three should be registered. Any one is fine. I feel all three should be registered mandatorily.
My review rating for this product is 9 out of 10.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Well-Organized Ticket Logging That Keeps Responses on Track
Automated incident alerts have boosted checkout reliability and increased online sales
What is our primary use case?
We have been using PagerDuty Operations Cloud for about 1.5 to two years now.
Our main use case with PagerDuty Operations Cloud is to create automation workflows with the Gen AI capabilities present in PagerDuty Operations Cloud. We run it on our e-commerce platform to detect anomalies whenever they occur.
We want to firefight different situations, so with the Gen AI capabilities of PagerDuty Operations Cloud, whenever an issue arrives during the checkout of a customer, it immediately informs us and we create a Teams channel or a Slack channel to address the issue, determine what the issue is, and fix it immediately. It throws us alerts whenever it is necessary to keep the customer journey much more smooth and much more comfortable.
We particularly use this to avoid different incidents that the customers might face, such as payment database issues, checkout issues, or product not going into cart issues. With the Gen AI provided by PagerDuty Operations Cloud, we are able to sort out everything, get timely notifications, and make the customer's journey much more smooth.
What is most valuable?
One of the best features is the notification categorization that we can do with PagerDuty Operations Cloud, which is the incident type categorization. We can select whether it is a major incident or a minor incident and based on what we select, a dedicated Slack channel or a dedicated Teams channel is created, which is much more helpful for us to diagnose the issue.
With the incident type categorization, we are able to prioritize which issues to sort out first and which issues to sort out later. This has helped us firefight the major issues on a first come first serve basis, so categorization helps us work more efficiently.
With the help of PagerDuty Operations Cloud, we have been able to resolve a lot of issues in a much quicker fashion. Our overall sales has gone up 30% after the introduction of PagerDuty Operations Cloud, which is a major advantageous situation for us.
This was majorly due to reduced downtime and faster response, which made the customer believe us more and made the customer's entire user journey much more smoother. This has directly impacted our sales, and the customer's journey within our e-commerce platform has been very quick with the reduced downtime, so this has helped us gain more sales.
What needs improvement?
Everything is fine now with PagerDuty Operations Cloud, but one thing that they can do to improve is bring more integrations. As of now, only Slack and Teams integration is there for firefighting, and whenever an issue arrives, a notification is provided only on these platforms. A lot of different channels can also be looked into for integration to make the work much more smoother.
For how long have I used the solution?
We have been using PagerDuty Operations Cloud for about 1.5 to two years now.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is very much stable. Being a cloud-front platform, there is no downtime, and with Amazon AWS hosting it, we find it very stable and the updates are also quite regular, which is something that we appreciate very much.
What do I think about the scalability of the solution?
PagerDuty Operations Cloud is very easy to scale, given the Gen AI diagnostics and Gen AI metrics present, and automation is also there, which is quite helpful in scaling up the entire platform and the entire business journey for us as well.
How are customer service and support?
Customer support is very great. We would rate it 10 on 10 because they are very knowledgeable people and we enjoy engaging with them a lot.
Which solution did I use previously and why did I switch?
We did not use any prior solutions to this; we were doing it manually only, and PagerDuty Operations Cloud is one of the first softwares that we have used.
What was our ROI?
We have seen a positive return on investment with the help of PagerDuty Operations Cloud. We have seen an increase in our sales, a 30% increase in our overall sales with the help of PagerDuty Operations Cloud, and also our sales cycle time has reduced a lot. We have seen a 50% improvement in our sales cycle as well with the help of PagerDuty Operations Cloud.
Which other solutions did I evaluate?
We did not evaluate any other options. We just saw the PagerDuty Operations Cloud demo and we were impressed with it, and we went ahead with it as it was much more affordable and it solved our issues.
What other advice do I have?
The incident alert feature that PagerDuty Operations Cloud gives has helped us prevent a lot of issues which are about to come, and a lot of same mistakes have been stopped. The repeating of the same mistakes has been stopped, so this has helped us make new mistakes instead, which is much more better than making the same mistakes again and again. This has helped us grow our business in a much more efficient and much more quicker manner than we expected.
We would definitely recommend that others at least take the trial version of PagerDuty Operations Cloud because for every e-commerce business, or any business, an AI which is as powerful as PagerDuty Operations Cloud must be deployed so as to reduce the number of errors and to improve the overall business efficiency. We urge others to at least try using the trial version. We give this review an overall rating of 10.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Reliable alerts have reduced incident response time and save hours each week
What is our primary use case?
PagerDuty Operations Cloud serves as an alerting tool for our organization. Whenever applications or any protection systems are down, we get notified via email and mobile. My current role in PagerDuty Operations Cloud is an admin role. Whenever there is a new user or any higher management requirement to grant email access to a particular server or program, we receive the request via the team, and they mention which team they are from and their role. Based on that information, if I get one email ID with a name, I have to enable the ID, then I have to add the name to PagerDuty Operations Cloud for the particular program. When the application is down or any issue is triggered, the user will be notified via mobile and email.
What is most valuable?
PagerDuty Operations Cloud has several valuable features. We have many monitoring tools, but the major use of PagerDuty Operations Cloud is that when a production alert comes in, it previously went mostly to Outlook and Slack. Once PagerDuty Operations Cloud came into the picture, sometimes we are not in shift or on weekends, or we might be outside. During that time, we would miss most of the issues. However, if PagerDuty Operations Cloud is enabled, it will notify us on mobile. We will get SMS as well via PagerDuty Operations Cloud. We have to know the criticality of the request, and based on that, if critical servers or issues arise, we have to work and fix them immediately. PagerDuty Operations Cloud mostly notifies us for issues based on priority.
The benefits I have seen so far from using PagerDuty Operations Cloud solution include many solutions. The major use is monitoring alerts. Another benefit is on-call scheduling, which goes directly to the right person. We also have an option called the escalation policy. If I am the primary one and I don't acknowledge the call, it redirects to my secondary person. If the secondary is unresponsive, it moves to the tertiary person. This is a major feature. Once we acknowledge the alert, it sends to the next level. It helps in easily coordinating incidents, such as identifying whether it is an issue or an outage. The notification aspect is a major use I see in PagerDuty Operations Cloud.
What needs improvement?
Currently, improvements for PagerDuty Operations Cloud could be made. It mainly acts as a monitoring alert tool. We need to prioritize tickets according to P1, P2, P3, and perhaps create a dashboard for program-based insights. Additionally, features related to hardware and software agents should be included. Currently, it only notifies us about incident management. If we install an agent or similar tool to collect data from a server at the hardware or application level, we could present multiple metrics in the same tool, which would be more beneficial.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud solution for the last couple of years.
What do I think about the scalability of the solution?
AI features within PagerDuty Operations Cloud will definitely benefit operations. However, because our project involves huge tools in a large environment, they may have planned to implement AI later. This feature would certainly reduce work, duplicates, and incidents. AI validates first-level issues, and once an issue is resolved, it won't return to us. If the issue persists and the user cannot fix it, it will come back to us. This is a good option, but our organization needs a process and approval to implement it. We may do that in the future.
How are customer service and support?
We have not directly used PagerDuty Operations Cloud support, but we are satisfied with the support we receive. I always get answers to the questions I post. I would rate them an eight out of ten because I'm not entirely sure if I reached PagerDuty Operations Cloud support or third-party support through Slack.
Which solution did I use previously and why did I switch?
If I remember correctly, our organization used Dynatrace before PagerDuty Operations Cloud. We have multiple tools for monitoring in our client's project. PagerDuty Operations Cloud is not a replacement for any tool. We also use Grafana dashboards and CloudWatch. We have not transitioned from any tools as far as I know.
How was the initial setup?
Implementation of PagerDuty Operations Cloud in our organization occurred previously. We recently took over the team, so we did not have any opportunity for implementation. Whatever features are available, we use them, add users, and alter settings. In the future, if I get the opportunity, I would definitely implement how we can deploy the application in the cloud and manage those processes.
What was our ROI?
PagerDuty Operations Cloud is definitely an important tool. With our investment, we can save time and resolve issues quickly. It is critical from my viewpoint.
We definitely save time with PagerDuty Operations Cloud. It saves more than half an hour—30 minutes—for each incident. For example, if we receive 10 incidents today, with PagerDuty Operations Cloud it will notify us immediately. This allows us to troubleshoot and fix the issue efficiently. For each ticket, we save around 30 minutes. That is almost 40 to 50 hours saved in a week.
Which other solutions did I evaluate?
The main differences between Dynatrace, Grafana, and PagerDuty Operations Cloud for monitoring are that both PagerDuty Operations Cloud and Dynatrace are monitoring tools. However, Dynatrace is a full-stack monitoring and observability tool, while PagerDuty Operations Cloud is mainly an incident alerting tool. Dynatrace is known for good detecting capabilities and covers multiple technologies, including cloud, on-premises, and databases. PagerDuty Operations Cloud focuses on incident alerting and routing. Both have AI capabilities and can help with root cause analysis, yet Dynatrace is typically used by SREs and platform teams, while PagerDuty Operations Cloud is used by on-call engineers and operation teams.
What other advice do I have?
We have not used any AI agent. PagerDuty Operations Cloud is the only tool we are using.
There is an option to see insights from AI, but our role is very limited. We only cover particular scenarios, so we don't use that option.
I know how we can use AI for incident response in PagerDuty Operations Cloud, but in my current organization, I don't have the opportunity to work with that. However, I am updating myself and going through information on the internet about AI in PagerDuty Operations Cloud. We can enable it for validating duplicate issues and when we get defined issues. If we know the troubleshooting process for an issue, we can use AI to do the first level of implementation. If the issue persists, we can address it. We can also use AI to suggest responses, and if we get an alert, AI can provide suggestions or references to resolve it, as well as post root cause content.
PagerDuty Operations Cloud is definitely a useful tool because I have heard about it recently. It helps us resolve critical alerts efficiently, benefiting my team and my managers in addressing issues. I would rate its performance well, and if it scales up with more technologies and features, it would encourage wider adoption among teams. In my previous company, it was not well-known. Promoting it could increase its usage. I would rate PagerDuty Operations Cloud an eight out of ten for overall performance.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Easy to Use with Strong ServiceNow Integrations and API
Scalable Event Orchestration with AIOps and Easy Integrations
Dependable Alerts, Smart Incident Grouping, and Deep Integrations
Automated incident response has reduced manual effort and protects revenue with faster recovery
What is our primary use case?
The main use cases for PagerDuty Operations Cloud are incident management and handling high severity incidents in an automated way. For example, at midnight, if we get a high severity incident, PagerDuty throws alarms or notifications to our Slack or team channel so other people get notified and start working on it. Another use case is that we are also monitoring our cloud VMs using PagerDuty automation, which we have done using service integration with our cloud provider. It identifies situations such as our cloud VM going to high CPU usage or requests getting dropped, which can significantly damage the application. At that time, PagerDuty sends alerts to the respective persons, the person who is on-call.
Another use case is that we also use PagerDuty to execute some of the installation scripts using PagerDuty webhooks integration in a totally automated way. Suppose any cloud resource goes faulty and we need to ensure that the respective installation or script gets executed automatically by the use of PagerDuty webhook, so that the issue gets fixed automatically. Nobody is going to be informed or bothered, as it is a self-healing process. We also use another notification to our mobile phone for any P1, which is a high severity incident that will definitely have a business impact. At that moment, we have also leveraged this tool to get a phone call.
What is most valuable?
The best features of PagerDuty Operations Cloud that I find most valuable are incident management and webhooks. The reason is that if you go for any P1 incident, somebody would usually need to do all the manual efforts, such as creating the Slack channel, informing the team leads and other team members, gathering information related to any service outage or component outage, and providing it in a text format to the Slack channel. All these manual efforts take much time, but PagerDuty does that for you. It gathers the related services and people who are already in that Slack channel and sends the alerts. It automates the whole process, so no human intervention is required.
Another very good functionality is webhooks. They work exceptionally well. If my cloud infrastructure has an issue, a person needs to investigate, and if it requires a server reboot or installation of a script, that person needs to intervene manually. However, using this webhook, we can send one alert or notification to our automated job indicating that this issue requires execution of a particular script or file on that server, which fixes the ongoing issue without requiring human intervention. If this happens at midnight, people can sleep easily without any headache, and PagerDuty will take care of the task.
PagerDuty Operations Cloud has a very good impact on our company, especially regarding business and service level agreements, as well as reducing incident resolution time. It also helps automate manual work.
What needs improvement?
An additional feature I think would improve PagerDuty Operations Cloud is integration with development scenarios or the code building side. For example, if there is a service outage, I believe PagerDuty could have functionality to check that out and verify if there was any faulty code pushed to our GitHub repository. If there was an error in our repository structure, then having a functionality to find these errors would provide much clarity. This would help developers quickly identify and fix issues, making sure they can change code wherever necessary. That would work exceptionally well.
For how long have I used the solution?
I have been using PagerDuty Operations Cloud for more than three years.
What do I think about the stability of the solution?
PagerDuty Operations Cloud is stable, with very little chance of missed alerts. It catches almost all alerts, sending notifications in a timely manner. We have not seen PagerDuty Operations Cloud instance go down or fail; it works smoothly, so I would say PagerDuty Operations Cloud operates perfectly fine.
What do I think about the scalability of the solution?
The scalability of PagerDuty Operations Cloud is totally doable. For example, I have created a group where I defined incident priorities, and if my team grows from five members to nine, I can easily onboard these new users to PagerDuty Operations Cloud and configure the alerting system to scale up or down based on requirements. The scalability is quite easy. PagerDuty Operations Cloud also offers many integrations, allowing us to scale to other cloud providers if necessary.
How are customer service and support?
Customer service from PagerDuty is great. Sometimes we need help with setups, which are usually minor, but whenever we need support, we receive it promptly. I would say their support is quite good.
Which solution did I use previously and why did I switch?
Before choosing PagerDuty Operations Cloud, we evaluated other solutions including Grafana, which is for monitoring cloud infrastructure. While we used Grafana for automated monitoring, it lacks the incident management and webhook features that PagerDuty Operations Cloud provides. Therefore, we can manage tasks effectively with PagerDuty Operations Cloud. We also used Nagios, which is another application monitoring tool that only allows monitoring application logs and lacks a solid alerting system, unlike PagerDuty Operations Cloud.
How was the initial setup?
The initial setup of PagerDuty Operations Cloud was very straightforward. It was not complex at all. The documentation provided by PagerDuty is very handy and easy to understand, making the process comfortable and non-confusing.
What about the implementation team?
We did not use an integrator, reseller, or consultant for the deployment. Since we were able to set it up ourselves, we never felt the need for any consultants or assistance from PagerDuty while setting up our environment or workspace.
What was our ROI?
I have seen a return on investment with PagerDuty Operations Cloud, particularly in incident management. Using this mechanism reduces the number of incidents I need to raise manually, decreasing the effort required. For example, I could say that in a month, we need to raise 150 incidents manually, but with PagerDuty Operations Cloud, I can cut that down to 100 incidents. This results in a reduction of about 40 or 50 incidents.
What's my experience with pricing, setup cost, and licensing?
The setup cost and pricing for PagerDuty Operations Cloud are very cost-effective, but I have a suggestion: if PagerDuty could provide a trial version for individual users, rather than just corporate users, it would help others learn more about PagerDuty Operations Cloud. They could conduct proofs of concept, do hands-on testing, and recommend PagerDuty Operations Cloud in their respective companies or projects.
Which other solutions did I evaluate?
We did evaluate other tools and even assessed the trial version of those tools by checking all functionalities and use cases. Given that our use cases were larger, we ultimately chose to proceed with PagerDuty Operations Cloud.
What other advice do I have?
We have started using PagerDuty's autonomous AI agents, but we have not leveraged it in detail yet. We have started using it, and hopefully very soon we will implement the AI agent to create solutions in our architectures, which will definitely help us a lot.
For decision-making, when we leverage the AI agent, we need to deploy it in our workspace and feed data to the agent in more use cases, including data in a spreadsheet or JSON or XML format. Cases and scenarios will be added so that the agent will analyze all the data we feed. In the future, if any such scenarios occur in our workspace or infrastructure, PagerDuty Operations Cloud agent will react and make decisions based on the previous trained data.
The influence of PagerDuty's embedded AI on revenue protection is significant. PagerDuty Operations Cloud helps us resolve incidents much faster, which means we consistently meet our SLAs, thereby avoiding penalties for service outages. This helps us save revenue by ensuring that clients or stakeholders do not impose penalties due to outages.
PagerDuty Operations Cloud's AI functionality helps improve my team's ability to focus on core tasks rather than routine issues, particularly with cloud service monitoring and incident response time. These aspects significantly increase our productivity. The solution's alert reduction feature has a significant impact on preventing costly incidents in our company. Creating multiple incidents manually takes a lot of time, but PagerDuty Operations Cloud does that in seconds. This reduction in incident creation time greatly influences our project delivery. Overall, I would rate this product a perfect ten.
Centralized alerts have reduced incident response time and now streamline SME on-call collaboration
What is our primary use case?
In my organization, we use PagerDuty Operations Cloud to acknowledge alerts. PagerDuty Operations Cloud is organized so that it is often used to page the SMEs. Whenever we work on any tasks and face critical situations where we are unable to troubleshoot from our end, we page for the SMEs. Irrespective of the team, if it is infra-related issues, we page to infra. If it is related to some other product, we page to that product's SMEs and involve them into a PagerDuty Operations Cloud call. We inform them regarding the issues, and then they acknowledge the alerts. After acknowledging the alerts, they start working on that particular error.
PagerDuty Operations Cloud is also organized so that if there is any critical issue, it will create an alert that will go in a particular notification form to the SME with a phone call, stating that there is a critical issue which is in progress. The particular SME will acknowledge the alert and come and join the call, mentioning that they have been paged for this issue. Then we will start working with that particular team to resolve the issue from our end.
Regarding the incident command system, we use the Freshservice tool. Freshservice and PagerDuty Operations Cloud have been synced in my organization. The incident command system is a structured way for major incidents. Whenever there would be any outage, in order to proceed with the communication flow, we use the incident command system in PagerDuty Operations Cloud. Everyone will jump into a call, and then multiple people will start fixing the issues. Everyone will be working hard to bring that instance back online or to restore that particular environment.
What is most valuable?
The best feature that I like about PagerDuty Operations Cloud is whenever we page a particular team. There is a specific feature where we can directly page a person. Usually, once we trigger the alert, it goes to a particular person, and if that person does not acknowledge it, then it will go to their reporting person. Even if they also do not acknowledge it, then it will go to some other person. In that case, it tends to take a bit of time because whenever we see the alerts, the alerts will be shifting to other people. Some people might not acknowledge the alerts due to various reasons, and it may get missed. In PagerDuty Operations Cloud, there is a specific feature where we can page a specific person or a specific user. If we give the particular team name, then in the subfield, we can specifically page a person. This feature attracts me a lot.
Additionally, there is another feature where we can check the SME calendar. In my organization, for a particular week, one person will be allotted as an SME. That calendar shows which person is the SME for the particular week regarding the particular product. These are the features I enjoy the most in PagerDuty Operations Cloud.
The main benefits I can say from using PagerDuty Operations Cloud are that we can easily page them. It is also widely used in our operations team for faster incident response, leading to a reduction of the MTTR, mean time to resolution. The smart on-call management allows us to create a call for the on-call people and to involve the backup engineers as well. One special thing in PagerDuty Operations Cloud is it has time zone-based scheduling. As per that particular time zone, we can schedule them. I witnessed automated escalation, where the particular person missed acknowledging the alert in PagerDuty Operations Cloud, leading to an automated escalation to their associate director or VP. This escalation policy is also very good in PagerDuty Operations Cloud.
The impact of integrating PagerDuty Operations Cloud with Freshservice is very good because earlier, when it was not integrated, there were many problems while paging the alerts. Now, when we have integrated it to Freshservice, once the alert comes into the queue of Freshservice, automatically a PagerDuty Operations Cloud alert will be created. So automatically, it syncs. Once it gets synced, the alert will be automatically created in PagerDuty Operations Cloud and will go to that particular person who is allotted as an SME for that particular product.
The measurable benefits from PagerDuty Operations Cloud are that it has made our work easier, where the alerts will be synced and then directly create an alert to the SMEs. Instead of doing it manually, if it is automated in such a way that an alert gets triggered and routed directly to the SME, then that is a great benefit.
What needs improvement?
To improve PagerDuty Operations Cloud, I can mention that we can improve the escalation policies. Nowadays, many people miss the alerts. There was an issue in a particular product, and when we paged it, that particular paged alert went to other product people. I do not know how that happened in PagerDuty Operations Cloud; it might be some configuration changes or anything in the backend. The point is we can improve on this setting, where the actual PagerDuty Operations Cloud alert should be routed and assigned to the correct person of that particular product. If it gets triggered to some other person unnecessarily, even that day, the particular person came into the Slack channel asking why they got paged for a product they were not part of. This is something we can improve on.
One feature I would like to see included in PagerDuty Operations Cloud is for a particular week, each person is assigned as an SME. It would be beneficial to add a note in the particular calendar where if this person is not available, then the backup engineer's name can be included.
For how long have I used the solution?
What do I think about the stability of the solution?
We have not used the real-time digital operations management feature. The advanced analytics feature is being used by another product in my cloud operations team. In my team, we have not used it.
How are customer service and support?
Regarding customer service and technical support teams of PagerDuty Operations Cloud, we never reached out to the technical support team. In my team, the technical support will handle the cloud-based platforms and everything. However, regarding PagerDuty Operations Cloud, in my organization, we do not have any technical team related to it.
Which solution did I use previously and why did I switch?
Prior to PagerDuty Operations Cloud, I have not seen any product of the same kind in my company. We do use PagerDuty Operations Cloud and also New Relic. A similar application, I have not seen before.
How was the initial setup?
I have not found any complexity in the initial setup process of PagerDuty Operations Cloud. The deployment was already pre-deployed.
Which other solutions did I evaluate?
I have not come across any other options or solutions available in the market. I am not sure if the on-call policy in Splunk is similar to PagerDuty Operations Cloud.
What other advice do I have?
We have integrated PagerDuty Operations Cloud with the Freshservice tool. Regarding automation in PagerDuty Operations Cloud, in my team, the admin access has been given to the onshore employees, not to Indian employees. I am not sure about that because I have been requesting admin access for a long time, but I have not been granted it yet. Given my experience with PagerDuty Operations Cloud, I recommend increasing the on-call primary escalation time to ten minutes. Additionally, for one hundred alerts, if we can manage that to one particular incident for one hundred alerts, that would also be beneficial. This adjustment will help with the mean time to resolution in all organizations. My overall rating for this product is ten out of ten.
On-call automation has transformed alert handling and now creates a faster, competitive workflow
What is our primary use case?
My use case for PagerDuty Operations Cloud is from the SRE and DevOps team. We use PagerDuty Operations Cloud for specific alerting purposes and for the pipeline process. When we build a pipeline and it suddenly fails due to some job and issues, we receive an error. We set up PagerDuty Operations Cloud with our monitoring services, which we are currently using, Datadog. Datadog is connected with PagerDuty Operations Cloud, and whenever Datadog receives an alert or a spike or anything critical, it will trigger an alert to PagerDuty Operations Cloud, and we quickly get a notification. We are currently using this process, and we are also maintaining our on-shift call rotation. For example, on Monday, Wednesday, and Friday, I am working as a shift lead, and then on Tuesday, Saturday, and Sunday, someone else is the shift lead. Regarding MTTR and all those statistics, we can see how many alerts we received, how many alerts we acknowledged this month, and we have a timeline as well. One of the valuable parts of PagerDuty Operations Cloud is that in our team, we can have a competitive environment. For example, if I resolved the most alerts triggered and resolved this month, then someone else can do it next month, and whoever resolves the most critical alerts on time receives appreciation every month.
What is most valuable?
One feature of PagerDuty Operations Cloud that I find valuable is the on-call schedule. We can manage our on-call scheduling, and we have various alert and notification delivery methods available, including mobile. We can receive phone calls, emails, SMS, and push notifications. For example, if someone missed the notification, they will get a phone call, which is very straightforward. We also have incident automation, making collaboration with any third-party monitoring services we use very straightforward, such as Datadog. We can seamlessly automate things with PagerDuty Operations Cloud. The AI features are also beneficial; for example, noisy alerts that trigger regularly and false positive alerts get suppressed. It checks the past month's alerts, showing us that this alert triggered 60 percent, this alert triggered 20 percent, this alert is rare, and this alert is not rare. The escalation policy is excellent as well, as if I did not pick up the call, my manager will get the call; if my manager did not pick up, then his manager gets the call. These are some of the most valuable parts we use in PagerDuty Operations Cloud.
In Datadog, we have multiple dashboards and monitoring systems where we see our spikes and alerts. When we integrated with PagerDuty Operations Cloud, we got better signal and less noise. When we are seeing a spike that is concurrent, in PagerDuty Operations Cloud, the AI feature already signifies that alert as a noisy alert, and it suppresses that alert. This significantly improves our workflow with both Datadog and PagerDuty Operations Cloud. We have faster response and faster escalation. Previously, in Datadog, we did not get notifications, and people would refresh it and check the spike every hour. Now that we integrated PagerDuty Operations Cloud, any alert triggers, and we quickly get a notification or a phone call. Therefore, we do not sit in front of a computer and refresh repeatedly. Additionally, we have a centralized incident workflow; PagerDuty Operations Cloud and Datadog feed into PagerDuty Operations Cloud incident timeline, so we see everything there. We do not need to open Datadog again and again, and if we need to deep dive into an alert from Datadog, we can click the link inside PagerDuty Operations Cloud, redirecting us to the Datadog dashboard where everything is noted down and visible.
In PagerDuty Operations Cloud, AI suppressing our alerts has helped streamline repetitive tasks. For example, very noisy alerts get suppressed automatically, aiding smarter routing. When we have new joiners in our team, they see alerts already suppressed, allowing them to focus on the critical ones instead of the lower ones. Additionally, alert prioritization is present; we receive critical alerts, high alerts, and then low alerts. The faster prioritization facilitated by AI enhances our alert management processes. Also, the root cause historical pattern assists us; if we get an alert similar to one from last month, it tells us how we resolved that alert previously. Historical patterns using AI greatly aid us in alert management.
What needs improvement?
I have already used PagerDuty Operations Cloud, and my previous monitoring tools were very poor for alerting. I had a good impression of PagerDuty Operations Cloud, but I believe it can improve with deeper root cause insights. I know there is automation to detect recent deployments causing incidents, but a deeper root cause analysis could provide more details. If PagerDuty Operations Cloud offers more information, we will not need to jump into the main dashboards where the alert triggered. For instance, if we get more insights directly in PagerDuty Operations Cloud, we would not need to check the Datadog dashboard. Additionally, I think a sandbox mode would be helpful for new team members, allowing us to guide them in simulating alerts, performing escalation policies, and creating PagerDuty Operations Cloud channels.
For how long have I used the solution?
I have been working with PagerDuty Operations Cloud for five years. I worked on two different projects, and in both projects, we use PagerDuty Operations Cloud.
What do I think about the stability of the solution?
In my previous project, we utilized the flexible incident command system to coordinate large-scale incidents, but in my current project with only Datadog, we have not received many alerts or incidents in the last couple of days.
How are customer service and support?
I do not have direct contact with PagerDuty Operations Cloud tech support or customer service teams, but my senior team members have connected with them when we received an alert related to our team failing to set it up properly. The customer support team promptly gave us insight and helped us within 24 hours.
Which solution did I use previously and why did I switch?
I am currently working with PagerDuty Operations Cloud. Previously, on my previous project, we were on BigPanda, but we faced multiple issues during BigPanda. At that time, there was no call schedule feature, and there was no alert triggered feature for BigPanda. We then moved it to PagerDuty Operations Cloud, and suddenly everything was smooth. We got a phone app as well; we set up PagerDuty Operations Cloud on the phone as well. Whenever any alert triggered for us, we used to quickly check from our phone to see if it was a false positive, a true P1, P2 alert, a major alert, or a critical alert. We then quickly jump into the alert and work on it. PagerDuty Operations Cloud changed the process and the flow in our team very smoothly.
How was the initial setup?
I found the initial setup of PagerDuty Operations Cloud straightforward; I did not face any complexities during the setup for alerts or during the initial configuration.
What's my experience with pricing, setup cost, and licensing?
Regarding pricing for PagerDuty Operations Cloud, I am currently a software engineer and a senior software engineer, so I do not handle the pricing aspect. However, I hear from my manager that the pricing is very high for PagerDuty Operations Cloud, and only a few of us have the main business tier accounts. Many of us have low tier accounts that restrict us to acknowledging and viewing alerts, while a few have the ability to create and trigger alerts. Therefore, I do not think much about pricing, but I do believe it is somewhat high. However, I think this is valid because PagerDuty Operations Cloud provides a vast amount of benefits compared to other alerting systems.
Which other solutions did I evaluate?
Regarding the key differences, pros and cons of PagerDuty Operations Cloud compared to competitors, some pros include alert grouping, AI functionality, and the ability to easily integrate with Slack for quicker resolution. Additionally, we receive phone notifications and push notifications, which many of the other competitors do not provide. The pricing of PagerDuty Operations Cloud is also reasonable for the functionalities it offers compared to its competitors. These are some benefits I see in PagerDuty Operations Cloud, including helpful alert insights and direct links to dashboards we have integrated, such as Datadog and Grafana, which allow us to resolve issues quickly.
What other advice do I have?
The recommendation I share, based on my experience with PagerDuty Operations Cloud, is that it is one of the best platforms for synchronizing with your monitoring tools. It will improve your flow, and your team will definitely benefit from PagerDuty Operations Cloud compared to other competitors, as it offers numerous advantages. I give this review a rating of ten out of ten.