Unified observability has improved incident response and now reduces downtime across environments
What is our primary use case?
My main use case for
Datadog is unified observability, as I use it to correlate metrics, traces, and logs in a single pane of glass to ensure the health and security of our cloud infrastructure and application.
I correlate those metrics, traces, and logs using the Service Map to visualize dependencies between our microservices, and for example, during a latency spike, I can instantly see if there is a bottleneck in a specific database query or a downstream API, which allows me to route the issues to the right team immediately.
What is most valuable?
Datadog is an incredibly powerful daily driver for any engineer, and the recent addition of LLM observability for AI apps and Cloud Security Management makes it feel like a platform that is truly keeping up with modern tech trends. The dashboarding and alert integrations are great features offered by Datadog, giving us all the required information on a single screen, and the alert integration performs its job in a very good manner.
Datadog has positively impacted our organization, as it has eliminated many negative issues, which I call tool sprawl, by replacing four or five separate monitoring tools with one unified platform. This has improved our MTTR and broken down silos between Dev and Ops teams.
Since Datadog has been introduced, the response time when seeing an alert has increased, so alerts have been taken care of within less time and routed to the other teams who have been taking the required actions. This has given us a very positive approach towards the entire working culture.
What needs improvement?
Datadog is a platform that can be improved by making its pricing more predictable, as sometimes it is difficult to forecast exactly how much a new project will cost until after we have started ingesting the data.
When it comes to the documentation, we do not have much available right now, so if Datadog can improve the documentation part, it would really help the engineers to work on this.
Datadog is the most comprehensive observability tool on the market, and it only loses two points because the pricing for log ingestion can grow quickly if we do not carefully manage our filters.
For how long have I used the solution?
I have been using Datadog for about three years to monitor our cloud-native application and infrastructure across multiple environments.
What do I think about the stability of the solution?
Datadog is extremely stable, as it is built for high scalable environments and consistently maintains high availability, which is why I trust it as our primary monitoring tool.
What do I think about the scalability of the solution?
Datadog is built for hyperscale, as it automatically scales when we add new hosts or containers, and its Monitoring as Code approach via
Terraform allows us to scale our monitoring setup instantly as our infrastructure grows.
How are customer service and support?
Their technical documentation is some of the best in the industry, and their support engineers are very proactive, helping us optimize the ingestion cost.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
I previously used a mix of open-source tools like Prometheus and
Grafana, and I switched because manual upkeep was too high and I needed a platform that could handle logs and traces alongside metrics without having to manage the backend storage.
How was the initial setup?
Buying Datadog through the
AWS Marketplace was seamless and helped me meet
AWS spending commitments, and while Datadog's custom metric pricing can be complex, the setup cost is very low because the agent is easy to deploy.
What was our ROI?
I have seen a strong ROI through a thirty percent reduction in downtime and significant cost savings by identifying under-utilized cloud resources, for example, the ideal
EC2 instances through their cloud cost management.
Which other solutions did I evaluate?
I evaluated
New Relic,
Dynatrace, and
Amazon CloudWatch before choosing Datadog, and I chose Datadog because of its massive library of over seven hundred integrations and its superior user interface, which is easier for our developers to use daily.
What other advice do I have?
My biggest advice is to set up ingestion rules and filters early, as you should not send all your logs and metrics at once, and being selective about what you need to store can maximize your ROI from day one. I would rate this review as an eight.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Comprehensive Monitoring with Easy Setup
What do you like best about the product?
I really like how detailed the log traces can be in Datadog, and how I can search for specific logs based on labels and facets. Setting up Datadog agents was also very easy.
What do you dislike about the product?
Pricing can become really expensive at scale, especially when log ingestion and custom metrics are not carefully managed. It would really be nice to be able to view a cost dashboard, as I don't think Datadog has that feature.
What problems is the product solving and how is that benefiting you?
I use Datadog to gain insights into application metrics and monitor key metrics like memory and CPU usage. It also provides visibility into services deployed across clouds like GCP and AWS.
Unified Monitoring (APM) That Accelerates Issue Diagnosis and Incident Resolution
What do you like best about the product?
Datadog brings infrastructure, applications, logs, and security signals together in one place, which makes it much easier to understand what is really happening in an environment and to move quickly from detection to action. The correlation between metrics, traces, and logs is particularly valuable when diagnosing incidents, as it reduces guesswork and speeds up root cause analysis.
What do you dislike about the product?
While Datadog is extremely powerful, it can become difficult to control and predict costs in large or rapidly changing environments, particularly when ingesting high volumes of logs, metrics, and traces. Without strong governance and regular tuning, usage can grow quickly and lead to unexpected spend.
In addition, the breadth of features can sometimes feel overwhelming. Teams need time and clear ownership to configure dashboards, alerts, and monitors properly; otherwise, there is a risk of noise, alert fatigue, or under-utilisation of the platform’s capabilities.
What problems is the product solving and how is that benefiting you?
Datadog helps us centralise logs and monitor our Java applications and APIs, and provides APM (Application Performance Monitoring) to quickly detect performance issues and troubleshoot incidents or bottlenecks.
Intuitive Interface That Makes Data Insights Effortless
What do you like best about the product?
The user interface is very intuitive, making it easy to gain insights from the data. getting data into datadog is quite simple due to the multiple integrations, so it get's ready to use in a few clicks, support is responsive, my team uses it every day.
What do you dislike about the product?
In terms of cost, this platform is not inexpensive. Additionally, making bulk changes across multiple widgets is not straightforward, which can be inconvenient.
What problems is the product solving and how is that benefiting you?
We leverage Datadog to generate alerts and reports for our services, which helps us maintain higher uptime and gain better visibility into any issues that arise.
Comprehensive Tracking Capabilities That Impress
What do you like best about the product?
It offers almost every possible way to track the application interactions.
What do you dislike about the product?
It's very expensive, and it's not easy to grasp for newbies
What problems is the product solving and how is that benefiting you?
It gives us insights into our application services and it allows us to catch issues. Also, it provides session replays that allow us to quickly spot issues with the way users interact with our application.
Empowers Confident Monitoring and Insightful System Analysis
What do you like best about the product?
Datadog is a powerful tool that gives us greater confidence in our company's systems and enhances our ability to detect outages. It offers a wide range of features, most of which we actively use. Datadog provides essential insights into our systems, which helps us investigate problems, identify issues, and monitor performance, these being just a few of the ways we rely on Datadog.
What do you dislike about the product?
The cost is one of Datadog's biggest drawbacks, there are some products that would be helpful to use but the cost makes them impractical. The cost of Mobile App Testing testing comes to mind as an example for this.
Additionally, we have experienced some frustration due to pricing changes. Our previous SKUs were grandfathered, but we were eventually required to switch to the newer, more expensive SKU pricing.
What problems is the product solving and how is that benefiting you?
Datadog helps gives our company confidence in knowing that our systems are being monitored and that issues can be detected and addressed by our observability team promptly. We also use the information collected in Datadog to assess the overall health of our systems and drive investigations for any issues detected.
User-Friendly Dashboards with Comprehensive Analytics
What do you like best about the product?
It logs all the details we need for analysis. The platform is user-friendly, allowing us to easily set up various dashboards to view all the insights. Implementation is straightforward as well. We have integrated it with our different platforms, including both web and mobile.
What do you dislike about the product?
I have never encountered any major issues. However, there should be an option to retrieve data based on our own custom filters.
What problems is the product solving and how is that benefiting you?
We use Datadog for both performance monitoring and crash analytics. Previously, our main challenge was with crash reporting, as we struggled to debug or analyze crash data in our production environments. However, Datadog has helped us address this issue through its crash reports and log monitoring features.
Unmatched Reliability and Performance—A Must-Have Monitoring Tool
What do you like best about the product?
Datadog's reliability and performance is unmatched which makes it an absolute recommendation.
What do you dislike about the product?
Too many features/window are spread-out and could be consolidated/integrated better for easier UX.
What problems is the product solving and how is that benefiting you?
As a developer, DD has been my go to point to investigate any ongoing incidents or malfunctions in our services. The APM offer a great detailed view of the application performance and general health.
Where are the gaps? Yet to find one
What do you like best about the product?
Datadog has everything you could want for observability and alerting and incident management.
What do you dislike about the product?
Can be quite overwhelming on where to get started, you need to know exactly what you want to do to get into Datadog.
What problems is the product solving and how is that benefiting you?
Need a simple place to create dashboards and alerts that cover multiple cloud providers
Monitoring has improved digital experiences and speeds root cause analysis for incident tickets
What is our primary use case?
I intend to use Datadog for application performance monitoring, digital user experiences, and troubleshooting to find the root cause analysis of tickets that will be generated in my managed environment. Digital user experience happens to be the priority for me, as I am evaluating this feature across some competing products.
What is most valuable?
The best features Datadog offers are digital user experience, troubleshooting, and remediation capabilities, which help identify what is going wrong and where. I focused on the root cause analysis of incidents and tickets, as examining the RCAs makes it easier to find remediations and helps with shifting incidents left. Datadog will positively impact my organization by allowing me to handle ticket resolutions at a much faster pace and bring productivity by reducing the number of support engineers required at the monitoring level. If I integrate Datadog with my managed environment or cloud environment, the RCAs and all the left shift will be automated, and with automation, I will be able to reduce the number of support engineers.
What needs improvement?
Datadog could be improved with a simpler graphical user interface that can be extended to non-technical users, such as a CXO, if they want to review the dashboard overall for current tickets and the ticketing dashboard. It would be beneficial to have documentation auto-generated while examining remediations or integration with existing systems.
For how long have I used the solution?
I have been working for more than fifteen years in data center, disaster recovery solutions, and cloud computing, which includes private, public, and hybrid environments.
What do I think about the stability of the solution?
Datadog seems to be more stable, and I really want to have a complete demo before making a call to decide on this.
What do I think about the scalability of the solution?
I hope that Datadog will be able to extend to digital users, even if they are on a scale of thousands for an organization and connect to corporate bandwidth, and the server should be pretty much scalable on the server side.
How are customer service and support?
I find the customer support impressive from what I have heard about Datadog, and I really want to onboard this solution for my customers.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
As of now, we are using cloud-native monitoring with CloudWatch and Azure Monitor for our multi-cloud environment, and we really want to extend it to greater detail that will cover deliberations at greater depth. We have looked at ManageEngine and SolarWinds before choosing Datadog, but they were not very impressive, as the amount of Datadog functionality is not available in these two platforms.
How was the initial setup?
I am looking to deploy Datadog on AWS and Azure for multi-cloud management support and really want to extend it at the server side and at the end-user side for digital user experience. I will start with AWS and extend it to Azure six months down the line. I plan to purchase Datadog through the AWS Marketplace once I have the demo.
What was our ROI?
I am looking at metrics that will help me decide whether I need to really deploy Datadog, and the metrics will primarily be centered around reducing the number of employees and cost optimization.
What's my experience with pricing, setup cost, and licensing?
I did not get the complete information regarding the licenses and commercials associated with Datadog, and I would like to have some idea about the license.
What other advice do I have?
I hope to have some literature on how I can leverage my managed support for cloud environments, plus how I can integrate this with my managed support at the end-user devices. Finding the root cause analysis at greater depth, reducing the number of employees to manage or monitor infrastructure incidents, and increasing satisfaction on the application performance monitoring part are the advice I would give to others looking into using Datadog. I give this review a rating of eight.
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)