Once you got onto Datadog (a closed platform), they will leverage switching cost to squeeze you
What do you like best about the product?
Their marketing is very strong. As I was choosing between Grafana vs Datadog, I believed datadog can solve my problems quicker and I am willing to pay more for that.
What do you dislike about the product?
1 months into datadog, my decision of choosing Datadog over Grafana is already biting me. None of my expectations were materialized
1. Datadog was not quicker to set up; there are less support docs/examples comparing to Grafana.
2. Datadog's support team are all about being "nice" instead of making real progress. Very inefficient.
3. When you are stuck, their support will remind you of the "switching cost" instead of solving real problems for you. Very distasteful.
4. I invested in a closed platform instead of an open source stack where I have a much easier time to extend and switch.
What problems is the product solving and how is that benefiting you?
We have many microservices. As an executive, I needed a service catelog to track, monitor, and level up all of the services across all dimensions (APM, Cloud Cost, Incident, Log, etc).
Datadog is a great monitoring and observability tool
What do you like best about the product?
Is a tool that provides many services like infrastructure monitoring, logging, database monitoring, etc. The great about this is that all services are interconnected, so when a problem ocuurs in your systems you can configure Datadog to have end 2 end visibility of your platform, helping the engineers to debug and find problems easy and fast enough.
What do you dislike about the product?
Sometimes the ui is very overwhelming, specially at the beginning, so many buttons and features makes the platform very complex to use, so the leraning curve is a bit hard at the beginnning, once you learn to use it is really simple and intuitive
What problems is the product solving and how is that benefiting you?
The monitoring and alerting systems we have in AWS and GCP are good to identify problems, but when it comes to review an end 2 end problem that there is no clue when the error is happening, datadog is providing the tools to follow step by step with full visibility of our systems. This is increasing the productivity of the engineering team a lot. We are right now finalizing our trial version but I can tell that not even getting the full potentil of it yet, is really promising what we achieved in just a few weeks. Very excited of what is coming with the new discoveries of this platform features
Very intuitive UI and powerfull integration
What do you like best about the product?
We have deployed Datadog for our all cloud deployments in AWS cloud. A large number of integrations allow us to literally monitor everything. From AWS cloud infra to hosted compute whether it be physical, virtual or serverless. We are using Datadog to monitor our endpoints and UI testing of the applications through synthetic tests.
Deployment is super easy and quick with highly skilled support team. Datadog is one of the most frequenctly used tool in our organization and its been great. Documentation is very detailed and has improved over time allowing us to setup everything without major hurdles.
What do you dislike about the product?
Some features are limited when compared to compatitors. We used to monitor the infra through NewRelic which offered more visibility into the running processes but Datadog is limited when it comes to manual process monitoring.
What problems is the product solving and how is that benefiting you?
Visibility to monitor the infrastructure, network, applications and databases was needed to optimize cost and resource right sizing. We also needed to monitor the infra and applications to keep track of whats happening and troubleshoot issues when they arrise. Datadog has helped us in doing this and we are satified with it.
Amazing UX
What do you like best about the product?
The user experience of DataDog is amazing. The granuality and flexibility of filters and charts makes it easy to quickly sift through large amounts of data and find what you need. This is especially useful when trying to quickly troubleshoot an incidents.
What do you dislike about the product?
One annoying thing about DataDog is working with IoC configurations that are defined in Terraform.
It's tough to make changes to your configurations because you need to deploy them in order to see if you did it correctly. It's also not possible to lock down resources defined with IoC, so sometimes people will edit something via the web UI that was conficuged in Terraform and then their changes will get reverted when Terraform re-deploys and they won't know why.
What problems is the product solving and how is that benefiting you?
System observability and incident response.
Monitoring Tool for Cloud Infra
What do you like best about the product?
What I like best about Datadog is its comprehensive monitoring capabilities across multiple environments and services, especially in cloud-based infrastructures. The platform makes it easy to monitor everything in one place, from application performance and infrastructure health to logs and security. The real-time dashboards are highly customizable, allowing us to drill down into specific metrics and get a clear overview of our entire stack.
What do you dislike about the product?
As our usage grows and we monitor more hosts and services, processing costs also increase. There is a learning curve involved in creating custom queries within the log management interface.
What problems is the product solving and how is that benefiting you?
Datadog helps us solve a range of monitoring and observability challenges across our infrastructure, particularly for our EKS clusters, AWS services, and application performance. By centralizing all these metrics in one place, it’s much easier to get a clear, real-time view of what’s happening across our system.
One Stop Solution (360 degree Monitoring)
What do you like best about the product?
Ecosystem and integration!
Effective communication between different Datadog products, such as APM and Infrastructure, shows detailed traces that I can drill down into to pinpoint the issue.
What do you dislike about the product?
There is a need for more improvements and features in the Datadog security lineup. A centralized SCA and SAST is lacking. Additionally, easier integration with MS Teams and other third-party software is necessary.
Flutter support in Real User Monitoring (it is not fully supported yet).
What problems is the product solving and how is that benefiting you?
1. At one glance I can see status of my production environment.
2. If there is issue in midnight then watchdog given me exact context on where is the failure and which all microservices were involved along with CPU and RAM usage of given period.
3. If there is crash for mobile application then it directly reports and create errors error tracking which I can assign it to team.
4. If there is vulnerability in the production then it scan and report.
5. If someone is trying to hijack the system then I can block those IPs.
6. Datadog detects unusual activities and reports it.
7. We have configured reports which get sends daily at 9AM to see previous day statics for review purpose so we are proactive if there are any issues.
Very useful custom matrics
Fair point about the open metrics integration. All metrics collected by that specific "integration" are considered custom. There are many ways to collect custom metrics. Through a open metrics endpoint, collecting metrics from logs, metrics from traces, custom check, or submitting metrics directly to the agent.
Very useful Network Hosts
The user interface is intuitive, making it easy to manage domains, emails, and databases. The dashboard is well-organized, which is a plus for beginners who might feel overwhelmed by technical details.
Debugs slow performance with good support and a straightforward setup
What is our primary use case?
We use Datadog for monitoring the performance of our infrastructure across multiple types of hosts in multiple environments. We also use APM to monitor our applications in production.
We have some Kubernetes clusters and multi-cloud hosts with Datadog agents installed. We have recently added RUM to monitoring our application from the user side, including replay sessions, and are hoping to use those to replace existing monitoring for errors and session replay for debugging issues in the application.
How has it helped my organization?
We have been using Datadog since I started working at the company ten years ago and it has been used for many reasons over the years. Datadog across our services has helped debug slow performance on specific parts of our application, which, in turn, allows us to provide a snappier and more performant application for our customers.
The monitoring and alerting system has allowed our team to be aware of the issues that have come up in our production system and react faster with more tools to debug and view to keep the system online for our customers.
What is most valuable?
Datadog infrastructure monitoring has helped us identify health issues with our virtual machines, such as high load, CPU, and disk usage, as well as monitoring uptime and alerting when Kubernetes containers have a bad time staying up. Our use of Datadog's Application Monitoring, APM over the last six years or so has been crucial to identifying performance and bottleneck issues as well as alerting us when services are seeing high error rates, which have made it easier to debug when specific services may be going down.
What needs improvement?
We have found that some of the different options for filtering for logs ingestion, APM traces and span ingestion, and RUM sessions vs replay settings can be hard to discover and tough to determine how to adjust and tweak for both optimal performance and monitoring as well as for billing within the console.
It can sometimes be difficult to determine which information is documented, as we have found inconsistencies with deprecated information, such as environment variables within the documentation.
For how long have I used the solution?
I've been using the solution for ten years.
What do I think about the stability of the solution?
The solution seems pretty stable, as we've been using it for more than a decade.
What do I think about the scalability of the solution?
The solution seems quite scalable, especially within Kubernetes. Costs are a factor.
How are customer service and support?
SUpport has been very helpful whenever we need it.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
We had tried some other APM monitoring in the past, however, it was too expensive, and then we added it to Datadog since we were already using Datadog and it seemed like a good value add.
How was the initial setup?
The solution is straightforward to set up. Sometimes, it is complex to find the correct documentation.
What about the implementation team?
We handled the setup in-house.
What was our ROI?
Our ROI is ease of mind with alerts and monitoring, as well as the ability to review and debug issues for our customers.
What's my experience with pricing, setup cost, and licensing?
Getting settled on pricing is something you want to keep an eye on, as things seem to change regularly.
Which other solutions did I evaluate?
We used New Relic previously.
What other advice do I have?
Datadog is a great service that is continually growing its solution for monitoring and security. It is easy to set up and turn on and off its features once you have instrumented agents and tailored solutions to your needs.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Other
Good RUM and APM with good observability
What is our primary use case?
We use Datadog across the enterprise for observability of infrastructure, APM, RUM, SLO management, alert management and monitoring, and other features. We're also planning on using the upcoming cloud cost management features and product analytics.
For infrastructure, we integrate with our Kube systems to show all hosts and their data.
For APM, we use it with all of our API and worker services, as well as cronjobs and other Kube deployments.
We use serverless to monitor our Cloud Functions.
We use RUM for all of our user interfaces, including web and mobile.
How has it helped my organization?
It's given us the observability we need to see what's happening in our systems, end to end. We get full stack visibility from APM and RUM, through to logging and infrastructure/host visibility. It's also becoming the basis of our incident management process in conjunction with PagerDuty.
APM is probably the most prominent place where it has helped us. APM gives us detailed data on service performance, including latency and request count. This drives all of the work that we do on SLOs and SLAs.
RUM is also prominent and is becoming the basis of our product team's vision of how our software is actually used.
What is most valuable?
APM is a fundamental part of our service management, both for viewing problems and improving latency and uptime. The latency views drive our SLOs and help us identify problems.
We also use APM and metrics to view the status of our Pub/Sub topics and queues, especially when dealing with undelivered messages.
RUM has been critical in identifying what our users are actually doing, and we'll be using the new product analytics tools to research and drive new feature development.
All of this feeds into the PagerDuty integration, which we use to drive our incident management process.
What needs improvement?
Sometimes thesolution changes features so quickly that the UI keeps moving around. The cost is pretty high. Outside of that, we've been relatively happy.
The APM service catalog is evolving fast. That said, it is redundant with our other tools and doesn't allow us to manage software maturity. However, we do link it with our other tools using the APIs, so that's helpful.
Product analytics is relatively new and based on RUM, so it will be interesting to see how it evolves.
Sometimes some of the graphs take a while to load, based on the window of data.
Some stock dashboards don't allow customization. You need to clone them first, but this can lead to an abundance of dashboards. Also, there are some things that stock dashboards do that can't yet be duplicated with custom dashboards, especially around widget organization.
The "top users" widget on the product analytics page only groups by user email, which is unfortunate, since user ID is the field we use to identify our users.
For how long have I used the solution?
I've used the solution for three and a half years.
What do I think about the stability of the solution?
The solution is pretty stable.
What do I think about the scalability of the solution?
The solution is very scalable.
How are customer service and support?
Support was excellent during the sales process, with a huge dropoff after we purchased the product. It has only recently (within the past year) they have begun to reach acceptable levels again.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
We did not have a global solution. Some teams were using New Relic.
How was the initial setup?
The instructions aren't always clear, especially when dealing with multiple products across multiple languages. The tracer works very differently from one language to another.
What about the implementation team?
We handled the setup in-house.
What's my experience with pricing, setup cost, and licensing?
We have built our own set of installation instructions for our teams, to ensure consistent tagging and APM setup.
Which other solutions did I evaluate?
We did look at Dynatrace.
What other advice do I have?
The service was great during the initial testing phase. However, once we bought the product, the quality of service dropped significantly. However, in the past year or so, it has improved and is now approaching the level we'd expect based on the cost.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Google