A great tool with an easy setup and helpful error logs
What is our primary use case?
We currently have an error monitor to monitor errors on our prod environment. Once we hit a certain threshold, we get an alert on Slack. This helps address issues the moment they happen before our users notice.
We also utilize synthetic tests on many pages on our site. They're easy to set up and are great for pinpointing when a bug is shipped, but they may take down a less visited page that we aren't immediately aware of. It's a great extra check to make sure the code we ship is free of bugs.
How has it helped my organization?
The synthetic tests have been invaluable. We use them to check various pages and ensure functionality across multiple areas. Furthermore, our error monitoring alerts have been crucial in letting us know of problems the moment they pop up.
Datadog has been a great tool, and all of our teams utilize many of its features. We have regular mob sessions where we look at our Datadog error logs and see what we can address as a team. It's been great at providing more insight into our users and logging errors that can be fixed.
What is most valuable?
The error logs have been super helpful in breaking down issues affecting our users. Our monitors let us know once we hit a certain threshold as well, which is good for momentary blips and issues with third-party providers or rollouts that we have in the works. Just last week, we had a roll-out where various features were broken due to a change in our backend API. Our Datadog logs instantly notified us of the issues, and we could troubleshoot everything much more easily than just testing blind. This was crucial to a successful rollout.
What needs improvement?
I honestly can't think of anything that can be improved. We've started using more and more features from our Datadog account and are really grateful for all of the different ways we can track and monitor our site.
We did have an issue where a synthetic test was set up before the holiday break, and we were quickly charged a great amount. Our team worked with Datadog, and they were able to help us out since it was inadvertent on our end and was a user error. That was greatly appreciated and something that helped start our relationship with the Datadog team.
For how long have I used the solution?
We've been using Datadog for several months. We started with the synthetic tests and now use It for error handling and in many other ways.
What do I think about the stability of the solution?
Stability has been great. We've had no issues so far.
What do I think about the scalability of the solution?
The solution is very easy to scale. We've used it on multiple clients.
How are customer service and support?
We had a dev who had set up a synthetic test that was running every five minutes in every single region over the holiday break last year. The Datadog team was great and very understanding and we were able to work this out with them.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
We didn't have any previous solution. At a previous company, I've used Sentry. However, I also find Datadog to be much easier, plus the inclusion of synthetic tests is awesome.
How was the initial setup?
The documentation was great and our setup was easy.
What about the implementation team?
We implemented the solution in-house.
What was our ROI?
This has had a great ROI as we've been able to address critical bugs that have been found via our Datadog tools.
What's my experience with pricing, setup cost, and licensing?
The setup cost was minimal. The documentation is great and the product is very easy to set up.
Which other solutions did I evaluate?
We also looked at other providers and settled on Datadog. It's been great to use across all our clients.
Good alerting and issue detection for many valuable features
What is our primary use case?
Our company has a microservice architecture, with different teams in charge of different services. Also, it is a start, which means that we have to build fast and move very fast as well. So before we were properly using DD, we often had issues of things breaking, but without much information on where in our system the breaking happened. This was quite a big-time sync as teams were unfamiliar with other teams' codes, so they needed the help of other teams to debug. This slowed our building down a lot. So implementing dd traces fixed this
What is most valuable?
DataDog has many features, but the most valuable have become our primary uses.
Also, thanks to frequent concurrent deployments, the DataDog alerts monitors allow us quickly detect issues if anything occurs.
What needs improvement?
The monitors can be improved. The chart in the monitors only goes back a couple of hours, clunky. Also, it can provide more info, like traces within the monitors. We have many alerts connected to different notification systems, such as Slack and Opsgenie.
When the on-caller receives notifications fired by the alerts, we are taken to the monitors. Yet often, we have to open up many different tabs to see logs, traces and info that is not accessible on the monitors. I think it would make all of the on callers' lives easier if the monitor had more data
For how long have I used the solution?
We've used the solution for three years.
Unified platform with customizable dashboards and AI-driven insights
What is our primary use case?
Our primary use case for this solution is comprehensive cloud monitoring across our entire infrastructure and application stack.
We operate in a multi-cloud environment, utilizing services from AWS, Azure, and Google Cloud Platform.
Our applications are predominantly containerized and run on Kubernetes clusters. We have a microservices architecture with dozens of services communicating via REST APIs and message queues.
The solution helps us monitor the performance, availability, and resource utilization of our cloud resources, databases, application servers, and front-end applications.
It's essential for maintaining high availability, optimizing costs, and ensuring a smooth user experience for our global customer base. We particularly rely on it for real-time monitoring, alerting, and troubleshooting of production issues.
How has it helped my organization?
Datadog has significantly improved our organization by providing us with great visibility across the entire application stack. This enhanced observability has allowed us to detect and resolve issues faster, often before they impact our end-users.
The unified platform has streamlined our monitoring processes, replacing several disparate tools we previously used. This consolidation has improved team collaboration and reduced context-switching for our DevOps engineers.
The customizable dashboards have made it easier to share relevant metrics with different stakeholders, from developers to C-level executives. We've seen a marked decrease in our mean time to resolution (MTTR) for incidents, and the historical data has been invaluable for capacity planning and performance optimization.
Additionally, the AI-driven insights have helped us proactively identify potential issues and optimize our infrastructure costs.
What is most valuable?
We've found the Application Performance Monitoring (APM) feature to be the most valuable, as it provides great visibility on trace-level data. This granular insight allows us to pinpoint performance bottlenecks and optimize our code more effectively.
The distributed tracing capability has been particularly useful in our microservices environment, helping us understand the flow of requests across different services and identify latency issues.
Additionally, the log management and analytics features have greatly improved our ability to troubleshoot issues by correlating logs with metrics and traces.
The infrastructure monitoring capabilities, especially for our Kubernetes clusters, have helped us optimize resource allocation and reduce costs.
What needs improvement?
While Datadog is an excellent monitoring solution, it could be improved by building more features to replace alerting apps like OpsGenie and PagerDuty. Specifically, we'd like to see more advanced incident management capabilities integrated directly into the platform. This could include features like sophisticated on-call scheduling, escalation policies, and incident response workflows.
Additionally, we'd appreciate more customizable machine learning-driven anomaly detection to help us identify unusual patterns more accurately. Improved support for serverless architectures, particularly for monitoring and tracing AWS Lambda functions, would be beneficial.
Enhanced security monitoring and threat detection capabilities would also be valuable, potentially reducing our reliance on separate security information and event management (SIEM) tools.
For how long have I used the solution?
I've used the solution for two years.
Good dashboards, easy troubleshooting, and integrations
What is our primary use case?
We utilize Datadog mainly to monitor our API integrations and all of the inventory that comes in from our API partners. Each event has its own ID, so we can trace all activity related to each event and troubleshoot where needed.
How has it helped my organization?
Datadog gives non-dev teams insights as to what all is happening with a particular event as well as flags any errors so that we can troubleshoot more efficiently.
What is most valuable?
The dashboards are super convenient to us for a more zoomed out view of what is going on with each integration that we utilize.
What needs improvement?
There could be more easily identifiable documentation on how to find different things on the platform. It can be overwhelming at first glance, and it's hard to find appropriate documentation on the site to lead you to where you need to be.
For how long have I used the solution?
I've used the solution for about 1.5 years.
Monitoring with datadog
What do you like best about the product?
APM traces, In detailed logging of pdos, service level data.
What do you dislike about the product?
Implementation of cloud siem is not so easy and its not as good as wazuh.
What problems is the product solving and how is that benefiting you?
Out entire monitoring is based on datadog
Becoming the Gold Standard
What do you like best about the product?
DataDog provides thorough insights accross all of the important facets
What do you dislike about the product?
DataDog has an excellent offering and continues to provides new services to keep clients and fulfill the needs of the industry. However, with this comes a premium and in turn hinders its adoption or for those who use it, to use it completely.
What problems is the product solving and how is that benefiting you?
Providing key actionable insights
This is a good product, but is only just starting to bubble up observability. Takes minutes
What do you like best about the product?
It is all in one place, the UI is fairly nice. Customer support is bomb! Thanks!
What do you dislike about the product?
Graphs are a bit small, it would be nice to have more datapoints. Also still no real useable HEATMAPS.
What problems is the product solving and how is that benefiting you?
Incident management using workflows/appbuilder/incident management
APM is growing by leaps and bounds
Using Datadog for Log Managment
What do you like best about the product?
Being able to have tedious task like Log Managment be dealt with, in an efficient way. Ease of use and ease of Integration is a big plus.
What do you dislike about the product?
Cost is a major drawback. While it might be justifiable for large enterprises with significant budgets, for small to medium-sized businesses or startups, the cost can be prohibitive. The charges accumulate quickly, especially as you add more hosts and services.
What problems is the product solving and how is that benefiting you?
Issues regarding AWS Monitoring
Datadog is like candy for DevSecOps or FinOps teams!
What do you like best about the product?
The variety of telemetry (features) that can be pulled in results in being able to make more meaningful decisions and process improvements.
What do you dislike about the product?
Sometime pricing/licensing can be tricky to understand.
What problems is the product solving and how is that benefiting you?
Every month Datadog is solving new problems for us, whether it be cloud cost optimization, or continous profiling which enables us to optimize our microservices quickly.
Great product, could get very expensive if you're not careful
What do you like best about the product?
Easy to use, great functionality for monitoring backend apps. Scales well for applications under big loads
What do you dislike about the product?
Because of it's pricing model, sometimes you pay for things you are not using. Easy to forget you have loads of metrics on some forgotten apps generating cost.
What problems is the product solving and how is that benefiting you?
Helps me monitor the health of my apps