My main use case for Datadog is dashboards and monitoring.
We use dashboards and monitoring with Datadog to monitor the performance of our Nexus Artifactory system and make sure the services are running.
External reviews are not included in the AWS star rating for the product.
My main use case for Datadog is dashboards and monitoring.
We use dashboards and monitoring with Datadog to monitor the performance of our Nexus Artifactory system and make sure the services are running.
The best features Datadog offers are the dashboarding tools as well as the monitoring tools.
What I find most valuable about the dashboarding and monitoring tools in Datadog is the ease of use and simplicity of the interface.
Datadog has positively impacted our organization by allowing us to look at things such as Cloud Spend and make sure our services are running at an optimal performance level.
We have seen specific outcomes such as cost savings by utilizing the cost utilization dashboards to identify areas where we could trim our spend.
To improve Datadog, I suggest they keep doing what they're doing.
Newer features using AI to create monitors and dashboards would be helpful.
I have been using Datadog for six years.
Datadog is stable.
I am not sure about Datadog's scalability.
Customer support with Datadog has been great when we needed it.
I rate the customer support a nine on a scale of 1 to 10.
Positive
We did not previously use a different solution.
In terms of return on investment, there is a lot of time saved from using the platform.
I was not directly involved in the pricing, setup cost, and licensing details.
Before choosing Datadog, we evaluated other options such as Splunk and Grafana.
I rate Datadog an eight out of ten because the expense of using it keeps it from being a nine or ten.
My advice to others looking into using Datadog is to brush up on their API programming skills.
My overall rating for Datadog is eight out of ten.
My main use case for Datadog is performance monitoring, SLOs, and SLIs.
For performance monitoring, SLOs, and SLIs, we create objectives and indicators around user feedback and stakeholder feedback. We have weekly meetings to create backlog items to work on if things have elapsed and gone into the red based on our SLO definitions.
The best features Datadog offers are the analytics that are all associated with each other. RUM data associated with APM, trace data, and all of that, including information around inferred requests, has been super useful. Machine health data gives a complete picture of performance, which has been extremely useful for troubleshooting difficult problems.
Having all that associated analytics helps me in troubleshooting by not having to bounce around to other tools, which saves me a lot of time. I know that the quality of Datadog metrics gathered is enough to where I can rule things in and out. This basically goes for any web app; when asking why a web app is slow, first you look at the code. If the code looks good, then you look at the hardware or the database. Being able to rule all of those out with one tool with one set of requests is useful.
Datadog has positively impacted my organization by allowing us to gather complete data instead of looking all over the place at incomplete data and actually make pointed determinations for fixing issues. It has helped increase efficiency and saved time.
I don't know how Datadog can be improved as they are doing a pretty good job.
I have been using Datadog for three years.
Datadog is stable.
Datadog's scalability is good.
The customer support is good.
My experience with pricing, setup cost, and licensing is that it is really expensive.
I have not seen a return on investment.
My experience with pricing, setup cost, and licensing is that it is really expensive.
My advice to others looking into using Datadog is that it is good and they should use it.
I don't know if my company has a business relationship with this vendor other than being a customer.
On a scale of 1-10, I rate Datadog a 9.
My main use case for Datadog is application and portal monitoring.
For application or portal monitoring, we have several monitors set up that give us a heads up early when we believe there's a problem with end users getting to the applications that are available to them on the portal. Just yesterday, we were able to identify an error in code that was throwing thousands of errors a day, and it was very simple for us to actually find it using Datadog analytics on the error and the Watchdog alerts.
I don't have anything else to add about my main use case, other than the ease with which we were able to identify an issue that we previously, when we didn't have Datadog, might not even be aware of, but was consuming resources that it didn't need to.
In my opinion, the best features Datadog offers are flexibility and extensive support. It can be a little overwhelming since there are so many features that come with Datadog, and I'm just scratching the surface of that. I also appreciate the support that our representative has provided to us, coming on-prem, providing training, being available to answer questions, and the extensive knowledge base documentation that I have been referred to, which has been extremely helpful also.
The flexibility I mentioned shows up in my day-to-day work because traditionally, I was using SolarWinds to monitor infrastructure health, but the polling period is lengthier than we would like to see. Datadog specifically has real-time monitoring, and the alerts that we have configured are coming to us much quicker. We're able to address an issue sooner rather than later, and when it comes to reviewing .NET code or application configuration, I only had limited visibility, but with Datadog doing the analysis of the IIS logs and any other application logs, it's also opened up visibility to me so that I can assist a developer in identifying the area of concern or where a code could be more efficiently written.
Datadog has positively impacted my organization by helping us make our web portals more efficient. Our portals and integrations are extremely complex, and as we get the agent installed on more devices, it's really provided us visibility that we haven't had in my entire career with Ace Hardware.
I cannot provide specific numbers for the improved performance, but Datadog has identified issues that we have in our data source area. We have implemented additional indexes and have plans for breaking out complex queries that are pulling data across multiple data sources. We're in the crawl, walk, run phase, so right now we're identifying and prioritizing the things that need to be fixed. A few of the things that we've already addressed include adding additional resources to servers, and we have noticed improved performance. I know someone has the statistics; I just don't have them available to me at the moment.
At this point, I'm not sure how Datadog can be improved, but maybe some initial intense training from the vendor before setting us loose with the application is the only thing I can think of.
I think it would be helpful to have an administrative page right from the portal that gives us links to the application documentation. I have separate URLs to get to the various locations that I need to go to, but unless I'm just not seeing them, I have to go to separate URLs. I cannot get to some of the documentation and various other components from my company-specific portal.
Other than being restricted by cost, Datadog's scalability has been a little bit of a challenge to do the initial installation of the agent. We have upgraded all of our agents so that we can do the upgrades remotely, but the initial install is still a little time-consuming and a little clunky.
I think the customer support is great. I love the ability to send flares directly from the machine or device that's having an issue, and my tickets are always opened promptly. I usually get links to documentation about the specific feature or function that I'm trying to implement, and when I have additional questions, the ticket is updated with actual recommendations or suggestions pointing me in the correct direction.
Positive
We continue to use SolarWinds, although I can see the infrastructure monitoring component of SolarWinds being replaced with Datadog. We also used Catchpoint to run synthetic scripts from various locations throughout the country, and we use Pingdom for our e-commerce solution. We're trying to phase out Pingdom at this time with the help of Datadog engineers, and we have ceased using Catchpoint because we have created those synthetic scripts within Datadog.
At this point, I'm leaving the return on investment metrics to my manager and director. I'm just focused on getting it up and running, installed, upgraded, and helping to train other folks to use it. I know they're trying to keep metrics on all of those questions, but I'm just not focusing on that at this time.
I was not included in the pricing, setup cost, and licensing decisions, but I have needed to gain more information about licensing and individual feature cost projections. Everybody wants the agent installed, but we only have so many dollars to spread across, so it's been difficult for me to prioritize who will benefit from Datadog at this time.
I'm excited to learn more about the application and can't wait as my knowledge expands, all the exciting things that we might be able to do with the tool.
I rate Datadog an 8 out of 10, only because I haven't had the ability to explore everything that I intend to explore, and some of the more complex monitors that I want to create I'm just not able to intuitively do. But that might be on me and not the product. The complexity and my lack of knowledge related to all the features and how I can use them keep it from being a 10 for me.
I would advise others looking into using Datadog to do more training and become much more familiar with the product before going live with it. There are so many wonderful things that can be done with it that it's a little overwhelming to only attempt to configure those or investigate them when the product's already live.
I'm excited to continue to learn and explore the tool. It's giving me some insight into systems that I have not had for the past 17 years, so it's exciting to be able to see that and put it to use almost immediately.
The primary purposes for which Datadog is used include infrastructure monitoring and application monitoring.
The main use case for Datadog integration capabilities is to monitor workloads in public cloud, and those public cloud integrations that reached the public cloud metric natively were helpful or critical for us. We are not using Datadog for AI-driven data analysis tasks, but more cloud-native and vendor-native tools at the moment, and at the time when I was still in my last employer, we didn't use Datadog for the AI piece at all.
I find alerting and metrics to be the most effective features of Datadog for system monitoring. It was still cheaper to run Datadog than other alternatives, so the running costs were cheaper because it was SaaS and quite easy to use.
Datadog is only available in SaaS.
The pricing nowadays is quite complex.
In future updates, I would like to see AI features included in Datadog for monitoring AI spend and usage to make the product more versatile and appealing for the customer.
I have been using Datadog since 2014.
There were no problems with the deployment of Datadog.
The deployment of Datadog just took a few hours.
The challenges I encountered while using Datadog were in the early days when the product was missing the ability to monitor Kubernetes and similar features, but they have since added those features. At the moment, I don't think there are too many challenges that I am worrying about.
One person is enough to do the installation.
I am not working with any of these solutions currently because I'm on sabbatical, but I used to work with Datadog six months ago, and now at the moment I'm on sabbatical.
We were using the tools that AWS and Azure came with natively to monitor the AI workflows on their platforms.
I used to work as the CTO at Northcloud, but I no longer work there.
On a scale of one to ten, I rate Datadog an eight out of ten.
The technology itself is generally very useful and the interface it great.
There should be a clearer view of the expenses.
I have used the solution for four years.
The solution is stable.
I have not personally interacted with customer service. I am satisfied with tech support.
Neutral
I am using ThousandEyes and Datadog. Datadog supports AI-driven data analysis, with some AI elements to analyze, like data processing tools and so on. AI helps in Datadog primarily for resolving application issues.
It was not difficult to set up for me. There was no problem.
I can confirm there is a return on investment.
I find the setup cost to be too expensive. The setup cost for Datadog is more than $100. I am evaluating the usage of this solution, however, it is too expensive.
I would rate this solution eight out of ten.
We use Datadog for monitoring and observing all of our systems, which range in complexity from lightweight, user-facing serverless lambda functions with millions of daily calls to huge, monolithic internal applications that are essential to our core operations. The value we derive from Datadog stems from its ability to handle and parse a massive volume of incoming data from many different sources and tie it together into a single, informative view of reliability and performance across our architecture.
Adopting Datadog has been fantastic for our observability strategy. Where previously we were grepping through gigabytes of plaintext logs, now we're able to quickly sort, filter, and search millions of log entries with ease. When an issue arises, Datadog makes it easy to track down the malfunctioning service, diagnose the problem, and push a fix.
Consequently, our team efficiency has skyrocketed. No longer does it take hours to find the root cause of an issue across multiple services. Shortened debugging time, in turn, leads to more time for impactful, user-facing work.
Our services have many moving parts, all of which need to talk to each other. The Service Map makes visualizing this complex architecture - and locating problems - an absolute breeze. When I reflect on the ways we used to track down issues, I can't imagine how we ever managed before Datadog.
Additionally, our architecture is written in several languages, and one area where Datadog particularly shines is in providing first-class support for a
multitude of programming languages. We haven't found a case yet where we
needed to roll out our own solution for communicating with our instance.
A tool as powerful as Datadog is, understandably, going to have a bit of a learning curve, especially for new team members who are unfamiliar with the bevy of features it offers. Bringing new team members up to speed on its abilities can be challenging and sometimes requires too much hand-holding. The documentation is adequate, but team members coming into a project could benefit from more guided, interactive tutorials, ideally leveraging real-world data. This would give them the confidence to navigate the tool and make the most of all it offers.
The company was using it before I arrived; I'm unsure of how long before.
We use Datadog across the enterprise for observability of infrastructure, APM, RUM, SLO management, alert management and monitoring, and other features. We're also planning on using the upcoming cloud cost management features and product analytics.
For infrastructure, we integrate with our Kube systems to show all hosts and their data.
For APM, we use it with all of our API and worker services, as well as cronjobs and other Kube deployments.
We use serverless to monitor our Cloud Functions.
We use RUM for all of our user interfaces, including web and mobile.
It's given us the observability we need to see what's happening in our systems, end to end. We get full stack visibility from APM and RUM, through to logging and infrastructure/host visibility. It's also becoming the basis of our incident management process in conjunction with PagerDuty.
APM is probably the most prominent place where it has helped us. APM gives us detailed data on service performance, including latency and request count. This drives all of the work that we do on SLOs and SLAs.
RUM is also prominent and is becoming the basis of our product team's vision of how our software is actually used.
APM is a fundamental part of our service management, both for viewing problems and improving latency and uptime. The latency views drive our SLOs and help us identify problems.
We also use APM and metrics to view the status of our Pub/Sub topics and queues, especially when dealing with undelivered messages.
RUM has been critical in identifying what our users are actually doing, and we'll be using the new product analytics tools to research and drive new feature development.
All of this feeds into the PagerDuty integration, which we use to drive our incident management process.
Sometimes thesolution changes features so quickly that the UI keeps moving around. The cost is pretty high. Outside of that, we've been relatively happy.
The APM service catalog is evolving fast. That said, it is redundant with our other tools and doesn't allow us to manage software maturity. However, we do link it with our other tools using the APIs, so that's helpful.
Product analytics is relatively new and based on RUM, so it will be interesting to see how it evolves.
Sometimes some of the graphs take a while to load, based on the window of data.
Some stock dashboards don't allow customization. You need to clone them first, but this can lead to an abundance of dashboards. Also, there are some things that stock dashboards do that can't yet be duplicated with custom dashboards, especially around widget organization.
The "top users" widget on the product analytics page only groups by user email, which is unfortunate, since user ID is the field we use to identify our users.
I've used the solution for three and a half years.
The solution is pretty stable.
The solution is very scalable.
Support was excellent during the sales process, with a huge dropoff after we purchased the product. It has only recently (within the past year) they have begun to reach acceptable levels again.
Neutral
We did not have a global solution. Some teams were using New Relic.
The instructions aren't always clear, especially when dealing with multiple products across multiple languages. The tracer works very differently from one language to another.
We handled the setup in-house.
We have built our own set of installation instructions for our teams, to ensure consistent tagging and APM setup.
We did look at Dynatrace.
The service was great during the initial testing phase. However, once we bought the product, the quality of service dropped significantly. However, in the past year or so, it has improved and is now approaching the level we'd expect based on the cost.
We were looking for an all-in-one observability platform that could handle a number of different environments and products. At a basic level, we have a variety of on-premises servers (Windows/Mac/Linux) as well as a number of commercial, cloud-hosted products.
While it's often possible to let each team rely on its own means for monitoring, we wanted something that the entire company could rally around - a unified platform that is developed and supported by the very same people, not others just slapping their name on some open source products they have no control over.
Datadog has effortlessly dropped in to nearly every stage of observability for us. We appreciate how it has robust cross-platform support for our IT assets, and for integrating hosted products, enabling integrations often couldn't be easier, with many of them including native dashboards and even other types of content packs.
Over the last couple of years, we have onboarded a number of engineering teams, and each of them feels comfortable using Datadog. This gives us the ability to build organizational knowledge.
Datadog's learning platform is second to none. It's the gold standard of training resources in my mind; not only are these self-paced courses available at no charge, but you can spin up an actual Datadog environment to try out its various features.
I just hate when other vendors try to upsell you on training beyond their (often poorly-written) documentation. Apart from that, we appreciate the variety of content that comes from Datadog's built-in integrations - for common sources, we don't have to worry about parsing, creating dashboards, or otherwise reinventing the wheel.
Datadog's roadmap can be a bit unpredictable at times. For instance, a few years ago, our rep at the time stated that Datadog had dropped its plans to develop an incident on-call platform. However, this year, they released a platform that does exactly that.
They also decided to drop chat-based support just recently. While I understand that it's often easier to work with support tickets, I do miss the easy availability of live support.
It would be nice if Datadog continued to broaden its variety of available integrations to include even more commercial platforms because that is central to its appeal. If we're looking at a new product and there isn't a native integration, then that's more work on our part.
We primarily use the solution for a variety of purposes, including:
This provides a single place to find monitoring data. Prior to DD, we had some metrics living in New Relic, some in Grafana, and some in Circonus, and it was very confusing to navigate across them. Understanding different query languages is challenging. Here, there's a single UI to get used to, and everything is so sharable.
DD has led to teams making more decisions based on data that they observe about their service metrics and RUM metrics. I've seen decisions get made based on what has been observed in DD, and less based on anecdotal data.
I really enjoyed using CCM since it showed cloud cost data easily next to other metrics, and I could correlate the two.
Across CCM and the rest of Datadog, I like how sharable everything is. It's so easy to share dashboards and links with my teammates so we can quickly get up to speed on debugging/solving an issue.
I also have really enjoyed K8s view of pods and pod health. It's very visual, and as a non-K8s platform owner at my company, I can still observe the overall health of the system. Then I can drill in and have learned things about K8s by exploring that part of the product and talking with the team.
We've had some issues where we had Datadog automatically turned on in AWS regions that we weren't using, which incurred a small but steady cost that amounted to tens of thousands of dollars spent over a few weeks. I wish there was a global setting that lets an admin restrict which regions DD is turned on in as a default setup step.
Sometimes, the APM service dashboard link isn't sharable. I click something in the service catalog, and on that service's APM default view, I try to share a link to that with a teammate, and they reach a blank or error screen. 
I wish there was more organization and detail in the suggestions when I use the query editor. I'm never quite sure when the autofill dropdown shows up if I'm seeing some custom tag or some default property, so I have to know exactly what I'm looking for in order to build a chart. It's hard to navigate and explore using the query autofill suggestions without knowing exactly what tag to look for.
It's been a bit hard to understand how data gets sampled or how many data points a particular dashboard value is using. We've had questions over the RUM metrics that we see and we had to ask for help with how values are calculated, bin sizes, etc to get confidence in our data.
I've used the solution for six months.
I've only been aware of a recent outage that affected the latency of data collection for one of our production tests. Outside of that, the solution seems stable.
The solution seems like it can scale very well and beyond our needs.
Technical support has been stellar. We love working with a team that responds fast, in great detail, and with great empathy. I trust what they say.
Positive
We used New Relic, Grafana, and Circonus. Circonus was flakey, always having downtime and we were always on the phone with them. New Relic and grafana, different metrics lived in either and it was hard for consumers of the data to easily find what they need. And we had licensing issues across the 3 so not everybody could easily access all of them.
I didn't do this portion of the product setup.
In our fast-paced environment, managing and analyzing log data and performance metrics is crucial. That’s where Datadog comes in. We rely on it not just for monitoring but for deeper insights into our systems, and here’s how we make the most of it.
One of the first things we appreciate about Datadog is its ability to centralize logs from various sources—think applications, servers, and cloud services. This means we can access everything from one dashboard, which saves us a lot of time and hassle. Instead of digging through multiple platforms, we have all our log data in one place, making it much easier to track events and troubleshoot issues.
Before Datadog, we faced the common challenge of fragmented data. Our logs, metrics, and traces were spread across different tools and platforms, making it difficult to get a complete picture of our system’s health.
With Datadog, we now have a centralized monitoring solution that aggregates everything in one place. This has streamlined our workflow immensely. Whether it’s logs from our servers, metrics from our applications, or traces from user transactions, we can access all this information easily. This unified view has made it simpler for our teams to identify and troubleshoot issues quickly.
In my experience with Datadog, one feature stands out above the rest is the Log Explorer. It has completely transformed the way I interact with our log data and has become an essential part of my daily workflow.
The user interface is incredibly intuitive. When I first started using it, I was amazed at how easy it was to navigate. The design is clean and straightforward, allowing me to focus on the data rather than getting lost in complicated menus. Whether I’m searching for specific log entries or filtering by certain criteria, everything feels seamless.
This ease of use allowed me to get up to speed with log management since it's my first time using Datadog.
Interactive tutorials could be a game changer. Instead of just reading about how to use query filters, users could engage with step-by-step guides that walk them through the process. For example, a tutorial could start with a simple query and gradually introduce more complex filtering techniques, allowing users to practice along the way. These tutorials could include pop-up tips and hints that provide additional context or best practices as users work through examples. This hands-on approach not only reinforces learning but also builds confidence in using the tool.
My company has recently made Datadog available to it's software engineers and I personally have been using it for almost a year now.