AWS Marketplace: Datadog Enterprise (Container Agent) Reviews

Ilja Summala

Alerting and metrics improve monitoring efficiency while pricing presents challenges

August 07, 2025
Review from a verified AWS customer

What is our primary use case?

The primary purposes for which Datadog is used include infrastructure monitoring and application monitoring.

The main use case for Datadog integration capabilities is to monitor workloads in public cloud, and those public cloud integrations that reached the public cloud metric natively were helpful or critical for us. We are not using Datadog for AI-driven data analysis tasks, but more cloud-native and vendor-native tools at the moment, and at the time when I was still in my last employer, we didn't use Datadog for the AI piece at all.

What is most valuable?

I find alerting and metrics to be the most effective features of Datadog for system monitoring. It was still cheaper to run Datadog than other alternatives, so the running costs were cheaper because it was SaaS and quite easy to use.

Datadog is only available in SaaS.

What needs improvement?

The pricing nowadays is quite complex.

In future updates, I would like to see AI features included in Datadog for monitoring AI spend and usage to make the product more versatile and appealing for the customer.

For how long have I used the solution?

I have been using Datadog since 2014.

What was my experience with deployment of the solution?

There were no problems with the deployment of Datadog.

The deployment of Datadog just took a few hours.

What do I think about the stability of the solution?

The challenges I encountered while using Datadog were in the early days when the product was missing the ability to monitor Kubernetes and similar features, but they have since added those features. At the moment, I don't think there are too many challenges that I am worrying about.

How was the initial setup?

One person is enough to do the installation.

What other advice do I have?

I am not working with any of these solutions currently because I'm on sabbatical, but I used to work with Datadog six months ago, and now at the moment I'm on sabbatical.

We were using the tools that AWS and Azure came with natively to monitor the AI workflows on their platforms.

I used to work as the CTO at Northcloud, but I no longer work there.

On a scale of one to ten, I rate Datadog an eight out of ten.

reviewer1599867

Great technology with a nice interface

January 20, 2025
Review provided by PeerSpot

What is most valuable?

The technology itself is generally very useful and the interface it great.

What needs improvement?

There should be a clearer view of the expenses.

For how long have I used the solution?

I have used the solution for four years.

What do I think about the stability of the solution?

The solution is stable.

How are customer service and support?

I have not personally interacted with customer service. I am satisfied with tech support.

Which solution did I use previously and why did I switch?

I am using ThousandEyes and Datadog. Datadog supports AI-driven data analysis, with some AI elements to analyze, like data processing tools and so on. AI helps in Datadog primarily for resolving application issues.

How was the initial setup?

It was not difficult to set up for me. There was no problem.

What was our ROI?

I can confirm there is a return on investment.

What's my experience with pricing, setup cost, and licensing?

I find the setup cost to be too expensive. The setup cost for Datadog is more than $100. I am evaluating the usage of this solution, however, it is too expensive.

What other advice do I have?

I would rate this solution eight out of ten.

Timothy Spangler

Makes it easy to track down a malfunctioning service, diagnose the problem, and push a fix

January 07, 2025
Review provided by PeerSpot

What is our primary use case?

We use Datadog for monitoring and observing all of our systems, which range in complexity from lightweight, user-facing serverless lambda functions with millions of daily calls to huge, monolithic internal applications that are essential to our core operations. The value we derive from Datadog stems from its ability to handle and parse a massive volume of incoming data from many different sources and tie it together into a single, informative view of reliability and performance across our architecture.

How has it helped my organization?

Adopting Datadog has been fantastic for our observability strategy. Where previously we were grepping through gigabytes of plaintext logs, now we're able to quickly sort, filter, and search millions of log entries with ease. When an issue arises, Datadog makes it easy to track down the malfunctioning service, diagnose the problem, and push a fix.

Consequently, our team efficiency has skyrocketed. No longer does it take hours to find the root cause of an issue across multiple services. Shortened debugging time, in turn, leads to more time for impactful, user-facing work.

What is most valuable?

Our services have many moving parts, all of which need to talk to each other. The Service Map makes visualizing this complex architecture - and locating problems - an absolute breeze. When I reflect on the ways we used to track down issues, I can't imagine how we ever managed before Datadog.

Additionally, our architecture is written in several languages, and one area where Datadog particularly shines is in providing first-class support for a
multitude of programming languages. We haven't found a case yet where we
needed to roll out our own solution for communicating with our instance.

What needs improvement?

A tool as powerful as Datadog is, understandably, going to have a bit of a learning curve, especially for new team members who are unfamiliar with the bevy of features it offers. Bringing new team members up to speed on its abilities can be challenging and sometimes requires too much hand-holding. The documentation is adequate, but team members coming into a project could benefit from more guided, interactive tutorials, ideally leveraging real-world data. This would give them the confidence to navigate the tool and make the most of all it offers.

For how long have I used the solution?

The company was using it before I arrived; I'm unsure of how long before.

reviewer2507895

Good RUM and APM with good observability

September 30, 2024
Review provided by PeerSpot

What is our primary use case?

We use Datadog across the enterprise for observability of infrastructure, APM, RUM, SLO management, alert management and monitoring, and other features. We're also planning on using the upcoming cloud cost management features and product analytics.

For infrastructure, we integrate with our Kube systems to show all hosts and their data.

For APM, we use it with all of our API and worker services, as well as cronjobs and other Kube deployments.

We use serverless to monitor our Cloud Functions.

We use RUM for all of our user interfaces, including web and mobile.

How has it helped my organization?

It's given us the observability we need to see what's happening in our systems, end to end. We get full stack visibility from APM and RUM, through to logging and infrastructure/host visibility. It's also becoming the basis of our incident management process in conjunction with PagerDuty.

APM is probably the most prominent place where it has helped us. APM gives us detailed data on service performance, including latency and request count. This drives all of the work that we do on SLOs and SLAs.

RUM is also prominent and is becoming the basis of our product team's vision of how our software is actually used.

What is most valuable?

APM is a fundamental part of our service management, both for viewing problems and improving latency and uptime. The latency views drive our SLOs and help us identify problems.

We also use APM and metrics to view the status of our Pub/Sub topics and queues, especially when dealing with undelivered messages.

RUM has been critical in identifying what our users are actually doing, and we'll be using the new product analytics tools to research and drive new feature development.

All of this feeds into the PagerDuty integration, which we use to drive our incident management process.

What needs improvement?

Sometimes thesolution changes features so quickly that the UI keeps moving around. The cost is pretty high. Outside of that, we've been relatively happy.

The APM service catalog is evolving fast. That said, it is redundant with our other tools and doesn't allow us to manage software maturity. However, we do link it with our other tools using the APIs, so that's helpful.

Product analytics is relatively new and based on RUM, so it will be interesting to see how it evolves.

Sometimes some of the graphs take a while to load, based on the window of data.

Some stock dashboards don't allow customization. You need to clone them first, but this can lead to an abundance of dashboards. Also, there are some things that stock dashboards do that can't yet be duplicated with custom dashboards, especially around widget organization.

The "top users" widget on the product analytics page only groups by user email, which is unfortunate, since user ID is the field we use to identify our users.

For how long have I used the solution?

I've used the solution for three and a half years.

What do I think about the stability of the solution?

The solution is pretty stable.

What do I think about the scalability of the solution?

The solution is very scalable.

How are customer service and support?

Support was excellent during the sales process, with a huge dropoff after we purchased the product. It has only recently (within the past year) they have begun to reach acceptable levels again.

Which solution did I use previously and why did I switch?

We did not have a global solution. Some teams were using New Relic.

How was the initial setup?

The instructions aren't always clear, especially when dealing with multiple products across multiple languages. The tracer works very differently from one language to another.

What about the implementation team?

We handled the setup in-house.

What's my experience with pricing, setup cost, and licensing?

We have built our own set of installation instructions for our teams, to ensure consistent tagging and APM setup.

Which other solutions did I evaluate?

We did look at Dynatrace.

What other advice do I have?

The service was great during the initial testing phase. However, once we bought the product, the quality of service dropped significantly. However, in the past year or so, it has improved and is now approaching the level we'd expect based on the cost.

reviewer08624379

Great documentation and learning platform with good built-in integrations

September 26, 2024
Review provided by PeerSpot

What is our primary use case?

We were looking for an all-in-one observability platform that could handle a number of different environments and products. At a basic level, we have a variety of on-premises servers (Windows/Mac/Linux) as well as a number of commercial, cloud-hosted products.

While it's often possible to let each team rely on its own means for monitoring, we wanted something that the entire company could rally around - a unified platform that is developed and supported by the very same people, not others just slapping their name on some open source products they have no control over.

How has it helped my organization?

Datadog has effortlessly dropped in to nearly every stage of observability for us. We appreciate how it has robust cross-platform support for our IT assets, and for integrating hosted products, enabling integrations often couldn't be easier, with many of them including native dashboards and even other types of content packs.

Over the last couple of years, we have onboarded a number of engineering teams, and each of them feels comfortable using Datadog. This gives us the ability to build organizational knowledge.

What is most valuable?

Datadog's learning platform is second to none. It's the gold standard of training resources in my mind; not only are these self-paced courses available at no charge, but you can spin up an actual Datadog environment to try out its various features.

I just hate when other vendors try to upsell you on training beyond their (often poorly-written) documentation. Apart from that, we appreciate the variety of content that comes from Datadog's built-in integrations - for common sources, we don't have to worry about parsing, creating dashboards, or otherwise reinventing the wheel.

What needs improvement?

Datadog's roadmap can be a bit unpredictable at times. For instance, a few years ago, our rep at the time stated that Datadog had dropped its plans to develop an incident on-call platform. However, this year, they released a platform that does exactly that.

They also decided to drop chat-based support just recently. While I understand that it's often easier to work with support tickets, I do miss the easy availability of live support.

It would be nice if Datadog continued to broaden its variety of available integrations to include even more commercial platforms because that is central to its appeal. If we're looking at a new product and there isn't a native integration, then that's more work on our part.

reviewer820579

Single pane of glass, easy to share dashboards, and good for monitoring

September 20, 2024
Review from a verified AWS customer

What is our primary use case?

We primarily use the solution for a variety of purposes, including:

Watching RUM data for frontend site, using LCP and INP metrics to compare across the old and new architecture to inform rollout decisions.
Watching APM data for backend services, observing how the backend server reacts (CPU util, memory, requests/second) to make sure the backend can handle the load.
Using Datadog CCM during our free trial period to get visibility over our AWS spend across accounts and resources and looking at recommendations and acting on those.
Browsing the service catalog to look at the current state of services that are running and what resources it uses.

How has it helped my organization?

This provides a single place to find monitoring data. Prior to DD, we had some metrics living in New Relic, some in Grafana, and some in Circonus, and it was very confusing to navigate across them. Understanding different query languages is challenging. Here, there's a single UI to get used to, and everything is so sharable.

DD has led to teams making more decisions based on data that they observe about their service metrics and RUM metrics. I've seen decisions get made based on what has been observed in DD, and less based on anecdotal data.

What is most valuable?

I really enjoyed using CCM since it showed cloud cost data easily next to other metrics, and I could correlate the two.

Across CCM and the rest of Datadog, I like how sharable everything is. It's so easy to share dashboards and links with my teammates so we can quickly get up to speed on debugging/solving an issue.

I also have really enjoyed K8s view of pods and pod health. It's very visual, and as a non-K8s platform owner at my company, I can still observe the overall health of the system. Then I can drill in and have learned things about K8s by exploring that part of the product and talking with the team.

What needs improvement?

We've had some issues where we had Datadog automatically turned on in AWS regions that we weren't using, which incurred a small but steady cost that amounted to tens of thousands of dollars spent over a few weeks. I wish there was a global setting that lets an admin restrict which regions DD is turned on in as a default setup step.

Sometimes, the APM service dashboard link isn't sharable. I click something in the service catalog, and on that service's APM default view, I try to share a link to that with a teammate, and they reach a blank or error screen.

I wish there was more organization and detail in the suggestions when I use the query editor. I'm never quite sure when the autofill dropdown shows up if I'm seeing some custom tag or some default property, so I have to know exactly what I'm looking for in order to build a chart. It's hard to navigate and explore using the query autofill suggestions without knowing exactly what tag to look for.

It's been a bit hard to understand how data gets sampled or how many data points a particular dashboard value is using. We've had questions over the RUM metrics that we see and we had to ask for help with how values are calculated, bin sizes, etc to get confidence in our data.

For how long have I used the solution?

I've used the solution for six months.

What do I think about the stability of the solution?

I've only been aware of a recent outage that affected the latency of data collection for one of our production tests. Outside of that, the solution seems stable.

What do I think about the scalability of the solution?

The solution seems like it can scale very well and beyond our needs.

How are customer service and support?

Technical support has been stellar. We love working with a team that responds fast, in great detail, and with great empathy. I trust what they say.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We used New Relic, Grafana, and Circonus. Circonus was flakey, always having downtime and we were always on the phone with them. New Relic and grafana, different metrics lived in either and it was hard for consumers of the data to easily find what they need. And we had licensing issues across the 3 so not everybody could easily access all of them.

What's my experience with pricing, setup cost, and licensing?

I didn't do this portion of the product setup.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)

reviewer3796153

Intuitive user interface with good log management and a helpful Log Explorer feature

September 20, 2024
Review provided by PeerSpot

What is our primary use case?

In our fast-paced environment, managing and analyzing log data and performance metrics is crucial. That’s where Datadog comes in. We rely on it not just for monitoring but for deeper insights into our systems, and here’s how we make the most of it.

One of the first things we appreciate about Datadog is its ability to centralize logs from various sources—think applications, servers, and cloud services. This means we can access everything from one dashboard, which saves us a lot of time and hassle. Instead of digging through multiple platforms, we have all our log data in one place, making it much easier to track events and troubleshoot issues.

How has it helped my organization?

Before Datadog, we faced the common challenge of fragmented data. Our logs, metrics, and traces were spread across different tools and platforms, making it difficult to get a complete picture of our system’s health.

With Datadog, we now have a centralized monitoring solution that aggregates everything in one place. This has streamlined our workflow immensely. Whether it’s logs from our servers, metrics from our applications, or traces from user transactions, we can access all this information easily. This unified view has made it simpler for our teams to identify and troubleshoot issues quickly.

What is most valuable?

In my experience with Datadog, one feature stands out above the rest is the Log Explorer. It has completely transformed the way I interact with our log data and has become an essential part of my daily workflow.

The user interface is incredibly intuitive. When I first started using it, I was amazed at how easy it was to navigate. The design is clean and straightforward, allowing me to focus on the data rather than getting lost in complicated menus. Whether I’m searching for specific log entries or filtering by certain criteria, everything feels seamless.

This ease of use allowed me to get up to speed with log management since it's my first time using Datadog.

What needs improvement?

Interactive tutorials could be a game changer. Instead of just reading about how to use query filters, users could engage with step-by-step guides that walk them through the process. For example, a tutorial could start with a simple query and gradually introduce more complex filtering techniques, allowing users to practice along the way. These tutorials could include pop-up tips and hints that provide additional context or best practices as users work through examples. This hands-on approach not only reinforces learning but also builds confidence in using the tool.

For how long have I used the solution?

My company has recently made Datadog available to it's software engineers and I personally have been using it for almost a year now.

reviewer2561892

A go-to tool for analyzing, understanding, and investigating application performance

September 20, 2024
Review provided by PeerSpot

What is our primary use case?

The soluton is used for full stack enterprise performance monitoring for our primarily cloud-based stack on AWS. We have implemented monitoring coverage using RUM for critical apps and websites and utilize APM (integrated with RUM) for full stack traceability.

We use Datadog as our primary log repository for all apps and platforms, and the advanced log analytics enable accurate log-based monitoring/alerting and investigations.

Additionally, we some advanced RUM capabilities and metrics to track and optimize client-side user experience. We track SLO's for our critical apps and platforms using Datadog.

How has it helped my organization?

We now have full-stack observability, which allows us to better understand application behavior, quickly alert users about issues, and proactively manage application performance.

We've seen value by implementing observability coordinated across multiple applications, allowing us to track things like customer shopping and orders across multiple applications and services.

For critical application launches, we've built dashboards that can track user activity and confirm users are able to successfully utilize new features, tracking user activities in real-time in a war-room situation.

Datadog is our go-to tool for analyzing, understanding, and investigating application performance and behavior.

What is most valuable?

APM accurately tracks our service performance across our ecosystem. RUM gives us client-side performance and user experience visibility, and the rate of new features implemented in the Digital Experience area recently has been high. Log analytics give us a powerful mechanism for error tracking, research, and analysis.

Custom metrics that we've created allow us to track KPIs in real-time on dashboards. All of these have proven valuable in our organization. Additionally, Datadog product support teams are responsive and have provided timely support when needed.

What needs improvement?

Agent remote configuration should be provided/improved and streamlined, allowing for config changes/upgrades to be performed via the portal instead of at the host.

Cost tracking via the admin portal is a bit lacking, even though it has gotten better. I'm looking for usage trends (that drive cost) across time and better visibility or notifications about on-demand charges.

Network device and performance monitoring could be improved, as we've faced some limitations in this area.

The Datadog usage-based cost model, while giving us better transparency, is difficult to follow at times and is constantly evolving.

For how long have I used the solution?

I've used the solution for three years.

How are customer service and support?

Support has been responsive and helpful.

What's my experience with pricing, setup cost, and licensing?

Pricing is straightforward. That said, it's sometimes difficult to estimate usage volumes.

Which other solutions did I evaluate?

We evaluated Datadog and New Relic in detail and chose Datadog due to their straightforward and competitive pricing model, and their full coverage of monitoring features that we desired, and an easy-to-use UI.

Tony Martinez1

Great logging, session replays, and alerting

September 19, 2024
Review provided by PeerSpot

What is our primary use case?

Our primary use cases include:

Alert on errors customers encounter in our product. We've set up logs that go to slack to tell us when a certain error threshold is hit.
Investigate slow page load times. We have pages in our app that are loading slowly and the logs help us figure out which queries are taking the longest time.
Metrics. We collect metrics on product usage.
Session replays. We watch session replays to see what a user was doing when a page took a long time to load or hit an error. This is helpful.

How has it helped my organization?

It's helped us find bugs that customers are experiencing before they're reported to us. Sometimes, customers don't report errors, so being able to catch errors before they're reported helps us investigate before other users find errors

Datadog has helped us investigate slow page loading times and even see the specific queries that are taking a long time to load

Logging lets us see the context around an error. For example, see if a backend service had an error before it surfaced on the frontend.

Dashboards are helpful for reviewing occasionally to get a higher-level overview of what's happening.

What is most valuable?

The most valuable aspects include:

Logging. Being able to view detailed logs helps debug issues.
Session replays. They are helpful for seeing what a customer was doing before they saw an error or had a slow page load
Alerting. This is an important part of our on-call process to send alerts to slack when an error threshold is crossed. Alerts/monitors are easy to configure to only alert when we want them to alert.
Dashboards. It's helpful to pull up dashboards that show our most common errors or page performance. It's a good way to see how the app is performing from a birds-eye-view.

What needs improvement?

The UI has a lot going on. It should be simpler and have a better way to onboard someone new to using Datadog.

The log querying syntax can be confusing. Usually, I filter by finding a facet in a log and selecting to filter by that facet - but I'm not sure how to write the filter myself

The monitor/alert syntax is also somewhat hard to understand.

Overall, it should be easier to learn how to use the product while you're using the product. Perhaps tooltips or a link to learn more about whatever section you're using.

For how long have I used the solution?

I've used the solution for two years.

Which solution did I use previously and why did I switch?

We did not previously use a different solution.

Which other solutions did I evaluate?

We did not evaluate other options.

Caleb Parks

Lots of features with a rapid log search and an easy setup process

September 19, 2024
Review provided by PeerSpot

What is our primary use case?

We use the solution for logs, infrastructure metrics, and APM. We have many different teams using it across both product and data engineering.

How has it helped my organization?

The solution has improved our observability by giving us rapid log search, a correlation between hosts/logs/APM, and tons of features in one website.

What is most valuable?

I enjoy the rapid log search. It's such a pleasure to quickly find what you're looking for. The ease of graph building is also nice, and MUCH easier than Prometheus.

What needs improvement?

It is far too easy to run up huge unexpected costs. The billing model is not flexible enough to handle cases where you temporarily have thousands of nodes. It is not price effective for monitoring big data jobs. We had to switch to open-source Grafana plus Prometheus for those.

It would be cool to have an open telemetry agent that automatically APM instruments everything in the next release.

For how long have I used the solution?

I've used the solution for three years.

What do I think about the stability of the solution?

I'd rate the stability ten out of ten.

What do I think about the scalability of the solution?

I'd rate the scalability ten out of ten.

Which solution did I use previously and why did I switch?

We did not previously use a different solution.

How was the initial setup?

The setup is very straightforward. Users just install the helm chart, and boom, you're done.

What about the implementation team?

We handled the setup in-house.

What's my experience with pricing, setup cost, and licensing?

Be careful about pricing. Make sure you understand the billing model and that there are multiple billing models available. Set up alarms to alert you of cost overruns before they get too bad.

Which other solutions did I evaluate?

We've never evaluated other solutions.

What other advice do I have?

It's a great product. However, you have to pay for quality.

Datadog Enterprise (Container Agent)

Reviews from AWS customer

External reviews

Alerting and metrics improve monitoring efficiency while pricing presents challenges

What is our primary use case?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What was my experience with deployment of the solution?

What do I think about the stability of the solution?

How was the initial setup?

What other advice do I have?

Great technology with a nice interface

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

How are customer service and support?

Which solution did I use previously and why did I switch?

How was the initial setup?

What was our ROI?

What's my experience with pricing, setup cost, and licensing?

What other advice do I have?

Makes it easy to track down a malfunctioning service, diagnose the problem, and push a fix

What is our primary use case?

How has it helped my organization?

What is most valuable?

What needs improvement?

For how long have I used the solution?

Good RUM and APM with good observability

What is our primary use case?

How has it helped my organization?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

Which solution did I use previously and why did I switch?

How was the initial setup?

What about the implementation team?

What's my experience with pricing, setup cost, and licensing?

Which other solutions did I evaluate?

What other advice do I have?

Great documentation and learning platform with good built-in integrations

What is our primary use case?

How has it helped my organization?

What is most valuable?

What needs improvement?

Single pane of glass, easy to share dashboards, and good for monitoring

What is our primary use case?

How has it helped my organization?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

How would you rate customer service and support?

Which solution did I use previously and why did I switch?

What's my experience with pricing, setup cost, and licensing?

Which deployment model are you using for this solution?

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Intuitive user interface with good log management and a helpful Log Explorer feature

What is our primary use case?

How has it helped my organization?

What is most valuable?

What needs improvement?

For how long have I used the solution?

A go-to tool for analyzing, understanding, and investigating application performance

What is our primary use case?

How has it helped my organization?

What is most valuable?

What needs improvement?

For how long have I used the solution?

How are customer service and support?

What's my experience with pricing, setup cost, and licensing?

Which other solutions did I evaluate?

Great logging, session replays, and alerting

What is our primary use case?