Sign in Agent Mode
Categories
Your Saved List Become a Channel Partner Sell in AWS Marketplace Amazon Web Services Home Help

Reviews from AWS customer

20 AWS reviews

External reviews

58 reviews
from

External reviews are not included in the AWS star rating for the product.


5-star reviews ( Show all reviews )

    Corey Peoples

Has improved our ability to identify cloud application issues quickly using trace data and detailed log filtering

  • October 16, 2025
  • Review from a verified AWS customer

What is our primary use case?

My team and I primarily rely on Datadog for logs to our application to identify issues in our cloud-based solution, so we can take the requests and information that's being presented as errors from our customers and use it to identify what the errors are within our back-end systems, allowing us to submit code fixes or configuration changes.

I had an error when I was trying to submit an API request this morning that just said unspecified error in the web interface. I took the request ID and filtered a facet of our logs to include that request ID, and it gave me the specific examples, allowing me to look at the code stack that we had logged to identify what specifically it was failing to convert in order to upload that data.

My team doesn't utilize Datadog logs very often, but we do have quite a few collections of dashboards and widgets that tell us the health of the various API requests that come through our application to identify any known issues with some of our product integrations. It's useful information, but it's not necessarily stuff that our team monitors directly as we're more of a reactionary team.

What is most valuable?

The best features Datadog offers, in my experience, are the ability to filter down by facets very quickly to identify the problems we're experiencing with our individual customers using our cloud application. I really enjoy the trace option so that I can see all of the various components and how they communicate with each other to see where the failures are occurring.

The trace option helps us spot issues by giving access to see if the problem is occurring within our Java components or if it's a result of the SQL queries, allowing us to look at the SQL queries themselves to identify what information it's trying to pull. We can also look at other integrations, whether that's serverless Lambda functions or different components from our outreach.

Datadog has impacted our organization positively because the general feeling is that it's superior to the ELK stack that we used to use, being significantly faster in searching and filtering the information down, as well as providing links to our search criteria that our development teams and cloud operations teams can use to look at the same problems without having to set up their own search and filter criteria.

What needs improvement?

For the most part, the issues that we come across with Datadog are related to training for our organization. Our development and operations teams have done a really good job of getting our software components into Datadog, allowing us to identify them. However, we do have reduced logging in our Datadog environment due to the amount of information that's going through.

The hardest thing we experience is just training people on what to search for when identifying a problem in Datadog, and having some additional training that might be easily accessible would probably be a benefit.

At this point, I do not know what I don't know, so while there may be options for improvements, Datadog works very well for the things that we currently use it for. Additionally, the extra training that would be more easily accessible would be extremely helpful, perhaps something within the user interface itself that could guide us on useful information or how to tie different components or build a good dashboard.

For how long have I used the solution?

I have worked for Calabrio for 13 years.

What do I think about the stability of the solution?

Datadog is very stable.

What do I think about the scalability of the solution?

Datadog's scalability is strong; we've continued to significantly grow our software, and there are processes in place to ensure that as new servers, realms, and environments are introduced, we're able to include them all in Datadog without noticing any performance issues. The reporting and search functionality remain just as good as when we had a much smaller implementation.

Which solution did I use previously and why did I switch?

Previously, we used the ELK stack—Elasticsearch, Logstash, and Kibana—to capture data. Our cloud operations team set that up because they were familiar with it from previous experiences. We stopped using it because as our environment continued to grow, the response times and the amount of data being kept reached a point where we couldn't effectively utilize it, and it lacked the capability to help us proactively identify issues.

What other advice do I have?

A general impression is that Datadog saves time because the ability to search, even over the vast amount of AWS realms and time spans that we have, is significantly faster compared to other solutions that I've used that have served similar purposes.

I would advise others looking into using Datadog to identify various components within their organization that could benefit from pulling that information in and how to effectively parse and process all of it before getting involved in a task, so they know what to look for. Specifically, when searching for data, if a metric can be pulled out into an individual facet and used, the amount of filtering that can be done is significantly improved compared to a general text search.

I would love to figure out how to use Datadog more effectively in the organization work that I do, but that is a discussion I need to have with our operations and research and development teams to determine if it can benefit the customer or the specific implementation software that I work with.

On a scale of one to ten, I rate Datadog a ten out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?


    reviewer2767254

Has created intuitive dashboards and streamlined monitoring across teams

  • October 16, 2025
  • Review from a verified AWS customer

What is our primary use case?

Our main use case for Datadog is collecting metrics, specifically things such as latency metrics and error metrics for our services at Procore.

To give a specific example of how I use Datadog for those metrics in my daily work, I had to create a new service to solve a particular problem, which was an API. I used Datadog to get metrics around successful requests, failure requests, and 400 requests. I then created dashboards that showed those metrics along with some latency metrics from the API, and I also built a monitor that triggers and sends an alert whenever we're over a certain number of the failure metrics.

How has it helped my organization?

The single biggest improvement has been breaking down the silos between our teams. Before we adopted it, our developers, operations, and SRE teams all lived in separate tools. Ops had their infrastructure graphs, Devs had their log files, and no one had a complete picture.

Here’s where we’ve seen the most significant impact:

  1. We Find and Fix Problems Drastically Faster: The "single pane of glass" is a real thing for us. When an alert fires, our on-call engineer can see the infrastructure metric spike (like CPU), pivot directly to the application traces (APM) running on that host, and see the exact, correlated logs from the services causing the problem—all in one place. We've cut our Mean Time to Resolution (MTTR) significantly because we're no longer "swivel-chairing" between three different tools trying to manually line up timestamps.
  2. We Are More Proactive and Less Reactive: Features like Watchdog (its anomaly detection) have been crucial. We've been alerted to a slow-building memory leak and an abnormal spike in error rates on a specific API endpoint before they breached our static thresholds and caused a user-facing outage. It's helped us move from a "firefighting" culture to one where we can catch problems before they escalate.

What is most valuable?

The best features of Datadog include a great dashboard, a super simple and easy to use Python library, and an easy monitor, which together provide a really great UI experience.

What makes the dashboard and Python library stand out for me is that they save a lot of time, getting right to the point and being super intuitive.

Datadog has positively impacted my organization by allowing us to have a link to a dashboard for most services.

We have dashboards across the company, which can easily be passed around, making it super easy for everyone to understand the metrics they are looking at.

What needs improvement?

Oh, that's a great question. We actually have a running list of things we'd love to see. Even though we get a ton of value from it, no tool is perfect. Our feedback generally falls into two categories: making the current experience less painful and adding new capabilities we think are the logical next step.

Honestly, our biggest frustrations aren't about a lack of features, but about the management of the platform itself.

  1. Cost Predictability and Governance: This is, without a doubt, our number one issue. It's not just that Datadog is expensive—it's that the cost is incredibly complex and hard to predict. Our bill can fluctuate wildly based on custom metrics, log ingestion, and traces from a new service. We've had to dedicate engineering time just to managing our Datadog costs, creating exclusion filters, and sampling aggressively, which feels like we're being punished for using the product more.

    • How to improve it: We need a "cost calculator" inside the platform. Before I enable monitoring on a new cluster or turn on a new integration, I want Datadog to give me a concrete estimate of what it will cost. We also need better built-in tools for attributing costs back to specific teams or services before the bill arrives.
  2. The Steep Learning Curve and UI Density: The UI is incredibly powerful, but it's dense. For a senior SRE who lives in the tool all day, it's fine. For a new engineer or a developer who only jumps in during an incident, it's overwhelming. We've seen people "click in circles" trying to find a simple stack trace that's buried three layers deep. Building a "perfect" dashboard is still too much of an art form.

For how long have I used the solution?

I have been using Datadog for about five years.

What do I think about the stability of the solution?

Datadog is stable.

Which solution did I use previously and why did I switch?

I did not previously use a different solution.

How was the initial setup?

I did not deal with any of the pricing, setup cost, or licensing.

What about the implementation team?

I do not know if we purchased Datadog through the AWS Marketplace.

What other advice do I have?

My advice to others looking into using Datadog is to just try using it and see how easy it is to use. I found this interview great. On a scale of 1-10, I rate Datadog a 10.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?


    reviewer907251

Good logging, easy to find issues, and saves time

  • September 18, 2024
  • Review from a verified AWS customer

What is our primary use case?

We use the solution for APM, AWS, Lambda, logging, and infrastructure. We have many different things all over AWS, and having one place to look is great.

We have all sorts of different AWS things out there that are in C# and Node. Having a single place to log and APM into is very important to us.

Keeping track of the cloud infrastructure is also important. We have Lambda, containers, EC2, etc.

Having a super simple interface to filter the searching for APM and logging is great. It is super easy to show people how to use. This is super important to us.

How has it helped my organization?

Finding issues quickly is super important. Being able to create dashboards and alert on issues.

Having the ability to create dashboards has really taught us how to utilize the searching part of the system. We are able to share them, and build upon them so easily. Many iterations later people are putting some solid information out there.

Alerting is also important to us. We have set up many alerts that help us spot issues in the platform before they become bigger issues. This has enabled my teams to use incidents and address the issues so they are no longer problems.

What is most valuable?

Alerting on running systems is very helpful. Finding issues is quick. We have one place for logging, searching through. Being able to save these and reference them in the future and build upon them.

The logging in general is one of my favorite features. The search is so straight forward and easy to use. Just being able to click on a field and add it to search has taught me so much about the interface, It might not be as useful without a shortcut like that to teach me the system. We have Cloudflare logs in there, and I have no idea sometimes how to filter on such a buried piece of JSON. That is where the interface helps me by clicking on the add to search I get what I need.

What needs improvement?

The "Pager Duty" replacement is something we are very interested in. We only really use pager duty to call the team when things are down.

I love to have some DD guru come in and do a department training directly at our setup. We would love to have someone come in and show us the things we could do better within our current setup.

Also saving a bit of cash would also help if there are things we are doing that are costing us. It's a big enough tool that it is tough to have someone dedicated to manage. 

For how long have I used the solution?

I've used the solution for a bit over a year at this point.

What do I think about the stability of the solution?

The stability seems good here too.

What do I think about the scalability of the solution?

Scalability seems good to me. I have no complaints

How are customer service and support?

I get answers from our contact, and one team member did reach out. It went well.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We used Loggly. 

We switched because we wanted an all-in-one tool

How was the initial setup?

Some parts of our setup were tough. Some Windows container setups cost us a lot of time.

The AWS infrastructure was tough to fully turn on due to the large cost of everything being run.

What about the implementation team?

We handled the setup ourselves in-house.

What was our ROI?

This cost us more overall. ROI is hard to sell. That said, I can find issues way faster and see what is going on in my entire platform. I pay back the cost every month with productivity. 

What's my experience with pricing, setup cost, and licensing?

It is going to cost you more than you think to keep everything running. We saw value in the one-for-all solution, however, it came at a premium to what we were paying. 

Which other solutions did I evaluate?

We did evaluate Dynatrace.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)


showing 1 - 3