AWS Marketplace: Datadog Pro - Pay-As-You-Go (Container Agent) Reviews

Ajay Thomas

Great features and synthetic testing but pricing can get expensive

September 19, 2024
Review provided by PeerSpot

What is our primary use case?

Our primary use case is custom and vendor supplied web application log aggregation, performance tracing and alerting. We run a mix of AWS EC2, Azure serverless, and colocated VMWare servers to support higher education web applications.

Managing a hybrid multi-cloud solution across hundreds of applications is always a challenge. Datadog agents on each web host, and native integrations with GitHub, AWS, and Azure gets all of our instrumentation and error data in one place for easy analysis and monitoring.

How has it helped my organization?

Through use of Datadog across all of our apps we were able to consolidate a number of alerting and error tracking apps and Datadog ties them all together in cohesive dashboards.

Whether the app is vendor-supplied or we built it ourselves, the depth of tracing, profiling, and hooking into logs is all obtainable and tunable. Both legacy .NET Framework and Windows Event Viewer and cutting edge .NET Core with streaming logs all work.

The breadth of coverage for any app type or situation is really incredible. It feels like there's nothing we can't monitor.

What is most valuable?

When it comes to Datadog, several features have proven particularly valuable. The centralized pipeline tracking and error logging provides a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly.

Synthetic testing has been a game-changer, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most. And the ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders.

Together, these features form a powerful toolkit that helps us maintain high performance and reliability across our applications and infrastructure, ultimately leading to better user satisfaction and more efficient operations.

What needs improvement?

I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view. I like the idea of monitoring on the go yet it seems the options are still a bit limited out of the box. While the documentation is very good considering all the frameworks and technology Datadog covers, there are areas - specifically .NET Profiling and Tracing of IIS-hosted apps - that need a lot of focus to pick up on the key details needed. In some cases the screenshots don't match the text as updates are made. I spent longer than I should figuring out how to correlate logs to traces, mostly related to environmental variables.

For how long have I used the solution?

I've used the solution for about three years.

What do I think about the stability of the solution?

We have been impressed with the uptime and clean and light resource usage of the agents.

What do I think about the scalability of the solution?

The solution is very scalable, very customizable.

How are customer service and support?

Service is always helpful in tuning our committed costs and alerting us when we start spending outside the on-demand budget.

Which solution did I use previously and why did I switch?

We used a mix of a custom error email system, SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility regardless of whether it is Linux or Windows or Container, cloud or on-prem hosted.

How was the initial setup?

The setup was generally simple. However, .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.

What about the implementation team?

We implemented the solution in-house.

What was our ROI?

I'd count our ROI as significant time saved by the development team assessing bugs and performance issues.

What's my experience with pricing, setup cost, and licensing?

Set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling.

Which other solutions did I evaluate?

NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.

What other advice do I have?

I'm excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog.

reviewer254673

Good monitoring capabilities, centralizing of logs, and making data easily searchable

September 19, 2024
Review provided by PeerSpot

What is our primary use case?

Our primary use of Datadog involves monitoring over 50 microservices deployed across three distinct environments. These services vary widely in their functions and resource requirements.

We rely on Datadog to track usage metrics, gather logs, and provide insight into service performance and health. Its flexibility allows us to efficiently monitor both production and development environments, ensuring quick detection and response to any anomalies.

We also have better insight into metrics like latency and memory usage.

How has it helped my organization?

Datadog has significantly improved our organization’s monitoring capabilities by centralizing all of our logs and making them easily searchable. This has streamlined our troubleshooting process, allowing for quicker root cause analysis.

Additionally, its ease of implementation meant that we could cover all of our services comprehensively, ensuring that logs and metrics were thoroughly captured across our entire ecosystem. This has enhanced our ability to maintain system reliability and performance.

What is most valuable?

The intuitive user interface has been one of the most valuable features for us. Unlike other platforms like Grafana, as an example, where learning how to query either involves a lot of trial and error or memorization almost like learning a new language, Datadog’s UI makes finding logs, metrics, and performance data straightforward and efficient. This ease of use has saved us time and reduced the learning curve for new team members, allowing us to focus more on analysis and troubleshooting rather than on learning the tool itself.

What needs improvement?

While the UI and search functionality are excellent, further improvement could be made in the querying of logs by offering more advanced templates or suggestions based on common use cases. This would help users discover powerful queries they might not think to create themselves.

Additionally, enhancing alerting capabilities with more customizable thresholds or automated recommendations could provide better insights, especially when dealing with complex environments like ours with numerous microservices.

For how long have I used the solution?

I've used the solution for five years.

What do I think about the stability of the solution?

We have never experienced any downtime.

Which solution did I use previously and why did I switch?

We previously used Sumo Logic.

reviewer1974104

Centralized pipeline with synthetic testing and a customized dashboard

September 19, 2024
Review provided by PeerSpot

What is our primary use case?

Our primary use case is custom and vendor-supplied web application log aggregation, performance tracing and alerting.

We run a mix of AWS EC2, Azure serverless, and colocated VMWare servers to support higher education web applications. Managing a hybrid multi-cloud solution across hundreds of applications is always a challenge.

Datadog agents on each web host, and native integrations with GitHub, AWS, and Azure gets all of our instrumentation and error data in one place for easy analysis and monitoring.

How has it helped my organization?

Through the use of Datadog across all of our apps, we were able to consolidate a number of alerting and error-tracking apps, and Datadog ties them all together in cohesive dashboards.

Whether the app is vendor-supplied or we built it ourselves, the depth of tracing, profiling, and hooking into logs is all obtainable and tunable. Both legacy .NET Framework and Windows Event Viewer and cutting-edge .NET Core with streaming logs all work. The breadth of coverage for any app type or situation is really incredible. It feels like there's nothing we can't monitor.

What is most valuable?

Centralized pipeline tracking and error logging provide a comprehensive view of our development and deployment processes, making it much easier to identify and resolve issues quickly.

Synthetic testing has been a game-changer, allowing us to catch potential problems before they impact real users. Real user monitoring gives us invaluable insights into actual user experiences, helping us prioritize improvements where they matter most.

The ability to create custom dashboards has been incredibly useful, allowing us to visualize key metrics and KPIs in a way that makes sense for different teams and stakeholders.

These features form a powerful toolkit that helps us maintain high performance and reliability across our applications and infrastructure, ultimately leading to better user satisfaction and more efficient operations.

What needs improvement?

I'd like to see an expansion of the Android and IOS apps to have a simplified CI/CD pipeline history view.

I like the idea of monitoring on the go, yet it seems the options are still a bit limited out of the box. While the documentation is very good considering all the frameworks and technology Datadog covers, there are areas - specifically .NET Profiling and Tracing of IIS-hosted apps - that need a lot of focus to pick up on the key details needed.

In some cases the screenshots don't match the text as updates are made. I spent longer than I should have figuring out how to correlate logs to traces, mostly related to environmental variables.

For how long have I used the solution?

I've used the solution for about three years.

What do I think about the stability of the solution?

We have been impressed with the uptime and clean and light resource usage of the agents.

What do I think about the scalability of the solution?

The solution has been very scalable and customizable.

How are customer service and support?

Sales service is always helpful in tuning our committed costs and alerting us when we start spending outside the on-demand budget.

Which solution did I use previously and why did I switch?

We used a mix of a custom error email system, SolarWinds, UptimeRobot, and GitHub actions. We switched to find one platform that could give deep app visibility regardless of whether it is Linux or Windows or Container, cloud or on-prem hosted.

How was the initial setup?

Generally simple, but .NET Profiling of IIS and aligning logs to traces and profiles was a challenge.

What about the implementation team?

We implemented the solution in-house.

What was our ROI?

I'd count our ROI as significant time saved by the development team assessing bugs and performance issues.

What's my experience with pricing, setup cost, and licensing?

Set up live trials to asses cost scaling. Small decisions around how monitors are used can have big impacts on cost scaling.

Which other solutions did I evaluate?

NewRelic was considered. LogicMonitor was chosen over Datadog for our network and campus server management use cases.

What other advice do I have?

Excited to dig further into the new offerings around LLM and continue to grow our footprint in Datadog.

reviewer2561139

Consistent, centralized service for varied cloud-based applications

September 19, 2024
Review provided by PeerSpot

What is our primary use case?

The current use case for Datadog in our environment is observability. We use Datadog as the primary log ingestion and analysis point, along with consolidation of application/infrastructure metrics across cloud environments and realtime alerting to issues that arise in production.

Datadog integrates within all aspects of our infrastructure and applications to provide valuable insights into Containers, Serverless functions, Deep Logging Analysis, Virtualized Hardware and Cost Optimizations.

How has it helped my organization?

Datadog improved our observability layer by creating a consistent, centralized service for all of our varied cloud-based applications. All of our production and non-production environment applications and infrastructure send metrics directly to Datadog for analysis and determination of any issues that would need to be looked at by the Infrastructure, Platform and Development teams for quick remediation. Using Datadog as this centralized Observability platform has enabled us to become leaner without sacrificing project timelines when issues arise and require triage for efficient resolution.

What is most valuable?

All of Datadog's features have become valuable tools in our cloud environments.

Our primary alerts, based on metrics and synthetic transactions, are the most used and relied upon for decreased MTTA/MTTR across all of our platforms. This is followed by deep log analysis that enables us to quickly and easily get to a preliminary root cause that someone on the infrastructure, platform or development teams can take and focus their attention on the precise target that Datadog revealed as the issue to be remediated.

What needs improvement?

The two areas I could see needing improvement or a feature to add value are building a more robust SIM that would include container scanning to rival other such products on the market so we do not need to extend functionality to another third-party provider. The other expands the alerting functions by creating a new feature to add direct SMS notifications, on-call rotation scheduling, etc., that could replace the need to have this as an external third party solution integration.

For how long have I used the solution?

I've been a Datadog user for almost ten years.

What do I think about the stability of the solution?

Datadog is very stable, and we've only come across a few items that needed to be addressed quickly when there were issues.

What do I think about the scalability of the solution?

Scalability is very favorable, aside from cost/budget, which limits the scalability of this platform.

How are customer service and support?

Both customer service and support need a little work, as we have had a number of requests/issues that were not addressed as we needed them to be.

Which solution did I use previously and why did I switch?

Being an Observability SME, I have used many native and third party solutions, including Dynatrace, New Relic, CloudWatch and Zabbix. As previously mentioned, Datadog provides a superior platform for centralizing and consolidating our Observability layer. Switching to Datadog was a no-brainer when most other solutions either didn't provide the maturity of functions, or have them available, at all.

How was the initial setup?

The initial setup was very straightforward, and the integrations were easily configured.

What about the implementation team?

We implemented Datadog in-house.

What was our ROI?

For the most part, Datadog's ROI is quite impressive when you consider all of the features and functions that are centralized on the platform. It doesn't require us to purchase additional third-party solutions to fill in the gaps.

What's my experience with pricing, setup cost, and licensing?

The setup was dead simple once the cloud integrations and agent components were identified and executed. Licensing falls into our normal third-party processes, so it was a familiar feeling when we started with Datadog. Cost is the only outlier when it comes to a perfect solution. Datadog is expensive, and each add-on drives that cost further into the realm of requiring justifications to finance expanding the core suite of features we would like to enable.

Which other solutions did I evaluate?

Yes, we evaluated several competing platforms that included Dynatrace, New Relic and Zabbix.

What other advice do I have?

They should provide more inclusive pricing, or an "all you can eat" tier that would include all relevant features, as opposed to individual cost increases to let Datadog to become more valuable and replace even more third-party solutions that have a lower cost of entry.

Reviewer 76

Enhances efficiency with robust alerting and visualization tools

September 19, 2024
Review provided by PeerSpot

What is our primary use case?

Our primary use case for Datadog is to monitor and manage our fully cloud-native infrastructure. We utilize DataDog to gain real-time visibility into our cloud environments, ensuring that all our services are running smoothly and efficiently.

The platform’s extensive integration capabilities allow us to seamlessly track performance metrics across various cloud services, containers, and microservices.

With Datadog’s robust alerting and visualization tools, we can proactively identify and resolve issues, minimizing downtime and optimizing our system’s performance. This has been crucial in maintaining the reliability and scalability of our cloud-native applications.

How has it helped my organization?

Datadog has significantly enhanced our organization’s operational efficiency and reliability. By providing real-time visibility into our cloud-native infrastructure, Datadog enables us to monitor performance metrics, detect anomalies, and resolve issues swiftly.

The platform’s robust alerting system ensures that potential problems are addressed before they impact our services, reducing downtime and improving overall system stability. Additionally, Datadog’s comprehensive dashboards and reporting tools have streamlined our troubleshooting processes and facilitated better decision-making.

What is most valuable?

The most valuable feature of Datadog for our organization has been its real-time monitoring capabilities. This feature provides us with instant visibility into our cloud-native infrastructure, allowing us to track performance metrics and detect anomalies as they occur. The ability to monitor our systems in real-time means we can quickly identify and address issues before they escalate, minimizing downtime and ensuring the reliability of our services.

Additionally, the real-time data helps us make informed decisions and optimize our operations, ultimately enhancing our overall efficiency and performance.

What needs improvement?

While Datadog has been instrumental in enhancing our operational efficiency, there are areas where it could be improved.

One area is the user interface, which could be more intuitive and user-friendly, especially for new users.

Additionally, the pricing model can be quite complex and might benefit from more flexible options tailored to different organizational needs.

For future releases, it would be beneficial to include more advanced machine learning capabilities for predictive analytics, helping us anticipate issues before they occur.

More third-party tools would also be valuable additions.

For how long have I used the solution?

I've used the solution for six years.

What do I think about the stability of the solution?

DataDog has proven to be a highly stable solution for our monitoring needs. Throughout our usage, we have experienced minimal downtime and consistent performance, even during peak traffic periods. The platform’s reliability ensures that we can continuously monitor our cloud-native infrastructure without interruptions, which is crucial for maintaining the health and performance of our services.

What do I think about the scalability of the solution?

DataDog’s scalability has been impressive and instrumental in supporting our growing cloud-native infrastructure. The platform effortlessly handles increased workloads and scales alongside our expanding services without compromising performance. Its ability to integrate with a wide range of cloud services and technologies ensures that as we grow, DataDog continues to provide comprehensive monitoring and insights.

How are customer service and support?

Our experience with Datadog’s customer service and support has been exceptional. The support team is highly responsive and knowledgeable, providing timely assistance whenever we’ve encountered issues or had questions.

Their proactive approach to offering solutions and guidance has been invaluable in helping us maximize the platform’s capabilities.

How was the initial setup?

The setup is straightforward.

What about the implementation team?

We handled the setup in-house.

What's my experience with pricing, setup cost, and licensing?

The pricing model can be quite complex and might benefit from more flexible options tailored to different organizational needs.

What other advice do I have?

One area is the user interface, which could be more intuitive and user-friendly, especially for new users.

Kenneth Dozier

Easy to use with good speed and helpful dashboards

September 19, 2024
Review provided by PeerSpot

What is our primary use case?

We are using Datadog to improve our cloud monitoring and observability across our enterprise apps. We have integrated a lot of different resources into Datadog, like Kubernetes, App Gateways, App Service Environments, App Service Plans, and other Web App resources.

I will be using the monitoring and observability features of Datadog. Dashboards are used very heavily by teams and SREs. We really have seen that Datadog has already improved both our monitoring and our observability.

How has it helped my organization?

The ease and speed of which you can create a dashboard has been a huge improvement.

The different types of monitors we can create have been huge, too. We can do so many different things with monitors that we couldn't do before with our alerts.

Being able to click on a trace or log and drill down on it to see what happened has been great.

Some have found the learning curve a bit steep. That said,they are coming around slowly. There is just a lot of information to learn how to navigate.

What is most valuable?

The different types of monitors have been very valuable. We have been able to make our alerts (monitors) more actionable than we were able to previously.

Watchdog is a favorite feature among a lot of the devs. It catches things they didn't even know were an issue.

RUM is another feature a lot of us are looking forward to seeing how it can help us improve our customer experience during tax season.

We hope to enable the code review feature at some point to so we can see what code caused the issue.

What needs improvement?

I would like to see the integration between PagerDuty and Datadog improved. The tags in Datadog don't match those in PagerDuty, and we have to make it work. Also, I would like to see if the ability to replicate a KQL query in Datadog is made easier or better.

I would like to see the alert communications to email or phones made better so we could hopefully move off PagerDuty and just use Datadog for that.

There are also a lot of features that we haven't budgeted for yet and I would like for us to be able to use them in the future.

For how long have I used the solution?

I've used the solution for about two years.

Lin Qui

Excellent APM, RUM and dashboards

September 19, 2024
Review provided by PeerSpot

What is our primary use case?

We use the solution for APM, anomaly detection, resource metrics, RUM, and synthetics.

We use it to build baseline metrics for our apps before we start focusing in on performance improvements. A lot of times that’s looking at methods that take too long to run and diving into db queries and parsing.

I’ve used it in multiple configurations in aws and azure. I’ve built it using terraform and hand rolled.

I’ve used it predominantly with Ruby and Node and a little bit of Python.

How has it helped my organization?

The solution provides deep insights into our stack. It gives us the ability to measure and monitor before making decisions.

We're using it to make informed decisions about performance. Being able to show how across a timeline we increased performance from a release via a visual indication of p50+ metrics is almost magical.

Another way we use it is for leading indicators of issues that might be happening. So for example, anomaly detection on gauge metrics across the app and having synthetics build in with alerting configurations are both ways we can get alerted sometimes even before a big issue is about to happen.

What is most valuable?

The most valuable aspects include APM, RUM and dashboards.

I think of Datadog as an analytics company first. And that the integrations around notifications and alerts as a part of insight discoverability.

Everything Datadog offers for me is around knowledge building and how much do I know about the deep details of my stack.

The pricing model makes more sense than what we paid for against other competitors. I was at one job where we used two competing services because DD didn’t have BAA for APM. And then when it offered it, we immediately dumped the other solution for Datadog.

What needs improvement?

Logging is not a great experience. Searching for specific logs and then navigating around the context of the results is slow and cumbersome. Honestly that is my only gripe for Datadog. It’s a wonderful product outside of log searching. I have had better experience using other services that aggregate logs for search.

My use case for it is around discoverability. Log search is fine if I’m just looking for something specific. That said, if it’s something else targeted and I am wandering around looking for possible issues, it’s really unintuitive.

For how long have I used the solution?

I've used the solution for more than eight years.

What do I think about the stability of the solution?

Very stable.

What about the implementation team?

We always implement the solution in-house.

Gediminas Anza

Increases efficiency, helps with customer satisfaction, and enhances collaboration

September 19, 2024
Review provided by PeerSpot

What is our primary use case?

The primary use case of Datadog within our organization encompasses providing a comprehensive and sophisticated solution that caters to the diverse needs of our internal customers. We have strategically implemented Datadog to serve as a centralized platform for monitoring, analyzing, and optimizing various aspects of our operations. With a robust suite of functionalities, Datadog empowers us to meet the dynamic requirements of over 40 internal customers efficiently.

Through Datadog, we offer a wide array of services to our internal stakeholders, allowing them to access and leverage its capabilities to enhance performance, troubleshoot issues, and make data-driven decisions. The tool's versatility enables different teams within our organization to monitor and track distinct metrics, such as application performance, infrastructure health, and logs, tailored to their specific requirements.

Moreover, Datadog serves as a pivotal component in our organizational ecosystem by streamlining processes, enhancing collaboration, and fostering a culture of data-driven decision-making. By harnessing the power of Datadog, our internal customers can proactively address issues, optimize resources, and ultimately improve operational efficiency across the board.

In essence, the primary use case of Datadog in our organization revolves around empowering our internal customers with a comprehensive and feature-rich solution that enables them to monitor, analyze, and optimize various aspects of our operations seamlessly and effectively. This strategic implementation of Datadog plays a vital role in enhancing our overall performance, fostering transparency, and driving continuous improvement within our organization.

How has it helped my organization?

Datadog has significantly contributed to enhancing the overall effectiveness and efficiency of our organization through various key improvements. One of the standout benefits has been the accelerated resolution of issues. By leveraging Datadog's monitoring and alerting capabilities, we have been able to swiftly detect, diagnose, and address issues before they escalate, resulting in minimized downtime and enhanced operational continuity.

Moreover, the implementation of Datadog has had a tangible positive impact on customer satisfaction. With improved visibility into our systems and applications, coupled with proactive monitoring and performance optimization, we have been able to deliver a more reliable and seamless experience to our customers. This has translated into higher customer satisfaction scores and strengthened relationships with our stakeholders.

Another notable improvement brought about by Datadog is the streamlining of our toolset. By identifying and removing multiple unused or redundant features and tools, Datadog has helped optimize our workflows and resources. This decluttering of unnecessary functionalities has not only increased operational efficiency yet also streamlined our processes, allowing us to focus on the tools and features that truly add value to our operations.

In summary, Datadog's impact on our organization has been profound, enhancing our ability to resolve issues rapidly, improving customer satisfaction levels, and streamlining our toolset for increased efficiency and focus. These improvements have led to a more robust and resilient operational environment, enabling us to better meet the needs of our internal and external stakeholders.

What is most valuable?

Within our organization, we have found the Agents feature in Datadog to be exceptionally valuable due to its rich set of functionalities and capabilities. The Agents play a crucial role in our monitoring and data collection processes, providing a comprehensive and reliable means to gather crucial performance metrics and insights across our systems and applications.

One of the key reasons why the agents feature stands out as particularly valuable is its versatility. The Agents offer a wide range of monitoring and data collection options, allowing us to capture diverse metrics and performance data with precision. This flexibility enables us to tailor our monitoring strategy to meet the specific needs of different teams and use cases within our organization.

Moreover, the agents feature in Datadog enhances the overall observability of our infrastructure and applications. By deploying Agents strategically across our environment, we can gather real-time metrics, logs, and traces, enabling us to monitor the health, performance, and behavior of our systems comprehensively. This deep level of observability empowers us to proactively identify issues, optimize performance, and make informed decisions based on accurate and timely data.

Furthermore, the agents feature in Datadog plays a pivotal role in driving actionable insights and facilitating efficient troubleshooting. With the detailed data collected by the Agents, we can perform in-depth analysis, detect anomalies, and troubleshoot issues quickly and effectively. This proactive approach to monitoring and analysis ultimately enhances our operational efficiency and resilience.

In essence, the agents feature in Datadog stands out as a valuable asset within our organization due to its robust functionality, versatility, and role in providing comprehensive monitoring and observability capabilities. By leveraging the power of the Agents feature, we can effectively monitor, analyze, and optimize our systems and applications to ensure seamless operations and performance excellence.

What needs improvement?

In assessing areas for potential improvement, one key aspect where Datadog could enhance its service is in the realm of billing CSV reports. Presently, the billing CSV reports provide insights into billing-related information yet are somewhat limited in functionality, typically offering reports with only three columns. Expanding the capabilities of the billing CSV reports to include more detailed and customizable information would greatly benefit users by allowing them to gain a deeper understanding of their usage, costs, and billing trends within Datadog.

Additionally, in considering features for inclusion in the next release of Datadog, the development of more robust and customizable billing CSV reports could be a significant enhancement. By allowing users to tailor their billing reports to specific metrics, timeframes, and parameters of interest, Datadog could provide greater transparency and control over billing data, enabling users to make informed decisions regarding resource allocation, cost optimization, and budget planning.

Moreover, the inclusion of features such as cost forecasting, budget tracking, and customizable alerts related to billing thresholds could further empower users to manage their expenses effectively and proactively monitor and control costs within Datadog. These additions would not only enhance user experience and satisfaction, however, also contribute to a more holistic and actionable approach to financial management within the Datadog platform.

By refining the functionality of billing CSV reports and incorporating advanced features for cost analysis, forecasting, and monitoring, Datadog can elevate its service offering and provide users with enhanced tools for optimizing their usage, expenses, and financial oversight within the platform.

For how long have I used the solution?

I've used the solution for over three years.

What do I think about the scalability of the solution?

Datadog is easy to scale. However, it's scaled for price, so be sure to measure what you need and not push all logs to the solution, or your price will skyrocket quickly.

Which solution did I use previously and why did I switch?

We use multiple APM tools to have both price and value correlations relevant to the teams using them.

What's my experience with pricing, setup cost, and licensing?

Request a test account during the POC phase to determine if the tool is the right fit; all providers do that for free.

Which other solutions did I evaluate?

We did POC with over five products. I can't name them due to the related NDA.

reviewer9816413

Easy, more reliable, and transparent monitoring

September 19, 2024
Review provided by PeerSpot

What is our primary use case?

We use the solution to monitor and investigate issues with production services at work. We're periodically reviewing the service catalog view for the various applications and I use it to identify any anomalies with service metrics, any changes in user behavior evident via API calls, and/or spikes in errors.

We use monitors to trigger alerts for on-call engineers to act upon. The monitors have set thresholds for request latency, error rates, and throughput.

We also use automated rules to block bad actors based on request volume or patterns.

How has it helped my organization?

Datadog has made setting up monitors easier, more reliable, and more transparent. This has helped standardize our on-call process and set all of our on-call engineers up for success.

It has also standardized the way we evaluate issues with our applications by encouraging all teams to use the service catalog.

It makes it easier for our platforms and QA teams to get other engineering teams up to speed with managing their own applications' performance.

Overall, Datadog has been very helpful for us.

What is most valuable?

The service catalog view is very helpful for periodic reviews of our application. It has also standardized the way we evaluate issues with our applications. Having one page with an easy-to-scan view of app metrics, error patterns, package vulnerabilities, etc., is very helpful and reduces friction for our full-stack engineers.

Monitors have also been very valuable when setting up our on-call processes. It makes it easy to set up and adjust alerting to keep our teams aware of anything going wrong.

What needs improvement?

Datadog is great overall. One thing to improve would be making it easier to see common patterns across traces. I sometimes end up in a trace but have a hard time finding other common features about the error/requests that are similar to that trace. This could be easier to get to; however, in that case, it's actually an education issue.

Another thing that could be improved is the service list page sometimes refreshes slowly, and I accidentally click the wrong environment since the sort changes late.

For how long have I used the solution?

I've used the solution for about a year.

What do I think about the stability of the solution?

It is very stable. I have not seen any issues with Datadog.

What do I think about the scalability of the solution?

It seems very scalable.

How are customer service and support?

I've had no specific experience with technical support.

Which solution did I use previously and why did I switch?

We used Honeycomb before. We switched since Datadog offered more tooling.

How was the initial setup?

Each application has been easy to instrument.

What about the implementation team?

We implemented the solution in-house.

What was our ROI?

Engineers save an unquantifiable amount of time by having one standard view for all applications and monitors.

What's my experience with pricing, setup cost, and licensing?

I am not exposed to this aspect of Datadog.

Which other solutions did I evaluate?

We did not evaluate other options.

reviewer902462

Capable of pinpointing warnings and errors in logs and provide detailed context

September 18, 2024
Review provided by PeerSpot

What is our primary use case?

Our primary use case for Datadog is to monitor, analyze, and optimize the performance and health of our applications and infrastructure.

We leverage its logging, metrics, and tracing capabilities to pinpoint issues, track system performance, and improve overall reliability.

Datadog’s ability to provide real-time insights and alerting on key metrics helps us quickly address issues, ensuring smooth operations. It’s integral for visibility across our microservices architecture and cloud environments.

How has it helped my organization?

Datadog has been incredibly valuable to our organization. Its ability to pinpoint warnings and errors in logs and provide detailed context is essential for troubleshooting.

The platform's request tracing feature offers comprehensive insights into user flows, allowing us to quickly identify issues and optimize performance.

Additionally, Datadog's real-time monitoring and alerting capabilities help us proactively manage system health, ensuring operational efficiency across our applications and infrastructure.

What is most valuable?

Being able to filter requests by latency is invaluable, as it provides immediate insight into which endpoints require further analysis and optimization. This feature helps us quickly identify performance bottlenecks and prioritize improvements.

Additionally, the ability to filter requests by user email is extremely useful for tracking down user-specific issues faster. It streamlines the troubleshooting process and enables us to provide more targeted support to individual users, improving overall customer satisfaction.

What needs improvement?

The query performance could be improved, particularly when handling large datasets, as slower response times can hinder efficiency.

Additionally, the interface can sometimes feel overwhelming, with so much happening at once, which may discourage users from exploring new features.

Simplifying the layout or providing clearer guidance could enhance user experience. Any improvements related to query optimization would be highly beneficial, as it would further streamline workflows and boost productivity.

For how long have I used the solution?

I've used the solution for five years.

Datadog Pro - Pay-As-You-Go (Container Agent)

Reviews from AWS customer

External reviews

Great features and synthetic testing but pricing can get expensive

What is our primary use case?

How has it helped my organization?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

Which solution did I use previously and why did I switch?

How was the initial setup?

What about the implementation team?

What was our ROI?

What's my experience with pricing, setup cost, and licensing?

Which other solutions did I evaluate?

What other advice do I have?

Good monitoring capabilities, centralizing of logs, and making data easily searchable

What is our primary use case?

How has it helped my organization?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

Which solution did I use previously and why did I switch?

Centralized pipeline with synthetic testing and a customized dashboard

What is our primary use case?

How has it helped my organization?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

Which solution did I use previously and why did I switch?

How was the initial setup?

What about the implementation team?

What was our ROI?

What's my experience with pricing, setup cost, and licensing?

Which other solutions did I evaluate?

What other advice do I have?

Consistent, centralized service for varied cloud-based applications

What is our primary use case?

How has it helped my organization?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

Which solution did I use previously and why did I switch?

How was the initial setup?

What about the implementation team?

What was our ROI?

What's my experience with pricing, setup cost, and licensing?

Which other solutions did I evaluate?

What other advice do I have?

Enhances efficiency with robust alerting and visualization tools

What is our primary use case?

How has it helped my organization?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

How was the initial setup?

What about the implementation team?

What's my experience with pricing, setup cost, and licensing?

What other advice do I have?

Easy to use with good speed and helpful dashboards

What is our primary use case?

How has it helped my organization?

What is most valuable?

What needs improvement?

For how long have I used the solution?

Excellent APM, RUM and dashboards

What is our primary use case?