
Overview
Datadog is a SaaS-based unified observability and security platform providing full visibility into the health and performance of each layer of your environment at a glance. Datadog allows you to customize this insight to your stack by collecting and correlating data from more than 600 vendor-backed technologies and APM libraries, all in a single pane of glass. Monitor your underlying infrastructure, supporting services, applications alongside security data in a single observability platform.
Prices are based on committed use per month over total term of the agreement (the Total Expected Use).
Highlights
- Get started in minutes from AWS Marketplace with our enhanced integration for account creation and setup. Turn-key integrations and easy-to-install agent to start monitoring all of your servers and resources in minutes.
- Quickly deploy modern monitoring and security in one powerful observability platform.
- Create actionable context to speed up, reduce costs, mitigate security threats and avoid downtime at any scale.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Trust Center
Buyer guide

Financing for AWS Marketplace purchases
AWS PrivateLink
Quick Launch
Pricing
Dimension | Description | Cost/month | Overage cost |
|---|---|---|---|
Infra Enterprise Hosts | Centralize your monitoring of systems and services (Per Host) | $27.00 | |
APM Hosts | Optimize end-to-end application performance (Per APM Host) | $36.00 | |
App Analytics | Analyze performance metrics (Per 1M Analyzed Spans / 15-day retention) | $2.04 | |
Custom Metrics | Monitor your own custom business metrics (Per 100 Custom Metrics) | $5.00 | |
Indexed Logs | Analyze and explore log data (Per 1M Log Events / 15-day retention) | $2.04 | |
Ingested Logs | Ingest all your logs (Per 1GB Ingested Logs) | $0.10 | |
Synthetics API Tests | Proactively monitor site availability (Per 10K test runs) | $6.00 | |
Synthetics Browser Tests | Easily monitor critical user journeys (Per 1K test runs) | $15.00 | |
Serverless Functions | Deprecated. Not available for new customers | $6.00 | |
Fargate Tasks | Monitor your Fargate Environment (Per Fargate Task) | $1.20 |
The following dimensions are not included in the contract terms, which will be charged based on your usage.
Dimension | Description | Cost/unit |
|---|---|---|
Custom dimension used for select private offers | Custom dimension used for select private offers | $1.00 |
consumption_unit | Additional Datadog Consumption Units | $0.01 |
Vendor refund policy
Custom pricing options
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Software as a Service (SaaS)
SaaS delivers cloud-based software applications directly to customers over the internet. You can access these applications through a subscription model. You will pay recurring monthly usage fees through your AWS bill, while AWS handles deployment and infrastructure management, ensuring scalability, reliability, and seamless integration with other AWS services.
Resources
Support
Vendor support
Contact our knowledgable Support Engineers via email, live chat, or in-app messages
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.


Standard contract
Customer reviews
Unified monitoring has improved incident response and now reduces root cause analysis time
What is our primary use case?
Datadog serves as my primary tool for infrastructure monitoring and log analysis in a cloud environment. From a network and security perspective, I use it to monitor server health, track network metrics like latencies and traffic patterns, and analyze logs for troubleshooting issues such as VPN instability and unexpected spikes. The ability to correlate metrics and logs in one place makes it much faster to identify the root cause instead of checking multiple tools.
One example where Datadog proved invaluable was during a sudden spike in application response time. We received alerts on increased latencies, and instead of checking multiple tools, I used Datadog's dashboard to quickly correlate metrics. I noticed that while the application CPU was normal, there was a spike in database response times. Using the logs and metrics together, I was able to confirm that the issue was coming from the database, not the application. This helped us quickly involve the right team and resolve the issue faster.
What is most valuable?
The best features of Datadog are the correlation capabilities and unified visibility. The most useful aspect is that I can see metrics, logs, and service-level data in one place. During troubleshooting, I do not have to switch tools; I can directly correlate spikes in latencies with log error patterns, which saves considerable time. Another feature I find very useful is the dashboards, which are flexible, and I can create views based on what I actually need to monitor daily instead of relying on default setups. The integration with cloud services makes onboarding very easy, and once integrated, most of the data starts flowing automatically without much manual effort.
Datadog has had a positive impact, mainly by improving how quickly we detect and understand issues. Earlier, when something went wrong, considerable time went into figuring out where the problem actually was. Now, with better visibility across services and logs, we can quickly narrow down the source, whether it is application, infrastructure, or dependency-related. It has also helped in reducing the back and forth between teams because we can validate issues with the data before escalating, which has made incident handling smoother and more efficient overall.
What needs improvement?
One area where Datadog can be improved is around alert quality. In the beginning, it tends to generate many alerts, and without proper tuning, many of them are not actionable. It would help if there were more built-in guidance or smarter defaults to reduce noise. Another improvement area is cost visibility and control. As log and metric ingestion increases, it has not always been straightforward to track which data is driving the cost. More granular and real-time cost insights would make it easier to manage. Additionally, while the dashboards are flexible, navigating and organizing them at scale can become slightly difficult. Better structuring or management options would help in larger environments.
For how long have I used the solution?
I have been using Datadog for nearly two years.
What do I think about the stability of the solution?
Datadog has been stable overall in my experience. We have not seen any major platform outages. Metrics collection and alerting have been consistent in day-to-day use. Most issues we have faced were related to configurations or alert tuning rather than the platform itself. The platform is stable with no major platform issues, only configuration-related challenges.
What do I think about the scalability of the solution?
Datadog scales well as environments grow in my experience. As we add more servers and services, onboarding is straightforward with agents and integrations. We have not faced any major performance issues from the platform side; it handles increased metrics and monitoring loads smoothly. The primary consideration is managing log volume carefully because as the scale increases, data ingestion and costs also go up. Datadog is scalable technically, but the ingestion costs need to be managed as the environment grows.
How are customer service and support?
We do not rely on Datadog support for day-to-day issues. Most of the time, we are able to resolve things using the dashboards, logs, and their documentation. We have only reached out in a few cases, mainly for configuration-related queries, and in those situations, support was helpful, though sometimes it required a few back and forth interactions to get to the exact solution. Overall, support is decent, but we mostly depend on self-troubleshooting.
Which solution did I use previously and why did I switch?
Before Datadog, we were mainly using native cloud monitoring like Azure Monitor , along with a few basic tools. The main issue was that monitoring was fragmented. Metrics, logs, and alerts were spread across different places, and so during an incident, we had to switch between multiple tools to understand what was happening. We moved to Datadog to have everything in one place. The ability to correlate metrics and logs in a single platform made troubleshooting much faster and more efficient.
How was the initial setup?
Setting up dashboards and integrations in Datadog is relatively straightforward in my experience, especially for standard cloud services. For integrations, once we connect our cloud account, most of the metrics start coming in automatically, so the initial setup is not very complex. The documentation also helps considerably during this phase. For dashboards, basic ones are easy to create using existing templates, but to make them truly useful, we have to spend time customizing them based on our actual use cases, like adding specific metrics and refining the layout. Overall, the initial setup is easy, but making it truly effective takes practical tuning.
What was our ROI?
We have seen a clear return on investment with Datadog, mainly in terms of time saved and faster incident handling. For example, earlier when an issue occurred, it would take around thirty-five to forty-five minutes just to identify the root cause because we had to check multiple tools. With Datadog, we are usually able to narrow it down within ten to fifteen minutes using the centralized dashboard and logs. We have also reduced repeated troubleshooting efforts because we can identify patterns and fix the root cause instead of dealing with the same issues repeatedly. It has not reduced headcount, but it has definitely improved team efficiency and allowed us to handle more incidents with the same team.
What's my experience with pricing, setup cost, and licensing?
My experience with pricing for Datadog has been mixed. The initial setup cost is relatively low since it is a SaaS model and does not require a heavy upfront investment. Getting started is quite quick with agent-based deployments. However, the ongoing cost is something that needs to be managed. Pricing is mainly based on data ingestion, such as logs, metrics, and traces, and it can increase quickly if everything is enabled by default. Licensing is flexible, but it requires continuous monitoring and optimization to keep costs under control.
What other advice do I have?
One additional point I can add is that with Datadog, I focused considerably on making alerts actionable and reducing noise. In the initial phases, we had too many alerts that were not very useful, so we spent time tuning thresholds, adding conditions, and correlating alerts with real impact. After that, alerts became much more meaningful and helpful in faster response. I also use it regularly for trend analysis, checking for recurring spikes or patterns over time, which helps in identifying potential issues before they become incidents.
The features of Datadog become truly useful when you start combining them, not just using them separately. For example, just looking at the metrics alone does not always give the full picture, but when you combine metrics with logs and service-level data, it becomes much easier to understand what is actually happening during an incident. Features like tagging help considerably in filtering data across environments and services, especially when the setup grows. Without proper tagging, it can get difficult to navigate. Overall, the strength of Datadog is not just the individual features, but how well they work together in real scenarios.
We have seen noticeable improvements after using Datadog, mainly in terms of time saved and faster incident handling. Earlier when an issue occurred, it could take around twenty to forty minutes just to understand where the problem was. Now, with the centralized visibility and correlation of metrics and logs, we are often able to narrow it down within fifteen to twenty-five minutes. We have also seen fewer repeated incidents because we can identify patterns and fix the root cause instead of just resolving symptoms. Incidents are getting resolved faster, and the time spent on troubleshooting has reduced significantly.
My advice for anyone considering Datadog is to be selective about what you monitor from day one. It is tempting to enable everything, but that usually leads to too much data and noisy alerts. Instead, start with critical services and key metrics, and then expand gradually. Invest time in tagging and structuring your data properly because it makes a considerable difference later when you need to filter, troubleshoot, or build dashboards. Finally, review your setup regularly because what works in the beginning may not stay relevant as the environment grows. Start small, avoid collecting all data, use proper tagging, and keep refining your setup over time. This review reflects an overall rating of eight.
Centralized monitoring has reduced troubleshooting time and improves proactive incident response
What is our primary use case?
My main use case for Datadog is infrastructure and log monitoring in a cloud-based environment. From a network and security perspective, I mainly use it to monitor server health, track network-level metrics, and analyze logs for troubleshooting issues such as VPN instabilities, traffic spiking, or unexpected behavior.
One recent example where I used Datadog was during a VPN-related issue where users were reporting intermittent disconnections. I checked our Datadog dashboard and noticed spiking in network latencies and a sudden increase in connections dropped around the same time users reported the issues. I then correlated this with the logs and found that one of the back-end servers handling the connection was hitting high CPU utilization. Because everything was centralized, I did not have to jump between multiple tools. I was able to quickly identify the impacted servers and escalate it to the infrastructure team. Once the load was balanced, the issue got resolved.
With Datadog, I mainly focus on creating meaningful dashboards and tuning alerts properly. In the beginning, we saw a lot of alert noise, so we had to refine thresholds and conditions to make sure alerts are actually actionable. Once that was done, it became much more effective for proactive monitoring instead of just reactive troubleshooting.
What is most valuable?
One of the best features of Datadog, in my opinion, is its unified visibility across the metrics, logs, and traces in a single platform. The dashboards are very flexible and customizable, which helps a lot in creating meaningful monitoring views based on different use cases. I also find the log management quite useful because it allows quick correlation with metrics during troubleshooting. Another strong feature is its integration, especially with cloud platforms such as AWS or Azure , which makes onboarding and monitoring much easier without heavy manual work.
Integration with cloud platforms such as Amazon Web Services or Microsoft Azure has really made daily monitoring much easier. Once the integration is set up, Datadog automatically pulls metrics from services such as virtual machines, load balancers, and databases without needing manual configuration on each resource. In one case, I was monitoring a cloud-based application where we started seeing performance issues through Datadog's Azure integrations. I could quickly view metrics from the application server and the back-end database in the same dashboard. It helped me identify that the issue was not network-related but due to the increased load on the backend services. Instead of checking multiple portals, everything was available in one place, which saved time and made troubleshooting faster.
Datadog has had a positive impact mainly by improving visibility and reducing troubleshooting times. Earlier, we had to rely on multiple tools to check metrics and logs, which delayed root cause analysis. With Datadog, everything is centralized, so it is much faster to identify issues and take actions. It has also helped in proactive monitoring with properly tuned alerts. We are able to detect unusual behaviors such as spiking in traffic or resource usage before it turns into a major incident. Overall, it has improved operational efficiency and reduced downtime by enabling quicker responses during incidents.
What needs improvement?
If you are asking for improvements, I feel some small areas where Datadog can improve. One area is alert management. In a dynamic environment, it can generate a lot of alert noise if not tuned properly. More intelligent alerting or built-in recommendations would help. Another aspect is cost visibility. As log ingestion increases, pricing can scale quickly. Having more transparent and granular cost control features would make it easier to manage usage. Also, the initial setup and configuration can feel a bit complex for new users.
For how long have I used the solution?
I have been using Datadog for ten months.
What do I think about the stability of the solution?
In my experience, it has been quite stable; we have not faced any major outages or reliability issues from the platform side. Data collection and dashboards have been consistent, and alerts are delivered on time as long as they are properly configured. Most of the issues we have seen were related to configuration or alert tuning rather than the platform itself.
What do I think about the scalability of the solution?
It has scaled well for our needs. As we added more servers and services, Datadog was able to handle the increased load without any major issues. Since it is a SaaS platform, we did not have to worry about backend scaling. New hosts and services get onboarded easily with the agents, and metric collection continues smoothly even as the environment grows. The only thing we monitor closely is log volume because as scale increases, ingestion and costs also go up, but from a performance and handling perspective, it has been quite good.
How are customer service and support?
In my experience, the customer support from Datadog has been quite reliable. For standard issues and queries, the response time is generally good, and the documentation is also very helpful for resolving common problems. For more complex cases, support may take some time for investigations, but they usually provide proper guidance and follow-up. Overall, I would say support is responsive and helpful, especially when combined with their strong documentation.
Which solution did I use previously and why did I switch?
This is the first time I am using Datadog. Before that, there was not any solution in place.
How was the initial setup?
The initial setup cost is relatively low since it is a SaaS model and getting started is straightforward with agent-based deployments. However, the main challenge is the ongoing cost, which depends on data ingestion such as logs, metrics, and traces. As usage grows, especially with log collection, the costs can increase quickly, which requires proper planning around what data to collect, retention policies, and filtering to keep control. Overall, I think it is flexible, but cost optimization needs continuous monitoring.
What was our ROI?
We have seen a return on investment with Datadog, mainly in terms of saving operational efficiency. For example, earlier our troubleshooting process involved checking multiple tools, which used to take around forty to forty-five minutes just to identify the root cause. With Datadog, since metrics and logs are centralized, we are usually able to reduce the time to around ten to twenty minutes in many cases. This has improved our response time and reduced the duration of incidents. While it may not directly reduce headcount, it definitely improves team productivity and helps handle more issues efficiently with the same team.
While we do not track exact numbers in all cases, with Datadog we have definitely seen a noticeable improvement in incident response time. For example, earlier it could take around thirty to forty-five minutes to identify the root cause analysis because we had to check multiple tools. With Datadog's centralized dashboards and logs, we are usually able to narrow it down within ten to fifteen minutes in most cases. We have also seen fewer escalations for minor issues because alerts help us catch problems earlier, which indirectly reduces downtime and improves overall efficiency.
Which other solutions did I evaluate?
We did consider a few alternatives, but they each have their own standards. We considered solutions such as Splunk, New Relic , and Prometheus. Everything is more costly, but I prefer Datadog. I have just heard about Datadog and other monitoring tools from some colleagues. As per their comparisons, I feel Datadog is much better.
What other advice do I have?
If anyone is looking to use Datadog, I would advise planning your monitoring strategy from the beginning. Focus on what metrics and logs are actually important because collecting everything can increase noise and costs. It is also important to spend some time on proper alert tuning; otherwise, you may end up with too many non-actionable alerts. I would also recommend starting with key integrations, especially with cloud platforms, and then gradually expanding use instead of enabling everything at once. I would rate this product an eight out of ten.