A go-to tool for analyzing, understanding, and investigating application performance
                        
                        
What is our primary use case?
The soluton is used for full stack enterprise performance monitoring for our primarily cloud-based stack on AWS. We have implemented monitoring coverage using RUM for critical apps and websites and utilize APM (integrated with RUM) for full stack traceability.  
 We use Datadog as our primary log repository for all apps and platforms, and the advanced log analytics enable accurate log-based monitoring/alerting and investigations. 
 Additionally, we some advanced RUM capabilities and metrics to track and optimize client-side user experience. We track SLO's for our critical apps and platforms using Datadog.
 
How has it helped my organization?
We now have full-stack observability, which allows us to better understand application behavior, quickly alert users about issues, and proactively manage application performance.  
 We've seen value by implementing observability coordinated across multiple applications, allowing us to track things like customer shopping and orders across multiple applications and services.  
 For critical application launches, we've built dashboards that can track user activity and confirm users are able to successfully utilize new features, tracking user activities in real-time in a war-room situation.  
 Datadog is our go-to tool for analyzing, understanding, and investigating application performance and behavior.
 
What is most valuable?
APM accurately tracks our service performance across our ecosystem. RUM gives us client-side performance and user experience visibility, and the rate of new features implemented in the Digital Experience area recently has been high. Log analytics give us a powerful mechanism for error tracking, research, and analysis.  
 Custom metrics that we've created allow us to track KPIs in real-time on dashboards. All of these have proven valuable in our organization.  Additionally, Datadog product support teams are responsive and have provided timely support when needed.
 
What needs improvement?
Agent remote configuration should be provided/improved and streamlined, allowing for config changes/upgrades to be performed via the portal instead of at the host.   
 Cost tracking via the admin portal is a bit lacking, even though it has gotten better.  I'm looking for usage trends (that drive cost) across time and better visibility or notifications about on-demand charges.  
 Network device and performance monitoring could be improved, as we've faced some limitations in this area.  
 The Datadog usage-based cost model, while giving us better transparency, is difficult to follow at times and is constantly evolving.  
 
For how long have I used the solution?
I've used the solution for three years.
 
How are customer service and support?
Support has been responsive and helpful.  
 
How would you rate customer service and support?
What's my experience with pricing, setup cost, and licensing?
Pricing is straightforward. That said, it's sometimes difficult to estimate usage volumes.
 
Which other solutions did I evaluate?
We evaluated Datadog and New Relic in detail and chose Datadog due to their straightforward and competitive pricing model, and their full coverage of monitoring features that we desired, and an easy-to-use UI.  
 
Which deployment model are you using for this solution?
Public Cloud
                        
                            
                        
                        
                     
                    
                        
                        Great logging, session replays, and alerting
                        
                        
What is our primary use case?
Our primary use cases include:
  - Alert on errors customers encounter in our product. We've set up logs that go to slack to tell us when a certain error threshold is hit.
- Investigate slow page load times. We have pages in our app that are loading slowly and the logs help us figure out which queries are taking the longest time.
- Metrics. We collect metrics on product usage.
- Session replays. We watch session replays to see what a user was doing when a page took a long time to load or hit an error. This is helpful.
 
 
How has it helped my organization?
It's helped us find bugs that customers are experiencing before they're reported to us. Sometimes, customers don't report errors, so being able to catch errors before they're reported helps us investigate before other users find errors
 Datadog has helped us investigate slow page loading times and even see the specific queries that are taking a long time to load
 Logging lets us see the context around an error. For example, see if a backend service had an error before it surfaced on the frontend.
 Dashboards are helpful for reviewing occasionally to get a higher-level overview of what's happening.
 
What is most valuable?
The most valuable aspects include: 
  - Logging. Being able to view detailed logs helps debug issues.
- Session replays. They are helpful for seeing what a customer was doing before they saw an error or had a slow page load
- Alerting. This is an important part of our on-call process to send alerts to slack when an error threshold is crossed. Alerts/monitors are easy to configure to only alert when we want them to alert.
- Dashboards. It's helpful to pull up dashboards that show our most common errors or page performance. It's a good way to see how the app is performing from a birds-eye-view.
 
What needs improvement?
The UI has a lot going on. It should be simpler and have a better way to onboard someone new to using Datadog.
 The log querying syntax can be confusing. Usually, I filter by finding a facet in a log and selecting to filter by that facet - but I'm not sure how to write the filter myself
 The monitor/alert syntax is also somewhat hard to understand.
 Overall, it should be easier to learn how to use the product while you're using the product. Perhaps tooltips or a link to learn more about whatever section you're using.
 
For how long have I used the solution?
I've used the solution for two years.
 
Which solution did I use previously and why did I switch?
We did not previously use a different solution.
 
Which other solutions did I evaluate?
We did not evaluate other options. 
 
                        
                            
                        
                        
                     
                    
                        
                        Lots of features with a rapid log search and an easy setup process
                        
                        
What is our primary use case?
We use the solution for logs, infrastructure metrics, and APM. We have many different teams using it across both product and data engineering.
 
How has it helped my organization?
The solution has improved our observability by giving us rapid log search, a correlation between hosts/logs/APM, and tons of features in one website.
 
What is most valuable?
I enjoy the rapid log search. It's such a pleasure to quickly find what you're looking for. The ease of graph building is also nice, and MUCH easier than Prometheus.
 
What needs improvement?
It is far too easy to run up huge unexpected costs. The billing model is not flexible enough to handle cases where you temporarily have thousands of nodes. It is not price effective for monitoring big data jobs. We had to switch to open-source Grafana plus Prometheus for those.
 It would be cool to have an open telemetry agent that automatically APM instruments everything in the next release.
 
For how long have I used the solution?
I've used the solution for three years.
 
What do I think about the stability of the solution?
I'd rate the stability ten out of ten.
 
What do I think about the scalability of the solution?
I'd rate the scalability ten out of ten.
 
Which solution did I use previously and why did I switch?
We did not previously use a different solution.
 
How was the initial setup?
The setup is very straightforward. Users just install the helm chart, and boom, you're done.
 
What about the implementation team?
We handled the setup in-house.
 
What's my experience with pricing, setup cost, and licensing?
Be careful about pricing. Make sure you understand the billing model and that there are multiple billing models available. Set up alarms to alert you of cost overruns before they get too bad.
 
Which other solutions did I evaluate?
We've never evaluated other solutions.
 
What other advice do I have?
It's a great product. However, you have to pay for quality.
 
Which deployment model are you using for this solution?
Public Cloud
                        
                            
                        
                        
                     
                    
                        
                        Great dashboards, lots of integrations, and heps trace data between components
                        
                        
What is our primary use case?
We use the product for instrumentation, observability, monitoring, and alerting of our system. 
 We have multiple environments and a variety of pieces of infrastructure including servers, databases, load balancers, cache, etc. and we need to be able to monitor all of these pieces, while also retaining visibility into how the various pieces interact with each other. 
 Tracing data between components and user interactions that trigger these data flows is particularly important for understanding where problems arise and how to resolve them quickly.
 
How has it helped my organization?
It provides a lot of options for integrations and tooling to observe what is happening within the system, making diagnosis and triage easier/faster. 
 Each user can set up their own dashboards and share them with other users on the team. We can instrument monitors based on various patterns that we care about, then notify us when an event triggers an alert with platforms such as Slack or PagerDuty. 
 Our ability to rapidly become aware of problems focused on the symptoms being observed and entry points into the tool to rapidly identify where to investigate further is important for our team and our users.
 
What is most valuable?
The most valuable aspects of the solution include log search to help triage specific problems that we get notified about (whether by alerts we have configured or users that have contacted us), APM traces (to view how user interactions trace through the various layers of our infrastructure and services to be able to reproduce and identify the source of problems), general performance/system dashboards (to regularly monitor for stability or deviation), and alerting (to be automatically informed when a problem occurs). We also use the incident tools for tracking production incidents.
 
What needs improvement?
In some ways, the tool has a pretty steep learning curve. Discovering the various capabilities available, then learning how to utilize them for particular use cases can be challenging. Thankfully, there is a good amount of documentation with some good examples (more are always welcome), and support is very helpful. 
 While DataDog has started adding more correlation mapping between services and parts of our system, it is still tricky to understand what is the ultimate root cause when multiple views/components spike. Additionally, there are lots of views and insights that are available but hard to find or discover. Some of the best ways to discover is to just click around a lot and get familiar with views that are useful, but that takes time and isn't ideal when in the middle of fighting a fire.
 
For how long have I used the solution?
I've used the solution for about four years.
 
What do I think about the stability of the solution?
What do I think about the scalability of the solution?
It seems to scale well. Performance for aggregating or searching is usually very fast.
 
How are customer service and support?
Technical support is helpful and pretty responsive.
 
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
We did not use a different solution. 
 
What was our ROI?
It's hard to say what ROI would be as I have not managed our system without it to compare to.
 
What's my experience with pricing, setup cost, and licensing?
I don't manage licensing.
 
Which other solutions did I evaluate?
We did not evaluate other options. 
 
What other advice do I have?
It's a great tool with new features and improvements continuously being added. It is not simple to use or set up, however, if you have the right personnel, you can get a lot of value from what DataDog has to offer.
 
Which deployment model are you using for this solution?
Public Cloud
                        
                            
                        
                        
                     
                    
                        
                        Prompt support with good logging and helps with standardization
                        
                        
What is our primary use case?
Internally our primary usage of Datadog pertains around APM/tracing, logging, RUM (real user monitoring), synthetic testing of service/application health and state, overall general monitoring + observability, and custom dashboards for aggregate observability. We also are more frequently leveraging the more recent service catalog feature.
We have several microservices, several databases, and a few web applications (both external and internal facing), and all of these within our systems are contained within several environments ranging from dev, sit, eat, and production.
 
How has it helped my organization?
Datadog has had a massive impact on our department. Before, we had loose logging dumped into a sea of GCP logs with haphazard custom solutions for traceability between logs and network calls. Datadog has helped standardize and normalize our processes around observability while providing fantastic tools for aggregating insight around what is monitored regularly, all wrapped in an easy-to-use UI.
 Additionally, a range of types of users exist within our department, each with its own positive impact on Datadog. DevOps leverages it to easily manage infra, developers leverage it to easily monitor/debug services and applications, and business leverages it for statistics.
 
What is most valuable?
Personally I've found the RUM (real user monitoring) to be above and beyond what I've worked with before. Client-side monitoring has always been on the short end of the stick but the information collected and ease of instrumentation provided by Datadog is second to none.
 Having a live dynamic service map is also one of my favourite features; it provides real-time insights into which services/applications are connected to which.
 We are also investigating the new API catalog feature set, which I believe will provide a high-value impact for real-time documentation and information about all of our shared microservices that other dev teams can use.
 
What needs improvement?
In production, we intend to use trace IDs generated by RUM to attach to support tickets when a user experiences a traceable network error, and we want to display this trace ID to the user so if they were to contact us about a specific issue, they can provide us an exact ID displayed to them back to us. Currently, this is not possible out-of-the-box client-side without inventing our own solution for capturing these trace IDs, such as shimming the native fetch or returning the ID from the service response.
 
For how long have I used the solution?
I've used the solution for approximately two years across our department and around a year or so of it being used practically and fully integrated into our systems.
 
What do I think about the stability of the solution?
Aside from one very brief bad update from the Datadog team around RUM where they broke the native 'fetch' for node in an update to RUM (which was resolved quickly) as it used to -- and may still -- modified the global 'fetch'; Datadog as a whole solution has been highly stable.
 
What do I think about the scalability of the solution?
It's easy to implement and scale provided a there's a solid IaC solution in place to integrate across your system.
 
How are customer service and support?
The Datadog support team is prompt and helpful when tickets have been submitted from our end. When their support team have been unsure, they've properly reached out internally to the relevant SME to help answer any questions we've had prior.
 
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
I've personally dabbled with some other open-source observability and monitoring solutions; however, prior to Datadog, our department did not have any solutions other than log dumps to GCP.
 
How was the initial setup?
The initial setup was straightforward from my own experience, helping integrate within the application and service levels; however, our DevOps team handled most of the infra process with minimal complaints.
 
What about the implementation team?
We handled the solution in-house.
 
What's my experience with pricing, setup cost, and licensing?
I personally am not involved in the decision around costing; however, I am aware that when we first set up Datadog, we explicitly configured our services/applications to have a master switch to enable Datadog integration so that we can dynamically enable/disable targeted environments as need due to the costs being associated on a per service basis for APM/logging/etc.
 
Which other solutions did I evaluate?
I was not involved in the decision-making regarding the evaluation of other options.
 
What other advice do I have?
I highly recommend Datadog, and I would explore it for my own individual projects in the future, provided the cost is within reason. Otherwise, I would highly recommend it for any medium-to-large-sized org.
 
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Google
                        
                            
                        
                        
                     
                    
                        
                        Good query filtering and dashboards to make finding data easier
                        
                        
What is our primary use case?
We use the solution for monitoring microservices in a complex AWS-based cloud service.  
 The system is comprised of about a dozen services. This involves processing real-time data from tens of thousands of internet connected devices that are providing telemetry. Thousands of user interactions are processed along with real-time reporting of device date over transaction intervals that can last for hours or even days. The need to view and filter data over periods of several months is not uncommon.  
 Datadog is used for daily monitoring and R&D research as well as during incident response.
 
How has it helped my organization?
The query filtering and improved search abilities offered by Datadog are by far superior to other solutions we were using, such as AWS CloudWatch. We find that we can simply get at the data we need quicker and easier than before. This has made responding to incidents or investigating issues a much more productive endeavour. We simply have less roadblocks in the way when we need to "get at the data". It is also used occasionally to extract data while researching requirements for new features.
 
What is most valuable?
Datadog dashboards are used to provide a holistic view of the system across many services. Customizable views as well as the ability to "dive in" when we see someting anomalous has improved the workflow for handling incidents.    
 Log filtering, pattern detection and grouping, and extracting values from logs for plotting on graphs all help to improve our ability to visualize what is going on in the system. The custom facets allow us to tailor the solution to fit our specific needs.
 
What needs improvement?
There are some areas on log filtering screens where the user interface can take some getting used to. Perhaps having the option for a simple vs advanced user interface would be helpful in making new or less experienced users comfortable with making their own custom queries.
Maybe it is just how our system is configured, yet finding the valid values for a key/value pair is not always intuitively obvious to me. While there is a pop-up window with historical or previously used values and saved views from previous query runs, I don't see a simple list or enumeration of the set of valid values for keys that have such a restriction.
 
For how long have I used the solution?
I've used the solution for one year.
 
What do I think about the stability of the solution?
The solution is very stable.
 
What do I think about the scalability of the solution?
The product is reasonably scalable, although costs can get out of hand if you aren't careful.
 
How are customer service and support?
I have not had the need to contact support.
 
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
We did use AWS CloudWatch. It was to awkward to use effectively and simply didn't have the features.
 
How was the initial setup?
We had someone experienced do the initial setup.  However, with a little training, it wasn't too bad for the rest of us.
 
What about the implementation team?
We handled the setup in-house.
 
What's my experience with pricing, setup cost, and licensing?
Take care of how you extract custom values from logs. You can do things without thought to make your life easier and not realize how expensive it can be from where you started.
 
Which other solutions did I evaluate?
I'm not aware of evaluating other solutions.
 
What other advice do I have?
Overall I recommend the solution. Just be mindful of costs.
 
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
                        
                            
                        
                        
                     
                    
                        
                        Good centralization with helpful monitoring and streamlined investigation capabilities
                        
                        
What is our primary use case?
We utilize Datadog to monitor both some legacy products and a new PaaS solution that we are building out here at Icario which is Micro-Service arch. 
 All of our infrastructure is in AWS with very few legacies being rackspace. For the PaaS we mainly just utilize the K8s Orchestrator which implements the APM libraries into services deployed there as well as giving us infra info regarding the cluster. 
 For legacies, we mainly just utilize the Agent or the AWS integration. With APM in specific places. We monitor mainly prod in Legacy and the full scope in the PaaS for now. 
 
How has it helped my organization?
Datadog has greatly improved the time needed to investigate issues. Putting everything into a single pane of glass. Allowing us to get ahead of infra/app-based issues before they affect customer experience with our products. 
 Outside of that, the ease of management, deployment of agents, integrations etc. has greatly helped the teams. There isn't much leg work needed by the devs to manage or deploy Datadog into their stacks. This is with the use of Terraform, pipelines and the orchestrator. All in all, it has been an improvement.
 
What is most valuable?
The two most valuable aspects are the Terraform provider for Datadog and the K8s Orchestrator. People don't take that into account when buying into a tooling product like Datadog in this age where scalability, management, and ease of implementation is key. Other tools not having good IaC products or options is a ball drop. Orchestration for the tools agent is good. Not having to use another tool to manage the agents and config files in mutiple places/instances is a huge win!
 
What needs improvement?
A big problem with Datadog is the billing. They need to make the billing more user-friendly. I know it like the back of my hand at this point, yet trying to explain it to the C-suite as to why costs went up or are what they are is many times more complicated than it needs to be. I can't even say "why" due to of the lack of metadata tied to billing. For instance, with the AWS Integration Host ingestion, I cant say well this month THESE host got added and thats what caused cost to go up. The billing visibility really needs to be resolved!
 
For how long have I used the solution?
I'd rate the solution for more than four years.
 
What do I think about the stability of the solution?
Datadog has always been extremely stable, with outages really only ever creating delays, never actual downtime of the service, which is amazing and impressive.
 
What do I think about the scalability of the solution?
The solution is very scalable if implemented right and not on top of complicated architecture.
 
How are customer service and support?
Support is excellent. They are always looking for a resolution, and a ticket is never left unresolved unless the feature just can't exist or isn't currently possible.
 
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
We did have New Relic, Datadog, Sumo Logic, Pingdom, and some other custom or third-party tooling. We switched because we wanted everything to be in a single pane and because Datadog is a better solution than the competitors.
 
How was the initial setup?
For us, set-up is a mixed bag as we support legacy apps and architectures as well as a new microservice architecture. That being said, legacy is somewhat complex just due to the nature of how those apps stack and the underlying infra and configuration and setup. Microservice is a breeze and straight-forward for most of the out-of-the-box stuff. 
 
What about the implementation team?
Our Team of SRE Engineers, Platform Engineers and Cloud Engineers implemented the solution.
 
What was our ROI?
I can't really speak to ROI; however, from my perspective, we definitely get our money's worth from the product.
 
What's my experience with pricing, setup cost, and licensing?
Users just just really need to make sure they stay on top of costs and don't let all of the engineers do as they please. Billing with Datadog can get out of hand if you let them. Not everything needs to be monitored.
 
Which other solutions did I evaluate?
We didn't really need to evaluate other options. 
 
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
                        
                            
                        
                        
                     
                    
                        
                        Good for log ingestion and analyzing logs with easy searchability of data
                        
                        
What is our primary use case?
We use Datadog as our main log ingestion source, and Datadog is one of the first places we go to for analyzing logs. 
 This is especially true for cases of debugging, monitoring, and alerting on errors and incidents, as we use traffic logs from K8s, Amazon Web Services, and many other services at our company to Datadog. In addition, many products and teams at our company have dashboards for monitoring statistics (sometimes based on these logs directly, other times we set queries for these metrics) to alert us if there are any errors or health issues.
 
How has it helped my organization?
Overall, at my company, Datadog has made it easy to search for and look up logs at an impressively quick search rate over a large amount of logs. 
 It seamlessly allows you to set up monitoring and alerting directly from log queries which is convenient and helps for a good user experience, and while there is a bit of a learning curve, given enough time a majority of my company now uses Datadog as the first place to check when there are errors or bugs. 
 However, the cost aspect of Datadog is tricky to gauge because it's related to usage, and thus, it is hard to tell the relative value of Datadog year to year.
 
What is most valuable?
The feature I've found most valuable is the log search feature. It's set up with our ingestion to be a quick one-stop shop, is reliable and quick, and seamlessly integrates into building custom monitors and alerts based on log volume and timeframes. 
 As a result, it's easy to leverage this to triage bugs and errors, since we can pinpoint the logs around the time that they occur and get metadata/context around the issue. This is the main feature that I use the most in my workflow with Datadog to help debug and triage issues.
 
What needs improvement?
More helpful log search keywords/tips would be helpful in improving Datadog's log dashboard. I recently struggled a lot to parse text from raw line logs that didn't seem to match directly with facets. There should be smart searching capabilities. However, it's not intuitive to learn how to leverage them, and instead had to resort to a Python script to do some simple regex parsing (I was trying to parse "file:folder/*/*" from the logs and yet didn't seem to be able to do this in Datadog, maybe I'm just not familiar enough with the logs but didn't seem to easily find resources on how to do this either). 
 
For how long have I used the solution?
I've used the solution for 10 months.
 
What's my experience with pricing, setup cost, and licensing?
Beware that the cost will fluctuate (and it often only gets more expensive very quickly).
 
                        
                            
                        
                        
                     
                    
                        
                        Good visibility into application performance, understanding of end-user behavior, and a single pane of glass view
                        
                        
What is our primary use case?
The primary use case for this solution is to enhance our monitoring visibility, determine the root cause of incidents, understand end-user behaviour from their point of view (RUM), and understand application performance.
 Our technical environment consists of a local dev env where Datadog is not enabled, we have deployed environments that range from UAT testing with our product org to ephemeral stacks that our developers use to test there code not on there computer.  We also have a mobile app where testing is also performed.
 
How has it helped my organization?
Datadog has greatly improved our organization in many ways. Some of those ways include greater visibility into application performance, understanding of end-user behavior, and a single pane of glass view into our entire infrastructure.  
 Regarding visibility, our organization previously used New Relic, and when incidents or regressions happened, New Relic's query language was very hard to use. End-user behavior in RUM has improved our ability to know what to focus on. Lastly, the single pane of glass view with maneuvering between products has helped us truly understand root causes after incidents.
 
What is most valuable?
APM has been a top feature for us. I can speak for all developers here: they use it more often than other products. Due to a standard in tracing (even though it is customizable), engineers find it easier to walk a trace than to understand what went wrong when looking at logging.  
 Another feature that I find valuable, though it isn't the first one that comes to mind, is Watchdog. I have found that has been a good source of understanding anomalies and where maybe we (as an organization) need more monitoring coverage.
 
What needs improvement?
I am not 100% sure how this is done or if it can be though I've had a lot of education I've had to do to ramp developers up on the platform. This feels like the nature of just the sheer growth and number of products Datadog now offers.  
 When I first started using the Datadog platform, I thought that was a big pro of the company that the ramp-up time was much quicker, not having to learn a query language. I still believe that to be true when comparing the product to someone like New Relic though with the wide range of products Datadog now offers it can be a bit intimidating to developers to know where to go to find what they want.
 
For how long have I used the solution?
I have been using the solution at my current company for almost four years, and have used it at my previous company as well.
 
Which solution did I use previously and why did I switch?
A while ago, we used New Relic, and we switched due to Datadog being a better product.
 
What about the implementation team?
We did the implementation in-house.
 
What's my experience with pricing, setup cost, and licensing?
The value compared to pricing is reasonable, though it can be a bit of a sticker shock to some.
 
Which other solutions did I evaluate?
We did not evaluate other options. 
 
Which deployment model are you using for this solution?
Public Cloud
                        
                            
                        
                        
                     
                    
                        
                        Enhances efficiency with robust alerting and visualization tools
                        
                        
What is our primary use case?
Our primary use case for Datadog is to monitor and manage our fully cloud-native infrastructure. We utilize DataDog to gain real-time visibility into our cloud environments, ensuring that all our services are running smoothly and efficiently. 
 The platform’s extensive integration capabilities allow us to seamlessly track performance metrics across various cloud services, containers, and microservices. 
 With Datadog’s robust alerting and visualization tools, we can proactively identify and resolve issues, minimizing downtime and optimizing our system’s performance. This has been crucial in maintaining the reliability and scalability of our cloud-native applications.
 
How has it helped my organization?
Datadog has significantly enhanced our organization’s operational efficiency and reliability. By providing real-time visibility into our cloud-native infrastructure, Datadog enables us to monitor performance metrics, detect anomalies, and resolve issues swiftly. 
 The platform’s robust alerting system ensures that potential problems are addressed before they impact our services, reducing downtime and improving overall system stability. Additionally, Datadog’s comprehensive dashboards and reporting tools have streamlined our troubleshooting processes and facilitated better decision-making.
 
What is most valuable?
The most valuable feature of Datadog for our organization has been its real-time monitoring capabilities. This feature provides us with instant visibility into our cloud-native infrastructure, allowing us to track performance metrics and detect anomalies as they occur. The ability to monitor our systems in real-time means we can quickly identify and address issues before they escalate, minimizing downtime and ensuring the reliability of our services. 
 Additionally, the real-time data helps us make informed decisions and optimize our operations, ultimately enhancing our overall efficiency and performance.
 
What needs improvement?
While Datadog has been instrumental in enhancing our operational efficiency, there are areas where it could be improved. 
 One area is the user interface, which could be more intuitive and user-friendly, especially for new users. 
 Additionally, the pricing model can be quite complex and might benefit from more flexible options tailored to different organizational needs. 
 For future releases, it would be beneficial to include more advanced machine learning capabilities for predictive analytics, helping us anticipate issues before they occur. 
 More third-party tools would also be valuable additions.
 
For how long have I used the solution?
I've used the solution for six years.
 
What do I think about the stability of the solution?
DataDog has proven to be a highly stable solution for our monitoring needs. Throughout our usage, we have experienced minimal downtime and consistent performance, even during peak traffic periods. The platform’s reliability ensures that we can continuously monitor our cloud-native infrastructure without interruptions, which is crucial for maintaining the health and performance of our services.
 
What do I think about the scalability of the solution?
DataDog’s scalability has been impressive and instrumental in supporting our growing cloud-native infrastructure. The platform effortlessly handles increased workloads and scales alongside our expanding services without compromising performance. Its ability to integrate with a wide range of cloud services and technologies ensures that as we grow, DataDog continues to provide comprehensive monitoring and insights.
 
How are customer service and support?
Our experience with Datadog’s customer service and support has been exceptional. The support team is highly responsive and knowledgeable, providing timely assistance whenever we’ve encountered issues or had questions. 
 Their proactive approach to offering solutions and guidance has been invaluable in helping us maximize the platform’s capabilities.
 
How would you rate customer service and support?
How was the initial setup?
The setup is straightforward.
 
What about the implementation team?
We handled the setup in-house.
 
What's my experience with pricing, setup cost, and licensing?
The pricing model can be quite complex and might benefit from more flexible options tailored to different organizational needs.
 
What other advice do I have?
One area is the user interface, which could be more intuitive and user-friendly, especially for new users.
 
Which deployment model are you using for this solution?
Public Cloud