I have been using Netdata for the past one and a half years, mainly for real-time monitoring and quick visibility into system performance. During this time, I have used it to monitor CPU, memory, disk, and network metrics. It has been especially useful for instant troubleshooting and understanding system behavior without needing complex setup.
Netdata Cloud Annual Contract
NetdataReviews from AWS customer
-
5 star0
-
4 star0
-
3 star0
-
2 star0
-
1 star0
External reviews
External reviews are not included in the AWS star rating for the product.
Real-time monitoring has improved incident detection and simplifies per-second troubleshooting
What is our primary use case?
What is most valuable?
During a production issue where one of the services started responding slowly, I used Netdata instantly to check the real-time metrics instead of digging through multiple tools. I observed a sudden spike in CPU usage and disk I/O along with increased network traffic. Since Netdata provides second-by-second visibility, I was able to quickly correlate the spike with the background process that was consuming resources. This helped us identify the root cause within minutes and take action immediately. Instead of spending time collecting logs or setting up queries, I had real-time visibility.
Netdata serves as a first-level monitoring and troubleshooting tool for instant visibility and quick checks before diving into more complex monitoring systems such as Prometheus or logs. It acts as a real-time layer where I can immediately see what is happening on the server without any delay. The zero-configuration setup is a big advantage where we can deploy it quickly on servers and start getting detailed metrics right away, which is very useful.
The best feature Netdata offers in my experience is per-second monitoring. Unlike traditional tools that collect data every ten to sixty seconds, Netdata gives instant visibility, which helps catch short-lived issues that other tools miss. It comes with Machine Learning on every metric, automatically detecting anomalies without manual setup, reducing false positives by up to ninety-nine percent and identifying the root cause quickly, which speeds up troubleshooting. One of the most powerful features is Auto Discovery and zero configuration, allowing us to install it easily without the need for manual dashboards or configurations.
Machine Learning-based anomaly detection is really helpful in identifying issues early without manual rule setup. Instead of defining static thresholds for every metric, Netdata automatically learns normal behavior and flags unusual patterns. For example, I have seen a case where CPU or memory usage did not cross typical alert thresholds, but the pattern was abnormal compared to historical behavior. Netdata detected this and alerted us early, allowing us to investigate before it turned into a bigger issue.
Another valuable feature is its granular drill-down capability, enabling me to go from a high-level system view down to very specific metrics instantly, which makes root cause analysis much faster without switching tools. The ease of deployment across environments is also valuable; whether it is a single server, VM, or a container, Netdata can be set up, and I can start collecting metrics immediately. This is very beneficial during scaling or instant responses, and live streaming dashboards are a nice addition.
The real-time visibility and faster incident response have led to significant improvements such as quicker troubleshooting. Since I get second-by-second metrics, I can identify issues almost instantly instead of waiting for delayed monitoring data. This has significantly reduced the time to detect and resolve problems.
What needs improvement?
The key area for improvement is data retention and historical analysis. Netdata is excellent for real-time monitoring, but for long-term storage and deep historical insights, it usually needs to be integrated with other tools. Improving built-in long-term retention would make it more complete. Another area I see is data aggregation, where it correlates multiple nodes; while it provides great per-node visibility, having more advanced centralized views and correlations across systems would help in larger-scale environments. Customization of alerts and anomaly detection could also be enhanced.
One area needing improvement is access control and security features. Stronger RBAC and user management would help control who can view data. Another area would be customization of dashboards, alerting integrations, and improving enterprise-level scalability features.
For how long have I used the solution?
I have been using Netdata for the past one and a half years.
What do I think about the stability of the solution?
Netdata is very stable.
What do I think about the scalability of the solution?
Its scalability is very good because I have deployed it across the infrastructure, and it is working fine across the organization. It varies team by team, but it is working very well.
How are customer service and support?
Customer support is very good. They follow a ticket-based system, and we receive SLAs for all tickets so that they are responded to in due time. The customer support team has also been nice, so things are going well.
How was the initial setup?
As I am going with the subscription-based model in the SaaS, the pricing and setup cost are very nice, and the setup is very easy since I am running the server inside GCP. The licensing is also very good, so I am getting good visibility onto cloud bills as well.
What was our ROI?
The biggest ROI I have seen is in incident detection, with improvements of forty to fifty percent, troubleshooting time by thirty to forty percent, and productivity gains of around twenty-five to thirty percent. Another key factor is reduced tooling overhead; since Netdata provides instant visibility out of the box, I rely less on multiple tools for initial debugging, which simplifies the workflow.
What's my experience with pricing, setup cost, and licensing?
As I am going with the subscription-based model in the SaaS, the pricing and setup cost are very nice, and the setup is very easy since I am running the server inside GCP.
What other advice do I have?
I would rate Netdata nine out of ten. The reason I give it a nine out of ten is that it is an excellent tool for real-time monitoring with instant visibility, easy setup, and powerful anomaly detection. The limitation of long-term retention, advanced customization, and some enterprise features is why it is not a full ten. Overall, it is highly effective for fast troubleshooting. I switched to Netdata mainly for instant per-second monitoring. It gave me immediate insights without needing to build anything complex. It also complements our existing stack by acting as a real-time troubleshooting layer. Overall, it is very good. Netdata is a very good tool compared to Prometheus and Grafana, so I would recommend it.
Easy to Use with Great Real-Time Monitoring and Alerts
Effortless Turnkey Solution with No-Configure Simplicity
An indispensable solution for central monitoring
Effortless VM Monitoring with Netdata
Effortless Monitoring with Robust Anomaly Detection
A detailed and reliable product for managing bare metal and virtual machines
Easy to deploy. Easy to get onboard. Responsive Support. Amazing customer experience!
I was amazed how easy was to get up and running with Netdata, a perfect solution for us.
• Beautiful web-based dashboards for visualizing metrics across Ceph, Proxmox, and GPU usage
• Zero-config auto-discovery for services like Ceph daemons, Proxmox nodes, and Docker containers
• Built-in alerting system with support for email, Telegram, Slack, and more — easy to customize
• Low system overhead with in-memory time-series engine, no need for external databases
• GPU monitoring support, including NVIDIA Tesla stats via nvml.plugin
• Proxmox hypervisor insights, including VM resource usage and host metrics
• Ceph integration for cluster health, OSD performance, and IOPS visibility
• Netdata Cloud integration for centralized, multi-node monitoring and alert correlation
• Extensible plugin system allows custom metrics or third-party integrations if needed
• Support through the KB and forum is just enough for us and we find it helpful
• Proactive monitoring — helps us react before issues escalate or impact performance
• High-resolution visibility — allows us to quickly identify where and when resource consumption increases
• Confidence in infrastructure health — even if we don’t have ongoing problems, it ensures we’re not missing anything critical
• Reduces firefighting — by catching potential issues early, we avoid downtime and stressful late fixes
• Supports capacity planning — by showing usage trends over time, it helps us scale or adjust resources proactively