Centralized monitoring has reduced incidents and now improves alerting and troubleshooting speed
What is our primary use case?
My main use case for Grafana is to create and design dashboards based on the metrics provided by different exporters via Prometheus.
We have different exporters, and we are creating different dashboards based on them. We have a set of dashboards related to Kafka, virtual machines, and instances. Inside Kafka, we have a broker dashboard, consumer dashboard, partition dashboard, and other ingestion and consumption rate dashboards. Apart from that, we have a dashboard for consumer lag and consumption by partition.
We are collecting metrics from Prometheus and creating dashboards inside Grafana. Inside Grafana, we have different data sources including Thanos and Prometheus. We are also using Grafana for alert setup. We have set up alerts based on the exceptions we are collecting from Loki, and if any such exception occurs, it will create an incident alert over Squadcast.
What is most valuable?
Grafana offers many features including the ability to create dashboards, add variables, and set up alerts, which also covers notifications via integration with incident management tools or by configuring your email ID to get the notifications.
You can directly configure alerts in Grafana by either creating a dashboard or using the explore icon in Grafana, where you can select Loki and set alerts based on your exceptions.
There are many features including dashboard creation being much easier. You can configure multiple data sources such as Prometheus and Thanos. Apart from that, you can directly link AWS CloudWatch with your Grafana and other tools. For alerting, you can create alerts based on thresholds and exceptions, and in Grafana, there are many plugins you can configure to create data source dashboards. Additionally, there is also a restriction in Grafana that allows you to provide viewer, editor, or admin access based on roles.
We have had very positive outcomes from Grafana because you can directly visualize the metrics based on past and current inputs and take timely actions based on the responses you are getting from the visualization dashboards. Apart from that, the alerts notify you through your incident management tool.
You can check those metrics in the incident management tool by filtering the alert source as Grafana, and it helps in reducing production incidents because you can acknowledge and visualize the metrics from Grafana on time.
What needs improvement?
Currently, I do not think that any improvement is required, but there are multiple use cases.
For how long have I used the solution?
I have been using Grafana for the last four years.
What do I think about the stability of the solution?
What do I think about the scalability of the solution?
Grafana has excellent scalability.
How are customer service and support?
The customer support for Grafana is excellent.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
This is the only solution we are currently using.
Before choosing Grafana, we evaluated other options including DataDog, but it was quite costlier, so we switched to Grafana.
How was the initial setup?
I have seen a return on investment as we actually need fewer employees, and you can take timely actions on the alerts. Apart from that, it reduces MTTR because you receive notifications through the incident management tool, allowing for timely action and better troubleshooting by visualizing metrics and logs inside Grafana. You can optimize these processes by visualizing issues earlier based on the metrics from Grafana.
I have seen a return on investment with fewer employees needed, and you can take timely actions based on alerts. Apart from this, it helps reduce MTTR because you receive notifications through the incident management tool, enabling timely responses and better troubleshooting by visualizing metrics and logs inside Grafana, thus allowing you to tackle issues earlier based on Grafana metrics.
What's my experience with pricing, setup cost, and licensing?
My experience with pricing, setup cost, and licensing is that it is very reasonable and has excellent community support.
What other advice do I have?
You are able to detect issues faster because you can configure alerts based on thresholds in your Grafana and get notifications from your tool like Squadcast, which will reduce MTTR. Apart from that, system visibility is there; you can visualize CPU metrics, memory, disk usage, API latencies, and other ports inside the Grafana dashboard. Based on these metrics, you can troubleshoot your issues very easily.
If you want a scalable solution, better visualization, optimization, centralized monitoring, and improved troubleshooting, then you can choose Grafana without any doubts in your mind. I would rate this product a ten out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Powerful, Flexible Observability with Grafana Labs
What do you like best about the product?
What I like most about Grafana Labs is its flexibility and strong visualization capabilities. Grafana makes it straightforward to create interactive, real-time dashboards by pulling data from multiple sources such as Prometheus, Elasticsearch, and various cloud platforms. The interface feels intuitive and highly customizable, which makes it especially well-suited for monitoring complex microservices environments.
What do you dislike about the product?
One drawback of Grafana Labs is that the more advanced features and enterprise capabilities can get expensive as you scale, particularly in larger environments with high data volumes.
For beginners, the initial dashboard setup and alert configuration can also feel a bit complex. Once you start integrating multiple data sources across distributed systems, it can take a while to fully understand how to structure queries, manage permissions, and set up alerting logic correctly.
What problems is the product solving and how is that benefiting you?
Grafana Labs addresses the challenge of fragmented observability by bringing metrics, logs, and traces together in a single, unified platform. Rather than jumping between multiple tools, I can visualize, compare, and correlate everything in one place.
This has noticeably improved our troubleshooting speed, provides clearer visibility across microservices, and helps our team make faster, data-driven decisions with greater confidence.
Flexible, Clear Dashboards with Powerful Integrations and Alerting
What do you like best about the product?
What I like most about Grafana Labs is the flexibility and clarity it offers for visualizing metrics. In our projects, we handle a lot of monitoring data, and Grafana makes it easy to turn raw metrics into meaningful, easy-to-read dashboards. The UI is clean, and we can customize panels to fit our requirements.
I also really appreciate its integration capabilities. It connects smoothly with Prometheus, Kubernetes, and other data sources without much complexity. Being able to pull metrics from different systems and view everything in one place gives us better visibility into overall system performance.
Alerting is another strong point. We can set up alerts based on thresholds, which helps us respond quickly when something goes wrong. Overall, Grafana makes monitoring more organized, visually clear, and easier to manage in day-to-day operations.
What do you dislike about the product?
One thing I dislike about Grafana Labs is that the initial setup and dashboard configuration can feel confusing, especially for someone new to monitoring tools. There are so many options and settings that it takes time to understand how to properly configure data sources and write the right queries.
Writing queries—particularly with Prometheus (PromQL)—can also be fairly complex. If a query isn’t written correctly, the dashboard may not show accurate data, and tracking down what went wrong can take extra time and effort.
Another minor concern is that when there are too many dashboards and panels, managing everything can become difficult without good organization. That said, once I get used to the structure and establish a system, it becomes much more manageable.
What problems is the product solving and how is that benefiting you?
Grafana Labs mainly addresses the challenge of monitoring visibility. In complex projects with many services, servers, and containers running at the same time, it can be hard to understand system performance by relying only on logs or raw metrics. Grafana helps us turn that monitoring data into clear, meaningful dashboards.
In our experience, it has made it easier to track application performance, CPU and memory usage, API response times, and overall system health in one place. Rather than jumping between multiple tools, we can view everything on a single dashboard, which saves a lot of time.
It also supports early issue detection. When a metric crosses a threshold, we receive alerts and can respond quickly before it turns into a bigger problem. Overall, this improves system stability and helps reduce downtime.
Grafana Labs Makes Observability Unified, Flexible, and Scalable
What do you like best about the product?
What I like best about Grafana Labs is its ability to unify metrics, logs, and traces into a single, intuitive observability platform. The dashboards are highly flexible and easy to customize, making complex system data instantly understandable. Its strong ecosystem, wide data-source support, and active open-source community make it both powerful and future-ready for scaling teams.
What do you dislike about the product?
What I dislike about Grafana Labs is that the learning curve can be steep for new users, especially when building advanced dashboards or setting up alerting. Performance can degrade with very large datasets if dashboards aren’t well optimized.
What problems is the product solving and how is that benefiting you?
Grafana Labs solves the problem of fragmented monitoring by bringing metrics, logs, and traces into a single observability layer. This helps quickly detect, diagnose, and resolve issues without jumping across multiple tools. For me, it improves system visibility, reduces MTTR, and enables more data-driven decisions around performance, reliability, and capacity planning.
Exceptional Global Network Visibility with Easy Data Export
What do you like best about the product?
This is one of the best tools available for keeping track of network infrastructure globally. It allows all global offices to be monitored with exceptionally detailed, live visibility into critical network points, and the data can be exported for use across various platforms, such as Power BI.
What do you dislike about the product?
Honestly, I don’t have any major complaints. The only thing I can think of is that it would be helpful if the data timeline could be extended to cover a few months or even years, although this option might already exist—I’m just not sure.
What problems is the product solving and how is that benefiting you?
It monitors all network equipment live, with very detailed specifications, and it will send you a notification—like a ticket in ServiceNow—if any network element goes down or isn’t functioning well.
Beautiful Dashboards, Simple Setup, Needs Better Automation
What do you like best about the product?
I use Grafana Labs for monitoring our infrastructure, particularly for Cyberark Devices. I find it to be a simple SQL-based tool that's very efficient for its dashboards and offers a one-view pane to identify exact issues quickly. I like how Grafana has very beautiful dashboards, and we can easily create custom SQL queries to capture any needed information and display it on the dashboard. The level of monitoring it provides is quite accurate. Also, I appreciate that the initial setup is quite simple due to its SQL-based design.
What do you dislike about the product?
Grafana Labs doesn't have native functionality to cut an incident automatically via integration to ticketing tools.
What problems is the product solving and how is that benefiting you?
I use Grafana Labs for efficient infrastructure monitoring, especially Cyberark devices. Its SQL-based tool simplifies creating custom queries and dashboards, providing accurate monitoring and a single view pane to pinpoint issues.
Poweful Monitoring and Observability Platform
What do you like best about the product?
I use Grafana labs to monitor applications health and infrastructure .I mainly use it for dashboards ,metrics visualization, and abalyzing system performance.
What do you dislike about the product?
The initial setup and learning curve can be challenging for beginners. Some advanced feactures for beginners.some advances features are only availablein paid plans.
What problems is the product solving and how is that benefiting you?
Grafana Labs helps solve the problem of monitoring and observability in distributed systems. It gives clear visibility into metrics and performance issues ,which helps in faster debugging and better system reliability.
Powerful Grafana Dashboards with Flexible Integrations for Observability
What do you like best about the product?
Grafana provides powerful dashboards and visualizations that make it easy to monitor metrics and logs from multiple data sources. The flexibility and integrations are very helpful for observability.
What do you dislike about the product?
Initial setup and dashboard configuration can take time, and tuning alerts or queries requires familiarity with the underlying data sources.
What problems is the product solving and how is that benefiting you?
Grafana helps centralize monitoring by visualizing metrics and logs in a single place. This improves visibility into system health, helps detect issues faster, and supports better operational decisions.
Reliable Monitoring Tool for DevOps, but Takes Time to Master Advanced Setup
What do you like best about the product?
I used Grafana during my DevOps internship mainly for monitoring servers and application services. We connected Grafana with Prometheus as the main data source and also used custom metrics from OTel.
What I like most is how easy it becomes to monitor everything from one place once the dashboards are set up. I created dashboards to check service uptime, system metrics and application health and also configured alerts to get notified when something went wrong. The integration with Prometheus queries works well, and real-time metrics help a lot during incidents. It saves time because you don’t have to manually check logs or servers again and again.
What do you dislike about the product?
The main difficulty I faced was during metric integration and dashboard setup, especially when working with custom metrics. For new users, it takes time to understand which metrics to use and how to structure panels properly.
Alert and dashboard configuration can feel complex at first and advanced features have a steep learning curve. Better and more practical documentation for real production use cases would make onboarding easier, especially for interns or beginners.
What problems is the product solving and how is that benefiting you?
Grafana helps in monitoring servers and services continuously in a production environment. Instead of manually checking systems or logs, we can quickly identify issues using dashboards and alerts.
If a service goes down, it is visible immediately on Grafana, which helps the team respond faster. This improves incident handling and reduces downtime. Overall, it made monitoring more organized and efficient during my internship
Unified dashboards have empowered teams and have democratized real-time operational insights
What is our primary use case?
My main use case for Grafana involves operational dashboarding and data visualization, where I use it as a central pane of glass to pull in metrics from multiple sources like Prometheus, Elasticsearch, and SQL databases to visualize the overall health of our systems in one unified view.
For example, I have built a NOC dashboard that tracks CPU memory usage and network traffic across all the pods. If a specific service starts failing, the Grafana dashboard highlights the issue in red, allowing my on-call engineers to identify the failing cluster at a glance.
What is most valuable?
Grafana's snapshot and dashboard sharing features are critical for our remote incident response. During production issues, I generate a public snapshot of a dashboard at a specific point and share the URL in our Slack war room so every engineer can see exactly what the metrics looked like when the error occurred. This helps significantly during the process of finding the root cause in those scenarios.
The best features Grafana offers go beyond just pretty charts; it is an integration engine. The fact that I can join data from my SQL database with metrics from Prometheus in the same table is a feature I have not found performed as well elsewhere.
My team uses this feature by comparing two different tables from the databases to show one single view, which Grafana is really helping with. In a visualized way, the charts can be displayed on one dashboard, allowing end users who are not familiar with these technical aspects to extract valuable data from it.
Grafana has positively impacted our organization by democratizing data within our company. Before using Grafana, only developers could see the system health, but now our product managers and executives have their own high-level dashboards, which has improved cross-departmental transparency and alignment.
What needs improvement?
I find that the alerting UI in Grafana can be complex for new users. While it is very powerful, it takes time to learn the differences between contact points, notification policies, and silences.
The documentation can be improved to provide more detailed descriptions, allowing new users to understand more concepts before they come to knowledge transfer sessions with senior team members.
For how long have I used the solution?
I have been using Grafana for over four years to build real-time observability dashboards and monitor our complex infrastructure and application performance.
What do I think about the stability of the solution?
In my experience, Grafana is extremely stable. Even when handling millions of data points, the visualization layer remains responsive. Since it is decoupled from the actual data storage, the dashboard stays up even if one of our underlying data sources is temporarily slow.
What do I think about the scalability of the solution?
Grafana's scalability is impressive. It is highly scalable and built on a big data architecture capable of ingesting trillions of data points. For our on-premise instance, I use a high availability configuration with a shared database to manage growth.
How are customer service and support?
Customer support for Grafana is solid. The community support is massive, and the technical support team is very helpful with complex PromQL troubleshooting.
Which solution did I use previously and why did I switch?
Before Grafana, I relied solely on the native monitoring console of our cloud providers, like AWS CloudWatch. I switched to Grafana because I needed a way to see all my clouds in a single dashboard rather than switching between multiple tabs.
How was the initial setup?
Grafana's forever free tier for the cloud version allowed the initial setup cost to be zero. As I scaled, I moved to a paid tier based on my number of active series and users, which I found to be very fair compared to other observability vendors.
What was our ROI?
I identified over-provisioned servers and reduced my AWS monthly bill by 15%, which is a significant saving in terms of costs. Additionally, I see a 25% improvement in MTTD due to my shift from text-based logs to visualized dashboards.
What's my experience with pricing, setup cost, and licensing?
I purchased my Grafana Cloud subscription through the AWS Marketplace, which simplified my procurement process and allowed me to apply the cost towards my AWS committed spend.
Which other solutions did I evaluate?
I looked at Kibana and Tableau before deciding on Grafana. I chose Grafana because Kibana is mostly limited to Elasticsearch, whereas Grafana can connect to almost any data source. Unlike Tableau, Grafana is specifically optimized for time series data and real-time monitoring.
What other advice do I have?
When Grafana highlights an issue, it will trigger email alerts that engineers can rely on. Immediately when they receive these alerts, they involve other support teams, and a bridge is initialized to start troubleshooting.
For those looking into using Grafana, I advise starting with the Grafana play site to see what is possible and then using the pre-built dashboards from the Grafana dashboard gallery. There is likely already a perfect dashboard available for free tailored to your tech stack.
Grafana is unique in that I can join data from my SQL database with metrics from Prometheus in the same table, a feature I have not found performed as well elsewhere. My overall rating for this product is 10 out of 10.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)