Unified dashboards have empowered teams and have democratized real-time operational insights
What is our primary use case?
My main use case for Grafana involves operational dashboarding and data visualization, where I use it as a central pane of glass to pull in metrics from multiple sources like Prometheus, Elasticsearch, and SQL databases to visualize the overall health of our systems in one unified view.
For example, I have built a NOC dashboard that tracks CPU memory usage and network traffic across all the pods. If a specific service starts failing, the Grafana dashboard highlights the issue in red, allowing my on-call engineers to identify the failing cluster at a glance.
What is most valuable?
Grafana's snapshot and dashboard sharing features are critical for our remote incident response. During production issues, I generate a public snapshot of a dashboard at a specific point and share the URL in our Slack war room so every engineer can see exactly what the metrics looked like when the error occurred. This helps significantly during the process of finding the root cause in those scenarios.
The best features Grafana offers go beyond just pretty charts; it is an integration engine. The fact that I can join data from my SQL database with metrics from Prometheus in the same table is a feature I have not found performed as well elsewhere.
My team uses this feature by comparing two different tables from the databases to show one single view, which Grafana is really helping with. In a visualized way, the charts can be displayed on one dashboard, allowing end users who are not familiar with these technical aspects to extract valuable data from it.
Grafana has positively impacted our organization by democratizing data within our company. Before using Grafana, only developers could see the system health, but now our product managers and executives have their own high-level dashboards, which has improved cross-departmental transparency and alignment.
What needs improvement?
I find that the alerting UI in Grafana can be complex for new users. While it is very powerful, it takes time to learn the differences between contact points, notification policies, and silences.
The documentation can be improved to provide more detailed descriptions, allowing new users to understand more concepts before they come to knowledge transfer sessions with senior team members.
For how long have I used the solution?
I have been using Grafana for over four years to build real-time observability dashboards and monitor our complex infrastructure and application performance.
What do I think about the stability of the solution?
In my experience, Grafana is extremely stable. Even when handling millions of data points, the visualization layer remains responsive. Since it is decoupled from the actual data storage, the dashboard stays up even if one of our underlying data sources is temporarily slow.
What do I think about the scalability of the solution?
Grafana's scalability is impressive. It is highly scalable and built on a big data architecture capable of ingesting trillions of data points. For our on-premise instance, I use a high availability configuration with a shared database to manage growth.
How are customer service and support?
Customer support for Grafana is solid. The community support is massive, and the technical support team is very helpful with complex PromQL troubleshooting.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
Before Grafana, I relied solely on the native monitoring console of our cloud providers, like AWS CloudWatch. I switched to Grafana because I needed a way to see all my clouds in a single dashboard rather than switching between multiple tabs.
How was the initial setup?
Grafana's forever free tier for the cloud version allowed the initial setup cost to be zero. As I scaled, I moved to a paid tier based on my number of active series and users, which I found to be very fair compared to other observability vendors.
What was our ROI?
I identified over-provisioned servers and reduced my AWS monthly bill by 15%, which is a significant saving in terms of costs. Additionally, I see a 25% improvement in MTTD due to my shift from text-based logs to visualized dashboards.
What's my experience with pricing, setup cost, and licensing?
I purchased my Grafana Cloud subscription through the AWS Marketplace, which simplified my procurement process and allowed me to apply the cost towards my AWS committed spend.
Which other solutions did I evaluate?
I looked at Kibana and Tableau before deciding on Grafana. I chose Grafana because Kibana is mostly limited to Elasticsearch, whereas Grafana can connect to almost any data source. Unlike Tableau, Grafana is specifically optimized for time series data and real-time monitoring.
What other advice do I have?
When Grafana highlights an issue, it will trigger email alerts that engineers can rely on. Immediately when they receive these alerts, they involve other support teams, and a bridge is initialized to start troubleshooting.
For those looking into using Grafana, I advise starting with the Grafana play site to see what is possible and then using the pre-built dashboards from the Grafana dashboard gallery. There is likely already a perfect dashboard available for free tailored to your tech stack.
Grafana is unique in that I can join data from my SQL database with metrics from Prometheus in the same table, a feature I have not found performed as well elsewhere. My overall rating for this product is 10 out of 10.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)