Real-time monitoring has improved performance tracking and has simplified analyzing complex metrics
What is our primary use case?
I work in data analytics with experience in monitoring systems and working with large-scale data. I have used Splunk Observability Cloud in the context of real-time monitoring and performance tracking.
Splunk Observability Cloud works well alongside Splunk Enterprise for logs and integrates with cloud platforms and monitoring tools. It is often used together with other observability solutions. The tracking metrics such as latency, error, and throughput are easily visible. I can also build dashboards for real-time visibility.
We use Splunk Observability Cloud to track latency metrics and identify where slowdowns are happening. We have visualized response time trends and quickly detected performance degradation. We have also used it for infrastructure monitoring. Over the past six months, we have been monitoring metrics such as CPU usage and memory. If there is unusual usage, we identify it quickly using this tool and take action before it impacts our performance.
What is most valuable?
Splunk Observability Cloud has optimized our solutions and helped us understand the metrics. The AI-powered guidance in Splunk Observability Cloud helps us identify patterns and anomalies in system performance data. Instead of manually going through a large volume of metrics, it highlights unusual behavior and potential issues automatically. This makes it easier to detect problems early and understand where to focus, especially in complex systems.
There is definitely log analysis and dashboards. Log monitoring and dashboards have been better using Splunk. Splunk Observability Cloud is the best tool for log monitoring and dashboards. Splunk Observability Cloud feels more focused on real-time metrics and performance tracking compared to some other traditional log-based tools.
What needs improvement?
The learning curve for understanding all features should be improved, and the cost can increase. Splunk Observability Cloud is very costly. Cost is one of the drawbacks.
Sometimes too many alerts, if not configured properly, is a major drawback that could be improved.
The prices are quite high. As I have mentioned earlier, we are Splunk partners, so this has been handled by my other team. However, for other companies and small startups, the prices are very high for them to use Splunk Observability Cloud. Price is a concern.
For how long have I used the solution?
I have been working with Splunk Observability Cloud for the past six to eight months.
What do I think about the scalability of the solution?
We have expanded our team and usage. We are scaling up right now from ten people to twenty-five or thirty. Over time, I expanded my usage by going through basic monitoring and exploring things like setting up custom dashboards. We have gradually expanded our usage from setting up dashboards and alerts.
How are customer service and support?
For customer service, I would rate them eight out of ten because whenever we raise a support case, they are always available for us.
For Splunk real user monitoring, implementation took time because our engineers tried very hard. In case of support, there should be more engineers specifically for this case.
Which solution did I use previously and why did I switch?
We have used different products like Palo Alto and Cribl before moving to Splunk Observability Cloud. As we got a partnership, we have shifted to Splunk Observability Cloud.
What was our ROI?
The information is confidential and I cannot share specific details. However, I can tell you in percentage that fifty to sixty percent of our work has been easy to identify in terms of performance metrics and performance using Splunk Observability Cloud.
It has saved us thirty to forty percent in cost because we used some other tools before that were more costly. As we are Splunk partners, we obtained Splunk Observability Cloud, and our costs have been reduced by thirty to forty percent using this solution.
What other advice do I have?
My overall impression of using Splunk Observability Cloud is that it is a strong tool for real-time monitoring. It does take some time to get fully comfortable with all the features. We have not explored everything right now, but in the future, we are looking forward to using more features.
A part of the implementation has been handled by my other team. I have explored using custom metrics to enrich observability data, mainly by adding application layer or business-related metrics alongside system metrics. I have used custom metrics in a limited way to add more context to monitoring, such as tracking application-specific metrics alongside system data.
Dashboard customization in Splunk Observability Cloud is quite flexible. We care about metrics in different types of visualization, and it helps us organize them in a way that makes sense for monitoring. It allows us to build dashboards tailored to specific use cases. This makes it easier to monitor system performance and quickly identify issues without going through unnecessary data.
The integration in real user monitoring from Splunk Observability Cloud is actually better than from some other tools. If you are looking for the best SIM tool, then Splunk Observability Cloud is for you. If you have funds and capability for the cost, then Splunk Observability Cloud is definitely the best tool you can use.
I have given this review an overall rating of nine out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Powerful Real-Time Insights, But Pricing Can Spiral Without Log Filtering
What do you like best about the product?
Real-time visibility and powerful SPL queries for rapid root cause analysis.
What do you dislike about the product?
High and Unpredictable Costs: The pricing (whether based on data ingestion volume or "Workload" compute units) scales rapidly. If you don't aggressively filter logs before they hit the cloud, your bill can spiral quickly
What problems is the product solving and how is that benefiting you?
Splunk IT Cloud (comprising Splunk Cloud Platform and the Observability suite) is designed to solve the problem of "Data Sprawl"—the overwhelming amount of fragmented information generated by modern, multi-cloud environments.
Observability has exposed tracing gaps and inconsistent metrics while still mapping complex services
What is our primary use case?
In my organization, we have 150 to 160 applications yearly with different frameworks including .NET, Java, and Python based applications. All of them are hosted on different types of servers such as Windows, Linux, ECS, and EKS. With respect to deployments, we integrated Splunk Observability Cloud. Previously, we used Prometheus and Grafana. My organization considered Splunk Observability Cloud to be a premium side of observability, so they switched from our previous solution.
We use the tracing feature in Splunk Observability Cloud.
What is most valuable?
I appreciate the service map and APM in Splunk Observability Cloud the most. This is the main feature I value. The interface is completely UI based, so I can see the complete service map, observe the latency present, and view complete metadata for a particular service or any database-related service. The service map enables a 3D view of the complete application architecture.
With respect to the effectiveness of Splunk Observability Cloud in improving digital resilience within the organization, it was quite similar to other third-party tools. The main distinction is that it has some improved security. We use SignalFlow queries, and with respect to those queries, we work with alerts and the dashboarding part. I can say it provides efficiency with improved security compared to other third-party tools, but in terms of usage, it is quite similar to Prometheus and Grafana.
What needs improvement?
I want to address a disadvantage regarding the service map showing misinformation with respect to latency, which relates to data reliability pulled from AWS cloud or on-premise servers. We saw issues with latency because Splunk APM app shows different data than Prometheus and Grafana. We tried to get premium support and on-call support with Splunk, and they were helpful in troubleshooting, but they ended up with no solution.
Performance with Splunk Observability Cloud is acceptable to me, but the modifications required by users are problematic. I had to build the complete alerting system and monitoring system, which had to be changed. The way they designed this is not optimal. If I compare with Prometheus, we can import and export dashboards, but here we face errors with dialogue boxes. We tried with technical support calls about this, but they were unable to solve it, so I do not understand why export and imports are not functioning.
The overall impression of the no-sample tracing feature in Splunk Observability Cloud, specifically in terms of eliminating blind spots in data collection, is that it needs improvement because the data is not adequate compared to other third parties. We get disturbance in the dashboards and charts while trying to correlate data. The mechanism functions differently manually than it does with a SignalFlow query, and both should be equal. We are unable to replicate from manual processes to the automation method, which is the issue.
The SignalFlow query feature in Splunk Observability Cloud needs improvement because it should function the same as manual processes. When we configure manual queries and then configure them via SignalFlow, they give different outputs. We tried with on-call support about this, but they were unable to address it, indicating there is a bug with the queries that needs improvement.
For enhancements, I would like to see improvements in the OTEL agents, OTEL collectors, and other features in Splunk Observability Cloud. The guidelines in the official documentation are not working at all. We have to deploy processes in our own way, and the documentation works only in 60 percent of the conditions, leaving the remaining 40 percent as problematic and needing improvement.
For how long have I used the solution?
I have used Splunk Observability Cloud for nearly one to one and a half years.
What do I think about the stability of the solution?
I experienced a downtime with Splunk Observability Cloud one time. We were unable to access it for nearly one day, which took a lot of time to resolve. Normally, other tools do not take as much time, and I do not understand why Splunk took so long. From the vendor's end, they should address such issues in a much shorter timeframe. When downtime occurs, it raises concerns about how we measure and receive alerts, as everything needs to be in place.
What do I think about the scalability of the solution?
In terms of lowering the cost of unplanned digital downtime using Splunk Observability Cloud, I found that many users report it is expensive, especially at a large scale, which can be a concern for organizations with tight budgets. At a large scale it is good, but for start-ups and some medium-range companies, it is expensive and they cannot afford it, especially as the cost increases with respect to data volume and retention needs.
How are customer service and support?
Support wise, there are two kinds of support for Splunk Observability Cloud: bi-weekly support and on-call support, with one more being premium support. They need to decrease the price of premium on-call support because as an employee, we require credits to get premium support, and our organization does not have many credits. That is a point where it lagged, but with respect to the bi-weekly calls and on-call support, it was acceptable. Out of five, I can give three for normal support, and four for premium call support.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
Previously, we used Prometheus and Grafana.
Which other solutions did I evaluate?
In comparing Splunk Observability Cloud to other observability platforms I have worked with, I find no key differences in both pros and cons. The integration process is the same across the board, and I feel there is not a real differentiator, as everything is similar in terms of custom dashboards and APM features.
What other advice do I have?
We miss the synthetic monitoring and AI-related features in Splunk Observability Cloud, which I think means front-end monitoring. We touch only the main AWS monitoring and service map, APM, and that is what we are using.
Regarding the ability to enrich data with custom metrics in Splunk Observability Cloud, we configured our breaches based on application performance only. Every application has different SLAs and SLOs, and according to each application, we have configured alerts using baselines that got triggered. We correlate this with multiple factors, such as Java-based memory leaks or garbage collections, and we generate custom metrics with alerts for notification purposes, employing the Webhook URL of Microsoft Teams and Outlook.
The out-of-the-box customizable dashboards provided by Splunk Observability Cloud are effective in showcasing IT performance to business leaders. It offers a nice point, as when we correlate different charts, I get so many x-axis and y-axis options, and we can correlate with other metrics. We have formulas there to find ratios and averages, which was a nice experience offering so many options. We are using the f(x) functions with respect to maximum, minimum, and averages, which are quite good.
On a scale of one to ten where ten is the best, I would rate Splunk Observability Cloud differently. For the UI part, I would rate it an eight, but for the configuration part, I would rate it three to four, as the configuration and integration aspects are not good at all. Overall, I would rate Splunk Observability Cloud a three out of ten.
Which deployment model are you using for this solution?
On-premises
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)