Observability has exposed tracing gaps and inconsistent metrics while still mapping complex services
What is our primary use case?
In my organization, we have 150 to 160 applications yearly with different frameworks including .NET, Java, and Python based applications. All of them are hosted on different types of servers such as Windows, Linux, ECS, and EKS. With respect to deployments, we integrated Splunk Observability Cloud. Previously, we used Prometheus and Grafana. My organization considered Splunk Observability Cloud to be a premium side of observability, so they switched from our previous solution.
We use the tracing feature in Splunk Observability Cloud.
What is most valuable?
I appreciate the service map and APM in Splunk Observability Cloud the most. This is the main feature I value. The interface is completely UI based, so I can see the complete service map, observe the latency present, and view complete metadata for a particular service or any database-related service. The service map enables a 3D view of the complete application architecture.
With respect to the effectiveness of Splunk Observability Cloud in improving digital resilience within the organization, it was quite similar to other third-party tools. The main distinction is that it has some improved security. We use SignalFlow queries, and with respect to those queries, we work with alerts and the dashboarding part. I can say it provides efficiency with improved security compared to other third-party tools, but in terms of usage, it is quite similar to Prometheus and Grafana.
What needs improvement?
I want to address a disadvantage regarding the service map showing misinformation with respect to latency, which relates to data reliability pulled from AWS cloud or on-premise servers. We saw issues with latency because Splunk APM app shows different data than Prometheus and Grafana. We tried to get premium support and on-call support with Splunk, and they were helpful in troubleshooting, but they ended up with no solution.
Performance with Splunk Observability Cloud is acceptable to me, but the modifications required by users are problematic. I had to build the complete alerting system and monitoring system, which had to be changed. The way they designed this is not optimal. If I compare with Prometheus, we can import and export dashboards, but here we face errors with dialogue boxes. We tried with technical support calls about this, but they were unable to solve it, so I do not understand why export and imports are not functioning.
The overall impression of the no-sample tracing feature in Splunk Observability Cloud, specifically in terms of eliminating blind spots in data collection, is that it needs improvement because the data is not adequate compared to other third parties. We get disturbance in the dashboards and charts while trying to correlate data. The mechanism functions differently manually than it does with a SignalFlow query, and both should be equal. We are unable to replicate from manual processes to the automation method, which is the issue.
The SignalFlow query feature in Splunk Observability Cloud needs improvement because it should function the same as manual processes. When we configure manual queries and then configure them via SignalFlow, they give different outputs. We tried with on-call support about this, but they were unable to address it, indicating there is a bug with the queries that needs improvement.
For enhancements, I would like to see improvements in the OTEL agents, OTEL collectors, and other features in Splunk Observability Cloud. The guidelines in the official documentation are not working at all. We have to deploy processes in our own way, and the documentation works only in 60 percent of the conditions, leaving the remaining 40 percent as problematic and needing improvement.
For how long have I used the solution?
I have used Splunk Observability Cloud for nearly one to one and a half years.
What do I think about the stability of the solution?
I experienced a downtime with Splunk Observability Cloud one time. We were unable to access it for nearly one day, which took a lot of time to resolve. Normally, other tools do not take as much time, and I do not understand why Splunk took so long. From the vendor's end, they should address such issues in a much shorter timeframe. When downtime occurs, it raises concerns about how we measure and receive alerts, as everything needs to be in place.
What do I think about the scalability of the solution?
In terms of lowering the cost of unplanned digital downtime using Splunk Observability Cloud, I found that many users report it is expensive, especially at a large scale, which can be a concern for organizations with tight budgets. At a large scale it is good, but for start-ups and some medium-range companies, it is expensive and they cannot afford it, especially as the cost increases with respect to data volume and retention needs.
How are customer service and support?
Support wise, there are two kinds of support for Splunk Observability Cloud: bi-weekly support and on-call support, with one more being premium support. They need to decrease the price of premium on-call support because as an employee, we require credits to get premium support, and our organization does not have many credits. That is a point where it lagged, but with respect to the bi-weekly calls and on-call support, it was acceptable. Out of five, I can give three for normal support, and four for premium call support.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
Previously, we used Prometheus and Grafana.
Which other solutions did I evaluate?
In comparing Splunk Observability Cloud to other observability platforms I have worked with, I find no key differences in both pros and cons. The integration process is the same across the board, and I feel there is not a real differentiator, as everything is similar in terms of custom dashboards and APM features.
What other advice do I have?
We miss the synthetic monitoring and AI-related features in Splunk Observability Cloud, which I think means front-end monitoring. We touch only the main AWS monitoring and service map, APM, and that is what we are using.
Regarding the ability to enrich data with custom metrics in Splunk Observability Cloud, we configured our breaches based on application performance only. Every application has different SLAs and SLOs, and according to each application, we have configured alerts using baselines that got triggered. We correlate this with multiple factors, such as Java-based memory leaks or garbage collections, and we generate custom metrics with alerts for notification purposes, employing the Webhook URL of Microsoft Teams and Outlook.
The out-of-the-box customizable dashboards provided by Splunk Observability Cloud are effective in showcasing IT performance to business leaders. It offers a nice point, as when we correlate different charts, I get so many x-axis and y-axis options, and we can correlate with other metrics. We have formulas there to find ratios and averages, which was a nice experience offering so many options. We are using the f(x) functions with respect to maximum, minimum, and averages, which are quite good.
On a scale of one to ten where ten is the best, I would rate Splunk Observability Cloud differently. For the UI part, I would rate it an eight, but for the configuration part, I would rate it three to four, as the configuration and integration aspects are not good at all. Overall, I would rate Splunk Observability Cloud a three out of ten.
Which deployment model are you using for this solution?
On-premises
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Monitoring has improved operational visibility and supports fast, customizable alert dashboards
What is our primary use case?
I work for a managed service provider, so I have different clients that require help in assessing various tools. I work with Splunk, ScienceLogic, and Nagios most frequently because I have small clients as well.
We have Splunk Observability Cloud for some customers. The dashboards are good, and everything is nice, but unfortunately, it doesn't have long-term storage of the logs. So you need to use a data lake to store the logs.
I would like to see agentless deployment and better integration with ticketing systems like ServiceNow, which is the biggest.
We utilize the ability to enrich data with custom metrics in Splunk Observability Cloud to create tickets in ServiceNow. It is integrated with ServiceNow, but we enrich the tickets by putting the logs in the tickets and things of that nature, so it helps us. However, even that is a mixed approach. From Splunk Observability Cloud, you cannot put the logs directly in the tickets. Instead, it will create a ticket and send you an email with the logs. That integration could be improved.
What is most valuable?
Splunk Observability Cloud has helped me improve my operational performance and my customer's operational performance because we use alerting, so we find when things are not working.
I think it is very good for evaluating the effectiveness of Splunk Observability Cloud in improving digital resilience within my customer's environment.
It does provide some return on investment. It is beneficial in terms of finance to use it.
The dashboards in Splunk Observability Cloud are amazing. If you configure them correctly, they are amazing, and it is quite fast as well.
That is a very good feature of Splunk Observability Cloud because it helps us and it gives more trust in the alerts.
What needs improvement?
There are not complexities with the installation of Splunk Observability Cloud, but with the configuration of alerts and everything because Splunk has its own language in the background. You need to know Splunk in order to configure everything that you want.
It requires some in-depth knowledge of the product. It should be more plug-and-play, similar to ScienceLogic. ScienceLogic uses whatever it finds. You can use PowerShell, you can use scripts that you make. Splunk is more on the old style. It uses agents, and you have to deploy the agents.
The out-of-the-box customizable dashboards provided by Splunk are okay, but usually, I have to create new dashboards because every user wants to see something else. The out-of-the-box dashboards help to get started faster, but in the end, I will have to redo them.
I would like to see agentless deployment and better integration with ticketing systems such as ServiceNow, which is the biggest.
We utilize the ability to enrich data with custom metrics in Splunk Observability Cloud to create tickets in ServiceNow. It is integrated with ServiceNow, but we enrich the tickets by putting the logs in the tickets and things of that nature, so it helps us. However, even that is a mixed approach. From Splunk Observability Cloud, you cannot put the logs directly in the tickets. Instead, it will create a ticket and send you an email with the logs. That integration could be improved.
For how long have I used the solution?
I have been working with Splunk Observability Cloud for about two years.
What do I think about the stability of the solution?
I cannot speak to lowering the cost of unplanned digital downtime using Splunk Observability Cloud because the client will get the bills. However, it reduces the downtime for systems. It improved visibility when you do changes and you do patching and you do emergency changes, so you can see if they were applied correctly or not, if the servers are still down.
What do I think about the scalability of the solution?
If it is a new deployment and you have a medium client with about 2,000 users or computers or servers, it will take about six months just to install and configure.
How are customer service and support?
The technical support is very good with Splunk.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
I worked with ScienceLogic before actually working with Splunk.
How was the initial setup?
There are not complexities with the installation of Splunk Observability Cloud, but with the configuration of alerts and everything because Splunk has its own language in the background. You need to know Splunk in order to configure everything that you want.
What about the implementation team?
I do not spend any time personally because I have a team that does it. I have 27 people in my team.
What was our ROI?
It does provide some return on investment. It is beneficial in terms of finance to use it.
What's my experience with pricing, setup cost, and licensing?
I think the pricing for Splunk Observability Cloud is still at a good price. If you are looking at Dynatrace, it is way higher.
Which other solutions did I evaluate?
I am familiar with the Dynatrace operator but I am not actually working with them. I am just looking into differences and tooling and what will benefit my clients better.
What other advice do I have?
You need to know Splunk in order to configure everything that you want.
The out-of-the-box customizable dashboards provided by Splunk are okay, but usually, I have to create new dashboards because every user wants to see something else. The out-of-the-box dashboards help to get started faster, but in the end, I will have to redo them.
We utilize the ability to enrich data with custom metrics in Splunk Observability Cloud to create tickets in ServiceNow. It is integrated with ServiceNow, but we enrich the tickets by putting the logs in the tickets and things of that nature, so it helps us. However, even that is a mixed approach. From Splunk Observability Cloud, you cannot put the logs directly in the tickets. Instead, it will create a ticket and send you an email with the logs. That integration could be improved.
I would rate this product an 8 overall.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Splunk Observability Cloud - A good product in "making"
What do you like best about the product?
Splunk observability cloud often referred as O11y, is a good product in terms of metrics observability and the part that I really liked about it is the integration with Splunk Cloud for logging needs so all in all, each and every key performance indicators about the application metrics-logs are visible under single plane.
What do you dislike about the product?
Splunk O11y Cloud seriously lacks in terms of graph and query customization, the visual customization are very limited which hinders in creating a dashboard with exactly what a user/consumer needs.
What problems is the product solving and how is that benefiting you?
It is way ahead from it competitors in terms of handling huge amount of data and providing it to the end user. The main issue any observability instrument faces is the storage and fetching of huge data in reasonable amount of time. Given Splunk's expertise, atleast the metrics/logs fetching for long data intervals is comparatively faster than any other platform in the market currently.
Synthetic monitoring increases availability and reduces downtime
What is our primary use case?
My main use cases for Splunk Observability Cloud include retail analytics.
What is most valuable?
The feature I appreciate the most about Splunk Observability Cloud is Synthetic Monitoring. These features have benefited my organization by increasing availability and decreasing downtime, providing assurance that makes you feel good, and ultimately enhancing well-being.
The out-of-the-box customizable dashboards are very effective. At the same time, we also use Splunk Cloud to enhance them. The Splunk Cloud is a better dashboarding experience.
Our teams have utilized the ability to enrich data with custom metrics in Splunk Observability Cloud. We've been doing a lot of that with event management and linking that into IT as well. So we're using that to be able to tie systems together. The integration we have between Observability Cloud and ITSI for event management is where we're using that type of stuff.
What needs improvement?
Splunk Observability Cloud could be improved by having more integration with Splunk Cloud because at the moment they're two separate products. They're making great moves on what they call unified access; tighter integration is always a good thing.
For how long have I used the solution?
I have been using Splunk Observability Cloud for three years.
What do I think about the stability of the solution?
I would assess the stability and reliability of Splunk Observability Cloud as generally good. We have experienced the odd bug; however, nothing too serious, and Splunk has been quite good in terms of resolving issues; it's just routine stuff and nothing bad.
What do I think about the scalability of the solution?
Splunk Observability Cloud scales incredibly with the growing needs of my organization. It just means the more we use it, the more expensive it is, but there are no issues reported.
How are customer service and support?
I would evaluate customer service and technical support as fantastic; nobody is better.
How would you rate customer service and support?
How was the initial setup?
During the deployment, we only had some challenges when we switched on unified access. However, they were just teething problems.
What was our ROI?
I have seen a return on investment with Splunk Observability Cloud as we have averted some things that may otherwise have resulted in downtime. We have had it avert potential problems, and the first time it happens is a return on investment. The second time, nobody notices, making measuring business value a challenge.
What other advice do I have?
I would advise other organizations considering this solution to give careful attention to the use cases they have and how they plan to proceed in terms of their roadmap over the next two to three years, as there are alternatives. Having an idea of where you want to go will help you make a better-informed decision.
Additionally, it's good advice to have a customer reference call to learn from someone's experience and avoid pitfalls.
On a scale of one to ten, I would rate Splunk Observability Cloud overall as a good eight; as soon as it's all integrated neatly together, then it's up in the high numbers.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)
Improves incident detection and performance monitoring but UI updates are needed
What is our primary use case?
My primary use cases for Splunk Observability Cloud include creating dashboards for metrics, detecting incidents, and ensuring overall observability of applications, service connections, and integrations, along with reporting and Slack integrations.
What is most valuable?
By visualizing the integration of the service, I can understand the flow of the data, which is one of the features I appreciate most about Splunk Observability Cloud.
With the metrics collection, I can proactively find incidents and work on the major issues when they happen and predict these issues.
With alerting and the detectors, we can inform the engineers that are on call to take over the service responsibility.
With the metrics and the dashboards, we can have a clear view of how the system is performing. Splunk Observability Cloud has helped improve my operational performance by detecting, analyzing tracings, and detecting alerts.
50% of our metrics on Splunk Observability Cloud are custom metrics, so we heavily rely on that. The out-of-the-box customizable dashboards provided by Splunk Observability Cloud are excellent, especially with the Amazon ones, AWS, memory cache, and Kubernetes dashboards, which are complete for the Kubernetes needs.
What needs improvement?
The UI of Splunk Observability Cloud is one of the major issues; it's old and has been there for more than 10 years, acquired by other applications from other companies. It's time to reinvent how the UI is going to work with the AI modules and integrations, making it softer and cleaner.
Splunk Observability Cloud is comprehensive in terms of functionality and features, so educating users has to be more functional. Users need to know how to be educated about certain views or pages they're working on.
For how long have I used the solution?
I have been using Splunk Observability Cloud for five years.
What do I think about the stability of the solution?
I assess the stability and reliability of Splunk Observability Cloud as built on top of reliability because of the Cisco networking and infrastructure. That's not a concern for me; I totally rely on it. I've experienced downtime, crashes, and performance issues with Splunk Observability Cloud, as with any other solution. Comparing it with other monitoring solutions, Splunk has been excellent with availability. When I experienced issues, they were communicated through maintenance windows, resulting in 100% satisfaction with how they conduct this.
What do I think about the scalability of the solution?
Splunk Observability Cloud scales very well with the growing needs of my organization. We didn't have scaling issues as the application evolved. I expanded usage of Splunk Observability Cloud when the company opened new coverage areas in different countries. Adding those metrics or new indexes to Splunk wasn't much of an issue in scaling.
How are customer service and support?
I evaluate customer service and technical support for Splunk Observability Cloud as having only great experiences working with people at Splunk.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
Prior to adopting Splunk Observability Cloud, I was using Datadog, which would accomplish 70% of what Splunk does currently.
How was the initial setup?
There have been so many challenges that I can't name one right now. There is always a challenge in deploying open source material, like the open telemetry modules, that don't have the reliance on Splunk. It's just an integration challenge that we have the most. Deploying Splunk itself wasn't that much of a big deal.
What was our ROI?
I see ROI with Splunk Observability Cloud. My company is heavily dedicated to analytics, so the Splunk deal is significant. I cannot imagine how the business would run without it currently.
What's my experience with pricing, setup cost, and licensing?
I had low pricing and setup costs for Splunk Observability Cloud, and overall, my company has received a good deal on all the features that we have. We just have to understand how to explore it further.
Which other solutions did I evaluate?
Not directly because of Splunk, but the visualization that I have with the main aspects of scaling made us create custom dashboards that proactively detect the changes in scale, and then we can get ready for those changes. We don't have to spend time testing the new capacity when it's already being defined and envisioned by Splunk.
What other advice do I have?
My advice to other organizations considering Splunk Observability Cloud is to watch out for your budget. If I could assess the impact of not having Splunk Observability Cloud, there would be a monetary impact with other solutions. For the business, we would lose resiliency of the system. To imagine the impact, it would be catastrophic.
Splunk has to think about how to redesign Observability Cloud. It came from SignalFx and AppDynamics to Splunk Cloud. It's a merge of different platforms into one, and this merge is being done at a pace where I expected more velocity.
On a scale of one to ten, I rate Splunk Observability Cloud overall as a seven.
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)