We use Splunk to monitor some devices in the company. We have several cloud groups for monitoring the energy companies in the state. The stack has several devices to monitor if you have a problem. There is a mixture of solutions.
Splunk Observability Cloud
SplunkExternal reviews
External reviews are not included in the AWS star rating for the product.
Enables me to supervise the flow and simulate the conditions of the repository across several dashboards
What is our primary use case?
How has it helped my organization?
The solution monitors the system in real-time. We can find the resources and investigate security incidents. Splunk and another solution, AppDynamics, monitor several devices.
We integrate Splunk with a data collection solution, and it plugs in the users to collect data at several points in the network and infrastructure. The data is indexed in Splunk, which can be visualized in different dashboards. Monitoring for fraud is critical for the company because you have to resolve many problems in the infrastructure with federal information in the dashboard.
What is most valuable?
The company has many systems that the customer pays to access. Splunk APM issued via AppDynamics helps find problems in the feed. It reduces the risk of supervising all the devices. I can supervise the flow and simulate the conditions of the repository across several dashboards to show what's happening at the moment.
What needs improvement?
The dashboards are used mainly to visualize information about the infrastructure, but it isn't easy to construct or use the dashboards. While we tried to resolve the issue by calling support, it would be easier if they had an AI co-pilot to identify the problem and help you solve it.
For how long have I used the solution?
I have been using Splunk APM.
What do I think about the scalability of the solution?
Splunk APM isn't easy to scale because you have to follow the steps and implement best practices, which can be a little awkward.
How are customer service and support?
I rate Splunk support 10 out of 10. We had good documentation, and the support team at Splunk has a lot of experience with code and the tool.
How would you rate customer service and support?
Positive
How was the initial setup?
I haven't had any problems deploying Splunk. When I installed Splunk for the first time, I thought the product line was complex because I had to build the solution. After working on it for a while, it has become easier to do the solution next time.
What was our ROI?
Splunk APM is a crucial tool because it controls all the systems and solves a lot of problems.
What other advice do I have?
I rate Splunk APM 8.5 out of 10. It's an excellent solution.
Useful to find statistical similarities between different traces
What is our primary use case?
I use the solution in my company primarily for distributed tracing and metrics troubleshooting. I use the tool to troubleshoot incidents and find the root cause of errors when something goes wrong. I also personally use it to have a developer's understanding of what is going on in my application. Sometimes, there is a case where you might put your application in a library or a new library, and that library also makes calls somewhere. Splunk APM's monitoring can show you that there is a call you are making now that you never used to make in the prior version of the library. In these cases, which you may not know just by looking at the external view of the application code, the tracing part traces everything, including the lowest types of supports.
How has it helped my organization?
The main benefit of the tool I have noticed in the solution is reduced time for the resolution of incidents. The meantime to resolve can help pinpoint the root causes of the issues because you see the connections on the graph in Tag Spotlight. It is easier to pinpoint who is responsible for the incident, especially when you have a larger organization. You have teams that ride services where they need to talk to different services from different teams rather than having to hand off instant resolutions from one team to another. You can often find it much more quickly from the first instance of the problem occurring with the product in place. The tool specifically helps your sites move up more frequently, and then when it does go down, the solution finds the root cause and gets it back up as fast as possible.
What is most valuable?
The most valuable feature of the solution, and my favorite, is always Tag Spotlight, especially considering the way they slice and dice all of Splunk APM's traces by span attributes.
I like the tool because it looks at a whole set of traces in aggregate, which means that it can find statistical similarities between different traces. Often, the cases are such that you will find some traces that show an error and have some other common attribute, which is much more apparent when you look at the feature known as Tag Spotlight rather than just looking at an overall metric. I like Tag Spotlight as it is one of the most simple to use features.
The meantime to resolve, or MTTR, can help pinpoint the root causes of the issues because you see the connections on the graph in Tag Spotlight. I don't personally have metrics associated with MTTR. I am more of the implementer of making certain that all the data is going in and looking at the debugging part. I am not a part of the set of people who keep track of the tool's MTTR.
In our company's case, we have reasonably good metrics related to the meantime to detect. I can't get a rough number when it comes to the meantime to detect, so I don't know for sure. My guess is that we often detect problems reasonably well. Our company figures out that there is some problem, but we just don't know where it is, so I feel that if there is an improvement, then it is mostly in the area of meantime to resolve. When it comes to the meantime to detect, I think our existing metrics are probably sufficient, and adding Splunk APM makes it much easier to detect the resolution time.
The tool has improved our organization's business resilience. In terms of resilience, in the tool, it is possible not to have downtime and make certain things up and running. The faster you get to web pages working again, the more people can actually do things that they want to do, such as trade players on their NFL Fantasy teams. In general, it gives out a better business result.
What needs improvement?
In our company's case, we have some very high throughput services, so they might be getting 10,000 requests per second. Currently, Splunk APM and Splunk Observability want to do things in a way that wants you to send every single span for every single request that is a part of the 10,000 requests per second. The process may give you all the data in the back end, but a lot of data, including CPU memory and network costs, is involved in sending data to Splunk. My feeling is that it would be nice if there were an easier way to send only a sample of my traces, which means that I send 10 percent or 5 percent, and then Splunk would extrapolate on the back end. It is obvious that with 10 percent of traces, the real metrics are something like ten times with a plus or minus margin of error. I am okay with the plus or minus margin of error because I think when you have a high enough request rate, you will see such problems appear even in a lower sample population. The process is political polling. You don't call all 150,000,000 people in the US and ask them who they are going to vote for, and I feel it is better if you choose to take a sample of maybe 10,000 and then extrapolate your findings to the rest. I feel the same should be applicable to trace something in Splunk APM.
For how long have I used the solution?
I have been using Splunk APM for two years.
What do I think about the stability of the solution?
I really haven't noticed anything going wrong with the tool's stability, and I haven't seen any downtime. I don't know if my company is necessarily measuring the stability part by ourselves, but at least for me, it is a pretty growing and solid solution.
What do I think about the scalability of the solution?
There is one issue with the tool's scalability. In our company, we are fairly big in terms of the number of containers we have, especially since we can run very large clusters. When you look at some of the charts, it will say 30,000 time series, reached the limit, and cannot show anymore, or it states that a particular data may not be complete. For me, it is a problem that I would like to see fixed. I have spoken to Spunk's team about it, and they have told me that they do recognize the issue and that other people have also mentioned the same problem. Once you see the issues related to the scalability part, you need to understand that it is a warning triangle. After seeing the warning triangle, you need to realize that you cannot trust any of the numbers you see in the chart because it is not a complete, full data set. I want the tool to either tell me that it can't show me the numbers or that I need to find some way to show all the numbers in a more summarized view. The tool asks you to filter things down more, but it would be nice to offer specific suggestions as to what you could filter down to get it into a more specific or reasonable number. In some cases, my company just has to have a number, considering that we have 1,00,000 containers. If I want to know how many containers are running, currently, the way the backend works in a way where it requires to know how many different time series there are, and then it just says that the 30,000 limit has been reached, but when it happens, I don't know whether it is for 1,00,000 containers, 1,20,000 containers or 80,000 containers.
How are customer service and support?
The technical support team for the solution is good for our company. My company has a weekly meeting with Splunk's sales support team, and if there are any issues, we bring them up for discussion. I have seen that the technical support team is super responsive.
Which solution did I use previously and why did I switch?
My company has its own internal solution, which was built ten to fifteen years ago, and it has progressed over time, but it is only ever used to support metrics and events, not for tracing. In short, it is not used for Splunk APM-related stuff, which is a big change that makes a difference for us.
How was the initial setup?
The product's deployment phase is good and very easy because it is done with OpenTelemetry for most of the parts. The product's deployment is not some custom thing where you have to deploy a particular agent that belongs to a particular company and put it on every single host. It is very easy to follow OpenTelemetry's models for the most part. Splunk is a very big contributor to OpenTelemetry, and I value it. It consists of the reasons I recommend using Splunk as a backend provider. In my company, we are more open to being more of an OpenTelemetry-compliant organization instead of going for other vendors.
What was our ROI?
I can't speak about the tool's ROI since I get paid, but I don't have to spend money on the product.
What's my experience with pricing, setup cost, and licensing?
I don't have much insight into the costs and licensing area attached to the tool. I am the engineer and developer, not the person who writes the checks in the company. I know that my company has a Splunk Enterprise Security license which is used for logging and even for Splunk Observability.
What other advice do I have?
I think the tool has the best trace aggregation features compared to what I have seen in different products, and I feel Tag Spotlight is a good example of it. A lot of the other products support tracing, but when you look at them, you see that they show one trace at a time. I can deep dive into one trace at a time, but what I want to find is commonality across the traces. I think it will give the tool a high grade for all its features. I rate the tool highly since it offers a very good Kubernetes integration. With a lot of data, you can see which part the Kubernetes host is running on, switch between them, and see the application metrics and the actual infrastructure metrics. Seeing it all together can be very useful.
I rate the tool a nine out of ten.
Makes troubleshooting easier and helps consolidate all the information in one place
What is our primary use case?
My customers used the solution for application performance in uptime and networking.
How has it helped my organization?
Splunk Infrastructure Monitoring has helped our customer's organization by making troubleshooting easier. The solution helped them have a centralized place where they could dig in across multiple other tools and consolidate all the information in one place.
What is most valuable?
Splunk Infrastructure Monitoring provided our customers with visibility into their overall infrastructure. They could quickly start identifying where the problems were coming from. If something was going sideways, they could more easily target the specific pathways.
One of our customers was on-premises. The other was a hybrid with on-premises and private cloud.
I was on a team helping them build a brand new tool, which was instantaneous. Another team got it a while ago, and they weren't sure what to do with it. So, we came in and helped them over a six-week engagement. We pivoted them from not feeling like they were getting all that much value to getting good value. It was more of a learning curve situation.
Splunk's unified platform has helped our customers consolidate networking, security, and IT observability tools. I was on the team of a company that was helping build a brand-new monitoring solution. They had probably a dozen separate stand-alone silo tools that could not talk to each other.
Instead of logging on to 12 different places to check each tool individually, Splunk Infrastructure Monitoring helped consolidate everything into a single location for viewing. We didn't get them to the point where they were ready to fully decommission the other systems.
They were going to decommission 12 systems on the six-month game plan. By now, they would have realized the cost savings. It would have been a multimillion-dollar savings for them.
Our customer, with 12 separate systems, was all on-premises. Part of our other customer's footprint was in AWS. It was incredibly easy for our customers to monitor multiple cloud environments using Splunk Infrastructure Monitoring. It was a combination of cloud and on-premises for our customer.
The solution provided them with a single pane of glass where they didn't have to log into multiple places and see everything in a single location. You can develop dashboards that give you cross-platform visibility, which is a huge win.
What needs improvement?
A wide variety of logging makes log onboarding difficult. Over the years, Splunk has done various things to make it easier, so I want to give them props for that. However, the reality is that every vendor has its own logging format. Some vendors have multiple log formats because they change their own products over time.
They have different log formats for different products in their own suites, and no industry standard makes it chaotic. Splunk is probably the best product out there in terms of how they handle it, but it's not perfect yet. They need to keep pushing that cutting edge and trying to improve it. I have no idea how they could do that because they're trying to wrangle chaos, and it's hard.
For how long have I used the solution?
I have been using Splunk Infrastructure Monitoring for two years.
What do I think about the stability of the solution?
I think Splunk Infrastructure Monitoring is a solid product from an infrastructure perspective. I haven't seen any bugs in the tool. Like many things with Splunk, everybody knows there will be patches when there's a core upgrade. However, that's more with Splunk Core and not specifically the Splunk Infrastructure Monitoring part.
What do I think about the scalability of the solution?
The solution's scalability is wonderful. I've worked with customers as small as 25 gigs a day, which is tiny, all the way up to close to a petabyte a day. You have to make sure you scale the tool intelligently, but it's more of a budgetary constraint than a technical one. The solution handles the big ones beautifully if you have the budget to have the needed hardware.
How are customer service and support?
Splunk's technical support has significantly improved in the last year. The support went through a rough patch about a year and a half ago. I had to coerce customers to use it because it was really bad there for a while. Splunk's support has vastly improved recently, and I hope it continues to improve.
Those people who changed the attitude, mindset, and processes need all the accolades because it's so much better than it was. Unfortunately, that does mean that it was really bad at one point.
Splunk's technical support still has some room for improvement in certain areas. Mostly, you can tell the more junior people who just read off of a script and really don't know where to go. I always introduce myself as a consultant to let the support person know that I have already done the basic introductory troubleshooting, and they can skip the first ten pages in their script.
Some frontline people in Splunks' support team are wonderful and clearly have more experience. However, it is still obvious that they occasionally bring in somebody brand new who's a little lost.
I rate the technical support seven and a half to eight out of ten.
How would you rate customer service and support?
Positive
How was the initial setup?
I've worked with Core Splunk as a consultant for seven years and was a customer for seven years before that. So I've seen it all: the good, the bad, the ugly, and everything in between. Usually, the actual building of Splunk is super easy because I've done it so many times. Every customer's environment is unique in terms of how to get the data.
It's more about navigating the local customer's politics and archaic technical debts. Somebody thought that a certain architecture was a good idea ten years ago, but today, that doesn't make any sense whatsoever. Wrangling customer chaos is hard, but the Splunk piece is usually easy.
What other advice do I have?
There's always room for improvement, but Splunk Infrastructure Monitoring is a solid product overall. It definitely helps customers who have a lot of legacy systems that don't work well together.
Overall, I rate the solution an eight out of ten.
Monitors attacks or unauthorized access to the information we want to protect
What is our primary use case?
We use the solution to do a lot of email checking. We also use the tool to monitor different embassies, server IPs and some of the teams.
How has it helped my organization?
Splunk Infrastructure Monitoring has helped our organization tremendously. We have onboarded Splunk for the last four years, and we have 30 to 40 contractors who use Splunk daily. The solution has helped not just a small organization like ours but the whole DOS (Department of State).
What is most valuable?
The solution monitors attacks or unauthorized access to the information we want to protect. There is a dashboard called ISSO that monitors pretty much everything worldwide. We also monitor almost 300 embassies and consulates.
What needs improvement?
The solution's machine learning deployment is hard and should be made user-friendly. Even if a team doesn't have a data scientist, they should be able to use the machine learning toolkit for monitoring purposes. The solution should include more algorithms and SPL commands that people can use.
For how long have I used the solution?
I have been using Splunk Infrastructure Monitoring for four months.
What do I think about the stability of the solution?
We haven’t faced any issues with the solution’s stability.
What do I think about the scalability of the solution?
Splunk Infrastructure Monitoring is highly scalable. We were able to do monitoring and some of the advanced analytics.
How are customer service and support?
I have not contacted Splunk's technical support. We have contacted our account manager for issues, and she's been awesome.
What about the implementation team?
We have different vendors who do deployments, which is different for the government than regular businesses.
What was our ROI?
We have seen a return on investment with Splunk Infrastructure Monitoring regarding the kind of threats we can identify.
What's my experience with pricing, setup cost, and licensing?
Splunk Infrastructure Monitoring is an expensive solution.
What other advice do I have?
Our organization monitors multiple cloud environments using Splunk Infrastructure Monitoring, which works well. This is the only tool we use, and we aren't considering moving or having additional tools.
It is important for our organization that Splunk Infrastructure Monitoring has end-to-end visibility into our cloud-native environments. Our job is critical and very sensitive, so having end-to-end visibility is really helpful.
Splunk Infrastructure Monitoring has helped reduce our mean time to resolve. Looking at the solution's dashboards has helped tremendously because we don't have to look at the individual index or events.
Our business is different from that of a private organization, and Splunk Infrastructure Monitoring has helped improve our organization's business resilience. The machine learning toolkit allows us to do clustering, and we have a couple of deployments on the clusters. That has helped cluster different events based on their critical or security threats.
We have seen time to value using Splunk Infrastructure Monitoring.
Splunk's unified platform has helped consolidate networking, security, and IT observability tools. We don't have to integrate Splunk with a different tool and worry whether those two will integrate. Having everything in one platform helps us create dashboards, alerts, and monitoring tools in one place.
Overall, I rate the solution an eight or nine out of ten.
Helps organizations achieve compliance control and provides all the data to users in a single place
What is our primary use case?
I use the solution in my company for our customers who use the tool for auditing and compliance in the area of DoD/AC. My company's customers have compliance controls, and STIG controls that they have to satisfy for their ETL processes.
How has it helped my organization?
The tool has helped our customer's organization in achieving compliance control. When our customer's organization has an inspection or when the DoD inspects their infrastructure, they can show their auditors that they are compliant. They can show the auditors the dashboards and verify that they are ingesting data from the sources and how all their hosts are being monitored. They can show everything to auditors, check the box, make sure that everything looks green, and then they continue to have authorization to operate.
What is most valuable?
The most valuable piece of Splunk Infrastructure Monitoring for our company's customers revolves around the data for everything. Everything produces data, and all the data can get ingested, whether it is Windows, RHEL, VMware products, Pure Storage products, or a custom product. Configuring data ingestion and performing everything in Splunk Infrastructure Monitoring is possible. At the same time, a lot of the other SIEM tools focus on a specific type of data. The benefit of Splunk Infrastructure Monitoring is that one can see all their data in one place.
What needs improvement?
There is not a lot of support for the tool's on-premises version, especially since everything is on the cloud. In my company, we had a really good demo this morning on Keynote, which touches on the APM part, and it was super cool. There was also a demo on AI assistant, which was super cool. It is hard to increase the options for a particular customer when so much of the stuff is limited to the cloud, and there is so much focus on the cloud part.
For how long have I used the solution?
I have been using Splunk Infrastructure Monitoring for three years for my customer, who has been using it for longer than when I started to use it.
What do I think about the stability of the solution?
The tool's stability is great.
What do I think about the scalability of the solution?
The tool's scalability is great. My company just moved Splunk from VMs to containers for our customers, so I would say that we have put it on Kubernetes on Tanzu, which has been great for them.
How are customer service and support?
Support is an area I have not really reached out to on behalf of our customers. I usually just go to Splunk Answers or rely on my colleagues to get what I need. My company has never opened a support ticket with Splunk for our customers.
Which solution did I use previously and why did I switch?
I don't know what one of my company's customers had used before Splunk Infrastructure Monitoring. They may have used some other solutions, but I have been on contract with them for three years.
What was our ROI?
In terms of ROI, I can say that I have seen a decreased amount of time spent on our company's end validating data ingested from an auditing perspective, especially when we are talking about their authorization to operate. With the tool, it is much quicker to view all your data in one place than it is to go show an auditor 15 different data sources. You can show it all together to the auditor.
What's my experience with pricing, setup cost, and licensing?
Licensing cost is the biggest argument I get from those divesting from Splunk. There are those within our organization who say we are going to go to other tools since Splunk is too expensive. Till now, I have been able to ask others to look at the value Splunk adds to the company, and I have been able to convince them that it is worth it, but that might not always be the case if licensing continues to be an issue, especially if costs continue the way they are and if other solutions offer more competitive pricing for similar results.
What other advice do I have?
The tool is not used to monitor multiple cloud environments.
It is not important for our company that Splunk Infrastructure Monitoring provides end-to-end visibility into your cloud-native environment.
The tool has helped improve our organization's business resilience.
The tool does the job very well. It is easy for me to use, especially as a trained person in Splunk products. The tool also does the job very well. With the tool in place, I can get Windows or RHEL. I can do things like scripted input on a forwarder. Splunk Universal Forwarder are so much more than if I just use Syslog, for example, to just get data. I can do a lot more with Splunk than just ingesting data via something like Syslog.
I rate the tool an eight out of ten.
Provides good optimization, performance, and visibility
What is our primary use case?
We use the solution to monitor and calculate the number of systems, applications, and DR sites we have. Then, if there is any problem, we can detect the information on which server belongs to which application. This really helps us.
How has it helped my organization?
We have seen 28% to 29% optimization and performance with Splunk Infrastructure Monitoring. You will know the moment you see any anomaly in the system, the server, or the infrastructure. The solution has given us more visibility not only from the infrastructure or server point of view but also from the network perspective.
What is most valuable?
Splunk's GUI and dashboard capacity are the most valuable features of Splunk Infrastructure Monitoring.
Compared to Microsoft Azure, Splunk Infrastructure Monitoring can ingest all the log sources. You can ingest all the data in one single source. Then, it accumulates the data, calculates internally, and gives you the right information you're looking for. Splunk Infrastructure Monitoring is the optimal solution, where you can see everything on one screen.
Our organization monitors multiple cloud environments, including GCP (Google Cloud Platform) and AWS (Amazon Web Services).
We're all completely dependent on Splunk's end-to-end visibility into our cloud-native environment to see everything, including any incident that comes.
Splunk Infrastructure Monitoring has helped drastically improve our meantime to resolve, detect, and investigate.
The solution has helped reduce our mean time to resolve by 28%, which is a huge number. We aim to reduce it by 30% to 37%, but that would definitely require some AI concept and new enterprise security. That's our plan for next year.
Splunk Infrastructure Monitoring has helped improve our organization's business resilience. The moment you receive an incident, you have full visibility. You can go deep into the investigation, do threat hunting, and find the root cause analysis. That's the visibility and performance we look for in enterprise security solutions like Splunk.
Splunk's unified platform helps consolidate networking, security, and IT observability tools. When you have multidimensional solutions and a multi-cloud environment, you have specific applications for finance and patient care. You can see everything consolidated in one solution.
DevOps and GRC compliance solutions come into one solution, and visibility extends. That gives you confidence, and we build trust with the business. Businesses are confident when they're going outside. Because we have full visibility, we provide that trust to the patient and my health care entities that we are safe.
What needs improvement?
The utilization of the use cases is not available. You need to write custom out-of-the-box use cases. There's no standard use case available where you can see the utilization of the number of use cases I have. For example, if you have 200 use cases, do you know if you are utilizing all 200 and if they are actually clicking at the right time?
If I can work 20 use cases out of 200, it is 20% utilization for the use cases. So, I'll focus more on 20% and try to optimize them based on my business requirements rather than focusing on 200.
For how long have I used the solution?
I have been using Splunk Infrastructure Monitoring for six years.
What do I think about the scalability of the solution?
The solution's scalability is marvelous because we can just add on. We are currently using two TB, and the solution gives us the flexibility to add an extra 500 GB next month.
How are customer service and support?
Sometimes, we face technical difficulties because of the limitations of the connectors. Integrating Splunk with post-relational databases like InterSystems is challenging because such applications or databases are not very much publicly exposed. The technical team faces a lot of challenges when integrating because they need to write some custom connectors to integrate the data.
We have some clinical applications specific to a particular specialty, and you have different applications and databases for that. For that, you need to write custom connectors. Sometimes, the technical team lingers on and passes the time because they're also exploring.
I rate the solution's technical support seven and a half out of ten.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
We previously used a different solution called RSA. We switched to Splunk because RSA was not providing the latest changes and many of the upgrades we were expecting. Also, a lot of functionality we were expecting, like XDR, optimization processes, and connectors, was not available. We used RSA for four and a half years. RSA had performance issues, and a lot of use cases were not met because it was an old solution.
What about the implementation team?
We had a system integrator who initially helped us integrate and deploy the solution. They helped us to deploy the solution, and we take their help to develop any new use cases.
What was our ROI?
We have seen a return on investment with the solution. Our KPIs have become smooth. When we have more visibility, our KPIs definitely increase. We can easily measure meantime to detect and meantime to resolve. You will definitely be up to the mark when your incident response capability increases. Our performance has increased. Our IT environment and DevOps team have more visibility and are more transparent now.
What's my experience with pricing, setup cost, and licensing?
The solution's pricing is costly. We're now looking for a cloud version that would have a completely different pricing calculation.
What other advice do I have?
Splunk Infrastructure Monitoring has use case capability, visibility capability, and performance. It also has a vast dashboard capability that no other solution currently provides. There are many solutions in the market, but Splunk stands out separately. With Splunk Infrastructure Monitoring, you can correlate data and ingest any kind of data with your connectors. Flexibility is another important functionality of Splunk.
Overall, I rate the solution an eight out of ten.
Used for troubleshooting purposes and to understand the bottlenecks of applications
What is our primary use case?
We use Splunk APM to understand and know the inner workings of our cloud-based and on-premises applications. We use the solution mainly for troubleshooting purposes and to understand where the bottlenecks and limits are. It's not used for monitoring purposes or sending an alert when the number of calls goes above or below some threshold.
The solution is used more for understanding and knowing where your bottlenecks are. So, it's used more for observability rather than for pure monitoring.
What is most valuable?
The solution's service map feature allows us to have a holistic overview and to see quickly where the issues are. It also allows us to look at every session without considering the sampling policy and see if a transaction contains any errors. It's also been used when we instrument real use amounts from the front end and then follow the sessions back into the back-end systems.
What needs improvement?
Splunk APM should include a better correlation between resources and infrastructure monitoring. The solution should define better service level indicators and service level objectives. The solution should also define workloads where you can say an environment is divided up by this area of back end and this area of integration. The solution should define workloads more to be able to see what is the service impact of a problem.
For how long have I used the solution?
I've been using Splunk APM in my current organization for the last 2 years, and I've used it for 4-5 years in total.
What do I think about the stability of the solution?
Splunk APM is a remarkably stable solution. We have only once encountered an outage of the ingestion, which was very nicely explained and taken care of by the Splunk team.
I rate the solution a 9 out of 10 for stability.
What do I think about the scalability of the solution?
Around 50 to 80 users use the solution in our organization. The solution's scalability fits what we are paying for. On the level of what we pay for, we have discovered both the soft limit and the hard limit of our environment. I would say we are abusing the system in terms of how scalable it is. Considering what we are paying for, we are able to use the landscape very well.
We have plans to increase the usage of Splunk APM.
How are customer service and support?
Splunk support itself leaves room for improvement. We have excellent support from the sales team, the sales engineers, the sales contact person, and our customer success manager. They are our contact when we need to escalate any support tickets. Since Splunk support is bound not to touch the consumer's environment, they cannot fix issues for us. It's pretty straightforward to place a support ticket.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
We have previously used AppDynamics, Dynatrace, and New Relic. We see more and more that Splunk APM is the platform for collaboration. New Relic is more isolated, and each account or team has its own part of New Relic. It's very easy to correlate and find the data within an account. Collaborating across teams, their data, and their different accounts is very troublesome.
With Splunk APM, there is no sensitivity in the data. We can share the data and find a way to agree on how to collaborate. If two environments are named differently, we can still work together without infecting each other's operations.
How was the initial setup?
If you're using the more common languages, the initial deployment of Splunk APM is pretty straightforward.
What about the implementation team?
The solution's deployment time depends on the environment. If the team uses the cloud-native techniques of TerraForm and Ansible, it's pretty straightforward. The normal engagement is within a couple of weeks. When you assess the tool they need and look at the architecture and so on, the deployment time is very, very minimal. Most of the time spent internally is caused by our own overhead.
What's my experience with pricing, setup cost, and licensing?
We have a very good conversation with our vendor for Splunk APM. We have full transparency regarding the different license and cost models. We have found a way to handle both the normal average load and the high peak that some of our tests can cause. Splunk APM is a very cost-efficient solution. We have also changed the license model from a host-based license model to a more granular way to measure it, such as the number of metric time series or the traces analyzed per minute.
We have quite a firm statement that for every cost caused within Splunk, you need to be able to correlate it to an IT project or a team to see who the biggest cost driver is. As per our current model, we are buying a capacity, and we eventually want to have a pay-as-you-go model. We cannot use that currently because we have renewed our license for only one year.
What other advice do I have?
We are using Splunk Observability Cloud as a SaaS solution, but we have implemented Splunk APM on-premises, hybrid, and in the cloud. We are using it for Azure, AWS, and Google. Initially, the solution's implementation took a couple of months. Now, we are engaging more and more internal consumers on a weekly basis.
We implement the code and services and send the data into the Splunk Observability Cloud. This helps us understand who is talking to whom, where you have any latencies, and where you have the most error types of transactions between the services.
Most of the time, we do verification tests in production to see if we can scale up the number of transactions to a system and handle the number of transactions a business wants us to handle at a certain service level. It's both for verification and to understand where the slowness occurs and how it is replicated throughout the different services.
We can have full fidelity and totality of the information in the tool, and we don't need to think about the big variations of values. We can assess and see all the data. Without the solution's trace search and analytics feature, you will be completely blind. It's critical as it is about visibility and understanding your service.
Splunk APM offers end-to-end visibility across our environment because we use it to coexist with both synthetic monitoring and real user monitoring. What we miss today is the correlation to logs. We can connect to Splunk Cloud, but we are missing the role-based access control to the logs so that each user can see their related logs.
Visualizing and troubleshooting our cloud-native environment with Splunk APM is easy. A lot of out-of-the-box knowledge is available that is preset for looking at certain standard data sets. That's not only for APM but also for the available pre-built dashboards.
We are able to use distributed tracing with Splunk APM, and it is for the totality of our landscape. A lot of different teams can coexist and work with the same type of data and easily correlate with other systems' data. So, it's a platform for us to collaborate and explore together.
We use Splunk APM Trace Analyzer to better understand where the errors originate and the root cause of the errors. We use it to understand whether we are looking at the symptom or the real root cause. We identify which services have the problem and understand what is caused by code errors.
The Splunk Observability Cloud as a platform has improved over time. It allows us to use profiling together with Splunk Distribution of OpenTelemetry Collector, which provides a lot of insights into our applications and metadata. The tool is now a part of our natural workbench of different tools, and it's being used within the organization as part of the process. It is the tool that we use to troubleshoot and understand.
Our organization's telemetry data is interesting, not only from an IT operational perspective but also to understand how the tools are being used and how they have been providing value for the business. It is a multifaceted view of the data we have, and it is being generated and collected by the solution.
Splunk APM has helped reduce our mean time to resolve. Something that used to take 2-3 weeks to troubleshoot is now done within hours. Splunk APM has freed up some resources if we are going to troubleshoot. If you spend a lot of time troubleshooting something and can't find a problem, we cannot close the ticket saying there's no resolution. With Splunk APM, we can now know for sure where we have the problem rather than just ignoring it.
Splunk APM has saved our organization around 25% to 30% time. It's a little bit about moving away from firefighting to be preventive and estimate more for the future. That's why we are using it for performance. The solution allows us to help and support the organization during peak hours and be preventative with the bottlenecks rather than identify them afterward.
Around 5-10 people were involved in the solution's initial deployment. Integrating the solution with our existing DevOps tools is not part of the developer's IDE environment, and it's not tightly connected. We have both subdomains and teams structured. Normally, they also compartmentalize the environment, and we use the solution in different environments.
Splunk APM requires some life cycle management, which is natural. In general, once you have set it up, you don't need to put much effort into it. I would recommend Splunk APM to other users. That is mainly due to how you collaborate with the data and do not isolate it. There is a huge advantage with Splunk. We are currently using Splunk, Sentry, and New Relic, and part of our tool strategy is to move to Splunk.
As a consumer, you need to consider whether you are going to rely on OpenTelemetry as part of your standard observability framework. If that is the case, you should go for Splunk because Splunk is built on OpenTelemetry principles.
Compared to other tools using proprietary agents and proprietary techniques, you may have more insights into some implementations. However, you will have a tighter vendor lock-in, and you won't have the portability of the back end. If you rely on OpenTelemetry, then Splunk is the tool for you.
Overall, I rate the solution a 9 out of 10.
Provides end-to-end visibility, simplifies application performance monitoring, and makes monitoring logs easy
What is our primary use case?
We use Splunk APM for performance testing.
How has it helped my organization?
Splunk offers end-to-end visibility across our environment.
Splunk APM simplifies application performance monitoring. It also provides insights into data quality, including data security, integration, ingestion, and versioning of trace logs. We can directly inject data for monitoring purposes, trace the data flow, and monitor metric values.
Splunk can ingest data in any format, allowing us to easily monitor logs and identify blockages through timestamps, which saves us time.
What is most valuable?
The most valuable feature is dashboard creation. This allows us to easily monitor everything by setting the data we want to see. For example, imagine we're working on a project within the application. There might be different environments, such as development, testing, and production environments. In the production environment, we can use dashboards to monitor customer activity, like account creation or other user data. This gives us a clear view of how transactions are performing and user response times. This dashboard creation feature is one of the most beneficial aspects of Splunk that I've used in a long time. While Splunk offers many features, including integration with various DevOps tools, its core strength lies in data monitoring and collection.
What needs improvement?
Splunk's functionality could be improved by adding database connectors for other platforms like AWS and Azure.
For how long have I used the solution?
I have been using Splunk APM for one year.
Which solution did I use previously and why did I switch?
We previously used a legacy application for monitoring and when it was decommissioned we adopted Splunk APM.
What's my experience with pricing, setup cost, and licensing?
Splunk offers a 14-day free trial and after that, we have to pay but the cost is reasonable.
What other advice do I have?
I would rate Splunk APM eight out of ten.
Splunk APM requires minimal maintenance and can be monitored by a team of three.
Provides great visibility, analysis, and data telemetry
What is our primary use case?
We use Splunk APM to monitor the performance of our applications.
How has it helped my organization?
Splunk APM offers end-to-end visibility across our entire environment. We need to control how many types of metrics are ingested by Splunk APM from all incoming requests. While we allow some metrics to be collected, Splunk APM provides the ability to track each request from its starting point to its endpoint at every stage.
Splunk APM trace analyzer allows us to analyze a request by providing its trace ID. This trace ID gives us a detailed breakdown of how the request entered the system, how many services it interacted with along the way, and its overall path within the system. We can also identify any errors that occurred during the request's processing and track any slowness or latency issues. This information is very helpful for troubleshooting performance problems in our application.
Splunk APM telemetry data has been incredibly valuable. While we faced challenges with Splunk Enterprise, such as the lack of a trace analyzer, Splunk APM's user interface is modern and highly flexible. The wide range of data it provides has significantly improved our incident response times, allowing us to quickly create alerts and adhere to the infrastructure as code principle. Splunk APM also proves beneficial during load testing, contributing to a positive impact on our overall infrastructure performance analysis.
Splunk APM helps us reduce our mean time to resolution. With its fast and accurate alerting system, we can quickly identify the exact location of issues. This pinpoint accuracy streamlines the investigation process, leading to faster root-cause analysis.
Splunk APM has helped us save significant time. We're now spending less time resolving production incidents and analyzing performance data. This focus on Splunk APM allows us to dedicate more time to other areas.
What is most valuable?
Detectors are a powerful feature. They create signal flow code in a format similar to Splunk APM language. For example, if we select five conditions, the detector can automatically generate the code for that signal flow. This code can then be directly integrated into our Terraform modules, streamlining the creation of detectors using Terraform. This is particularly helpful because our infrastructure adheres to a well-defined practice, and detectors help automate this process.
APM dashboards are another valuable tool. They provide more comprehensive information than traditional spotlights. One particularly useful feature is the breakdown of a trace ID. This breakdown allows us to see the entire journey of a request, including where it originated, any slowdowns it encountered, and any issues it faced. This level of detail enables us to track down the root cause of performance problems for every request.
What needs improvement?
We currently lack log analysis capabilities in Splunk APM. Implementing this functionality would be very beneficial. With log analysis, we could eliminate our dependence on Splunk Enterprise and rely solely on APM. The user interface design of APM seems intuitive, which would likely simplify setting up log-level alerts. Currently, all log-level alerting is done through Splunk Enterprise, while infrastructure-level alerting has already transitioned to Splunk APM.
The Splunk APM documentation on the official Splunk website could benefit from additional resources. Specifically, including more examples of adapter creation and management using real-world use cases would be helpful. During our setup process, we found the documentation lacked specific implementation details. While some general information was available on public platforms like Google and YouTube, it wasn't comprehensive. This suggests that others using Splunk APM in the future might face similar challenges due to the limited information available on social media. It's important to remember that many users rely on social media for setup guidance these days.
For how long have I used the solution?
I have been using Splunk APM for 1.5 years.
What do I think about the stability of the solution?
While Splunk APM occasionally experiences slowdowns, it recovers on its own. Fortunately, these haven't resulted in major incidents because most maintenance is scheduled for weekends, with ample notice provided in advance. We have never experienced any data loss that occurred during previous slowdowns.
How are customer service and support?
Splunk APM customer support is helpful. They promptly acknowledge requests and provide regular updates. They've been able to fulfill all our information requests so far. However, Splunk APM is a constantly evolving product. This means there are some limitations due to ongoing industry advancements. They are actively working on incorporating customer feedback, such as the CV request. Overall, the customer support is excellent, but the desired features may not all be available yet.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
Previously, we used Grafana, but we faced challenges that led us to switch to Splunk APM. Since then, Splunk has become our primary tool for data analysis. In our experience, Splunk offers several advantages over Grafana. Setting up and using Splunk is significantly easier than Grafana. Splunk provides a user-friendly interface that allows anyone to start working immediately, while Grafana's setup can be more complex. Splunk also boasts superior reliability. Its architecture utilizes a master-slave node structure, with the ability to cluster for redundancy. This ensures that if a node goes down, another available node automatically takes over, minimizing downtime. Ultimately, our decision to switch to Splunk was driven by several factors: user-friendliness, a wider range of features, cost-effectiveness, and its established reputation. Splunk is a globally recognized and widely used tool, which suggests a higher level of trust and support from the industry.
We use Splunk Enterprise and Splunk APM. Splunk APM offers a comprehensive view of various application elements. We primarily migrated to APM to gain application-level metrics. This includes latency issues, which are delays in processing user requests. Splunk APM generates a unique trace ID for each user request. This allows us to track the request from the user to our servers and identify any delays or errors that occur along the way.
Additionally, Splunk APM utilizes detectors to create alerts based on specific metrics. We've implemented alerts for CPU and memory usage, common issues in our Kubernetes infrastructure. We can also track container restarts within the cluster and pinpoint the causes. Another crucial area for us is subscription latency. Splunk APM allows us to monitor this metric and identify any performance bottlenecks. This capability was absent in Splunk Enterprise, necessitating the switch to APM. Furthermore, Splunk APM enables us to track application status codes, such as 404 errors.
Splunk APM facilitates the creation of informative dashboards using collected metrics. Additionally, the Metrics Explorer tool allows us to investigate specific metrics of interest and generate alerts or customized spotlights.
Spotlights are tailored visualizations that track metrics for critical application areas. They can trigger alerts based on unexpected changes, such as a sudden increase in error codes over a set timeframe. This provides a more proactive approach to identifying potential issues compared to traditional detector-based alerts.
Splunk APM empowers us to effectively monitor various metrics during load testing. This includes analyzing memory usage across ten to eleven metrics, tracking container restarts during flow testing, and verifying the functionality of auto scaling mechanisms. The comprehensive visualization capabilities of Splunk APM surpass those of Splunk Enterprise, making it ideal for analyzing large sets of metrics and graphs.
We're currently exploring the integration of an OpenTelemetry agent with Splunk APM. This will enable us to collect and transmit a wider range of data, including application metrics, latency metrics, and basic infrastructure metrics such as CPU, memory, etc.
How was the initial setup?
During the initial Splunk deployment, I found that most information available on social media platforms catered to enterprise deployments. Fortunately, many of our new hires had prior Splunk experience, which eased the initial learning curve. Splunk's widespread adoption across industries also meant there was a general familiarity with the tool among the team. Additionally, the comprehensive documentation proved helpful. Overall, the initial rollout went smoothly, though there were some challenges that we were able to resolve.
The Splunk deployment was done on multiple environments. We started with development and then deployed to a staging environment, which sits between development and production. As expected, the development deployment took the longest. The total time for the entire deployment, including my cloud setup, was 2 to 3 weeks. It's important to note that this timeframe isn't solely dependent on Splunk implementation. Other factors can influence the timeline, such as network requests, firewall changes, and coordination with IT teams for license purchases. While the development deployment took longer, promoting Splunk to the staging and production environments was significantly faster. It only took 1 week for each environment.
What about the implementation team?
Our cloud deployment didn't require a consultant, but we used one for our on-premise enterprise deployment, which was a bit more complex.
What other advice do I have?
I would rate Splunk APM 9 out of 10.
The maintenance required is minimal because the cluster deployment helps ensure there is always 1 node working.
Improves operational efficiency and integrates very well
What is our primary use case?
We mostly work with developers. They run some pipelines, and they use Splunk as a platform to identify the errors, instead of themselves debugging the logs and understanding what the issue is. This is one side of the business. On the other side of the business, we use the Splunk database for frozen buckets where we archive the data.
We can easily integrate it with other tools for monitoring our entire IT data infrastructure. I also handle AppDynamics. We have integrated Splunk and AppDynamics. With one click, we can understand what the actual issue is. It brings down the time to resolve. We have had some good experiences.
How has it helped my organization?
It improves our operational efficiency every day. In my previous company, we had integrated it with ServiceNow. For defined alerting conditions, it could directly open up a ticket for the right team. We did not have to look into a thousand cases to understand a problem.
In terms of integrations, most of the plugins are already available. If a plugin is not available, even then it is pretty easy to integrate. There are multiple ways to integrate. You can use the REST API and just forward the data. It can be easily integrated.
It makes it easy to have end-to-end visibility in the cloud environment. There are multiple types of devices in an environment. You might have AWS, Microsoft Azure, or something else. It operates beautifully. It is easy to integrate. This is the best part.
I am in the banking industry. It helps to keep track of how well our application is performing when somebody tries to do a transaction. There are multiple pieces to it, and we keep track of everything. We have our own business dashboard that the top-tier leaders can look into. All the visibility is there because of it.
What is most valuable?
I find the monitoring console very helpful. With one click, I can see how we are performing, and at the same time, I can see what data is flowing.
What needs improvement?
The clustering part of indexes can be more refined.
They can cut down a bit at the monetary level for the long-time customers. We recently had a scenario where we were in discussions to see if there was any flexibility from Splunk's side.
For how long have I used the solution?
I have been using this solution for the past two years. I have also used it in my previous company.
What do I think about the scalability of the solution?
It is pretty scalable. I would rate it a nine out of ten for scalability.
Which solution did I use previously and why did I switch?
I have worked with Kibana and Logstash, but they are not comparable to this solution.
What's my experience with pricing, setup cost, and licensing?
It is expensive.
What other advice do I have?
Overall, I would rate it an eight out of ten.