We didn't have a very good story for our product engineers to help them visualize their application, post-deployment after migrating from Singularity to Kubernetes. Singularity has a nice user interface where they could see the deployments and scale, balance, restart, et cetera. When we moved to Kubernetes, things were more hidden. All we had was the Kubernetes Dashboard, which is verbose and clunky. We were looking for something to help our engineers visualize their deployments and troubleshoot them.
Komodor
KomodorExternal reviews
External reviews are not included in the AWS star rating for the product.
k9s replaced and surpassed by Komodor
Using Komodor in the day to day working
Can integrate it without much deep learning if you know k8s fundementals
It is a powerfull tool for analyzing the problems on real time
I am using it very frequently in my work for doing operations, see the cluster state and analyze specific issues
And Komodor has excelent and quick customer support from my expirience.
It's the best k8s tool I enccountered
sometimes on analyzing issues in pods I get logs that are not related to the real problem
Komodor
Using Komodor to access multiple k8s production clusters on a daily basis
The visualization is very clear and points directly to current issues.
It allows all the functionalities that needed when managing multiple clusters.
A must-have platform for K8s environments!
Komodor has become the main platform our team uses on a daily basis whenever deployment fails, errors arise and overall cluster health monitoring.
Quick troubleshooting
Developers on-boarding and day to day operations
Visualizing things across deployments and seeing the events—what changed—really helps our engineers with troubleshooting
What is our primary use case?
How has it helped my organization?
Our engineers currently deploy through our CI/CD system, and it's fire-and-forget. We report back the status of their deployment to them, "success" or "fail." So visualizing things across deployments, and seeing the events—what changed—really helps our engineers with troubleshooting, especially when we have an incident.
The biggest benefit is that our triage time decreases extremely when people choose to use Komodor. If there is a failed deployment, hopping into it to see the logs from the pods is very quick and easy. When a deployment is crashing and people don't understand why, Komodor is the quickest way to comprehend all the different pieces of the puzzle in one place and figure things out.
But it's not just that. Komodor does its best to give you what it thinks is the root issue. I'm not sure when this was added, but when there is a failed deployment, it even inspects your log stream and narrows it down to the line in your logs where it thinks the root cause of the failure is. I was pretty impressed with that. For example, where there were a lot of Java exceptions being spat out, it zeroed in on the right one.
The visibility into nodes has helped save time when troubleshooting in cases such as disk pressure, memory pressure, and seeing why pods have been evicted. The way that node events are overlaid onto the event screen really helps correlate what is happening and why it's happening.
In terms of freeing up staff time, I was one of the main people involved in setting Komodor up. And now that it is up and running, I've been able to work on other things. The questions we get in our support channel have decreased quite a bit when it comes to Komodor.
For our product engineers who are doing a deployment, and everything is falling along the "happy path," meaning the deployment is successful and there are no issues with it running, Komodor is not part of that engineer's life. Where it really shines and helps is when there is a problem. It eases the burden of troubleshooting.
What is most valuable?
The event timeline has been super helpful, enabling us to overlay node events in the same timeline as deployment events. For example, if your deployment goes down for some unknown reason, the event timeline overlays if there was a node that may have had disk pressure or some other issue. That helps an engineer very quickly troubleshoot without having to do too much digging.
Komodor also recently put in an investigation or triaging window. When we first started using it, you really had to dig to find out why things failed, or you had to set up availability monitors to give you some of that information. But now, straight from the event timeline, you can click on a little red icon that indicates that something failed and it gives you a best-effort summary of what it thinks failed, just by looking at the different statuses. They try to trickle the most important things up to the top, while still allowing you to scroll down and dig deeper. That's a very nice feature.
There is also the ability to show differences between deployments. We annotate all of our deployments with some of the Komodor labeling schemes: specifically, the Git repository and Git hash. That way, you can click on any deployment in Komodor and it shows a quick summary of what changed. Sometimes there is only a change in the application code because the base image didn't change. Sometimes, neither one has changed and it's just the deployment descriptor. But we had one problem not too long ago where the engineers and our support team spent a couple of hours troubleshooting. Komodor highlighted a change in the base image, which ended up being a breaking change.
Being able to quickly see those changes is very helpful. That's a valuable feature and shows that having historical information like that is super important.
Another useful feature is the logging. You lose pod logs in the Kubernetes dashboard or if you are using kubectl on the command line. You can't see logs of pods that have been deleted, but Komodor retains them for a small amount of time. If you do a deployment and it gets rolled back and the pods are gone, Komodor still grabs some of that, and you can use that for troubleshooting. That is also a nice feature.
What needs improvement?
Having an agent that's almost God-like makes us a little nervous. I would love to see the actions not be performed on behalf of the user, but rather as the user, perhaps by aligning SSO groups, as one approach to that. That is one reason that we're not using its actions, because we're a little nervous about granting all that access to Komodor. In terms of just read-only and presenting this data, you have to give it view access to practically everything to get value from it.
Also, we get a steady stream of new users coming into Komodor, so I know people are using it. But one thing we don't have visibility into, which I would love to have, is metrics, such as user logins and usage. It's really hard to know what people are doing when I don't have any metrics on that directly.
If somebody higher up wants to know how Komodor is being used over time, I can't query that. I know that Komodor has those metrics, but from my understanding, they're in another tool outside of the Komodor application. It would be nice if they found a way to funnel that back in. Currently, Komodor doesn't really focus on the managers and decision-makers when it comes to the engineering of the tool. They rely on quarterly meetings where they present usage but it would be so much nicer if that was built into the tool.
They're also working on the audit log area and there is still a long way to go to make that look nicer and more feature-rich.
For how long have I used the solution?
We went live with Komodor about a year ago.
What do I think about the stability of the solution?
It definitely has lots of moments where it's lagging, even for what seems like a very small window of time and a small number of events. It can be a head-scratcher as to why it is taking so long.
Ever since we've been using it, there have been little issues here and there, and to their credit, they've been fixing them. It used to be if we left a tab open the whole tab would die after an hour or so. They may have been trying to load too much data in there and that may have led them to a design decision where now it takes a long time to query all the data because they don't want to overload the tabs. I'm not sure about that, but it does lag intermittently and not very predictably. Sometimes it's screaming fast, and sometimes you wonder why it's taking so long. We've had multiple people comment on that.
What do I think about the scalability of the solution?
The source of the lagging we see could be because it's not scaling very well. We have a whole bunch of clusters and a ton of nodes, and it might seem that on the surface it's handling it, but maybe it's not. But I don't have the correlation to say whether it scales well or not.
How are customer service and support?
There is a shared Slack channel, so I would call that support. And there is a Help feature in the app itself. Their support is very responsive and they get back to us right away. I've never had anybody as responsive as they are.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
The better visibility was the main reason we switched to Komodor. Being able to visualize changes to deployments on a timeline is something that our previous solutions did not provide at all. That historical aspect of Komodor is helpful. We never had a tool that could help with a node event and a deployment going down.
How was the initial setup?
The initial setup was very straightforward and painless. It was all done in-house. It was just a Kubernetes deployment, a Helm chart, and it was very quick. It was just me involved on our side, our SSO support, and there was somebody involved on the Komodor side. There were other people involved in the proof of concept, giving input and feedback.
There is no maintenance, other than that I have to redeploy the Helm chart whenever there's an update. Because there are updates so frequently, I do that about every month or so.
During the proof of concept, I did a lot of things, such as annotating from our deployment tool and correlating the deployment to changes in Git, which was a super big deal. During that time, the solution wasn't live for any of the users, so we couldn't have seen any value at that point.
But pretty much right when we went live and presented it to the users, they were happy immediately. And every time we've shown it for troubleshooting in our support channels, and we show a screenshot, people always say, "Wow. What tool is that and how do I get into it?" It has been nothing but a delight for all of our engineers from day one.
Which other solutions did I evaluate?
We did a scour of the internet for open-source solutions around the whole landscape of Kubernetes tooling. But we never really found anything else that was compelling.
What other advice do I have?
We have a whole bunch of steps just to onboard somebody into Kubernetes, and Komodor has nothing to do with that process. Even after an engineer has deployed Kubernetes, they still haven't really learned anything about Kubernetes, but Komodor allows them some visibility into things without having to know too much about their deployment and their logging.
However, the learning curve for Komodor is super low. We just tell people, "Go here," and their reaction is "Wow." And if somebody asks a question on one of our support channels, our support engineers will use Komodor to find something, send a screenshot and a link, and people will say, "Wow. What tool is that? That's really neat." We generally don't have to teach anyone at all to use Komodor. We did a presentation or two when we first went live with it, but we have had a ton of new users since then, and we never get questions on how to use the tool. Everybody is very happy with it.
Regarding the Komodor Helm Dashboard feature, that used to be a whole separate service and we were not interested in investing our time and energy into deploying it. They have since integrated it into Komodor and it is very helpful, but don't know how much our users use it. Our product engineers aren't necessarily aware of the Helm layer. They don't really know that their deployment is actually six or seven different Kubernetes resources. They don't really see things that way or know all the different pieces of the puzzle. Some of them do, but not all of them, and they're not required to know that.
Helm is important from the support side, for myself and other support engineers. But the only time, even on the support side, that we need to know anything about Helm is when we need to uninstall something. If we don't want to delete the deployment resource, we need to do Helm uninstall.
The Helm dashboard is interesting, but it doesn't really solve anything for us. And that's by design. We want things to be as simple as possible for them. We don't want them to have to know what all these things are.
My advice would be don't overthink it. It was so easy to onboard. It's definitely worth it in the long run.
I was heavily involved at the beginning, getting us onboarded with Komodor, but the great thing about it, and this speaks volumes about the organization and the product they have made, is that it does a great job running itself, and the users are very happy with it.