Many of you already know the benefits of the ELK stack and why so many teams are using it. Not only is it critical for diagnosing and resolving bugs and production issues, but it’s increasingly valuable for customer insights. Plus, gaining additional metrics about the health and usage of systems gives your team a strong competitive advantage. While other companies struggle to pinpoint where they’re falling short, your team can examine data, adapt, and deliver just what your system needs.
As a manager, providing a good logging solution for your engineers and DevOps teams is often on your mind. But you also know that there’s a cost to be paid for providing it. In addition to the cost of infrastructure, there are upgrades, patches, and deployments to deal with. And these all require time, effort and expertise.
So how can you provide an elegant logging solution that’s also easy to maintain and support? Is there a way we can get all the benefits of the ELK stack without the headaches?
Why is logging so important?
With the growth of machine data, logging is increasingly important. It’s critical for diagnosing and troubleshooting issues for optimal application performance. Additionally, there are many tools that let you get critical business metrics and data from your logs. Logging is no longer just for finding issues. It’s also for monitoring your systems.
No doubt about it; logging is critical. So, let’s talk about how we can implement it. One popular method today—one that has a lot going for it—is the ELK stack.
Why do I want the ELK stack?
Elasticsearch: A powerful open-source search and analytics engine used for full-text search and for analyzing logs and metrics.
Logstash: An open-source tool that ingests and transforms logs and events.
Kibana: An open-source visualization and exploration tool for reviewing logs and events.
When used together, the components of the ELK stack give you the ability to aggregate logs from all your systems. Not only analyze them for problems but also monitor system use and find opportunities for improvement. The data analysis and visualization ELK provides can’t be beat.
But why would you use a managed service for the ELK? You’ve got a good team. You don’t want to get locked into a single hosting provider. And you want the flexibility of configuring everything for your particular use case. Why not just manage the ELK yourself?
Long story short: it’s hard.
What’s so hard about managing my own ELK stack?
Let’s review some of the reasons why managing the ELK might not be something you want to do on your own.
Installation is rarely easy
When looking at integrating systems, don’t forget that installation is not always trivial. What’s the best configuration? Where do you manage your secrets? How much hardware do you need?
A quick internet search provides a lot of articles talking about how to install the ELK. If you take a look, one thing you may notice is that it’s far from a one-click deploy when you’re getting started. In fact, there are a lot of prerequisites. And either you use the basic installation that has out-of-the-box configuration for everything or you spend more time researching all the configuration options. What are your data ingestion limits? What’s your retention plan? Why does it matter?
Now imagine doing that several times for each environment you have.
Even if you use automated scripting for everything, it will take a lot of precious time to get everything set up and running smoothly. That’s time that won’t be spent delivering value to your customers. With managed services like Amazon Elasticsearch Service, deploying an ELK stack is simple and repeatable.
And it’s rarely quick
Most companies attempt to drive their product to market quickly. But you won’t be able to do that if you’re forced to spend weeks or even months getting your stack and infrastructure to a production-ready state.
What’s your current process for getting applications to production? What testing, verification, and polishing do you do to make sure you’re ready? And if your hardware needs to be set up optimally for write-intensive operations, do you get that done before it goes into production? What’s your plan for upgrades?
If you’re managing your own ELK stack, you’re doing all this with infrastructure and applications that you’re not familiar with. And services like Amazon Elasticsearch Service can help you significantly reduce infrastructure, implementation, and ongoing maintenance costs.
More time is spent on resiliency
When managing your ELK stack, you’ll soon find yourself worried about resiliency. What should you do when one of your Elasticsearch nodes go down? Or your Kibana performance becomes unusable?
Furthermore, you might be in trouble if you set up your infrastructure correctly in the steps above. Logstash and Elasticsearch are memory intensive. If you tried to save time and money by installing them on the same tiny piece of hardware, they’ll step all over each other. And we haven’t even talked yet about a plan for updating versions or monitoring and patching security issues!
AWS can help with these needs. For example, let’s consider your Elasticsearch nodes. If a node goes down, Amazon Elasticsearch Service detects and replaces it for you. It’s yet another thing you don’t want to spend time worrying about.
Basically, it’s good to have a big team because self-management will make it even bigger. If running an Elasticsearch cluster isn’t critical to your line of business, then have AWS manage it for you. It’s always a good idea to pay your engineers to create business value, not to manage the ELK stack.
So, what tools help prevent ELK stack headaches?
If you build, run, and support the ELK, it’s important to remember that you’re not supporting just one tool. You now have to worry about three of them. And running in production will bring up other concerns. What else might you end up supporting in your quest for self-management?
For example, to bring in more resilience, you may want to use Kafka for queuing the logs. In times of high traffic, you don’t want to lose logs because your ELK stack couldn’t keep up. Are you ready to bring on Kafka management as well? Alternatively, you could use Redis to help manage the load during peak times. But wait—isn’t that just something else you’ll need to manage? And how are you going to pull out the analytics you need such that your team can monitor the system? Will you hand roll something here, too? Or will you install and configure another system to give you the monitoring you need?
There’s an alternative. You can use something like Amazon Cloudwatch to help. This works in conjunction with Kibana, creating an easy-to-use method of analyzing logs.
Last but certainly not least, managed Elasticsearch services like Amazon Elasticsearch Service can help with security integration too including VPC support, built-in encryption for data at-rest and in-motion, and user authentication.
Are managed solutions right for you?
Many companies decide that self-hosting is an option. But not all consider the amount of work and technical resources and expertise required to keep running smoothly. And no one wants to spend time keeping a system up and running when they could be spending it delivering great products to customers.
Even if you’ve been thinking about self-hosting, you should consider starting with managed services like Amazon Elasticsearch Service to get up and running quickly. It’s likely that once you try it, you’ll find that managed services like Amazon Elasticsearch Service lets you spend time on the things that matter.