The Case for Investing in Cloud Automation

This is a guest post from Jason Deck, SVP of Strategy at Premier AWS Consulting Partner and AWS Managed Service Partner Logicworks.

The team at Logicworks believes that great cloud infrastructure begins with two key components: a robust, dynamic cloud platform like Amazon Web Services (AWS) and a custom-built automation framework that controls, scales, secures, and deploys infrastructure resources. AWS provides the reliable, secure, and innovative services to power your applications; and your automation framework — which is architected using tools offered by AWS and AWS Technology Partners — supports agile deployment processes, security controls, and cost efficiencies that create value for your business.

What is a Cloud Automation Framework?

Cloud automation is a broad term that can refer to any piece of software that reduces manual infrastructure engineering effort and simplifies cloud operations. In essence, it is code that controls infrastructure.

In the old world, the only way to control infrastructure and make a system more reliable or secure was to throw more people and dollars at the problem. More people to monitor things like CPU utilization and manually secure sensitive data; more dollars to buy better hardware, live backups, IDS, etc. If you want to upgrade a large, enterprise-grade system, you are going to need a lot of people and dollars.

All of this changes as you take advantage of AWS. If you want to improve reliability of your workloads, you should program your environment in new ways and rely on an automated failover system to improve SLAs. We believe that the smartest way to automate your cloud is to develop a single set of tools within a management “hub” — defined by a single, central framework — that is used to design, build, deploy, and change a system. This is most often composed of three central components:

Infrastructure automation: Infrastructure is structured and built into templates, where it can be versioned and easily replicated for future environments. Tools: AWS CloudFormation, GitHub
Deployment automation: Code deployment processes are integrated with cloud-native tools, improving deployment velocity and reducing manual effort (and error). Tools: AWS CodeDeploy, AWS Lambda, Puppet, Chef, Jenkins
Self-healing/auto-correcting/self-monitoring: Configuration management scripts and monitoring tools catch anomalies and proactively correct failed/misconfigured resources. Tools: AWS Lambda, AWS Config, AWS Inspector, AWS CloudTrail, Puppet, Chef, Jenkins

These three core elements are, in our opinion, the foundation of a robust infrastructure automation framework.

Together, this framework forms the bedrock of cloud infrastructure that is resilient, costs less to maintain, and allows engineering resources to be devoted to revenue-generating products, not infrastructure maintenance. Let’s dive into each of these benefits with some customer examples.

Horizontal Scalability / High Availability Architecture

AWS provides infrastructure resources that can scale up and down to meet demand or respond to failure; your automation framework must provide the rules that define when and how this occurs. Here is what this looks like in practice.

A global enterprise that provides backup, business continuity, and data analytics software and services to a large number of the Fortune 500 currently works with Logicworks to automate and manage its AWS infrastructure. The company’s development team wanted to be able to terminate instances with outdated code rather than update them, so that every code push is a rebuild of the entire stack. This means that no instance in project’s environment lasts for longer than 24 hours, and over the course of a single week, hundreds of instances must be terminated and rebuilt without any human intervention.

Disposable infrastructure requires full automation of the environment from AWS resource provisioning to bootstrapping, package installation, and code deployment. The Logicworks Sr. Engineering team first built a custom CloudFormation template that performs standard tasks like building and configuring a VPC and access controls. Then the Puppet agent is installed and connects to the Puppetmaster, which then configures the OS of the instance. The final step in the Puppet process is kicking off their custom deploy script, which pulls down the most recent version of code from an on-premises box.

By automating (and unifying) infrastructure build-out and code deployment, the global software company was able to create an immutable architecture with a 0.001% instance failure rate and 100% uptime for their production application. Rather than sending a failure alert to a human who would then manually correct the problem, the Logicworks team put in the work upfront to automate self-correction. “Failure” is such a routine part of their deployment process that it occurs with zero impact or human intervention.

Secure, Self-Correcting Resources

AWS provides world-class physical datacenter security. AWS also provides a number of tools to make it easier and cheaper to encrypt data everywhere, implement firewalls, etc. But it is your responsibility to configure, deploy and maintain these resources in a way that protects your data; this is known as the Shared Responsibility Model, which you can learn more about here. The same cloud automation framework we have been describing has a number of security benefits — and represents a system that is not just “protected”, but secure by design.

Logicworks recently worked with an organization that provides security training to engineering professionals around the world. As a security-focused company, they were looking for a Managed Service Partner with advanced security expertise to protect their new AWS environment. Logicworks created AWS infrastructure can be controlled programmatically, so that their security and compliance parameters are just pieces of code, capable of being changed more flexibly, versioned in Git like any piece of software, and automated to self-correct errors. This resulted in the following security benefits:

Central control: We built infrastructure in templates using AWS CloudFormation, so the company’s “ideal” state is defined and can be repeated for multiple projects. This creates uniformity across projects, and ensures that new environments have standard elements, like VPC design, AWS CloudTrail enabled, MFA on root, etc.
Improved transparency: They know exactly how every system is configured for security at any point in time, due to configuration management.
Efficiency: They have reduced the time and cost of deploying future systems; they do not have to rebuild security configurations or get them approved by security teams when they are “built in” to templates and bootstrapping scripts.
Reduced manual security work: By centrally managing configuration, they discourage ad hoc work; any change made directly to the instance and not to the script will be overwritten when your configuration management tool runs anyway.
Simplified patching: Patches can be distributed across every system rapidly and with a complete audit trail of what was patched where.
Happy auditors: The company can tell them exactly how your system is configured, critical compliance features like log monitoring and archival are included by default.

When systems are complex, there must be an equally powerful set of management tools and processes to enforce and maintain configurations. The set of practices described above, which are sometimes called DevSecOps or security automation or security by design, ensure that your ideal state is defined and maintained.

Cost-Efficient IT

Perhaps the most significant benefit of AWS and an automation framework is efficiency. The model described above does not just cut resource costs, it takes advantage of all AWS resources to deliver greater productivity and efficiency at scale.

Our team helped perform an audit of a large internet news organization’s 3-year-old AWS environment. While they employed cloud engineers on staff, those engineers were busy supporting code releases and putting out fires. This left them little time to update their environment with new AWS features or best practice configurations. As a side note, we find that this is often what can happen when a “DevOps” engineering team is tasked with doing everything quickly — but not given the time/resources to develop/outsource an automation framework that reduces manual maintenance and deployment work.

Our preliminary audit of the news organization’s AWS environment revealed that about 20% of their compute resources were being wasted due to poor tracking. Another 15% of the instances could not be linked back to an active project. They had over-engineered VPCs and were still manually launching and updating instances, which meant each instance had different configurations based on which engineer launched the instance.

Automation could solve this team’s problems in a number of ways:

Cost Tracking: Infrastructure automation in AWS CloudFormation can include a tagging policy to track resources and associated engineers/projects/teams. This eliminates manual (or nonexistent) tracking of expenses to associated projects.
Productivity: Second, the automation framework as a whole significantly reduced the amount of time that the team spends configuring instances and deploying new infrastructure, an enormous potential gain in engineer efficiency. A recently released report by Puppet found that high performing IT teams — that prioritize automation, high deployment velocity, and the “work not done” principle — spend 22% less time on unplanned work and rework than low-performing IT teams.
Other Instance Types: Fully-automated infrastructure may also allow the team to take advantage of AWS Spot Instances more reliably. AWS Spot Instances are instances that you can “bid” on and receive a significant cost reduction; however, Spot Instances can be terminated any time the instance cost exceeds your bid. If you utilize Auto Scaling and automate your infrastructure to self-heal in case of failure, then a terminated Spot Instance will simply fail-over to a Reserved Instance or On-Demand Instance without human intervention.

It is true that automation itself takes time and money. It also takes expertise — the kind that can be hard to find. Yet infrastructure automation is the inflection point that jumpstarts an organization’s cost efficiency efforts. These factors make it an ideal service to outsource; it is a skills gap that enterprises are struggling to fill, and a non-disruptive place where Managed Service Providers can provide value without replacing internal DevOps efforts. A Next-Generation Managed Service Provider like Logicworks that has already developed proprietary infrastructure automation software is an ideal fit.

The Future of AWS + Automation, From Our Perspective

AWS has spent the last decade developing a platform that significantly reduces the manual, time-consuming maintenance of hardware, network, and storage, and enables infrastructure to be controlled via software. However, it is up to each company to develop the automation framework that orchestrates AWS services and the company’s unique applications and processes.

Cloud automation is what drives better availability, better cost management, better governance, better time-to-delivery. Rather than growing bigger teams or adding a hodgepodge of tools, building an automation framework gives you far greater benefit for less effort. So whether a company chooses to build an automation team in-house or outsource it to a next-gen service provider, it should be at the top of enterprises’ IT priority lists.

The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.