AWS Partner Network (APN) Blog
Automation in the Cloud
Continuing our MSP Partner Spotlight series from last week’s post, Unlocking Hybrid Architecture’s Potential with DevOps, automation is another critical area of capability for next generation Managed Service Providers (MSPs). Automation incorporating elements such as configuration templates, code deployment automation, and self-healing infrastructure reduces the need for manual interventions, the potential for errors, and the operating costs for MSPs. This week we hear from Cloudreach (APN Premier and MSP Partner, with numerous AWS Competencies) and their perspective on the value of automation in the cloud.
Automation in the Cloud
By: Neil Stewart, Cloud Systems Engineer, Cloudreach
Before my life at Cloudreach, my understanding of a lot of relevant technologies and terminologies were non-existent. I was inspired by a recent Cloudreach blog post about our placement as a Leader in the Gartner Managed Services Magic Quadrant, as well as the blog post about the flexibility of working here, and it got me thinking about my experience so far and how things have progressed.
I joined Cloudreach fresh out of University in May 2014. From there I was given the opportunity to show what I could do with a little time and bright people around me to learn from. Quickly, I began to learn the tricks of the trade when working in the cloud, and more importantly, while working in a managed services environment such as a Cloud Operations team. I learned how to do a variety of things that were totally new to me, such as how to navigate and use Linux, diagnose a Microsoft SQL Server mirroring setup, and write my first Ruby script to delete old AMI’s in AWS. I was able to learn to appreciate command line over GUI and how much you could do with code and scripting. Which leads me to the point of this post.
Automating all the things
I love automation. I have smart lights, smart speakers, and a smart kettle that all have automation involved at home. It can be as simple as turning a light on when I walk into a room or boiling a kettle in the morning when I wake up. Automation is fantastic.
While automation in my personal life is fun, efficient, useful and awesome, automation in the cloud, especially from a cloud operations perspective, is essential. For example, rebooting an instance after an update is fine for the first time for a single instance, but doing it more than 30 times is painful!
I love to approach these asks with a “Let’s automate that” frame of mind. Some examples of automation we often use at Cloudreach include running a script on a fresh AWS account that will identify all default VPC’s in every region and delete associated resources as well as the VPC itself. Sounds simple? It is. However, as AWS adds more regions, this task takes longer. Repeat that across lots of new customer accounts and… you get where this is going.
Writing some code to perform a task like this is not difficult; when you approach other tasks in this way, it only becomes easier. Consider the example below:
import boto3
client = boto3.client('ec2',region_name='eu-west-1')
regions = [region['RegionName'] for region in client.describe_regions()['Regions']]
for region in regions:
print "Finding VPCs in {}".format(region)
client = boto3.client('ec2', region_name=region)
vpcs = client.describe_vpcs()['Vpcs']
default_vpc = [x for x in vpcs if x['IsDefault'] == True]
if len(default_vpc) > 0:
default_vpc = default_vpc[0]
print "Found Default VPC {}".format(default_vpc['VpcId'])
delete = raw_input("Would You like to delete {}?(Y/N)".format(default_vpc['VpcId'])).lower()
if delete == 'y':
print "Deleting {}".format(default_vpc['VpcId'])
subnets = [x['SubnetId'] for x in client.describe_subnets(
Filters=[{
'Name': 'vpc-id',
'Values': [
default_vpc['VpcId']
]
}]
)['Subnets']]
internet_gateways = [x['InternetGatewayId'] for x in client.describe_internet_gateways(
Filters=[{
'Name': 'attachment.vpc-id',
'Values': [
default_vpc['VpcId']
]
}]
)['InternetGateways']]
for internet_gateway in internet_gateways:
client.detach_internet_gateway(
VpcId=default_vpc['VpcId'],
InternetGatewayId=internet_gateway
)
client.delete_internet_gateway(
InternetGatewayId=internet_gateway
)
for subnet in subnets:
client.delete_subnet(
SubnetId=subnet
)
client.delete_vpc(
VpcId=default_vpc['VpcId']
)
else:
print "Not Deleting {}".format(default_vpc['VpcId'])
else:
print "No Default VPC found in {}".format(region)
Ok, this could go on for pages, but you get the idea. Easy, right? There are a lot of improvements you could make to this, but in its simplest form, this is a great example of automating a small and simple task that you don’t need to do manually. Lovely.
Automation at an MSP Level
Simple scripts are great. The power of automation in an MSP environment really shines through when you have lots of these simple scripts that all trigger and run when they need to. This is the difference between working on simple and small environments versus the management and monitoring of multiple large-scale, growing, and sophisticated environments. As our customers shift towards highly scalable and serverless applications and away from more monolithic architecture, automation is less “nice to have” and more “you had better get on the wagon before the wagon runs you over.”
Looking at this in a more real world sense, let us imagine we have some applications running in the cloud that we want to apply automation to.
Backup taking and retention
Backups and retention of backups is automation 101. We need to be able to back up servers that are not stateless, such as database servers. This can be as simple or as sophisticated as you like. Implementing something like AWS Lambda and an Amazon CloudWatch event to trigger a backup function as often as needed is simple. A function that generates a list of required instances to be backed up, and then fires a process to back each of them up in parallel, is more effective.
As part of this solution, retention of backups is important too. This can be another AWS Lambda function. It could be configured to run daily to check all backups that have already been taken to determine whether or not it has passed its use-by date. If it has, delete it.
Without much effort, you can have a quick and simple backup solution in place—no manual manual work required once in place and it scales. You could tie all of this together with an AWS API Gateway and a Describe function and you have a new backup taking and reporting API.
At Cloudreach, we work with customers to implement backup solutions that work within their requirements. This might take shape as AWS Lambda functions as explained above, as third-party products, or custom solutions developed for the customer. Within the Cloud Operations team, we also use in-house tools that allow us to easily automate backups and deal with retention too.
Security Compliance
Automation and security are a perfect match. Where you enable automation within security can vary greatly. A great example of this in place would be security group auditing.
Keeping your resources secure in AWS is important and there are plenty of ways to do it. Security groups and their rules are one of the simplest but also one of the most powerful security features in AWS and an important layer to control. Whether it be accidentally leaving remote access open to any IP address, or a developer opening access from a coffee shop IP address so they can work can work more easily—these situations are not just bad, they can also potentially violate security policies and compliance standards.
These are both examples of where we can automate to mitigate.
Cloudreach has helped implement functionality for customers where we can alert and report on security group changes. We can restrict users from an IAM perspective so that security group creation has to go through an approval process. This works well but can be time consuming to implement. More simply, we can implement a AWS Lambda function that is triggered each time a security group is created or changed using AWS Config or Cloudwatch Events. Once triggered, the function checks that security group, checks if the ports and sources in that rule are valid—possibly against a configuration file in Amazon S3 or against an RDS database table of allowed IPs/Sources. If a rule in a group is not allowed, it removes the rule if it is an addition to an existing group, or deletes it from the group if the rule was added as part of the group creation.
Either way, we can report on the “breach” through something like SNS or a logging tool such as Splunk. Most importantly, the time spent in violation of security policy is minimized to seconds, rather than waiting on an alert to be triggered and investigated by an engineer.
Code deployments
Introducing a CI/CD platform to integrate with your source control system is an awesome way to introduce automation into your development cycle. This is an area that is exciting to get involved in. An effective and deep pipeline integration can enable your team to push minor code changes to dev/pre-prod but can also be expanded to full on deployments to production.
Cloudreach helps customers manage their CI/CD pipelines by working with them to ensure the infrastructure behind the scenes is running as it should and, if issues arise, they are resolved. We also work with customers from very early on in a cloud enablement or agile ops project to figure out where we can incorporate CI/CD automation as well as how they can manage the risk of moving to automated deployments. We encourage our customers to keep this in mind from day one and push the subject as a must-have rather than a nice-to-have.
AWS Infrastructure changes
Similar to code deployments, infrastructure changes paired with Jenkins and a source control system are powerful and fast.
Here you want to look at using AWS CloudFormation as much as possible; we recommend adopting Sceptre, Cloudreach’s open-source tool for AWS CloudFormation template development and deployment. It has commands that can be used in the testing, approval, and deployment of new and updated infrastructure in AWS.
This setup is useful for changes to sensitive resources, such as IAM, Security Groups or VPC components. With a CD pipeline in place, you can restrict changes to these resources to only people who are allowed and only to changes that pass a set of standards and approval.
Moving on
I hope it has been helpful to see how you can easily automate some key areas in working in the cloud. Focusing on automation helps deliver financial, security, and innovation benefits to a business and its teams. Pipelines allow you to control how changes are implemented and to what environments, keeping things secure and costs down when it comes to rolling back changes if something goes wrong. Imagine the revenue that could be lost if a production change that was manually deployed caused your application to fail. From an innovation perspective, automation of tasks allows for your teams to focus on more challenging and exciting work, such as improving application features or fixing bugs that may have been looked over when teams are stretched to focus on those tedious and often boring tasks.
Hopefully, this post will encourage you to implement automation in your cloud environments or at least look into how automation can help your business work more effectively in the cloud. At Cloudreach, automation is fundamental to working successfully and keeping up with the pace of change in the cloud. We’d love to hear some examples of how automation has been implemented by others and also hear your thoughts on where you think automation could be seen next.
Neil Stewart
Cloud Systems Engineer
Cloudreach