Category: AWS Partner Solutions Architect (SA) Guest Post


How to Migrate Amazon EC2 Instances from EC2-Classic to Amazon VPC with CloudEndure

This is a guest post from David Shurtliff, Enterprise Solutions Architect, AWS, and Gonen Stein, VP Business Development, CloudEndure

Amazon Web Services (AWS) customers who have been using AWS services for a long period may still be using Amazon EC2 instances in the EC2-Classic platform, as well as using instances in Amazon’s newer Virtual Private Cloud (Amazon VPC) service. EC2-VPC is your private, isolated portion of the AWS cloud, and became the default network environment on December 4, 2013. Any accounts created after this date support EC2-VPC only, and cannot use EC2-Classic. There are a number of advantages of using EC2-VPC:

  • Security—You can control outbound (egress) and inbound (ingress) connectivity to EC2 resources, and you can create network access control lists (network ACLs) on VPC subnets
  • Flexibility—You can define IP address ranges (CIDR blocks) and subnets
  • Network isolation—You can control internal and external connectivity to EC2 resources
  • Features—Certain AWS features and newer instance types, such as C4, M4, and T2 instances, are available only in EC2-VPC. For more information, see the Benefits of Using a VPC

If you want to move your existing workloads from EC2-Classic to EC2-VPC using a manual approach, you would launch new AMIs within your EC2-VPC, install and configure your applications and databases, export the data from your old servers, and import it to the new servers. You would also need to assess the EC2-Classic application stack in advance, and configure your target VPC and servers accordingly, including your networking, instance types, and volume types, to mirror the EC2-Classic environment.

To simplify the journey from EC2-Classic to EC2-VPC, you may want to use AWS Technology Partners such as CloudEndure for an automated, 1-click migration solution.

CloudEndure is an APN Advanced Technology Partner and AWS Storage Competency Partner who provides customers with live workload mobility between data centers, clouds, regions, and networks within a region. You can use CloudEndure’s technology to migrate your live workloads from your old EC2-Classic network into EC2-VPC, while maintaining your existing configuration, including instance types, private IP addresses, and load balancers. In addition to creating the new EC2-VPC configuration automatically and moving the entire workload, CloudEndure lets you select target VPCs that may already exist, as well define specific servers to be migrated.

An automated solution such as CloudEndure significantly reduces the time to complete the migration, without affecting the operation and performance of the current workload while the data migration is in progress. The use of continuous data replication means that no data will be lost during the cutover from EC2-Classic to EC2-VPC.

In addition to EC2-Classic to EC2-VPC migration, you can also use CloudEndure to:

  • Migrate physical or virtual servers to AWS
  • Use AWS as a dramatically lower-cost disaster recovery site for your on-premises workloads
  • Provide cross-region disaster recovery for your cloud-based workloads
  • Clone your workloads within or across regions and Availability Zones for dev/test and staging purposes

This blog post walks you through the steps to migrate an EC2 workload from EC2-Classic to EC2-VPC using CloudEndure.

Getting Started

At a high level, the migration process will create instance replicas in a region, VPC, and subnet of your choosing. EC2 security groups are created in the target VPC, and rules from the source security group are copied to the target security group at the time of instance replica creation. The following diagram represents a high-level view of the CloudEndure replication process.

Step 1. Configure Your Account

The first step is to sign up for an account at cloudendure.com. Go to the signup page to create an account, and then log in to the CloudEndure dashboard to connect your CloudEndure account with your AWS account. You will need to enter your account credentials associated with the appropriate IAM policy, and set your source/target regions for the instance migration. You may migrate your EC2-Classic instances either to a VPC in the same region or to a VPC in a different region. You should then select a subnet that will be dedicated as a staging ground to replicate your source instances’ data. This subnet is used to maintain continuous replication of the data until you decide to cut over into the EC2-VPC and stop replication.

Step 2. Install the CloudEndure Agents

Download and install the CloudEndure agents on Microsoft Windows and Linux instances running in your EC2-Classic network. In this example, we will install the agent on these two EC2-Classic Windows and Linux instances:

As shown here in the Amazon EC2 console, the instance is outside a VPC:

The agent installation takes about 1 minute. The agent installation does not require a reboot, nor does it impact the source machine’s performance in any way. After connecting to the source machine, download and execute the appropriate Linux or Windows operating system CloudEndure agent. The following command line sequence shows a successful agent installation.

Step 3. Start Continuous Replication

Once the agent installation completes, the instance name will appear in the CloudEndure dashboard, and replication of the data will begin. During replication, you will see the percentage completion of each replicated instance. When replication reaches 100% for an instance, its status will change to a green checkmark.

Note: While CloudEndure agents are replicating data, either during the initial sync phase or during continuous sync, you should see CloudEndure replicator instance(s) with their attached volumes located within the replication server subnet as defined earlier in step 1.

Step 4. Create the Replicas in the Target VPC

When all servers show a green checkmark, select the instances that you want to migrate into the target VPC and click Create Replica.

Note: Before you create a replica, ensure that the status field for all instances shows a green checkmark, and pay attention to the last update time. Your replica(s) created in the new VPC will be as up-to-date as the time shown. The screenshot below shows the instance selection check boxes, replication status, last update time, and replica creation button.

The replica creation process takes several minutes. Once it is complete, the replica instances will appear on the right side of the dashboard:

Note: The replica instances in the new VPC will carry over any security group configuration, Elastic Load Balancing configuration, etc. In your AWS console, you will now be able to see both your old EC2-Classic instances and the new instances within the target VPC. In this example, the instances outlined in red are the newly created instances within the target VPC.

In the Amazon EC2 console, you can confirm that the instances are now in a VPC.

That’s it! Once you have confirmed that your application is behaving as expected in the target VPC, you may redirect your users to the new EC2-VPC based instances via public DNS redirection.

Note: This replication methodology will not impact your source server, and you can test your target replica servers in the EC2-VPC without any system disruption, so go ahead and test away.

Should you wish to make corrections to the application and spin up a newer version of your replica instances, you may delete the current replica by using the Delete Replica button, make the appropriate changes to the source instances, and repeat step 4.

When the cutover is complete and replication from your old EC2-Classic environment is no longer needed, you may uninstall the agents by right clicking the instances in the CloudEndure dashboard and selecting Stop Replication. This will stop all replication and remove the agent.

To find out more about CloudEndure, visit AWS Marketplace, or email info@cloudendure.com.

 

Running SQL Server Linked Servers on AWS

Scott Zimmerman is a Partner Solutions Architect with AWS.

In this post, I’ll demonstrate how to set up SQL Server linked servers on Microsoft Windows Server in Amazon EC2. Linked servers allow you to join tables between database servers and distribute queries through stored procedures and views across servers, without even needing to change your application source code or manage multiple connection strings in your web tier. Please see technet.com for details about SQL Server linked servers. Note: Linked servers are not currently supported in Amazon RDS for SQL Server, hence this article is about Amazon EC2.

Step-by-Step: Set up Linked Servers

To keep this brief, let’s deploy two Amazon EC2 instances with SQL Server and skip the details of setting up web or application tiers. Also, although we’d typically use Amazon Route 53 or Active Directory for DNS, or Windows Authentication in SQL Server, today we’re going to focus only on linked servers. For a smoke test, we’ll just run a simple query from one server against the other.

Note: Amazon EC2 offers a free tier for t2.micro instances, but running SQL Server requires a minimum of the m3.medium instance type. You can purchase these instances on an hourly basis from AWS, and even get them with SQL Server Standard Edition pre-installed (license cost included). If you choose to follow along with this article, you are responsible for any charges your account may incur.

Now let’s build a couple of linked servers in AWS:

    1. Log in to your AWS Management Console and click EC2. Click Launch Instance and select the AMI for Windows Server 2012 R2 with SQL Server Standard. If you plan to bring your own license for SQL Server Enterprise and install it yourself, you could instead pick the AMI for Windows Server 2012 R2 Base.
    2. In Step 2 of the wizard, Choose an Instance Type, select m3.medium.
    3. In Step 3 of the wizard, Configure Instance Details, change the number of instances from 1 to 2.
    4. Accept the defaults in Steps 4 and 5 of the wizard.
    5. In Step 6 of the wizard, Configure Security Group, leave the Create a new security group option selected, and you will see that an RDP rule has already been added. If you chose the SQL Server Standard AMI in Step 1, a rule is also added for SQL Server TCP port 1433, but if you chose the AMI without SQL Server Standard, then you need to click Add Rule here and choose MS SQL in the dropdown. You could also add a rule for All ICMP if you would like to test connectivity between servers with ping.
    6. Click Review and Launch, then click Launch. When you’re prompted for a key pair, either create one and download it to your workstation, or use a key pair you already have in AWS and on your workstation.
    7. After you launch the instances, it’s a very good idea to edit the Name column in the EC2 Dashboard to tag the instances as SQL1 and SQL2. The instructions below refer to the servers by those names.
    8. After a couple of minutes, the Instance State will change to running. Select only SQL1, and click Connect. Save the Remote Desktop file to your desktop as SQL1.rdp. Click Get Password (you may need to wait another minute for Windows to finish booting up). Browse to the key pair (.pem) file you saved earlier (this is probably in your downloads folder). Click Decrypt Password. Select the text of the password, copy it, paste it into a scratch text file, and save it on your desktop. If the text file includes a spurious space character after the password, delete that character.
    9. Repeat the above step for SQL2. You can save both administrator passwords in the same scratch text file.
    10. Open SQL1.rdp. (In Windows, it launches Remote Desktop Connection, hereafter called RDC. Alternative RDP client programs are available for Mac systems.) Log in as administrator using the SQL1 password you saved in the scratch file. Click Yes to connect without a remote certificate. Minimize that RDC window and launch another RDC window for SQL2, and log in as administrator using the second password you saved.
    11. If you plan to use your own SQL Server license and chose the AMI for Windows Server 2012 R2 Base in step 1, install SQL Server Enterprise now.
    12. SQL1 needs to be able to get the IP address of SQL2, but in this example, to keep our focus on SQL Server, we aren’t using Amazon Route 53 or Active Directory. On SQL1, open the file c:\windows\system32\drivers\etc\hosts in Notepad. Add the IP address and NetBIOS name of the SQL2 instance. To get the IP address, you can copy/paste the Public IP from the EC2 Dashboard (or run the ipconfig command in a Windows Command Prompt window on SQL2). Usually, you would want to list the actual NetBIOS name in the hosts file, found by running the hostname command on SQL2. But for this exercise you can simply list it as sql2, which will be a handy alias to use on sql1 when referring to the sql2 instance.
    13. On SQL2, start SQL Server Management Studio. You can find it by typing “sql server 2014 man” on the Start screen. In SSMS Object Explorer, right-click the server name and choose Properties. On the Security tab, change Server authentication to SQL Server and Windows Authentication mode (aka “mixed mode”). Right-click the server name again and restart the MSSQLSERVER service.
    14. In SQL2 SSMS Object Explorer, click Security. Right-click Logins | New Login. Since we’re not using Active Directory, let’s change the login type from Windows authentication to SQL Server authentication. Type a login name and password for the linked server to use. Clear the User must change password at next login check box. Don’t click OK yet. See Figure 1.

      Figure 1: SQL Server Login Properties Dialog Box

    15. There are many ways you should lock this down for tighter security, but for our quick experiment, let’s give this user permission to access the master database. Click the User Mapping page in the left navigation pane. Check the box for master. Click OK to save the user.
    16. Switch over to SQL1. In SSMS Object Explorer, click Server Objects. Right-click Linked Servers | New Linked Server. On the General tab, in the Linked server text box at the top, type the NetBIOS name of the SQL2 server. Remember, in the hosts file we simply used an alias name of “sql2” rather than the actual NetBIOS name. For Server type, check SQL Server. On the Security tab, check the Be made using this security context. Enter the user name/password that you created on SQL2 (see figure below). We could be more granular about impersonating local accounts as remote users, but this suffices without adding any rows in the upper grid. Note: If you plan to call stored procedures on SQL2, change RPC Out to true on the Server Options page.

      Figure 2: SQL Server Linked Server Properties Dialog Box

    17. Click OK to create the linked server. If you get an error here, on the Security tab, ensure the Security Context name/password on sql1 match with the SQL user you created on sql2. On the General tab, ensure you checked the radio button for SQL Server for the Server type. Ensure that both instances are in the same EC2 security group with a rule that opens TCP 1433. Also, ensure that you can ping the NetBIOS name of SQL2 from a command prompt on SQL1. To verify that your new user login works on SQL2, you could disconnect your login in SSMS on SQL2 via your administrator account (using Windows Authentication) and then try to connect again in SSMS using your linked user name/password with SQL Server Authentication.
    18. On SQL1, open a query window and execute the query below. Note that the FROM clause uses a four-part syntax: computer.database.schema.table. Remember, in this exercise, we simply created an alias for SQL2 in the hosts file, so you don’t need to enter the actual NetBIOS name between the square brackets. If you do use the actual NetBIOS names, note that AWS defaults to NetBIOS names like Win-xxxx, and SQL Server requires square brackets for names with dashes.

SELECT name "SQL2 databases" FROM [sql2].master.sys.databases

You should see the list of tables in SQL2. Now that you’ve setup linked servers, and seen how the query syntax works, you should have an idea how to set up linked servers in your real applications. One advantage of doing this is that you could offload long-running queries to back-end databases without impacting the CPU on the primary server.

You can find many more resources for running Windows workloads in AWS, including whitepapers and Quick Starts, at this page: http://aws.amazon.com/windows/resources/.

Rancher Labs Support for Amazon EC2 Container Service – RancherOS 0.4

APN Technology Partner Rancher Labs has announced support for Amazon EC2 Container Service by releasing RancherOS 0.4, the first ECS-enabled version of their streamlined, container-oriented operating system Rancher OS.

Amazon EC2 Container Service is built to help you run containers at scale across clusters of Amazon EC2 instances. These instances can run any OS that you’d like, so long as they are ECS-enabled.

ECS-enabled AMIs share two things in common, the Docker daemon and the Amazon ECS Agent. The ECS Agent is responsible for translating ECS API calls into Docker commands, so that when you tell ECS to run a task, it can start the desired containers on your EC2 instances. The ECS Agent is open source and written in GO, and is available on both Github and Dockerhub.

Rancher OS is built to run Docker containers with minimal operating system overhead. All system services are run inside of containers, allowing the entire OS to be extremely minimal (the binary download of the OS is around 20mb). Rancher runs two Docker daemons, one in kernel-space for system utilities, and one in user-space to support your application containers. The kernel-space Docker process is called System Docker and is launched as PID 1, replacing traditional init systems. System Docker then runs User Docker, the instance of the Docker daemon that runs user application containers.

Image credit: https://github.com/rancher/os

Below is an AWS CloudFormation template intended to make evaluating Rancher OS on ECS very straightforward. The template will run in US-WEST-1, create a new VPC with a public subnet, and configure an autoscaling group of Rancher OS instances. We’ll also create an ECS Cluster called “Rancher Cluster”, and we’ll tell the EC2 instances to join this cluster by providing the cluster name in the Autoscaling Launch Configuration userdata script. Be aware that this template will create resources in your account and launch two t2.micro instances (Free Tier eligible!). The template is here.

Once the cluster is up and running, you can use the ECS APIs to start containers on the cluster. If you’re new to running tasks on ECS, take a look at our getting started guide here:

http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_GetStarted.html

We also recommend that you check out the video below from our re:Invent 2014 session,”Amazon EC2 Container Service in Action”:

https://www.youtube.com/watch?v=2vJLS8qfhI0

Here at AWS, we’re excited to continue to support customers with diverse requirements and workloads, especially in the rapidly changing EC2 Container Service ecosystem, and we’re happy to have a network of APN Partners with software that help our customers take full advantage of the cloud. We see Rancher OS as a great option to help our customers run containers on Amazon ECS with minimal overhead. If you’re looking for a minimal and container-focused OS distribution to run containers at scale, check out Rancher OS with Amazon ECS today.

Learn more about Rancher Labs here.

 

Machine Learning on the AWS Cloud

The following is a guest post from one of our APN SAs. This post is an introductory, high-level post. It is intended to help APN Partners familiarize themselves with the concept of machine learning, and to learn more about the use cases that can be supported using Amazon Machine Learning.  

Introduction

There can be tremendous amounts of information buried within gigabytes of your data, including web site visitor metrics, sales information, and email campaign responses, to name a few. How do you tap into that information to make informed business decisions? Is there a way an organization can take advantage of its existing repositories of data to predict the choices customers may make in the future?

Machine learning (ML) can help you use historical data to make better business decisions. ML algorithms discover patterns in data, and construct mathematical models using these discoveries. With machine learning, you can use these models to make predictions on future data. For example, one possible application of a machine learning model would be to predict how likely a customer is to purchase a particular product based on their past behavior.

Smart Applications

Machine learning is the technology that can find patterns in data and use them to make predictions for new data points as they become available. A simplistic definition of a smart application:

Your data + machine learning = smart applications

Smart applications can predict future user action based on past actions. For example, based upon what it knows about the user, a smart application can predict whether the user will make a purchase. A lot of banks are using a smart application concept to warn a user if their log in pattern changes. It’s not uncommon in retail banking websites to see a warning whenever a user tries to log in from a different location or computer. Another example can be seen in specific recommendations made to users from a website; a number of e-commerce and news aggregation websites offer recommendations on a product or news that might be interesting for the user.

The science of machine learning provides the mathematical underpinnings needed to run the analysis and to make sense of the results.  It can help you turn your data into high-quality predictions by finding and codifying patterns and relationships within the data.

What is Amazon Machine Learning?

Amazon Machine Learning is a service that that makes it easy for developers of all skill levels to use machine learning technology, based on the same proven, highly scalable, ML technology used for years by Amazon’s internal data scientist community.  Amazon Machine Learning allows you to easily build predictive applications, including fraud detection, demand forecasting, and click prediction. Amazon Machine Learning uses powerful algorithms that can help you create machine-learning models by finding patterns in existing data, and using these patterns to make predictions from new data as it becomes available.

You can use Amazon Machine Learning through the AWS Management Console and access the data and model visualization tools, as well as wizards, to guide you through the process of creating machine learning models, measuring their quality, and fine-tuning the predictions to match your application requirements. Once the models are created, you can get predictions for your application by using the simple Amazon Machine Learning API, without having to implement custom prediction generation code or manage any infrastructure.

Amazon Machine Learning is highly scalable, and can generate billions of predictions, and serve those predictions in real-time and at high throughput. With Amazon Machine Learning there is no setup cost and you pay as you go, so you can start small and scale as your application grows.

Popular Amazon ML Use Cases

There are a number of use cases for which Machine Learning is a good fit. For APN Partners, I recommend that you consider how smart applications may enhance the value you’re able to provide for your customers on AWS in the following areas, which are outlined on our main Amazon Machine Learning page in more detail: Fraud Detection, Content Personalization, Propensity Modeling for Marketing Campaigns, Document Classification, Customer Churn Prediction, and Automated Support Recommendation for Customer Support.

 

To find out more about Amazon Machine Learning, visit the service web pages and get started building your first predictive model, today.

New AWS Support for Commercially-Supported Docker Applications: Docker Trusted Registry and Docker Engine

The AWS cloud has been shown to be a natural complement to the flexibility that Docker containers offer organizations, and today Amazon EC2 and and Amazon ECS are very popular places to launch and run Docker containers. Customers continue to expand their container footprint and move their applications from dev to test to production, and look for enhanced support and additional product offerings as they embrace the AWS cloud as a place to run Docker containers. At DockerCon 2015 in San Francisco, we discussed work done by both teams to better support Docker on AWS for our customers, and today we take another step toward supporting those who wish to run Docker exclusively on AWS by announcing support for Docker Trusted Registry in AWS Marketplace. Customers can go from building a Docker application locally on a developer’s laptop and ship to their production Amazon Virtual Private Cloud (Amazon VPC) with just a few commands.

Like Docker Hub, Docker Trusted Registry (DTR) is a solution that allows organizations to store and manage Docker containers. However, DTR can be run as an EC2 instance, allowing complete control over how and where the registry is available and accessed from within your environment.

Configuring Your AWS Environment for Docker Trusted Registry

By running Docker Trusted Registry, organizations are able to create custom levels of access control to their Docker images. Certain components of this access control model include support for customer SSL certificates, LDAP integration to limit access to specific users, and leveraging the network access control capabilities of Amazon VPC.

Amazon VPC allows you to configure network settings and isolate cloud resources as much as necessary to meet security or compliance standards. In the case of DTR, we recommend first deciding if your registry instance should be accessible from the Internet or only from within your VPC. If the instance should be available from the Internet, you can launch the DTR instance into a public subnet. However, take care to configure the security group to only allow access from specific trusted IP ranges over ports 22 (SSH), 80 (HTTP) and SSL (443).

Please note that when DTR is launched from AWS Marketplace, the default security group is open to the world, so it’s up to you to restrict access to the IP ranges appropriate for you environment.

The other option is to place your DTR instance into a private subnet so that only resources within your network are able to access the registry. In this case, you’ll need to ensure that you have either a bastion host set up or a VPN into your VPC so you can manage the DTR instance via the web GUI.

We recommend using an Amazon Route 53 private hosted zone with Docker Trusted Registry. A private hosted zone is always queried first by instances in your VPC, and is only accessible from within your VPC- so this allows for the convenience of choosing the endpoint you will use to interact with your registry. This DNS name is what you’ll reference when pushing and pulling images from your registry, so choose something that makes sense- here we’ll use dtr.mydomain.com as an A record that points to our DTR instances IP address.

Because DTR leverages Amazon S3 for back-end storage of your Docker images, we recommend creating an IAM role that will allow your instance to communicate securely with S3. IAM roles are assigned to EC2 instances at instance launch. Here we assume that there is a dedicated bucket for our Docker images, and we can scope our IAM policy accordingly:


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::my_DTR_bucket",
                "arn:aws:s3:::my_DTR_bucket/*"
            ]
        }
    ]
}

Once you’ve decided on a VPC, how to scope access to your registry, created an IAM role, and decided on a DNS record to use, you’re ready to move forward with setting up the DTR instance itself.

Setting Up Supported Docker Environments on AWS

To begin, we’ll be using the Docker Trusted Registry “pay as you go” AMI from AWS Marketplace.  This licensing model is intended to simplify the deployment experience. To further enhance the experience, Docker have included a 30-day free trial of their software. The details are provided on the product page in AWS Marketplace, which is listed here: https://aws.amazon.com/marketplace/pp/B014VG1SIG

Once you’ve launched the AMI, you can follow the AWS and Docker Trusted Registry guide to configure the DTR instance: https://docs.docker.com/docker-trusted-registry/install/dtr-ami-byol-launch/ 

When launching your instance, you’ll need to choose an appropriate instance size. Docker recommends an M3.Large for initial test deployments. As your environment grows, you can use the monitoring features built into the Docker Trusted Registry web GUI to keep an eye on resource utilization and scale your instance size as needed.

Once the DTR instance is up and running, you’ll also need to launch Docker Engine instances (instances running the commercially-supported version of Docker). You can find AMIs to launch Docker Engine instances here: https://aws.amazon.com/marketplace/pp/B014VG1R4Q

One thing to note about Docker Engine instance configuration: if you’re using a self-signed certificate, you’ll also have to configure your clients to pull the certificate from the DTR instance. This can be done using the following commands passed as a user data script:


#!/bin/bash
export DOMAIN_NAME=dtr.mydomain.com
openssl s_client -connect $DOMAIN_NAME:443 -showcerts /dev/null | openssl x509 -outform PEM | sudo tee /etc/pki/ca-trust/source/anchors/$DOMAIN_NAME.crt
sudo update-ca-trust
sudo service docker restart

 

This process depends on your OS, so check here for more comprehensive detail: https://docs.docker.com/docker-trusted-registry/configuration/#installing-registry-certificates-on-client-docker-daemons

Once your Docker Engine client(s) are launched, you can begin interacting with the DTR instance, pushing and pulling images to and from your own private registry from another EC2 instance within your AWS VPC, a peer VPC, or remote location connected via VPN.

From Developer Desktops Direct to the Cloud

Continuous Integration and Delivery is a critical workflow for many teams. Docker supports a number of CI/CD tools, like AWS CodePipeline and AWS CodeDeploy, and a number of deployment endpoints, like Amazon EC2 or Amazon ECS. Docker Trusted Registry can serve as the foundation of these automated workflows that can take code from a developer’s desktop, through integration and unit testing, to a staging or QA environment, and finally to production deployment.

In order to understand how to interact with DTR at the most foundational level, we’ll examine a basic Docker image workflow that can provide the baseline understanding necessary to build more complex CI/CD workflows later.

We’ll first need a client machine that is configured to interact with the DTR instance.

First, we’re going to pull a public Jenkins instance with docker pull Jenkins

Next we’ll tag the image with docker tag dtr.mydomain.com/my-jenkins

Finally, push the image to your local DTR instance docker push dtr.mydomain.com/my-jenkins

A robust and scalable CI pipeline can be built with Docker and Jenkins on AWS to take the code from your developer laptops, directly into integration testing cluster on AWS. Code pushed to a repository like Github can trigger automatic builds of containers using Jenkins, and the resulting container can be pushed to your Docker Trusted Registry instance. This container can then be tested in QA, and ultimately rolled out to production using AWS services like AWS CodeDeploy or AWS Elastic Beanstalk.

We encourage you to take a look at the new Docker commercially-supported software in AWS Marketplace today. We hope the above information gets you started, and we’d love to hear your feedback.

For additional video tutorials, resources and more at the Docker and AWS Resource Center: https://docker.com/aws

The Road to Modern Ops, a New Curriculum from APN Partner HashiCorp

AWS is excited to announce the availability of HashiCorp’s Road to Modern Ops, an interactive curriculum dedicated to guiding organizations from manual processes to modern, automated operations. Through these labs, you’ll have the opportunity to provision AWS infrastructure using HashiCorp products Terraform Packer, and Atlas.

AWS and HashiCorp

AWS provides a flexible and elastic cloud computing platform that facilitates API-driven infrastructure as code, allowing development and operational teams to work closer together. HashiCorp, a member of the AWS Partner Network (APN), has built a variety of tooling around the AWS APIs (amongst others) that allow customers to provision cloud infrastructure in a repeatable fashion.

What is the Road to Modern Ops Curriculum All About?

This educational series covers the full spectrum of HashiCorp automation tools, but two labs in particular are focused on highlighting AWS functionality.  The first of these labs, “Automate provisioning with Terraform”, teaches students how to use Terraform by HashiCorp to build AWS resources like Amazon Virtual Private Clouds (VPCs), Amazon EC2 instances, and Elastic Load Balancers. The declarative syntax used by Terraform configuration files represents infrastructure as code, which enables repeatable deployments of your production environment, better visibility into the relationships between different components and systems, and rapid recovery from failures. This lab is available now here.

The second lab uses Packer by HashiCorp to automate the provisioning and configuration of Amazon EC2 AMIs. The AMIs built by Packer can be used to stand up fully configured EC2 instances into the supporting infrastructure provisioned during the Terraform lesson. By using Packer to move the configuration of systems to before the deploy stage, rather than after, organizations can take a major step toward enabling immutable infrastructure and ultimately ensuring a consistent configuration of resources in their production environments. This lab is available now here.

The labs use HashiCorp’s Atlas to run Terraform and Packer. By running Terraform and Packer within Atlas, development and operations teams automate, audit, and collaborate on infrastructure changes.

Why You Should Consider Signing up for the HashiCorp Curriculum

As you work through this curriculum, you’ll be introduced to the concepts of immutable infrastructure and automated provisioning, two practices that leverage the flexibility and elasticity of the AWS cloud, and you’ll benefit from hands-on experience using what our customers tell us are very effective tools for interacting with AWS Services.

Want to learn more about HashiCorp? Visit the company’s website here.

Getting the Most out of the Amazon S3 CLI

Editor’s note: this is a co-authored guest post from Scott Ward and Michael Ruiz, Solutions Architects with the APN. 

Amazon Simple Storage Service (Amazon S3) makes it possible to store unlimited numbers of objects, each up to 5 TB in size. Managing resources at this scale requires quality tooling. When it comes time to upload many objects, a few large objects or a mix of both, you’ll want to find the right tool for the job. Today we will take a look at one option that is sometimes overlooked: the AWS Command Line Interface (AWS CLI) for Amazon S3.

Note: Some of the examples in this post take advantage of more advanced features of the Linux/UNIX command line environment and the bash shell. We included all of these steps for completeness, but won’t spend much time detailing the mechanics of the examples in order to keep the post at reasonable length.

What is Amazon S3?

Amazon S3 is global online object store and has been a core AWS service offering since 2006. Amazon S3 was designed for scale: it currently stores trillions of objects with peak load measured in millions of requests per second. The service is designed to be cost-effective—you pay only for what you use—durable, and highly available. See the Amazon S3 product page for more information about these and other features.

Data uploaded to Amazon S3 is stored as objects in containers called buckets and identified by keys. Buckets are associated with an AWS region and each bucket is identified with a globally unique name. See the S3 Getting Started guide for a typical Amazon S3 workflow.

Amazon S3 supports workloads as diverse as static website hosting, online backup, online content repositories, and big data processing, but integrating Amazon S3 into an existing on-premises or cloud environment can be challenging. While there is a rich landscape of tooling available from AWS partners and open-source communities, a great place to start your search is the AWS CLI for Amazon S3.

The AWS Command Line Interface (AWS CLI)

The AWS CLI is an open source, fully supported, unified tool that provides a consistent interface for interacting with all parts of AWS, including Amazon S3, Amazon Elastic Compute Cloud (Amazon EC2), Amazon Virtual Private Cloud (Amazon VPC), and other services.  General information about the AWS CLI can be found in the AWS CLI User Guide.

In this post we focus on the aws s3 command set in the AWS CLI. This command set is similar to standard network copy tools you might already be familiar with, like scp or rsync, and is used to copy, list, and delete Amazon S3 buckets and objects. This tool supports the key features required for scaled operations with Amazon S3, including multipart parallelized uploads, automatic pagination for queries that return large lists of objects, and tight integration with AWS Identity and Access Management (IAM) and Amazon S3 metadata.

The AWS CLI also provides the aws s3api command set, which exposes more of the unique features of Amazon S3 and provides access to bucket metadata, like lifecycle policies designed to migrate or delete data automatically.

There are two pieces of functionality built into the AWS CLI for Amazon S3 tool that help make large transfers (many files and large files) into Amazon S3 go as quickly as possible:

First, if the files are over a certain size, the AWS CLI automatically breaks the files into smaller parts and uploads them in parallel. This is done to improve performance and to minimize impact due to network errors.  Once all the parts are uploaded, Amazon S3 assembles them into a single object. See the Multipart Upload Overview for much more data on this process, including information on managing incomplete or unfinished multipart uploads.

Second, the AWS CLI automatically uses up to 10 threads to upload files or parts to Amazon S3, which can dramatically speed up the upload.

These two pieces of functionality can support the majority of your data transfer requirements, eliminating the need to explore other tools or solutions.

For more information on installation, configuration and, usage of the AWS CLI and the s3 commands, see the following AWS documentation:

AWS S3 Data Transfer Scenarios

Let’s take a look at using the AWS CLI for Amazon S3 in the following scenarios and dive into some details of the Amazon S3 mechanisms in play, including parallel copies and multipart uploads.

  • Example 1: Uploading a large number of very small files to Amazon S3
  • Example 2: Uploading a small number of very large files to Amazon S3
  • Example 3: Periodically synchronizing a directory that contains a large number of small and large files that change over time
  • Example 4: Improving data transfer performance with the AWS CLI

Environment Setup

The source server for these examples is an Amazon EC2 m3.xlarge instance located in the US West (Oregon) region. This server is well equipped with 4 vCPUs and 15 GB RAM, and we can expect a sustained throughput of about 1 Gb/sec over the network interface to Amazon S3. This instance will be running the latest Amazon Linux AMI (Amazon Linux AMI 2015.03 (HVM).

The example data will reside in an Amazon EBS 100 GB General Purpose (SSD) volume, which is an SSD-based, network-attached block storage device attached to the instance as the root volume.

The target bucket is located in the US East (N. Virginia) region. This is the region you will specify for buckets created using default settings or when specifying us-standard as the bucket location. Buckets have no maximum size and no object-count limit.

All commands that are represented in this document are run from the bash command line.  All command-line instructions will be represented by a $ as the starting point for the command.

We will be using the aws s3 command set throughout the examples. Here is an explanation for several common commands and options used in these examples:

  • The cp command initiates a copy operation to or from Amazon S3.
  • The --recursive option instructs the AWS CLI for Amazon S3 to descend into subdirectories on the source.
  • The --quiet option instructs the AWS CLI for Amazon S3 to print only errors rather than a line for each file copied.
  • The --sync option instructs the AWS CLI for Amazon S3 to initiate a copy to or from Amazon S3.
  • The Linux time command is used with each AWS CLI call in order to get statistics on how long the command took.
  • The Linux xargs command is used to invoke other commands based on standard output or output piped to it from other commands.

Example 1 – Uploading a large number of small files

In this example we are going to simulate a fairly difficult use case: moving thousands of little files distributed across many directories to Amazon S3 for backup or redistibution. The AWS CLI can perform this task with a single command, s3 cp   --recursive, but we will show the entire example protocol for clarity.  This example will utilize the multithread upload functionality of the aws s3 commands.

  1. Create the 26 directories named for each letter of the alphabet, then create 2048 files containing 32K of pseudo-random content in each
$ for i in {a..z}; do
    mkdir $i
    seq -w 1 2048 | xargs -n1 -P 256 -I % dd if=/dev/urandom of=$i/% 
bs=32k count=1
done
  1. Confirm the number of files we created for later verification:
$ find . -type f | wc -l
53248
  1. Copy the files to Amazon S3 by using aws s3 cp, and time the result with the time command:
$ time aws s3 cp --recursive --quiet . s3://test_bucket/test_smallfiles/

real    19m59.551s
user    7m6.772s
sys     1m31.336s

 

The time command returns the ‘real’ or ‘wall clock’ time the aws s3 cp took to complete. Based on the real output value from the time command, the example took 20 minutes to complete the copy of all directories and the files in those directories.

Notes:

  • Our source is the current working directory (.) and the destination is s3://test_bucket/test_smallfiles.
  • The destination bucket is s3://test_bucket.
  • The destination prefix is test_smallfiles/. Note that this is not a directory in the usual sense, but rather a key prefix that will be prepended to the file name of each object to build the final key name.

TIP:

In many real-world scenarios, the naming convention you use for your Amazon S3 objects will have performance implications.  See this blog post and this document for details about object key naming strategies that will ensure high performance as you scale to hundreds or thousands of requests per second.

  1. We used the Linux lsof command to capture the number of open connections on port 443 while the above copy (cp) command was running:
$ lsof -i tcp:443
COMMAND   PID     USER   FD   TYPE DEVICE SIZE/OFF NODE NAME

aws     22223 ec2-user    5u  IPv4 119954      0t0  TCP ip-10-0-0-37.us-west-2.com
pute.internal:48036->s3-1-w.amazonaws.com:https (ESTABLISHED)

aws     22223 ec2-user    7u  IPv4 119955      0t0  TCP ip-10-0-0-37.us-west-2.com
pute.internal:48038->s3-1-w.amazonaws.com:https (ESTABLISHED)

<SNIP>

aws     22223 ec2-user   23u  IPv4 118926      0t0  TCP ip-10-0-0-37.us-west-2.com
pute.internal:46508->s3-1-w.amazonaws.com:https (ESTABLISHED)

...10 open connections

 

You may be surprised to see there are 10 open connections to Amazon S3 even though we are only running a single instance of the copy command (we truncated the output for clarity, but there were ten connections established to the Amazon S3 endpoint  ‘s3-1-w.amazonaws.com’). This demonstrates the native parallelism built into the AWS CLI.

Here is an example of a similar command that gives us the count of open threads directly:

$ lsof -i tcp:443 | tail -n +2 | wc -l

10

 

  1. Let’s also peek at the CPU load during the copy operation:
$ mpstat -P ALL 10
Linux 3.14.35-28.38.amzn1.x86_64 (ip-10-0-0-37)     05/04/2015  
_x86_64_    (4 CPU)

<SNIP>
09:43:18 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
09:43:19 PM  all    6.33    0.00    1.27    0.00    0.00    0.00    0.51    0.00   91.90
09:43:19 PM    0   14.14    0.00    3.03    0.00    0.00    0.00    0.00    0.00   82.83
09:43:19 PM    1    6.06    0.00    2.02    0.00    0.00    0.00    0.00    0.00   91.92
09:43:19 PM    2    2.04    0.00    0.00    0.00    0.00    0.00    1.02    0.00   96.94
09:43:19 PM    3    2.02    0.00    0.00    0.00    0.00    0.00    1.01    0.00   96.97

 

The system is not seriously stressed given the small file sizes involved. Overall, the CPU is 91.90% idle.   We don’t see any %iowait, %sys, or %user activity, so we can assume that almost all of the CPU time is spent running the AWS CLI commands and handling file metadata.

6. Finally, let’s use the aws s3 ls command to list the files we moved to Amazon S3 and get a count to confirm that the copy was successful:

$ aws s3 ls --recursive s3://test_bucket/test_smallfiles/ | wc -l
53248

This is the expected result: 53,248 files were uploaded, which matches the local count in step 2.

Summary:

Example 1 took 20 minutes to move 53,248 files at a rate of 44 files/sec (53,248 files / 1,200 seconds to upload) using 10 parallel streams.

Example 2 – Uploading a small number of large files

In this example we will create five 2-GB files and upload them to Amazon S3. While the previous example stressed operations per second (both on the local system and in operating the aws s3 upload API), this example will stress throughput. Note that while Amazon S3 could store each of these files in a single part, the AWS CLI for Amazon S3 will automatically take advantage of the S3 multipart upload feature.  This feature breaks each file into a set of multiple parts and parallelizes the upload of the parts to improve performance.

  1. Create five files filled with 2 GB of pseudo-random content:
$ seq -w 1 5 | xargs -n1 -P 5 -I % dd if=/dev/urandom of=bigfile.% b
s=1024k count=2048

Since we are writing 10 GB to disk, this command will take some time to run.

  1. List the files to verify size and number:
$ du -sk .
10485804

$ find . -type f | wc -l
5

This is showing that we have 10 GB (10,485,804 KB) of data in 5 files, which matches our goal of creating five files of 2 GB each.

  1. Copy the files to Amazon S3:
$ time aws s3 cp --recursive --quiet . s3://test_bucket/test_bigfiles/

real    1m48.286s
user    1m7.692s
sys     0m26.860s

Notes:

  • Our source prefix is the current working directory (.) and the destination is s3://test_bucket/test_bigfiles.
  • The destination bucket is s3://test_bucket.
  • The destination prefix is test_bigfiles/. Note that this is not a directory in the usual sense, but rather a key prefix that will be prepended to the file name of each object to build the final key name.
  1. We again capture the number of open connections on port 443 while the copy command is running to demonstrate the parallelism built into the AWS CLI for Amazon S3:
$ lsof -i tcp:443 | tail -n +2 | wc -l
10

Looks like we still have 10 connections open. Even though we only have 5 files, we are breaking each file into multiple parts and uploading them in 10 individual streams.

  1. Capture the CPU load:
$ mpstat -P ALL 10
Linux 3.14.35-28.38.amzn1.x86_64 (ip-10-0-0-37)     05/04/2015  _
x86_64_    (4 CPU)

<SNIP>
10:35:47 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
10:35:57 PM  all    6.30    0.00    3.57   76.51    0.00    0.17    0.75    0.00   12.69
10:35:57 PM    0    8.15    0.00    4.37   75.21    0.00    0.71    1.65    0.00    9.92
10:35:57 PM    1    5.14    0.00    3.20   75.89    0.00    0.00    0.46    0.00   15.31
10:35:57 PM    2    4.56    0.00    2.85   75.17    0.00    0.00    0.46    0.00   16.97
10:35:57 PM    3    7.53    0.00    3.99   79.36    0.00    0.00    0.57    0.00    8.55
 

This is a much more serious piece of work for our instance: We see around 70-80% iowait (where the CPU is sitting idle, waiting for disk I/O) on every core. This hints that we are reaching the limits of our I/O subsystem, but also demonstrates a point to consider: The AWS CLI for Amazon S3, by default and working with large files, is a powerful tool that can really stress a moderately powered system.

6. Check our count of the number of files moved to Amazon S3 to confirm that the copy was successful:

$ aws s3 ls --recursive s3://test_bucket/test_bigfiles/ | wc -l
5

7. Finally, let’s use the aws s3api command to examine the object head metadata on one of the files we uploaded.

$ aws s3api head-object --bucket test_bucket --key test_bigfiles/bigfile
.1
bytes   2147483648      binary/octet-stream     "9d071264694b3a028a22f20
ecb1ec851-256"    Thu, 07 May 2015 01:54:19 GMT
 

  • The 4th field in the command output is the ETag (opaque identifier), which contains an optional ‘-’ if the object was uploaded with multiple parts. In this case we see that the ETag ends with ‘-256’ indicating that the s3 cp command split the upload into 256 parts. Since all the parts but the last are of the same size, a little math tells us that each part is 8 MB in size.
  • The AWS CLI for Amazon S3 is built to optimize upload and download operations while respecting Amazon S3 part sizing rules. The Amazon S3 minimum part size (5 MB, except for the last part which can be smaller), the maximum part size (5 GB), and the maximum number of parts (10,000) are described in theS3 Quick Facts documentation.

Summary:

In example 2, we moved five 2-GB files to Amazon S3 in 10 parallel streams. The operation took 1 minute and 48 seconds. This represents an aggregate data rate of ~758 Mb/s  (85,899,706,368 bytes in 108 seconds) – about 80% of the maximum bandwidth available on our host.

Example 3 – Periodically synchronizing a directory that contains a large number of small and large files that change over time

In this example, we will keep the contents of a local directory synchronized with an Amazon S3 bucket using the aws s3 sync command. The rules aws s3 sync will follow when deciding when to copy a file are as follows: “A local file will require uploading if the size of the local file is different than the size of the s3 object, the last modified time of the local file is newer than the last modified time of the s3 object, or the local file does not exist under the specified bucket and prefix.” See the command reference for more information about these rules and additional arguments available to modify these behaviors.

This example will use multipart upload and parallel upload threads.

  1. Let’s make our example files a bit more complicated and use a mix of file sizes (warning: inelegant hackery imminent):
>
$ i=1;
while [[ $i -le 132000 ]]; do
    num=$((8192*4/$i))
    [[ $num -ge 1 ]] || num=1
    mkdir randfiles/$i
    seq -w 1 $num | xargs -n1 -P 256 -I % dd if=/dev/urandom of=r
andfiles/$i/file_$i.% bs=16k count=$i;
    i=$(($i*2))
done
 

 

  1. Check our work by getting file sizes and file counts:
$ du -sh randfiles/
12G     randfiles/
$ find ./randfiles/ -type f | wc -l
65537

So we have 65537 files totaling 12 GB in size, to sync.

  1. Upload to Amazon S3 using the aws s3 sync command:
$ time aws s3 sync --quiet . s3://test_bucket/test_randfiles/
real    26m41.194s
user    10m7.688s
sys     2m17.592s
 

Notes:

  • Our source prefix is the current working directory (.) and the destination is s3://test_bucket/test_randfiles/.
  • The destination bucket is s3://test_bucket.
  • The destination prefix is test_randfiles/. Note that this is not a directory in the usual sense, but rather a key prefix that will be prepended to the file name of each object to build the final key name.
  1. We again capture the number of open connections while the sync command is running to demonstrate the parallelism built into the AWS CLI for Amazon S3:
$ lsof -i tcp:443 | tail -n +2 | wc -l
10

  1. Let’s check the CPU load. We are only showing one sample interval, but the load will vary much more than the other runs as the AWS CLI for Amazon S3 deals with various files of varying file sizes:
$ mpstat -P ALL 10
Linux 3.14.35-28.38.amzn1.x86_64 (ip-10-0-0-37)     05/07/2015  _
x86_64_    (4 CPU)

03:08:50 AM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
03:09:00 AM  all    6.23    0.00    1.70    1.93    0.00    0.08    0.31    0.00   89.75
03:09:00 AM    0   14.62    0.00    3.12    2.62    0.00    0.30    0.30    0.00   79.03
03:09:00 AM    1    3.15    0.00    1.22    0.41    0.00    0.00    0.31    0.00   94.91
03:09:00 AM    2    3.06    0.00    1.02    0.31    0.00    0.00    0.20    0.00   95.41
03:09:00 AM    3    4.00    0.00    1.54    4.41    0.00    0.00    0.31    0.00   89.74
 

  1. Let’s run a quick count to verify that the synchronization is complete:
$ aws s3 ls --recursive s3://test_bucket/test_randfiles/  | wc -l
65537
 

Looks like all the files have been copied!

  1. Now we’ll make some changes to our source directory:

With this command we are touching eight existing files to update the modification time (mtime) and creating a directory containing five new files.

$ touch 4096/*
$ mkdir 5_more
$ seq -w 1 5 | xargs -n1 -P 5 -I % dd if=/dev/urandom of=5_more/5
_more% bs=1024k count=5

$ find . –type f -mmin -10
.
./4096/file_4096.8
./4096/file_4096.5
./4096/file_4096.3
./4096/file_4096.6
./4096/file_4096.4
./4096/file_4096.1
./4096/file_4096.7
./4096/file_4096.2
./5_more/5_more1
./5_more/5_more4
./5_more/5_more2
./5_more/5_more3
./5_more/5_more5
 

  1. Rerun the sync command. This will compare the source and destination files and upload any changed files to Amazon S3:
$ time aws s3 sync . s3://test_bucket/test_randfiles/
upload: 4096/file_4096.1 to s3://test_bucket/test_randfiles/4096/file_4096.1
upload: 4096/file_4096.2 to s3://test_bucket/test_randfiles/4096/file_4096.2
upload: 4096/file_4096.3 to s3://test_bucket/test_randfiles/4096/file_4096.3
upload: 4096/file_4096.4 to s3://test_bucket/test_randfiles/4096/file_4096.4
upload: 4096/file_4096.5 to s3://test_bucket/test_randfiles/4096/file_4096.5
upload: 4096/file_4096.6 to s3://test_bucket/test_randfiles/4096/file_4096.6
upload: 4096/file_4096.7 to s3://test_bucket/test_randfiles/4096/file_4096.7
upload: 5_more/5_more3 to s3://test_bucket/test_randfiles/5_more/5_more3
upload: 5_more/5_more5 to s3://test_bucket/test_randfiles/5_more/5_more5
upload: 5_more/5_more4 to s3://test_bucket/test_randfiles/5_more/5_more4
upload: 5_more/5_more2 to s3://test_bucket/test_randfiles/5_more/5_more2
upload: 5_more/5_more1 to s3://test_bucket/test_randfiles/5_more/5_more1
upload: 4096/file_4096.8 to s3://test_bucket/test_randfiles/4096/file_409
6.8

real    1m3.449s
user    0m31.156s
sys     0m3.620s
 

Notice that only the touched and new files were transferred to Amazon S3.

Summary:

This example shows the result of running the sync command to keep local and remote Amazon S3 locations synchronized over time. Synchronizing can be much faster than creating a new copy of the data in many cases.

Example 4 – Maximizing throughput

When you’re transferring data to Amazon S3, you might want to do more or go faster than we’ve shown in the three previous examples.  However, there’s no need to look for another tool—there is a lot more you can do with the AWS CLI to achieve maximum data transfer rates.  In our final example, we will demonstrate running multiple commands in parallel to maximize throughput.

In the first example we uploaded a large number of small files and achieved a rate of 44 files/sec.  Let’s see if we can do better.  What we are going to do is string together a few additional Linux commands to help influence how the aws s3 cp command runs.

  1. Launch 26 copies of the aws s3 cp command, one per directory:
$ time ( find smallfiles -mindepth 1 -maxdepth 1 -type d -print0 | xargs -n1 -0 -P30 -I {} aws s3 cp --recursive --quiet {}/ s3://test_bucket/{}/ )
real    2m27.878s
user    8m58.352s
sys     0m44.572s
 

    Note how much faster this completed compared with our original example which took 20 minutes to run.

Notes:

  • The find part of the above command passes a null-terminated list of subdirectories to the ‘smallfiles’ directory to xargs.
  • xargs launches up to 30 parallel (‘-P30’) invocations of aws s3 cp. Only 26 are actually launched based on the output of the find.
  • xargs replaces the ‘{}’ argument in the aws s3 cp command with the file name passed from the output of the find command.
  • The destination here is s3://test_bucket/smallfiles/, which is slightly different from example 1.
  1. Note the number of open connections
$ lsof -i tcp:443 | tail -n +2 | wc -l
260
 

We see 10 connections for each of the 26 invocations of the s3 cp command.

  1. Let’s check system load:
$ mpstat -P ALL 10
Linux 3.14.35-28.38.amzn1.x86_64 (ip-10-0-0-37)     05/07/2015  _
x86_64_    (4 CPU)

07:02:49 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
07:02:59 PM  all   91.18    0.00    5.67    0.00    0.00    1.85    0.00    0.00    1.30
07:02:59 PM    0   85.30    0.00    6.50    0.00    0.00    7.30    0.00    0.00    0.90
07:02:59 PM    1   92.61    0.00    5.79    0.00    0.00    0.00    0.00    0.00    1.60
07:02:59 PM    2   93.60    0.00    5.10    0.00    0.00    0.00    0.00    0.00    1.30
07:02:59 PM    3   93.49    0.00    5.21    0.00    0.00    0.00    0.00    0.00    1.30
  

The server is finally doing some useful work! Since almost all the time is spent in %user with very little %idle or %iowait, we know that the CPU is working hard on application logic without much constraint from the storage or network subsystems. It’s likely that moving to a larger host with more CPU power would speed this process up even more.

  1. Verify the file count:
$ aws s3 ls --recursive s3://test_bucket/smallfiles
53248
 

Summary:

Using 26 invocations of the command improved the execution time by a factor of 8: 2 minutes 27 seconds for 53,248 files vs. the original run time of 20 minutes. The file upload rate improved from 44 files/sec to 362 files/sec.

The application of similar logic to further parallelize our large file scenario in example 2 would easily saturate the network bandwidth on the host. Be careful when executing these examples! A well-connected host can easily overwhelm the Internet links at your source site!

Conclusion

In this post we demonstrated the use of the AWS CLI for common Amazon S3 workflows. We saw that the AWS CLI for Amazon S3 scaled to 10 parallel streams and enabled multipart uploads automatically. We also demonstrated how to accelerate the tasks with further parallelization by using common Linux CLI tools and techniques.

When using the AWS CLI for Amazon S3 to upload files to Amazon S3 from a single instance, your limiting factors are generally going to be end-to-end bandwidth to the AWS S3 endpoint for large file transfers and host CPU when sending many small files. Depending on your particular environment, your results might be different from our example results.  As demonstrated in example 4, there may be an opportunity to go faster if you have the resources to support it. AWS also provides a variety of Amazon EC2 instance types, some of which might provide better results than the m3.xlarge instance type we used in our examples.  Finally, networking bandwidth to the public Amazon S3 endpoint is a key consideration for overall performance.

We hope that this post helps illustrate how powerful the AWS CLI can be when working with Amazon S3, but this is just a small part of the story: the AWS CLI can launch Amazon EC2 instances, create new Amazon VPC’s and enable many of the other features of the AWS platform with just as much power and flexibility as it can for Amazon S3. Have fun exploring!

Securing Web Applications in AWS with Soha

Security is the top priority for Amazon Web Services (AWS) and our customers. AWS handles security with the shared responsibility model. When you are running workloads on AWS, we handle everything from the physical security of our data centers all the way up to the hypervisor. Customers are responsible for building secure applications on AWS, as well as configuring AWS features like security groups or AWS Identity and Access Management (IAM) policies.

Soha Systems is an APN Technology Partner who provides enterprise-grade application security solutions on the customer side of the shared responsibility model. Customers can use Soha to restrict access to applications running in AWS by wrapping them with a secure login in a fully managed package. Soha’s solution complements AWS security features by allowing customers to deploy a line of defense between their applications and the Internet. You can create an Amazon Virtual Private Cloud (VPC) for applications, lock it down so that it allows no inbound access, and use Soha to allow your users to access applications within the Amazon VPC securely.

In this blog post, I will walk you through the steps for setting up this easy-to-use security solution.

Getting Started

Your first step will be to sign up for an account at soha.io. Go to their sign up page and create an account.

Next, stand up a secure VPC with an application that you’d like to protect. For this example, I have built an AWS CloudFormation template that creates a VPC with a public and private subnet. The VPC contains a Network Address Translation (NAT) instance in the public subnet and a web instance in the private subnet. Traffic is allowed out through the NAT instance—there is no inbound access allowed. Download the template, and create a CloudFormation stack from it in your AWS account. Make a note of the VPC ID and private subnet ID that will be in the outputs for this stack. You will need these IDs later to configure Soha to allow access to the web application.

Soha uses an Amazon Elastic Compute Cloud (Amazon EC2) instance called a Cloudlet running in your VPC to allow access to your applications. The Cloudlet is actually brokering the connectivity in and out of your VPC. When installed, the Soha Cloudlet ensures that all inbound ports to your applications are locked down—in essence, moving the attack surface from your application to the Soha cloud.

Here is a diagram of what the completed example will look like. The CloudFormation template will create the VPC, subnets, security groups, and the web and NAT instances. Soha will create the Cloudlet and the Soha security group.

Configure the Application in Soha

Now that you’ve launched a CloudFormation stack from the example template, log in to Soha and follow these steps to configure the application.

First, create an application name and specify its address. The CloudFormation template creates a web instance with the private IP address of 10.0.1.100. Use this as the internal address for the application and set the protocol to HTTP. For this example, use the Soha domain for the application and pick a unique name. When you have this filled out, click Next.

Now configure and launch a Cloudlet into your VPC. Give your Cloudlet a name and pick Amazon AWS EC2/VPC as the Cloudlet package type. Click Next.

Deploy the Cloudlet by clicking the Download and deploy now button. This will take you to the CloudFormation console and will start launching the CloudFormation template. You will need to provide the VPC ID and subnet ID from the example application CloudFormation stack.

 

 

Once the template is deployed, it will take a few minutes for Soha to find the Cloudlet and configure it for your application. The Soha console will be updated when the application is ready, and you will also get an email. Click the link in the Soha console or in the email that says the application is ready. You will be prompted to sign in with your Soha credentials, and you will then be redirected to the application. Congratulations, you have secured access to the demo application with Soha!

To learn more about Soha, visit their AWS Partner Directory listing here.

 

 

Leveraging CircleCI and AWS CodeDeploy for Continuous Integration Workflows

As more organizations move toward a DevOps mindset and prioritize rapid iterations in application development and deployment, choosing an effective combination of tools has become paramount. Continuous integration platforms often function as the cornerstone of application lifecycle management processes, so getting hands-on with a product to understand how it works for your organization is a critical part of choosing the right tool.

This blog post introduces an offering from CircleCI, a member of the AWS Partner Network (APN).  CircleCl is a continuous integration tool that works with AWS CodeDeploy and other deployment services to help development teams build, test, and deliver their software iteratively and efficiently. We’ve written this blog post as newcomers ourselves to the CircleCI platform, utilizing the CircleCI tools to follow our standard development methodology.. This post will also feature AWS CodeDeploy, which will deploy the code we build and test with CircleCI to our Amazon Elastic Compute Cloud (Amazon EC2) instances.

(more…)

Announcing Mesosphere DCOS on AWS

Mesosphere is announcing the general availability of its Datacenter Operating System (DCOS) cluster management product on the Amazon Web Services (AWS) cloud. Let’s take a look at Mesosphere DCOS and launch a simple DCOS cluster on AWS.

What is Mesosphere DCOS?

The Mesosphere Datacenter Operating System (DCOS) is a cluster management platform designed to abstract physical and cloud-based resources behind a single API. This platform enables customers to improve efficiency by running existing applications and complex services like Apache Hadoop, Apache Spark, Apache Kafka, and Apache Hadoop YARN side by side on the same set of servers.

DCOS includes the Apache Mesos, schedulers Chronos and Marathon, an integrated command line interface (CLI) and GUI for administrative tasks, and a simple installer. Once deployed, DCOS enables single-command installation of services like Hadoop or Spark from the DCOS public repository. You can also install custom applications that use language-specific packaging (e.g., a zip archive) or Docker-based container packaging. DCOS is particularly easy to try on AWS: you can launch an entire cloud-based datacenter cluster in minutes by following the steps in the Mesosphere installation guide.

To read about the DCOS architecture, see the overview on the Mesosphere website.

DCOS on AWS

Mesosphere DCOS and AWS are a natural fit: DCOS eases the administrative tasks inherent in managing fleets of containers or complex cluster-enabled services, but it still relies on properly managed compute and networking building blocks like those provided by AWS. By extending on-premises clusters or building entirely new installations on Amazon Elastic Compute Cloud (Amazon EC2), you can take advantage of regional diversity for high availability (HA), robust networking described as code, Auto Scaling, elastic compute capacity, and other core features of the AWS platform. Applications deployed on DCOS in the AWS cloud also have direct access to the wealth of AWS value-added features, including Amazon Simple Storage Service (Amazon S3) for scaled object storage and distribution, Amazon Route 53 for global DNS services, and Amazon CloudFront for edge caching.

Installation Example: Creating an AWS DCOS Cluster Using the AWS CLI

DCOS can be installed by launching a Mesosphere-provided AWS CloudFormation template into your AWS account. AWS CloudFormation provides an easy way to create and manage a collection of AWS resources by declaratively describing the complete environment as code, and maintaining the state of the resources over the complete lifecycle.

In this demonstration, we will be following the procedure from the DCOS installation guide to install a simple, single-server DCOS cluster in the AWS US West (Oregon) region. However, instead of using the AWS EC2 console demonstrated in the guide, we will use the AWS command line interface (AWS CLI) to launch the DCOS stack. We recommend that you review the installation guide to understand the overall installation flow and for up-to-date guidance, tips, and hints.

While the Amazon EC2 console is fully featured and easy to use, automating the installation with the AWS CLI enables you to rapidly iterate and test various deployment parameters, and ultimately improves the quality, repeatability, and maintainability of the installation process through its usage of code.

Prerequisites

Once you’ve installed these components, you can create an AWS DCOS cluster by using AWS CLI commands, as described in the following sections.

Create an Amazon EC2 Key Pair 

Mesosphere DCOS requires an AWS key pair to be specified as part of the cluster installation. From the command line, type:


$ aws --region us-west-2 ec2 create-key-pair --key-name dcos-demo-key --output 
     text \
     --query KeyMaterial > dcos-demo-key.pem
$ chmod 600 dcos-demo-key.pem

Note: Here, we used the aws ec2 create-key-pair command to build a new Amazon EC2 key pair in the US West (Oregon) region. We provided the key name dcos-demo-key and redirected the resulting secret key material into the local file dcos-demo-key.pem. We then protected the key by setting more restrictive permissions on the key file with the chmod command.

Create a Mesosphere DCOS Cluster by Using AWS CloudFormation

Get started with Mesosphere DCOS on AWS by launching an AWS CloudFormation template in the same region as the key pair you created in the previous step. Review the documentation for the location of an appropriate template to use for the value of the template URL. Then, from the command line, type:

 

$ aws --region us-west-2 cloudformation create-stack --stack-name dcos-demo \
--template-url ${TEMPLATE_URL} \
--parameters ParameterKey=AcceptEULA,ParameterValue="Yes" 
     ParameterKey=KeyName,ParameterValue="decos-demo-key" \
--capabilities CAPABILITY_IAM

 

Note: Here, we used the aws cloudformation create-stack command to launch an AWS CloudFormation stack in the US West (Oregon) region. We provided a unique stack name, dcos-demo, a parameter indicating that we accept the EULA, a parameter specifying dcos-demo-key as the name of the key pair we wish to use, and a capability indicating that we are allowing AWS Identity and Access Management (IAM) modifications as a product of the stack launch. Some of the parameters may change depending on the template being used.

Note: In this example, the AWS CloudFormation template we specified with the template-url parameter was replaced with the placeholder ${TERMPLATE_URL}, as the actual templates are region-specific and may be different for your location. The template may also need to be updated as the DCOS stack evolves. Follow the links in the DCOS installation guide for up-to-date template locations and individualized templates for each region.

Tip: The AWS CLI is completely programmable. You can set parameters on the command line (as shown), via a separate JSON file, or via a user-provided URL for full programmatic control.

Monitor the Stack Launch

After 10 to 15 minutes, the stack status should change from CREATE_IN_PROGRESS to CREATE_COMPLETE as in the following example:

During launch:


$ aws --region us-west-2 cloudformation describe-stacks --stack-name dcos-demo
     --query Stacks[0].StackStatus

CREATE_IN_PROGRESS

After launch:


$ aws --region us-west-2 cloudformation describe-stacks --stack-name dcos-demo
     --query Stacks[0].StackStatus

CREATE_COMPLETE

Note: Here, we used the aws cloudformation describe-stacks command to monitor the state of the stack launch. We provided the stack name dcos-demo as before, and used the special query option Stacks[0].StackStatus to limit the output to only the information we are interested in.

Troubleshooting: A ROLLBACK_COMPLETE status means that the deployment has failed. See the AWS CloudFormation console Events tab for useful information about failures. You can also retry the cluster creation. Otherwise, contact Mesosphere support via the feedback mechanisms.

Access the Mesosphere DCOS Dashboard

Once the stack has launched and shows the status CREATE_COMPLETE, run the following command to determine the URL for the DCOS dashboard:

 

$ aws --region us-west-2 cloudformation describe-stacks --stack-name dcos-demo 
     --output json --output text --query Stacks[0].Outputs
Mesos Master DnsAddress dcos-demo-ElasticL-128WE6TIQZZE4-1645934438.us-west- 2.elb.amazonaws.com
Public slaves PublicSlaveDnsAddress     dcos-demo-PublicSl-1L2RTMDHJ3BXH-
     653008986.us-west-2.elb.amazonaws.com

 

Note: We used the aws cloudformation describe-stacks command as in the previous step, but modified the query option to show the output parameters populated after a successful launch.

Enter the value shown for DnsAddress (in bold in the previous example) into a web browser to open the DCOS dashboard. Save this hostname for later use in the DCOS CLI setup process.

If all goes well, you should see a screen similar to the following:

On first launch, the DCOS dashboard will prompt for an email address to use in the Mesosphere DCOS support system.

Install the Mesosphere DCOS CLI

Full DCOS CLI installation instructions are on the Mesosphere DCOS website. To see an abbreviated version of the instructions, choose the Install the Command Line icon in the lower left corner of the DCOS dashboard:

Follow the DCOS CLI installation instructions before proceeding to the next step.

Launch a Sample Application

After you follow the CLI setup instructions, you can start launching applications; for example:

# dcos package install cassandra

Follow the Apache Cassandra installation process on the DCOS dashboard. The Service will appear in response to your CLI command, then show as Unhealthy while the nodes register with one another, and finally after 5-6 minutes will complete installation and show Healthy:

Conclusion

In about 15 minutes, we installed a five-node Marathon-powered Mesos cluster using AWS CLI commands, and then installed Cassandra with a single DCOS CLI command. The cluster is ready for use: you can scale compute capacity by taking advantage of Amazon EC2 Auto Scaling, extend an on-premises DCOS installation, deploy a fully functional web application, and add your own container-based services. In summary, Mesosphere DCOS on AWS represents a very easy, scalable, and cost-effective path for customers and APN Partners who are interested in exploring the wider Mesos ecosystem.