AWS Official Blog

  • AWS Week in Review – September 28, 2015

    by Jeff Barr | on | in Week in Review |

    Let’s take a quick look at what happened in AWS-land last week:


    September 28


    September 29


    September 30


    October 1


    October 2


    October 4

    New & Notable Open Source

    • Otto simplifies development and deployment.
    • SOPS uses KMS and PGP to manage encrypted files for distribution of secrets.
    • Interferon signals you when infrastructure or application issues arise.
    • reinvent-sessions-api is an API to the re:Invent session list.
    • eureka is an AWS Service registry for mid-tier load balancing and failover.
    • acli is an alternative CLI for AWS.
    • BasicConsumer is an example consume for Kinesis.
    • wt-aws-spotter manages EC2 Spot instances using webtasks.
    • ec2-management is a CLI for controlling and scaling DCE Matterhorn clusters on AWS.
    • TrendingTopics discovers what is trending anywhere in the world using big data tools on AWS.

    New YouTube Videos

    New Customer Success Stories

    New SlideShare Presentations

    Upcoming Events

    Upcoming Events at the AWS Loft (San Francisco)

    Upcoming Events at the AWS Loft (New York)

    • October 6 – AWS Pop-up Loft Trivia Night (6 – 8 PM).
    • October 7 – AWS re:Invent at the Loft – Keynote Live Stream (11:30 AM – 1:30 PM).
    • October 8 – AWS re:Invent at the Loft – Keynote Live Stream (12:00 PM – 1:30 PM).
    • October 8 – AWS re:Invent at the Loft — re:Play Happy Hour! (7 – 9 PM).

    Upcoming Events at the AWS Loft (Berlin) – Register Now

    • October 15 – An overview of Hadoop & Spark, using Amazon Elastic MapReduce (9 AM).
    • October 15 – Processing streams of data with Amazon Kinesis (and other tools) (10 AM).
    • October 15 – STUPS – A Cloud Infrastructure for Autonomous Teams (5 PM).
    • October 16 – Transparency and Audit on AWS (9 AM).
    • October 16 – Encryption Options on AWS (10 AM).
    • October 16 – Simple Security for Startups (6 PM).
    • October 19 – Introduction to AWS Directory Service, Amazon WorkSpaces, Amazon WorkDocs and Amazon WorkMail (9 AM).
    • October 19 – Amazon WorkSpaces: Advanced Topics and Deep Dive (10 AM).
    • October 19 – Building a global real-time discovery platform on AWS (6 PM).
    • October 20 – Scaling Your Web Applications with AWS Elastic Beanstalk (10 AM).

    Upcoming Events at the AWS Loft (London) – Register Now

    • October 7 – Amazon DynamoDB (10 AM).
    • October 7 – Amazon Machine Learning (1 PM).
    • October 7 – Innovation & Amazon: Building New Customer Experiences for Mobile and Home with Amazon Technology (3 PM).
    • October 7 – IoT Lab Session (4 PM).
    • October 8 – AWS Lambda (10 AM).
    • October 8 – Amazon API Gateway (1 PM).
    • October 8 – A DevOps Way to Security (3 PM).
    • October 9 – AWS Bootcamp: Architecting Highly Available Applications on AWS (10 AM).
    • October 12 – Hands-on Labs Drop In (1 PM).
    • October 14 – Masterclass Live: Amazon EMR (10 AM).
    • October 14 – IoT on AWS (Noon).
    • October 14 – FinTech in the Cloud: How to build scalable, compliant and secure architecture with AWS (2 PM).
    • October 14 – AWS for Startups (6 PM).
    • October 15 – AWS Container Day (10 AM).
    • October 16 – HPC in the Cloud Workshop (2 – 4 PM).
    • October 19 – Hands-on Labs Drop In (1 PM).
    • October 20 – An Introduction to Using Amazon Web Services and the Alexa Skills Kit to Build Voice Driven Experiences + Open Hackathon (10 AM).
    • October 21 – Startup Showcase – B2C (10 AM).
    • October 21 – Chef Cookbook Workflow (6 PM).
    • October 22 – AWS Security Day (10 AM).
    • October 22 – Working with Planetary-Scale Open Data Sets on AWS (4 PM).
    • October 23 – AWS Booktamp: Taking AWS Operations to the Next Level (10 AM).
    • October 26 – Hands-on Labs Drop In (1 PM).
    • October 27 – IoT Hack Day: AWS Pop-up Loft Hack Series – Sponsored by Intel (10 AM).

    Help Wanted

    Stay tuned for next week! In the meantime, follow me on Twitter and subscribe to the RSS feed.


  • Spot Fleet Update – Console Support, Fleet Scaling, CloudFormation

    by Jeff Barr | on | in Amazon EC2 |

    There’s a lot of buzz about Spot instances these days. Customers are really starting to understand the power that comes with the ability to name their own price for compute power!

    After launching the Spot fleet API in May to allow you to manage thousands of Spot instances with a single request, we followed up with resource-oriented bidding in August and the option to distribute your fleet across multiple instance pools in September.

    One quick note before I dig in: While the word “fleet” might make you think that this model is best-suited to running hundreds or thousands of instances at a time, everything that I have to say here applies regardless of the size of your fleet, whether it is comprised of one, two, three, or three thousand instances! As you will see in a moment, you get a console that’s flexible and easy to use, along with the ability to draw resources from multiple pools of Spot capacity, when you create and run a Spot fleet.

    Today we are adding three more features to the roster: a new Spot console, the ability to change the size of a running fleet, and CloudFormation support.

    New Spot Console (With Fleet Support)
    In addition to CLI and API support, you can now design and launch Spot fleets using the new Spot Instance Launch Wizard. The new wizard allows you to create resource-oriented bids that are denominated in instances, vCPUs, or arbitrary units that you can specify when you design your fleet.  It also helps you to choose a bid price that is high enough (given the current state of the Spot market) to allow you to launch instances of the desired types.

    I start by choosing the desired AMI (stock or custom), the capacity unit (I’ll start with instances), and the amount of capacity that I need. I can specify a fixed bid price across all of the instance types that I select, or I set it to be a percentage of the On-Demand price for the type. Either way, the wizard will indicate (with the “caution” icon) any bid prices that are too low to succeed:

    When I find a set of prices and instance types that satisfies my requirements, I can select them and click on Next to move forward.

    I can also make resource-oriented bids using a custom capacity unit. When I do this I have even more control over the bid. First, I can specify the minimum requirements (vCPUs, memory, instance storage, and generation) for the instances that I want in my fleet:

    The display will update to indicate the instance types that meet my requirements.

    The second element that I can control is the amount of capacity per instance type (as I explained in an earlier post, this might be driven by the amount of throughput that a particular instance type can deliver for my application). I can control this by clicking in the Weighted Capacity column and entering the designated amount of capacity for each instance type:

    As you can see from the screen shot above, I have chosen all of instance types that offer weighted capacity at less than $0.35 / unit.

    Now that I have designed my fleet, I can configure it by choosing the allocation strategy (diversified or lowest price), the VPC, security groups, availability zones / subnets, and a key pair for SSH access:

    I can also click on Advanced to create requests that are valid only between certain dates and times, and to set other options:

    After that I review my settings and click on Launch to move ahead:

    My Spot fleet is visible in the Console. I can select it and see which instances were used to satisfy my request:

    If I plan to make requests for similar fleets from time to time, I can download a JSON version of my settings:

    Fleet Size Modification
    We are also giving you the ability to modify the size of an existing fleet. The new ModifySpotFleetRequest allows you to make an existing fleet larger or smaller by specifying a new target capacity.

    When you increase the capacity of one of your existing fleets, new bids will be placed in accordance with the fleet’s allocation strategy (lowest price or diversified).

    When you decrease the capacity of one of your existing fleets, you can request that excess instances be terminated based on the allocation strategy. Alternatively, you can leave the instances running, and manually terminate them using a strategy of your own.

    You can also modify the size of your fleet using the Console:

    CloudFormation Support
    We are also adding support for the creation of Spot fleets via a CloudFormation template. Here’s a sample:

    "SpotFleet": {
      "Type": "AWS::EC2::SpotFleet",
      "Properties": {
        "SpotFleetRequestConfigData": {
          "IamFleetRole": { "Ref": "IAMFleetRole" },
          "SpotPrice": "1000",
          "TargetCapacity": { "Ref": "TargetCapacity" },
          "LaunchSpecifications": [
            "EbsOptimized": "false",
            "InstanceType": { "Ref": "InstanceType" },
            "ImageId": { "Fn::FindInMap": [ "AWSRegionArch2AMI", { "Ref": "AWS::Region" },
                         { "Fn::FindInMap": [ "AWSInstanceType2Arch", { "Ref": "InstanceType" }, "Arch" ] }
            "WeightedCapacity": "8"
            "EbsOptimized": "true",
            "InstanceType": { "Ref": "InstanceType" },
            "ImageId": { "Fn::FindInMap": [ "AWSRegionArch2AMI", { "Ref": "AWS::Region" },
                         { "Fn::FindInMap": [ "AWSInstanceType2Arch", { "Ref": "InstanceType" }, "Arch" ] }
            "Monitoring": { "Enabled": "true" },
            "SecurityGroups": [ { "GroupId": { "Fn::GetAtt": [ "SG0", "GroupId" ] } } ],
            "SubnetId": { "Ref": "Subnet0" },
            "IamInstanceProfile": { "Arn": { "Fn::GetAtt": [ "RootInstanceProfile", "Arn" ] } },
            "WeightedCapacity": "8"

    Available Now
    The new Spot Fleet Console, the new ModifySpotFleetRequest function, and the CloudFormation support are available now and you can start using them today!


  • Are You Well-Architected?

    by Jeff Barr | on | in Training and Certification |

    Seattle-born musical legend Jimi Hendrix started out his career with a landmark album titled Are You Experienced?

    I’ve got a similar question for you: Are You Well-Architected? In other words, have you chosen a cloud architecture that is in alignment with the best practices for the use of AWS?

    We want to make sure that your applications are well-architected. After working with thousands of customers, the AWS Solutions Architects have identified a set of core strategies and best practices for architecting systems in the cloud and have codified them in our new AWS Well-Architected Framework. This document contains a set of foundational questions that will allow you to measure your architecture against these best practices and to learn how to address any shortcomings.

    The AWS Well-Architected Framework is based around four pillars:

    • Security – The ability to protect information systems and assets while delivering business value through risk assessments and mitigation strategies.
    • Reliability – The ability to recover from infrastructure or service failures, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.
    • Performance Efficiency -The efficient use computing resources to meet system requirements, and maintaining that efficiency as demand changes and technologies evolve.
    • Cost Optimization – The ability to avoid or eliminate unneeded cost or suboptimal resources.

    For each pillar, the guide puts forth a series of design principles, and then defines the pillar in detail. Then it outlines a set of best practices for the pillar and proffers a set of questions that will help you to understand where you are with respect to the best practices. The questions are open-ended. For example, there’s no simple answer to the question “How does your system withstand component failures?” or “How are you planning for recovery?”

    As you work your way through the Framework, I would suggest that you capture and save the answers to each of the questions. This will give you a point-in-time reference and will allow you to look back later in order to measure your progress toward well-architected.

    The AWS Well-Architected Framework is available at no charge. If you find yourself in need of additional help along your journey to the cloud, be sure to tap in to accumulated knowledge and expertise of our team of Solutions Architects.


    PS – If you are coming to AWS re:Invent, be sure to attend the Well-Architected Workshop at 1 PM on Wednesday, October 7th.

  • Amazon WorkSpaces Update – BYOL, Chromebooks, Encryption

    by Jeff Barr | on | in Amazon WorkSpaces |

    As I have noted in the past, I am a huge fan and devoted user of Amazon WorkSpaces. In fact, every blog post that I have written and illustrated over the last 6 or 7 months has been written on my WorkSpace. The most recent set of AWS podcasts were edited on the same WorkSpace.

    Several months ago the hard drive in my laptop crashed and was replaced. In the past, I would have spent several hours installing and customizing my apps and my environment. All of my work in progress is stored in Amazon WorkDocs, so that aspect of the recovery would have been painless. At this point, the only truly personal items on my laptop are the 12-character registration code for my WorkSpace and my hard-won set of stickers. My laptop has become little more than a generic display and I/O device (with some awesome stickers).

    I have three pieces of good news for Amazon WorkSpaces users:

    1. You can now bring your Windows 7 Desktop license to Amazon WorkSpaces.
    2. There’s a new Amazon WorkSpaces Client App for Chromebook.
    3. The storage volumes used by WorkSpaces (both root and user) can now be encrypted.

    Bring Your Windows 7 Desktop License to Amazon WorkSpaces (BYOL)
    You can now bring your existing Windows 7 Desktop license to Amazon WorkSpaces and run the Windows 7 Desktop OS on hardware that is physically dedicated to you. This new option entitles you to a discount of $4.00 per month per WorkSpace (a savings of up to 16%) and also allows you to use the same Windows 7 Desktop golden image on-premises and the AWS cloud. The newly launched images can be activated using new or existing Microsoft activation servers running in your VPC, or that can be reached from your VPC.

    To take advantage of this option, at a minimum your organization must have an active Enterprise Agreement (EA) with Microsoft and you must commit to running at least 200 WorkSpaces in a given AWS region each month. To learn more, take a look at the WorkSpaces FAQ.

    In order to ensure that you have adequate dedicated capacity allocated to your account and to get started with BYOL, please reach out to your AWS account manager or sales representative or create a Technical Support case with Amazon WorkSpaces.

    New Amazon WorkSpaces Client App for Chromebook
    Today we are making Amazon WorkSpaces even more flexible and accessible by adding support for the Google Chromebook. These low-cost “thin client” laptops are simple and easy to manage. They run Chrome OS and were designed specifically for internet users. This makes them a great match for Amazon WorkSpaces because you can access your cloud desktops, your productivity apps, and your corporate network from devices that are simple to manage, secure, and available at a low cost.

    The newest Amazon WorkSpaces client app runs on Chromebooks (version 45 of Chrome OS and newer) with ARM and Intel chipsets, and supports both touch and non-touch devices.  You can download the WorkSpaces client for Chromebook now and install it on your Chromebook today.

    The Amazon WorkSpaces client app is also available for Mac OS X, iPad, Windows, Android Tablet, and Fire Tablet environments.

    Encrypted Storage Volumes Using KMS
    Amazon WorkSpaces enables you to deliver a high quality desktop experience to your end-users and can also help you to address regulatory requirements or to conform to organizational security policies.

    Today we are announcing an additional security option: encryption for WorkSpaces data in motion and at rest (this includes the disk volume and the snapshots associated with it). The WorkSpaces administrator now has the option to encrypt the C: and D: drives as part of the launch and configuration process for each newly created WorkSpace.  This encryption is performed using a customer master key (CMK) stored in AWS Key Management Service (KMS).

    Encryption is supported for all types of Amazon WorkSpace bundles including custom bundles created within your organization, but must be set up when the WorkSpace is created (encrypting an existing WorkSpace is not supported). Each customer master key from KMS can be used to encrypt up to 30 WorkSpaces.

    Launching a WorkSpace with an encrypted root volume can take additional time. Once launched, you can expect to see a minimal impact on latency or IOPS. Here is how you (or your WorkSpaces administrator) choose the volumes to be encrypted along with the KMS key at launch time:

    The encryption status of each WorkSpace is also visible from within the WorkSpaces Console:

    There’s no charge for the encryption feature, but you will pay the standard KMS charges for any keys that you create.


    PS – Before you ask, I am planning to ditch my laptop in favor of a Chromebook immediately after AWS re:Invent!

  • AWS CloudTrail Update – SSE-KMS Encryption & Log File Integrity Verification

    by Jeff Barr | on | in AWS CloudTrail, Key Management Service |

    My colleague Sivakanth Mundru sent the guest post below to introduce a pair of features for AWS CloudTrail.


    As you know, AWS CloudTrail records API calls made on your account and delivers log files containing API activity to an S3 bucket you specify. Today, we are announcing two new features for CloudTrail:

    • Support for Encryption using SSE-KMS – You can add an additional layer of security for the CloudTrail log files stored in your S3 bucket by encrypting them with your AWS Key Management Service (KMS) key. CloudTrail will encrypt the log files using the KMS key you specify.
    • Log File Integrity Validation – You can validate the integrity of the CloudTrail log files stored in your S3 bucket and detect whether they were deleted or modified after CloudTrail delivered them to your S3 bucket. You can use the log file integrity (LFI) validation as a part of your security and auditing discipline.

    These features are available today in the US East (Northern Virginia), US West (Oregon), US West (Northern California), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Tokyo), Asia Pacific (Sydney), Asia Pacific (Singapore), and South America (Brazil) regions.

    Support for Encryption Using KMS
    CloudTrail generates log files and sends them to a S3 bucket. By default, the files are encrypted using S3’s Server Side Encryption (SSE), and then transparently decrypted when you read them. With today’s launch you can now provide a KMS key to CloudTrail and it will be used to encrypt your log files. As is the case with SSE, decryption is transparent and automatic if you have permission to read the object. Therefore, applications that read and process log files do not require any changes. You simply need to give S3 permission to decrypt the files. Here’s how it all fits together:

    Here’s how you can set this up yourself:

    1. Create a KMS key or use an existing KMS key in the same region as the S3 bucket where you receive your CloudTrail log files and apply the KMS-CloudTrail policy.
    2. Apply decrypt permissions to the principal (IAM users, roles, groups, and so forth) that will be accessing the CloudTrail log files.
    3. Update an existing trail with the KMS key from step 1 (you can enable encryption at the time you create a trail if you use the CLI).

    Log File Integrity Validation
    If you are doing a security audit or investigation, you may want to validate the integrity of the CloudTrail log files stored in your S3 bucket and detect whether they been deleted or modified since CloudTrail delivered the log file to your S3 bucket (the expectation is that they will be unchanged). The new CloudTrail log file integrity validation feature enables you do that.

    In order to validate the integrity of log files, you need to enable log file validation for your trail.  You can do this by setting Enable log file validation to Yes in the advanced section of your trail configuration:

    Once you enable log file integrity validation, CloudTrail will start delivering digest files, on an hourly basis, to the same S3 bucket where you receive your CloudTrail log files, but with a different prefix:

    • CloudTrail log files are delivered to /optional_prefix/AWSLogs/AccountID/CloudTrail/*.
    • CloudTrail digest files are delivered to /optional_prefix/AWSLogs/AccountID/CloudTrail-Digest/*.

    This layout allows applications that integrate with CloudTrail to process the log files without making any changes. You can also apply different and granular access control permission to the log files and digest files.

    The digest files contain information about the log files that were delivered to your S3 bucket, hash values for those log files, digital signatures for the previous digest file and the digital signature for the current digest file in the S3 metadata section. For more information about digest files, digital signatures and hash values, read about the CloudTrail Digest File Structure.

    To validate the CloudTrail log files, use the AWS Command Line Interface (CLI) and simply run the following command to validate the log files:

    $ aws cloudtrail validate-logs \
      --trail-arn arn:aws:cloudtrail:us-west-2:111111111111:trail/Trailname \
      --start-time 2015-09-24T00:00:00Z --region=us-west-2

    If the log files have not been modified or deleted you will see output that looks like this:

    Validating log files for trail arn:aws:cloudtrail:us-west-2:111111111111:trail/Trailname between \
      2015-09-24T00:00:00Z and 2015-09-25T18:56:41Z
    Results requested for 2015-09-24T00:00:00Z to 2015-09-25T18:56:41Z
    Results found for 2015-09-24T00:30:26Z to 2015-09-25T18:56:41Z:
    43/43 digest files valid
    31/31 log files valid

    If one or more log files have been deleted you will see output that looks like this:

    Log file s3://mybucket-CTlogs/AWSLogs/111111111111/CloudTrail/us-west-2/2015/09/22/111111111111_CloudTrail_us-west-2_20150922T1720Z_Jy4SwZotr3eTI2FM.json.gz \
           INVALID: not found
    Results requested for 2015-09-22T00:00:00Z to 2015-09-25T18:42:03Z
    Results found for 2015-09-22T00:30:26Z to 2015-09-25T18:42:03Z:
    43/43 digest files valid
    30/31 log files valid, 1/31 log files INVALID

    If one or more log files have been modified you will see output that looks like this:

    Log file s3://mybucket-CTlogs/AWSLogs/111111111111/CloudTrail/us-west-2/2015/09/25/111111111111_CloudTrail_us-west-2_20150925T1845Z_lU58MiCsXyI1U3R1.json.gz \
           INVALID: hash value doesn't match
    Results requested for 2015-09-24T00:00:00Z to 2015-09-25T21:44:50Z
    Results found for 2015-09-24T00:30:26Z to 2015-09-25T21:44:50Z:
    45/45 digest files valid
    35/36 log files valid, 1/36 log files INVALID

    You can run the validate-logs command in verbose mode to perform a deeper analysis.

    To learn more about this feature, read about Validating CloudTrail Log File Integrity.

    If you have any questions or feedback on these new features, you can post them in the CloudTrail forums.

    Sivakanth Mundru, Senior Product Manager, AWS CloudTrail

  • New – Amazon Elasticsearch Service

    by Jeff Barr | on | in Amazon Elasticsearch Service |

    Elasticsearch  is a real-time, distributed search and analytics engine that fits nicely into a cloud environment. It is document-oriented and does not require a schema to be defined up-front. It supports structured, unstructured, and time-series queries and serves as a substrate for other applications and visualization tools including Kibana.

    Today we are launching the new Amazon Elasticsearch Service (Amazon ES for short). You can launch a scalable Elasticsearch cluster from the AWS Management Console in minutes, point your client at the cluster’s endpoint, and start to load, process, analyze, and visualize data shortly thereafter.

    Creating a Domain
    Let’s go ahead and create an Amazon ES domain (as usual, you also can do this using the AWS Command Line Interface (CLI), AWS Tools for Windows PowerShell, or the Amazon Elasticsearch Service API). Simply click on the Get Started button on the splash page and enter a name for your domain (I chose my-es-cluster):

    Select an instance type and an instance count (both can be changed later if necessary):

    Here are some guidelines to help you to choose appropriate instance types:

    • T2 – Dev and test (also good for dedicated master nodes).
    • R3 – Processing loads that are read-heavy or that have complex queries (e.g. nested aggregations).
    • I2 – High-write, large-scale data storage.
    • M3 – Balanced read/write loads.

    If you check Enable dedicated master, Amazon ES will create a separate master node for the cluster. This node will not hold data or respond to upload requests. We recommend that you enable this option and use at least three master nodes to ensure maximum cluster stability. Also, clusters should always have an odd number of master nodes in order to protect against split-brain scenarios.

    If you check Enable zone awareness, Amazon ES will distribute the nodes across multiple Availability Zones in the region to increase availability. If you choose to do this, you will also need to set up replicas using the Elasticsearch Index API; you can also use the same API to do this when you create new indexes (learn more).

    I chose to use EBS General Purpose (SSD) storage for my data nodes. I could have chosen to store the data on the instance, or to use another type of EBS volume. Using EBS allows me to store more data and to run on less costly instances; however on-instance storage will offer better write performance. Large data sets can run on I2 instances (they have up to 1.6 terabytes of SSD storage per node).

    Next, set the access policy. I chose to make mine wide-open in order to simplify testing (don’t do this for your cluster); I could have used one of the IP-based or user-based templates and a wizard to create a more restrictive policy.

    Finally, review the settings and click on Confirm and create:

    The cluster will be created in a couple of minutes, and will be listed on the Elasticsearch Service dashboard (I added some documents before I took this screenshot):

    And that’s it!

    Loading Documents
    I knew next to nothing about Elasticsearch before I started to write this blog post, but that didn’t stop me from trying it out. Following the steps in Having Fun: Python and Elasticsearch, Part 1, I installed the Python library for Elasticsearch, and returned to the AWS Management Console to locate the endpoint for my cluster.

    I performed the status check outlined in the blog post, and everything worked as described therein. Then I pasted the Python code from the post into a file, and ran it to create some sample data. I was able to see the new index in the Console:

    That was easy!

    Querying Documents
    With the data successfully loaded, I clicked on the Kibana link for my cluster to see what else I could do:

    Kibana (v4) opened in another browser tab and I configured it to index my posts:

    Kibana confirmed the fields in the domain:

    From there (if I had more time and actually knew what I was doing) I could visualize my data using Kibana.

    Version 3 of Kibana is also available. To access it, simply append _plugin/kibana3/ to the endpoint of your cluster.

    Other Goodies
    You can scale your cluster using the CLI (aws es update-elasticsearch-domain-configuration), API (UpdateElasticsearchDomainConfig), or the console. You simply set the new configuration and Amazon ES will create the new cluster and copy your the data to it with no down time.

    As part of today’s launch of Amazon ES, we are launching integration with CloudWatch Logs. You can arrange to route your CloudWatch Logs to Amazon ES by creating an Amazon ES domain, navigating to the Cloudwatch Logs Console and clicking on Subscribe to Lambda / Amazon ES, then stepping through the wizard:

    The wizard will help you to set up a subscription filter pattern for the incoming logs (the pattern is optional, but having one allows you to define a schema for the logs). Here are some sample Kibana dashboards that you can use to view several different types of logs, along with the filter patterns that you’ll need to use when you route the logs to Amazon ES:

    • VPC Flow Dashboard – use this filter pattern to map the log entries:
      [version, account_id, interface_id, srcaddr, dstaddr, srcport, dstport,
      protocol, packets, bytes, start, end, action, log_status]
    • Lambda Dashboard – use this filter pattern to map the log entries:
      [timestamp=*Z, request_id="*-*", event].
    • CloudTrail Dashboard – no filter pattern is needed; the log entries are in self-identifying JSON form.

    Amazon ES supports the ICU Analysis Plugin and the Kuromoji plugin. You can configure these normally through the Elasticsearch Mapping API. Amazon ES does not currently support commercial plugins like Shield or Marvel. The AWS equivalents for these plugins are AWS Identity and Access Management (IAM) and CloudWatch.

    Amazon ES automatically takes a snapshot of your cluster every day and stores it durably for 14 days. You can contact us to restore your cluster from a stored backup. You can set the hour of the day during which that backup occurs via the “automated snapshot hour.” You can also use the Elasticsearch Snapshot API to take a snapshot of your cluster and store it in your S3 bucket or restore an Elasticsearch snapshot (Amazon ES or self-managed) to an Amazon ES cluster from your S3 bucket.

    Each Amazon ES domain also forwards 17 separate metrics to CloudWatch. You can view these metrics on the Amazon ES console’s monitoring tab or in the CloudWatch console. The cluster Status metrics (green, yellow, and red) expose the underlying cluster’s status: green means all shards are assigned to a node; yellow means that at least one replica shard is not assigned to any node; red means that at least one primary shard is not assigned to a node. One common occurrence is for a cluster to go yellow when it has a single data node and replication is set to 1 (Logstash does that by default). The simple fix is to add another node to the cluster.

    CPU Utilization is most directly affected by request processing (reads or writes). When this metric is high you should increase replication and add instances to the cluster to allow for additional parallel processing. Similarly for JVM Memory pressure, increase instance count or change to R3 instances. You should set CloudWatch alarms on these metrics, to keep 10-20% free storage, and free CPU at all times.

    Available Now
    You can create your own Amazon ES clusters today in the US East (Northern Virginia), US West (Northern California), US West (Oregon), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney), South America (Brazil), Europe (Ireland), and Europe (Frankfurt) regions.

    If you qualify for the AWS Free Tier, you can use a t2.micro.elasticsearch node for up to 750 hours  per month, along with up to 10 gigabytes of Magnetic or SSD-Backed EBS storage at no charge.


  • New – AWS CloudFormation Designer + Support for More Services

    by Jeff Barr | on | in AWS CloudFormation |

    AWS CloudFormation makes it easy for you to create and manage a collection of related AWS resources (which we call a stack). Starting from a template, CloudFormation creates the resources in an orderly and predictable fashion, taking in to account dependencies between them, and connecting them together as defined in the template.

    A CloudFormation template is nothing more than a text file. Inside the file, data in JSON format defines the AWS resources: their names, properties, relationships to other resources, and so forth. While the text-based model is very powerful, it is effectively linear and, as such, does not make relationships between the resources very obvious.

    Today we are launching the new AWS CloudFormation Designer. This visual tool allows you to create and modify CloudFormation templates using a drag-and-drop interface. You can easily add, modify, or remove resources and the underlying JSON will be altered accordingly. If you modify a template that is associated with a running stack, you can update the stack so that it conforms to the template.

    We are also launching CloudFormation support for four additional AWS services.

    Free Tour!
    Let’s take a quick tour of the CloudFormation Designer. Here’s the big picture:

    The design surface is center stage, with the resource menu on the left and a JSON editor at the bottom. I simply select the desired AWS resources on the left, drag them to the design surface, and create relationships between them. Here’s an EC2 instance and 3 EBS volumes:

    I  created the relationships between the instance and the volumes by dragging the “dot” on the lower right corner of the instance (labeled AWS::EC2::Volume) to each volume in turn.

    I can select an object and then edit its properties in the JSON editor:

    Here’s a slightly more complex example:

    The dotted blue lines denote resource-to-resource references (this is the visual equivalent of CloudFormation’s Ref attribute). For example, the DBSecurityGroup (middle row, left) refers to the EC2 SecurityGroup (top row, left); here’s the JSON:

    I should point out that this tool does not perform any magic!  You will still need to have a solid understanding of the AWS resources that you include in your template, including a sense of how to put them together to form a complete system. You can right-click on any resource to display a menu; from there you can click on the ? to open the CloudFormation documentation for the resource:

    Clicking on the eye icon will allow you to edit the resource’s properties. Once you have completed your design you can launch a stack from within the Designer.

    You can also open up the sample CloudFormation templates and examine them in the Designer:

    The layout data (positions and sizes) for the AWS resources is stored within the template.

    Support for Additional Services
    We are also adding support for the following services today:

    Visit the complete list of supported services and resources to learn more.

    Available Now
    The new CloudFormation Designer is available now and you can start using it today by opening the CloudFormation Console. Like CloudFormation itself, there is no charge to use the Designer; you pay only for the AWS resources that you use when you launch a stack.


  • Amazon EMR Release 4.1.0 – Spark 1.5.0, Hue 3.7.1, HDFS Encryption, Presto, Oozie, Zeppelin, Improved Resizing

    by Jeff Barr | on | in Amazon EMR |

    My colleagues Jon Fritz and Abhishek Sinha are both Senior Product Managers on the EMR team. They wrote the guest post below to introduce you to the newest release of EMR and to tell you about new EMR cluster resizing functionality.


    Amazon EMR is a managed service that simplifies running and managing distributed data processing frameworks, such as Apache Hadoop and Apache Spark.

    Today we are announcing Amazon EMR release 4.1.0, which includes support for Spark 1.5.0, Hue 3.7.1 and HDFS transparent encryption with Hadoop KMS. We are also introducing an intelligent resize feature that allows you to reduce the number of nodes in your cluster with minimal impact to running jobs. Finally, we are also announcing the availability of Presto 0.119, Zeppelin 0.6 (Snapshot) and Oozie 4.0.1 as Sandbox Applications. The EMR Sandbox gives you early access to applications which are still in development for a full General Availability (GA) release.

    EMR release 4.1.0 is our first follow-up release to 4.0.0, which brought many new platform improvements around configuration of applications, a new packaging system, standard ports and paths for Hadoop ecosystem applications, and a Quick Create option for clusters in the AWS Management Console.

    New Applications and Components in the 4.x Release Series
    Amazon EMR provides an easy way to install and configure distributed big data applications in the Hadoop and Spark ecosystems on your cluster when creating clusters from the EMR console, AWS CLI, or using a SDK with the EMR API. In release 4.1.0, we have added support for several new applications:

    • Spark 1.5.0 – We included Spark 1.4.1 on EMR release 4.0.0, and we have upgraded the version of Spark to 1.5.0 in this EMR release. Spark 1.5.0 includes a variety of new features and bug fixes, including additional functions for Spark SQL/Dataframes, new algorithms in MLlib, improvements in the Python API for Spark Streaming, support for Parquet 1.7, and preferred locations for dynamically allocated executors. To learn more about Spark in Amazon EMR, click here.
    • HUE 3.7.1 – Hadoop User Experience (HUE) is an open source user interface which allows users to more easily develop and run queries and workflows for Hadoop ecosystem applications, view tables in the Hive Metastore, and browse files in Amazon S3 and on-cluster HDFS. Multiple users can login to HUE on an Amazon EMR cluster to query data in Amazon S3 or HDFS using Apache Hive and Pig, create workflows using Oozie, develop and save queries for later use, and visualize query results in the UI. For more information about how to connect to the HUE UI on your cluster, click here.
    • Hadoop KMS for HDFS Transparent Encryption – The Hadoop Key Management Server (KMS) can supply keys for HDFS Transparent Encryption, and it is installed on the master node of your EMR cluster with HDFS. You can also use a key vendor external to your EMR cluster which utilizes the Hadoop KeyProvider API. Encryption in HDFS is transparent to applications reading from and writing to HDFS, and data is encrypted in in-transit in HDFS because encryption and decryption activities are carried out in the client. Amazon EMR has also included an easy configuration option to programmatically create encrypted HDFS directories when launching clusters. To learn more about using Hadoop KMS with HDFS Transparent Encryption, click here.

    Introducing the EMR Sandbox
    With the EMR Sandbox, you now have early access to new software for your EMR cluster while those applications are still in development for a full General Availability (GA) release. Previously, bootstrap actions were the only mechanism to install applications not fully supported on EMR. However, you would need to specify a bootstrap action script, the installation was not tightly coupled to an EMR release, and configuration settings were harder to maintain. Instead, applications in the EMR Sandbox are certified to install correctly, configured using a configuration object, and specified directly from the EMR console, CLI, or EMR API using the application name (ApplicationName-Sandbox). Release 4.1.0 has three EMR Sandbox applications:

    • Presto 0.119 – Presto is an open-source, distributed SQL query engine designed to query large data sets in one or more heterogeneous data sources, including Amazon S3. Presto is optimized for ad-hoc analysis at interactive speed and supports standard ANSI SQL, including complex queries, aggregations, joins, and window functions. Presto does not use Hadoop MapReduce; instead, it uses a query execution mechanism that processes data in memory and pipelines it across the network between stages. You can interact with Presto using the on-cluster Presto CLI or connect with a supported UI like Airpal, a web-based query execution tool which was open sourced by Airbnb. Airpal has several interesting features such as syntax highlighting, results exported to a CSV for download, query history, saved queries, table finder to search for appropriate tables, and a table explorer to visualize schema of a table and sample the first 1000 rows. To learn more about using Airpal with Presto on Amazon EMR, read the new post, Analyze Data with Presto and Airpal on Amazon EMR on the AWS Big Data Blog. To learn more about Presto on EMR, click here.
    • Zeppelin 0.6 (Snapshot) – Zeppelin is an open source GUI which creates interactive and collaborative notebooks for data exploration using Spark. You can use Scala, Python, SQL (using Spark SQL), or HiveQL to manipulate data and quickly visualize results. Zeppelin notebooks can be shared among several users, and visualizations can be published to external dashboards. When executing code or queries in a notebook, you can enable dynamic allocation of Spark executors to programmatically assign resources or change Spark configuration settings (and restart the interpreter) in the Interpreter menu.
    • Oozie 4.0.1 – Oozie is a workflow scheduler for Hadoop, where you can create Directed Acyclic Graphs (DAGs) of actions. Also, you can easily trigger your Hadoop workflows by actions or time.

    Example Customer Use Cases for Presto on Amazon EMR
    Even before Presto was supported as a Sandbox Application, many AWS customers have been using Presto on Amazon EMR, especially for interactive ad hoc queries on large scale data sets in Amazon S3. Here are a few examples:

    • Cogo Labs, a startup incubator, operates a platform for marketing analytics and business intelligence. Presto running on Amazon EMR allows any of their 100+ developers and analysts to run SQL queries on over 500 TB of data stored in Amazon S3 for data-exploration, ad-hoc analysis, and reporting.
    • Netflix has chosen Presto as their interactive, ANSI-SQL compliant query engine for big data, as Presto scales well, is open source, and integrates with the Hive Metastore and Amazon S3 (the backbone of Netflix’s Big Data Warehouse environment.) Netflix runs Presto on persistent EMR clusters to quickly and flexibly query across their ~25PB S3 data store. Netflix is an active contributor to Presto, and Amazon EMR provides Netflix with the flexibility to run their own build of Presto on Amazon EMR clusters. On average, Netflix runs ~3500 queries per day on their Presto clusters. Learn more about Netflix’s Presto deployment.
    • Jampp is a mobile application marketing platform, and they use advertising retargeting techniques to drive engaged users to new applications. Jampp currently uses Presto on EMR to process 40 TB of data each day.
    • Kanmu is Japanese startup in the financial services industry and provides offers based on consumers’ credit card usage. Kanmu migrated from Hive to using Presto on Amazon EMR because of Presto’s ability to run exploratory and iterative analytics at interactive speeds, good performance with Amazon S3, and scalability to query large data sets.
    • OpenSpan provides automation and intelligence solutions that help bridge people, processes and technology to gain insight into employee productivity, simplify transactions, and engage employees and customers. OpenSpan migrated from HBase to Presto on Amazon EMR with Amazon S3 as a data layer. OpenSpan chose Presto because of its ANSI SQL interface and ability to query data in real-time directly from Amazon S3, which allows them to quickly explore vast amounts of data and rapidly iterate on upcoming data products.

    Intelligent Resize Feature Set
    In release 4.1.0, we have added an Intelligent Resize feature set so you can now shrink your EMR cluster with minimal impact to running jobs. Additionally, when adding instances to your cluster, EMR can now start utilizing provisioned capacity as soon it becomes available. Previously, EMR would need the entire requested capacity to become available before allowing YARN to send tasks to those nodes. Also, you can now issue a resize request to EMR while a current resize request is being executed (to change the target size of your cluster), or stop a resize operation.

    When decreasing the size of your cluster, EMR will programmatically select instances which are not running tasks, or if all instances in the cluster are being utilized, EMR will wait for tasks to complete on a given instance before removing it from the cluster. The default wait time is 1 hour, and this value can be changed. You can also specify a timeout value in seconds by changing the yarn.resourcemanager.decommissioning parameter in /home/hadoop/conf/yarn-site.xml or /etc/hadoop/conf/yarn-site.xml file. EMR will dynamically update the new setting and a resource manager restart is not required. You can set this to arbitrarily large number to ensure that no tasks are killed while shrinking the cluster.

    Additionally, Amazon EMR now has support for removing instances in the core group, which store data as a part of HDFS along with running YARN components. When shrinking the number of instances in a cluster’s core group, EMR will gracefully decommission HDFS daemons on the instances. During the decommissioning process, HDFS replicates the blocks on that instance to other active instances to reach the desired replication factor in the cluster (EMR sets the default replication factor to 1 for 1-3 core nodes, the value to 2 for 4-9 core nodes, and the value to 3 for 10+ core nodes). To avoid data loss, EMR will not allow shrinking your core group below the storage required by HDFS to store data on the cluster, and will ensure that the cluster has enough free capacity to successfully replicate blocks from the decommissioned instance to the remaining instances. If the requested instance count is too low to fit existing HDFS data, only a partial number of instances will be decommissioned.

    We recommend minimizing HDFS heavy writes before removing nodes from your core group. HDFS replication can slow down due to under-construction blocks and inconsistent replica blocks, which will decrease the performance of the overall resize operation. To learn more about resizing your EMR clusters, click here.

    Launch an Amazon EMR Cluster With 4.1.0 Today
    To create an EMR cluster with 4.1.0, select release 4.1.0 on the Create Cluster page in the AWS Management Console, or use the release label “emr-4.1.0” when creating your cluster from the AWS CLI or using a SDK with the EMR API.

    Jon Fritz and Abhishek Sinha

  • New AWS Security Courses (Fundamentals & Operations)

    by Jeff Barr | on | in Training and Certification |

    It’s probably no surprise that information security is one of today’s most sought after IT specialties. It’s also deeply important to our customers and any company considering moving to the cloud.

    So, today we’re launching a new AWS Training curriculum focused on security. The curriculum’s two new classes are designed to help you meet your cloud security objectives under the AWS Shared Responsibility Model, by showing you how to create more secure AWS architectures and solutions and address key compliance requirements.

    Here’s a closer look at what’s new:

    • AWS Security Fundamentals – This free 3-hour online class is designed to introduce fundamental cloud computing and AWS security concepts, including AWS access control and management, governance, logging, and encryption methods. The class, aimed primarily at security professionals with little or no working knowledge of AWS, also addresses security-related compliance protocols, risk management strategies, and procedures for auditing AWS security infrastructure.
    • Security Operations on AWS – A 3-day technical deep dive on how to stay secure and compliant in the AWS cloud. This classroom-based course covers security features of key AWS services and AWS best practices for securing data and systems. You’ll learn about regulatory compliance standards and use cases for running regulated workloads on AWS. Hands-on practice with AWS security products and features will help you take your security operations to the next level.

    Visit AWS Training to learn more about the new security courses and find an instructor-led class near you.

    — Jeff;

  • New AWS Digital Library for Big Data Solutions

    by Jeff Barr | on | in Big Data, Case Studies | | Comments

    My colleague Luis Daniel Soto has been working with AWS Community Hero Lynn Langit to create a comprehensive collection of resources for customers who are ready to run Big Data applications on AWS!

    Here’s what they have to say….

    — Jeff;

    Today the AWS Marketplace is launching a new on-line video library designed to help our customers find AWS Marketplace vendor solutions, as well as accelerate and manage short and long-term data integration, business intelligence and advanced analytics projects for their AWS cloud and on-premises data.

    The AWS Marketplace Digital Library for Big Data provides business and technical content from AWS Marketplace technology vendors and case studies from customers who have built end-to-end Big Data solutions. The segments are hosted by cloud and Big Data architect Lynn Langit and organized around a common set of functionality to help organizations and individuals find the AWS Marketplace vendor solutions to address their particular needs.

    The library is hosted on a video webcasting platform which allows our customers to interact with AWS Marketplace partners, by asking questions as they watch the demos and interviews in split-screen mode. Here’s a sample:

    If you are an APN Partner and want to learn more or want to be part of the AWS Digital Library, visit the new Big Data Partner Solutions page.

    — Luis and Lynn