AWS Official Blog
I would like to extend a warm welcome to the newest AWS Community Heroes:
- Adam Smolnik
- Kai Henry
- Onur Salk
- Paolo Latella
- Raphael Francis
- Rob Linton
The Heroes share their enthusiasm for AWS via social media, blogs, events, user groups, and workshops. Let’s take a look at their bios to learn more.
Adam is a Principal Software Engineer at Pitney Bowes, a global technology company offering products and solutions that enable commerce in the areas of customer information management, location intelligence, customer engagement, shipping and mailing, and global ecommerce. Prior to Pitney Bowes, Adam worked as an application developer, consultant, and designer for companies like Kroll Ontrack, IBM and EDS. He supports and publishes articles on Chmurowisko.pl, the most recognized Polish website revolving around Cloud technology and premier source of information about Amazon AWS and Cloud Computing in general.
Adam is also a co-founder of AWS User Group Poland (established in 2014), an active speaker and trainer at Cloud conferences, instructor at Cloud and Software workshops as well as co-organizer of the Cloudyna conference. Be sure to take a look at his LinkedIn profile.
Kai Hendry is the founder of Webconverger, a company and open source project of the same name, supplier of Web kiosk and signage software since 2007. After graduating from the University of Helsinki with a Master’s degree in Computer Science in 2005, he travelled and worked around the world to discover insecure Web kiosks in Internet cafes and public spaces. On return to England, he engineered a secure Web kiosk operating system based on Debian and maintained it on weekends whilst in fulltime employment working upon Web technologies.
Over time, Webconverger’s popularity grew and by the end of his tenure in the telecommunication’s industry he decided move to Singapore, get married and focus on his company. Now a successful small business, Webconverger provides reliable management service for Web kiosks using AWS services such as S3 with Route 53 fail over.
Kai is an active member of the maker community in Singapore, usually found working from Hackerspace.SG, helps with the local AWS User Group Singapore Meetup group and organizes the Singapore Hack and Tell chapter.
For 8 years Onur Salk has been leading the infrastructure and technical operations of Yemeksepeti.com, which has since been acquired by Delivery Hero. He is also responsible for Foodonclick.com, Ifood.jo, Yemek.com and Irmik.com.
He helped build Yemek.com, a fully automated and self-healing website, which runs entirely on Amazon Web Services. In a first for Turkey, he worked to migrate Foodonclick.com to AWS, achieving implementing of MS SQL Always On running in a production environment.
Onur regularly publishes AWS articles on his blog Wekanban.com and is the founder and organizer of the AWS User Group Turkey Meetup group in Istanbul. He is passionate about cloud computing, automation, configuration management and DevOps. He also enjoys programming in Python and developing open source AWS tools.
Paolo Latella is a Cloud Solutions Architect and AWS Technical Trainer at XPeppers, an enterprise focused on Cloud technologies and DevOps methodologies and member of the AWS Partner Network (APN). Paolo has more than 15 years of experience in IT and has worked on AWS technologies since 2008. Before joining XPeppers he was a Solution Architect Team Leader at Interact, an enterprise leader in Digital Media for the Cloud. There he followed the first Hybrid Cloud project for the Italian Public Sector.
He graduated from the University of Rome “La Sapienza” in Computer Science, publishing a thesis about “Auto configuration and monitoring of Wireless Sensors Network”. After graduating, he received a research grant for the study of advanced network systems and mission critical services at the CASPUR (Consorzio Applicazioni Supercalcolo per Università e Ricerca) now CINECA.
Raphael Francis is a proud Cebuano technopreneur. He is the Chief Technology Officer of Upteam Corporation, a worldwide supplier of authentic, curated and pre-owned high-end brands. He serves as a consultant to the management services company Penbrothers, business SaaS company Yewusoftware and was a founding member of AVA.ph, the Philippine’s first curated marketplace for premium brands. He also served as the CTO of Techforge Solutions, an IT firm that launched various brands, enterprises and online ventures.
“Sir Rafi” has genuine enthusiasm for effective mentoring. He comes from a family of teachers and educators, and was a professor himself at the Sacred Heart – Ateneo de Cebu and La Salle College of St. Benilde.
Rob Linton is the founder of Podzy, an encrypted on premise replacement for Dropbox, which was the winner of the 2013 Australian iAwards Toolsets category. Over the past 20 years as a data specialist he’s worked as a spatial information systems professional and data professional. His first company, Logicaltech Systalk has received numerous awards and commendations for product excellence, and was the winner of the Australian 2010 iAwards.
In July 2011 he founded the first AWS User Group in Australia. He is a certified Security Systems ISO 27001 auditor, and one of the few people to receive a perfect score for his SQL Server certification. His last book was Amazon Web Services: Migrate your .NET Enterprise Application to the Amazon Cloud.
In his spare time he enjoys coding in C++ on his Macbook Pro and chasing his kids away from things that break relatively easily.
Please join me in welcoming our newest heroes!
AWS OpsWorks makes it easy for you to deploy applications of all shapes and sizes. It provides you with an integrated management experience that spans the entire application lifecycle including resource provisioning, EBS volume setup, configuration management, application deployment, monitoring, and access control (read my introductory post, AWS OpsWorks – Flexible Application Management in the Cloud Using Chef for more information).
Amazon EC2 Container Service is a highly scalable container management service that supports Docker containers and allows you to easily run applications on a managed cluster of Amazon Elastic Compute Cloud (EC2) instances (again, I have an introductory post if you’d like to learn more: Amazon EC2 Container Service (ECS) – Container Management for the AWS Cloud).
ECS and RHEL Support
Today, in the finest “peanut butter and chocolate” tradition, we are adding support for ECS Container Instances to OpsWorks. You can now provision and manage ECS Container Instances that are running Ubuntu 14.04 LTS or the Amazon Linux 2015.03 AMI.
We are also adding support for Red Hat Enterprise Linux (RHEL) 7.1.
Let’s take a closer look at both features!
Support for ECS Container Instances
The new ECS Cluster layer type makes it easy for you to provision and configure ECS Container Instances. You simply create the layer, specify the name and instance type for the cluster (which must already exist), define and attach EBS volumes as desired, and you are good to go. The instances will be provisioned with Docker, the ECS agent, and the OpsWorks agent, and will be registered with the ECS cluster associated with the ECS Cluster layer.
It is really easy to get started. Simply add a new layer and select the ECS Cluster Layer type:
Then choose a cluster and a profile:
The next step is to add instances to the cluster. This takes just a couple of clicks per instance:
As is always the case with OpsWorks, the instances are initially in the Stopped state, and can be started with a click on Start All Instances (individual instances can also be started):
Once the instances are up and running you can run Chef recipes on them. You can also install operating system (Linux only) and package updates (read Run AWS OpsWorks Stack Commands to learn more) on the instances in the cluster. Finally, take a look at Using OpsWorks to Perform Operational Tasks to learn how to envelop shell commands in a simple JSON wrapper and run them.
For more information on this and other features, take a look at the OpsWorks User Guide. To learn more about how to run ECS tasks on Container Instances that have been provisioned by OpsWorks, read the ECS Getting Started Guide.
RHEL 7.1 Support
OpsWorks now supports version 7.1 of Red Hat Enterprise Linux (RHEL). Many AWS customers have asked us to support this OS and we are happy to oblige, as we did earlier this year when we announced OpsWorks support for Windows. You can launch and manage EC2 instances running RHEL 7. You can also manage existing, on-premises instances that are running RHEL 7.
You have several launch options. You can choose RHEL 7 as the default when you launch a new stack, and you can set it as the default for an existing stack. You can also leave the default as-is and choose to run RHEL 7 when you launch new instances. Here’s how you select RHEL 7 as the default when you launch a new stack:
As you probably know already, you can manage instances that are not running on OpsWorks for a modest hourly fee. You can take advantage of the monitoring and management tools provided by OpsWorks while managing all of your instances using a single user interface. To do this, you add additional compute power to a layer by registering an existing instance instead of launching a new one:
Step through the wizard; the final step will show you how to install the OpsWorks agent on your instance and register with OpsWorks:
When you run the command it will download the agent, install any necessary packages, and start the agent. The agent will register itself with OpsWorks and the instance will become part of the stack specified on the command line. At that point the instance will be registered as part of the stack but not assigned to a layer or configured in any particular way. You can use OpsWorks user-management feature to create users, manage permissions, and provide them with SSH access if necessary.
Installing the agent also sets up one-minute CloudWatch metrics:
After you have configured the instances and verified that they are being monitored, you can assign them to a layer:
These features are available now and you can start using them today.
We launched AWS Device Farm earlier this month with support for testing apps on Android and Fire OS devices.
I am happy to give you a heads-up that you will soon be able to test your apps on Apple phones and tablets! We plan to launch support for iOS on August 4, 2015 with support for the following test automation frameworks:
Here are some preliminary screen shots of the new iOS support in action. After you upload your binary to Device Farm, you will have the opportunity to select the app to test:
After you start the test (step 5 in the screen shot above), the test results and the associated screen shots will be displayed as they arrive:
With the new iOS support, you will be able to test your cross-platform titles and get reports (including high-level test results, problem patterns, logs, screenshots, and performances data) that are consistent, regardless of the platform and test framework that you use. If you use a cross-platform test framework such as Appium or Calabash, you can use the same code for Android, FireOS, and iOS tests.
As I said earlier, iOS support will be available in less than a week. You can get ready now by reading the Device Farm documentation and by creating test suites and scripts using one or more of the frameworks that I mentioned above.
Today we are adding a developer preview of support for Xamarin to the existing AWS SDK for .NET. Xamarin allows you to build cross-platform C# applications that run on iOS, Android, and Windows devices. The new AWS Mobile SDK for Xamarin gives your Xamarin app access to multiple AWS services including:
- Amazon Cognito – Identity management.
- Amazon Simple Storage Service (S3) – Object storage.
- Amazon DynamoDB – NoSQL database
- Amazon Simple Notification Service (SNS) – Mobile Push for notifications.
- Amazon Mobile Analytics – Track app usage and other metrics.
You can use the Xamarin Studio IDE to write, debug, and test your code:
Read Getting Started with the AWS Mobile SDK for .Net / Xamarin to learn how to install the SDK and to start using AWS services from a Xamarin application. Read more on the Xamarin blog.
We launched Amazon Simple Storage Service (S3) in the spring of 2006 with a simple blog post. Over the years we have kept the model simple and powerful while reducing prices, and adding features such as the reduced redundancy storage model, VPC endpoints, cross-region replication, and event notifications.
We launched the event notification model last year, with support for notification when objects are created via PUT, POST, Copy, or Multipart Upload. At that time the notifications applied to all of the objects in the bucket, with the promise of more control over time.
Today we are adding notification when objects are deleted, as well as with filtering on prefixes and suffixes for all types of notifications. We are also adding support for bucket-level Amazon CloudWatch metrics,
You can now arrange to be notified when an object has been deleted from an S3 bucket. Like the other types of notifications, delete notifications can be delivered to an SQS queue or an SNS topic or used to invoke an AWS Lambda function. The notification indicates that a DELETE operation has been performed on an object, and can be used to update any indexing or tracking data that you maintain for your S3 objects.
Also, you can now use prefix and suffix filters to opt in to event notifications based on object name. For example, you can choose to receive DELETE notifications for the images/ prefix and the .png suffix in a particular bucket like this:
You can create and edit multiple notifications from within the Console:
CloudWatch Storage Metrics
Amazon CloudWatch tracks metrics for AWS services and for your applications and allows you to set alarms that will be triggered when a metric goes past a limit that you specify. You can now monitor and set alarms on your S3 storage usage. Available metrics include total bytes (Standard and Reduced Redundancy Storage) and total number of objects, all on a per-bucket basis. You can find the metrics in the AWS Management Console:
The metrics are updated daily, and will align with those in your AWS bill. These metrics do not include or apply to S3 objects that have been migrated (via a lifecycle rule) to Glacier.
These features are available now and you can start using them today.
We announced Amazon Aurora last year at AWS re:Invent (see Amazon Aurora – New Cost-Effective MySQL-Compatible Database Engine for Amazon for more info). With storage replicated both within and across three Availability Zones, along with an update model driven by quorum writes, Amazon Aurora is designed to deliver high performance and 99.99% availability while easily and efficiently scaling to up to 64 TB of storage.
In the nine months since that announcement, a host of AWS customers have been putting Amazon Aurora through its paces. As they tested a wide variety of table configurations, access patterns, and queries on Amazon Aurora, they provided us with the feedback that we needed to have in order to fine-tune the service. Along the way, they verified that each Amazon Aurora instance is able to deliver on our performance target of up to 100,000 writes and 500,000 reads per second, along with a price to performance ratio that is 5 times better than previously available.
Today I am happy to announce that Amazon Aurora is now available for use by all AWS customers, in three AWS regions. During the testing period we added some important features that will simplify your migration to Amazon Aurora. Since my original blog post provided a good introduction to many of the features and benefits of the core product, I’ll focus on the new features today.
If you are already using Amazon RDS for MySQL and want to migrate to Amazon Aurora, you can do a zero-downtime migration by taking advantage of Amazon Aurora’s new features. I will summarize the process here, but I do advise you to read the reference material below and to do a practice run first! Immediately after you migrate, you will begin to benefit from Amazon Aurora’s high throughput, security, and low cost. You will be in a position to spend less time thinking about the ins and outs of database scaling and administration, and more time to work on your application code.
If the database is active, start by enabling binary logging in the instance’s DB parameter group (see MySQL Database Log Files to learn how to do this). In certain cases, you may want to consider creating an RDS Read Replica and using it as the data source for the migration and replication (check out Replication with Amazon Aurora to learn more).
Open up the RDS Console, select your existing database instance, and choose Migrate Database from the Instance Actions menu:
Fill in the form (in most cases you need do nothing more than choose the DB Instance Class) and click on the Migrate button:
Aurora will create a new DB instance and proceed with the migration:
A little while later (a coffee break might be appropriate, depending on the size of your database), the Amazon Aurora instance will be available:
Now (assuming that the source database was actively changing) while you were creating the Amazon Aurora instance, replicate the changes to the new instance using the mysql.rds_set_external_master command, and then update your application to use the new Aurora endpoint!
Each Amazon Aurora instance reports a plethora of metrics to Amazon CloudWatch. You can view these from the Console and you can, as usual, set alarms and take actions as needed:
Easy and Fast Replication
Each Amazon Aurora instance can have up to 15 replicas, each of which adds additional read capacity. You can create a replica with a couple of clicks:
Due to Amazon Aurora’s unique storage architecture, replication lag is extremely low, typically between 10 ms and 20 ms.
When we first announced Amazon Aurora we expected to deliver a service that offered at least 4 times the price-performance of existing solutions. Now that we are ready to ship, I am happy to report that we’ve exceeded this goal, and that Amazon Aurora can deliver 5x the price-performance of a traditional relational database when run on the same class of hardware.
In general, this does not mean that individual queries will run 5x as fast as before (although Amazon Aurora’s fast, SSD-based storage certainly speeds things up). Instead, it means that Amazon Aurora is able to handle far more concurrent queries (both read and write) than other products. Amazon Aurora’s unique, highly parallelized access to storage reduces contention for stored data and allows it to process queries in a highly efficient fashion.
From our Partners
Members of the AWS Partner Network (APN) have been working to test their offerings and to gain operational and architectural experience with Amazon Aurora. Here’s what I know about already:
- Business Intelligence – Tableau, Zoomdata, and Looker.
- Data Integration – Talend, Attunity, and Informatica.
- Query and Monitoring – Webyog, Toad, and Navicat.
- SI and Consulting – 8K Miles, 2nd Watch, and Nordcloud.
- Content Management – Alfresco.
Ready to Roll
Our customers and partners have put Amazon Aurora to the test and it is now ready for your production workloads. We are launching in the US East (Northern Virginia), US West (Oregon), and Europe (Ireland) regions, and will expand to others over time.
Pricing works like this:
- Database Instances – You pay by the hour for the primary instance and any replicas. Instances are available in 5 sizes, with 2 to 32 vCPUs and 15.25 to 244 GiB of memory. You can also use Reserved Instances to save money on your steady-state database workloads.
- Storage – You pay $0.10 per GB per month for storage, based on the actual number of bytes of storage consumed by your database, sampled hourly. For this price you get a total of six copies of your data, two copies in each of three Availability Zones.
- I/O – You pay $0.20 for every million I/O requests that your database makes.
See the Amazon Aurora Pricing page for more information.
Let’s take a quick look at what happened in AWS-land last week. If you find these summaries useful, or if you have ideas for additional types of content, please feel free to leave a comment.
New & Notable Open Source
- aws-google-login allows you to log into the AWS Console with Google Apps.
- aws-api-gateway-swagger-importer lets you create or update Amazon API Gateway APIs from a Swagger representation.
- libqtaws is an library for consuming AWS services from Qt applications.
- aws-terraform is a practical implementation of CoreOS cluster provisioning on AWS.
- aws-ec2-ssh lets you retrieve all active EC2 instances and SSH to them based on tag name.
- aws-cleanup cleans up unused EBS volumes, AMIs, and snapshots.
- stacker is an opinionated CloudFormation stack builder.
- aws-sdk-perl is an attempt to build an AWS SDK in Perl.
- opsworks_cookbooks is a cookbook collection for OpsWorks.
- log4net.awsKinesis is a log4net appender that writes to an AWS Kinesis stream.
- node-teslakinesis streams Tesla telemetry data to AWS Kinesis.
New Customer Stories
- Alert Logic.
- Apeejay Stya & Svrán Group.
- Federal Home Loan Bank of Chicago.
- NDTV Convergence.
- New York City Department of Transportation.
- Oscar Insurance.
- Pinoy Travel.
- Thermo Fisher Scientific.
New SlideShare Content
- Batch Processing with Amazon EC2 Container Service.
- Empire – Building a PaaS with Docker and AWS.
- Convert Your Code into a Microservice Using AWS Lambda.
- AWS Technical Day – Amazon Cognito.
- AWS Technical Day – Building Your Data Warehouse with Redshift.
- AWS Technical Day – Introducing Amazon API Gateway.
New YouTube Videos
- Women in Tech Panel: Women Executives from AWS, Georgetown University, Navy, SAP, and Congress.
- Michael Wagers, COO, Seattle Police Department shares how they’re transforming policing on the cloud.
- David Ohana shares how UNICEF is reaching a global audience in a more agile and cost effective way.
- Chan Cheow How shares how the Singapore Government is digitizing citizen services on the cloud.
- Mark Schwartz, DHS, CIS CIO shares how agencies are modernizing and accelerating the pace of IT.
- Jon Booth, CMS shares Healthcare.gov’s turnaround using cloud.
- Congressman Gerry Connolly talks about progress in public services enabled by the cloud.
- AWS Public Sector Symposium Keynote Featuring Teresa Carlson and Customers.
- AWS NY Summit – Amazon ECS Customer Use Cases.
- AWS Summit London – Peterborough City Council: Infrastructure Innovation in the Cloud.
- AWS Summit London – Cork Institute of Technology: Cloud Transformation in Education.
- AWS Summit London – Transport for London Infrastructure Powered by AWS.
- AWS Summit London – UCAS’ Journey to the Cloud.
- AWS Summit London – Makewaves Transforming Education in the Cloud.
- AWS Symposium – Washington, DC.
New Marketplace Applications
- DivvyCloud Enterprise.
- Couchbase Server Enterprise Edition (Silver).
- Bitfusion Accelerated Media Processing.
- July 28 – Webinar – Best Practices: OpsWorks for Windows on AWS.
- July 28 – Webinar – Deploying line of business desktop apps using Amazon WorkSpaces Application Manager.
- July 29 – Webinar – Overview: Build and Manage your APIs with Amazon API Gateway.
- July 29 – Webinar – Deploying and Scaling Web Application with AWS Elastic Beanstalk.
- July 29 – Webinar – Deep Dive: Troubleshooting Operational and Security incidents in your AWS Account using CloudTrail.
- July 30 – Webinar – Getting Started with AWS Device Farm.
- July 30 – Webinar – Best Practices: Real-time Data Processing with Amazon DynamoDB Streams and AWS Lambda.
- July 30 – Webinar – Getting Started with Amazon DynamoDB.
- July 29 – Webinar – Troubleshoot Operational & Security Incidents with CloudTrail.
- August 19 – Meetup (Sacramento, CA) – High Availability by Design.
- August 27 – Meetup (Novato, CA) – Autodesk and Cyan Discuss All Things Docker.
- September 1 – Meetup (Dublin, Ireland) – AWS Usergroup Ireland.
- AWS Summits – Latin America.
- AWS re:Invent.
Upcoming Events at the AWS Loft (San Francisco)
- July 27 – Lowering Total Cost of Ownership with AWS (3 – 4 PM).
- July 30 – Shinola Brings Back “Made in America” to Detroit with DevOps and the Cloud (6 – 8 PM).
- July 31 – IoT Hack Day Sponsored by MediaTek Labs (10 AM – 6 PM).
- August 10 – AWS Bootcamp: Taking Operations to the Next Level (10 AM – 6 PM).
- August 11 – Behind the Scenes with AdRoll – Petabyte-Scale Data Workflows with Docker, Luigi, and Spot Instances (6 PM – 7:30 PM).
- August 12 – Loft Rocks Concert Series with Bohemian Guitars (7 PM – 10 PM).
- August 13 – Meet the Amazon Aurora Subject Matter Experts (2 PM – 4 PM).
- August 13 – Programmatic Security on AWS (6 PM – 8 PM).
- August 17 – AWS Bootcamp: Getting Started with AWS -Technical (10 AM – 6 PM).
- August 26 – Continuous Compliance and Management in the Cloud (6 PM – 9 PM).
- August 20 – Behind the Scenes with ILM Lucasfilm: “Powering the World’s Leading Visual Effects Studio” (6 PM – 7:30 PM).
- September 14 – AWS Bootcamp: Architecting Highly Available Applications (10 AM – 6 PM).
- September 28 – AWS Bootcamp: Taking Operations to the Next Level (10 AM – 6 PM).
Upcoming Events at the AWS Loft (New York)
- July 28 – Convert Your Code into a Microservice Using AWS Lambda (4 PM – 6 PM).
- July 28 – Behind the Scenes with Timehop: Optimizations for Time Series Data in DynamoDB (6:30 PM – 8 PM).
- July 29 – Security Threats, the Cloud, and Your Responsibilities presented by Evident.io (6:30 PM – 8 PM).
- July 30 – Build Your Own Website: Making Web Development Fun & Easy (1 PM – 6 PM).
- August 3 – Amazon EMR Deep Dive (12 PM – 1:30 PM).
- August 5 – Startup Pitch Event and Summer Social (6:30 PM).
- August 10 – AWS Bootcamp (10 AM – 6 PM).
- August 11 – AWS Bootcamp (10 AM – 6 PM).
- August 12 – DynamoDB Deep Dive (12 PM – 1:30 PM).
- August 12 – Behind the Scenes with Threat Stack: Building a Production Analytics System in Amazon Redshift (6:30 PM – 8:30 PM).
- August 13 – Loft Rocks Concert Series with Controller (7 PM – 10 PM).
- August 17 – IAM Overview (12 PM – 1 PM).
- August 17 – IAM Best Practices (1 PM).
- August 17 – Understanding Options for Encrypting Your Data (2 – 3 PM).
- August 17 – Behind the Scenes with Rapid7 – AWS Infrastructure as Code (6:30 PM – 8 PM).
- August 18 – Mastering Access Management Policies (12 PM – 1:30 PM).
- August 18 – Bring Your Own Identities – Federating Access to Your AWS Environment (1:30 PM – 3 PM).
- August 18 – Entrepreneurs Roundtable (6:30 PM – 9:30 PM).
- August 20 – AdTech Loft Talks (6:30 PM – 8 PM).
- August 21 – Intro to Using AWS and the Alexa Skills Kit to Build Voice Driven Experiences + Open Hackathon (10 AM – 3 PM).
- August 24 – AWS Bootcamp – Getting Started with AWS (10 AM – 6 PM).
- August 25 – AWS Bootcamp – Architecting Highly Available Apps (10 AM – 6 PM).
- August 25 – Eliot Horowitz, CTO and Co-Founder of MongoDB (6:30 PM).
- September 2 – Behind the Scenes with Twilio – SMS for Humans: Using NLP for Better Text Experiences (6:30 PM – 8 PM).
Amazon EMR is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. By using these frameworks and related open-source projects, such as Apache Hive and Apache Pig, you can process data for analytics purposes and business intelligence workloads. First launched in 2009 (Announcing Amazon Elastic MapReduce), we have added comprehensive console support and many, many features since then. Some of the most recent features include:
- Support for S3 encryption (both server-side and client-side).
- Consistent view for the EMR Filesystem (EMRFS).
- Data import, export, and query via the Hive / DynamoDB Connector.
- Enhanced CloudWatch metrics.
Today we are announcing Amazon EMR release 4.0.0, which brings many changes to the platform. This release includes updated versions of Hadoop ecosystem applications and Spark which are available to install on your cluster and improves the application configuration experience. As part of this release we also adjusted some of the ports and paths so as to be in better alignment with several Hadoop and Spark standards and conventions. Unlike the other AWS services which do not emerge in discrete releases and are frequently updated behind the scenes, EMR has versioned releases so that you can write programs and scripts that make use of features that are found only in a particular EMR release or a version of an application found in a particular EMR release.
If you are currently using AMI version 2.x or 3.x, read the EMR Release Guide to learn how to migrate to 4.0.0.
EMR users have access to a number of applications from the Hadoop ecosystem. This version of EMR features the following updates:
- Hadoop 2.6.0 – This version of Hadoop includes a variety of general functionality and usability improvements.
- Hive 1.0 -This version of Hive includes performance enhancements, additional SQL support, and some new security features.
- Pig 0.14 – This version of Pig features a new ORCStorage class, predicate pushdown for better performance, bug fixes, and more.
- Spark 1.4.1 – This release of Spark includes a binding for SparkR and the new Dataframe API, plus many smaller features and bug fixes.
Quick Cluster Creation in Console
You can now create an EMR cluster from the Console using the Quick cluster configuration experience:
Improved Application Configuration Editing
In Amazon EMR AMI versions 2.x and 3.x, bootstrap actions were primarily used to configure applications on your cluster. With Amazon EMR release 4.0.0, we have improved the configuration experience by providing a direct method to edit the default configurations for applications when creating your cluster. We have added the ability to pass a configuration object which contains a list of the configuration files to edit and the settings in those files to be changed. You can create a configuration object and reference it from the CLI, the EMR API, or from the Console. You can store the configuration information locally or in Amazon Simple Storage Service (S3) and supply a reference to it (if you are using the Console, click on Go to advanced options when you create your cluster in order to specify configuration values or to use a configuration file):
To learn more, read about Configuring Applications.
New Packaging System / Standard Ports & Paths
Our release packaging system is now based on Apache Bigtop. This will allow us to add new applications and new applications to EMR even more quickly.
Also, we have moved most ports and paths on EMR release 4.0.0 to open source standards. For more information about these changes read Differences Introduced in 4.x.
Additional EMR Configuration Options for Spark
The EMR team asked me to share a couple of tech tips with you:
Spark on YARN has the ability to dynamically scale the number of executors used for a Spark application. You still need to set the memory (
spark.executor.memory) and cores (
spark.executor.cores) used for an executor in spark-defaults, but YARN will automatically allocate the number of executors to the Spark application as needed. To enable dynamic allocation of executors, set
truein the spark-defaults configuration file. Additionally, the Spark shuffle service is enabled by default in Amazon EMR, so you do not need to enable it yourself.
You can configure your executors to utilize the maximum resources possible on each node in your cluster by setting the
maximizeResourceAllocationoption to true when creating your cluster. You can set this by adding this property to the “spark” classification in your configuration object when creating your cluster. This option calculates the maximum compute and memory resources available for an executor on a node in the core node group and sets the corresponding spark-defaults settings with this information. It also sets the number of executors—by setting
spark.executor.instancesto the initial core nodes specified when creating your cluster. Note, however, that you cannot use this setting and also enable dynamic allocation of executors.
To learn more about these options, read Configure Spark.
All of the features listed above are available now and you can start using them today
If you are new to large-scale data processing and EMR, take a look at our Getting Started with Amazon EMR page. You’ll find a new tutorial video, along with information about training and professional services, all aimed at getting you up and running quickly and efficiently.
If you are tasked with providing and managing user logins to a fleet of Amazon Elastic Compute Cloud (EC2) instances running Linux, I have some good news for you!
You can now join these instances to an AWS Directory Service Simple AD directory and manage credentials for your user logins using standard Active Directory tools and techniques. Your users will be able to log in to all of the instances in the domain using the same set of credentials. You can exercise additional control by creating directory groups.
We have published complete, step-by-step instructions to help you get started. You’ll need to be running a recent version of the Amazon Linux AMI, Red Hat Enterprise Linux, Ubuntu Server, or CentOS on EC2 instances that reside within a Amazon Virtual Private Cloud, and you’ll need to have an AWS Directory Service Simple AD therein.
You simply create a DHCP Options Set for the VPC and point it at the directory, install and configure a Kerberos client, join the instance to the domain, and reboot it. After you have done this you can SSH to it and log in using an identity from the directory. The documentation also shows you how to log in using domain credentials, add domain administrators to the sudo’ers list, and limit access to members of specific groups.
Amazon CloudWatch monitors your cloud resources and applications, including Amazon Elastic Compute Cloud (EC2) instances. You can track cloud, system, and application metrics, see them in graphical form, and arrange to be notified (via a CloudWatch alarm) if they cross a threshold value that you specify. You can also stop, terminate, or recover an EC2 instance when an alarm is triggered (see my blog post, Amazon CloudWatch – Alarm Actions for more information on alarm actions).
New Action – Reboot Instance
Today we are giving you a fourth action. You can now arrange to reboot an EC2 instance when a CloudWatch alarm is triggered. Because you can track and alarm on cloud, system, and application metrics, this new action gives you a lot of flexibility.
You could reboot an instance if an instance status check fails repeatedly. Perhaps the instance has run out of memory due to a runaway application or service that is leaking memory. Rebooting the instance is a quick and easy way to remedy this situation; you can easily set this up using the new alarm action. In contrast to the existing recovery action which is specific to a handful of EBS-backed instance types and is applicable only when the instance state is considered impaired, this action is available on all instance types and is effective regardless of the instance state.
If you are using the CloudWatch API or the AWS Command Line Interface (CLI) to track application metrics, you can reboot an instance if the application repeatedly fails to respond as expected. Perhaps a process has gotten stuck or an application server has lost its way. In many cases, hitting the (virtual) reset switch is a clean and simple way to get things back on track.
Creating an Alarm
Let’s walk through the process of creating an alarm that will reboot one of my instances if the CPU Utilization remains above 90% for an extended period of time. I simply locate the instance in the AWS Management Console, focus my attention on the Alarm Status column, and click on the icon:
Then I click on Take the action, choose Reboot this instance, and set the parameters (90% or more CPU Utilization for 15 minutes in this example):
If necessary, the console will ask me to confirm the creation of an IAM role as part of this step (this is a new feature):
The role will have permission to call the “Describe” functions in the CloudWatch and EC2 APIs. It also has permission to reboot, stop, and terminate instances
I click on Create Alarm and I am all set!
This feature is available now and you can start using it today in all public AWS regions.