AWS Blog

Now Open – AWS Asia Pacific (Mumbai) Region

by Jeff Barr | on | in Announcements | | Comments

We are expanding the AWS footprint again, this time with a new region in Mumbai, India. AWS customers in the area can use the new Asia Pacific (Mumbai) Region to better serve end users in India.

New Region
The new Mumbai region has two Availability Zones, raising the global total to 35. It supports Amazon Elastic Compute Cloud (EC2) (C4, M4, T2, D2, I2, and R3 instances are available) and related services including Amazon Elastic Block Store (EBS), Amazon Virtual Private Cloud, Auto Scaling, and  Elastic Load Balancing.

It also supports the following services:

There are now three edge locations (Mumbai, Chennai, and New Delhi) in India. The locations support Amazon Route 53, Amazon CloudFront, and S3 Transfer Acceleration. AWS Direct Connect support is available via our Direct Connect Partners (listed below).

This is our thirteenth region (see the AWS Global Infrastructure map for more information). As usual, you can see the list of regions in the region menu of the Console:

Customers
There are over 75,000 active AWS customers in India, representing a diverse base of industries. In the time leading up to today’s launch, we have provided some of these customers with access to the new region in preview form. Two of them (Ola Cabs and NDTV) were kind enough to share some of their experience and observations with us:

Ola Cabs’ mobile app leverages AWS to redefine point-to-point transportation in more than 100 cities across India. AWS allows OLA to constantly innovate faster with new features and services for their customers, without compromising on availability or the customer experience of their service. Ankit Bhati (CTO and Co-Founder) told us:

We are using technology to create mobility for a billion Indians, by giving them convenience and access to transportation of their choice. Technology is a key enabler, where we use AWS to drive supreme customer experience, and innovate faster on new features & services for our customers. This has helped us reach 100+ cities & 550K driver partners across India. We do petabyte scale analytics using various AWS big data services and deep learning techniques, allowing us to bring our driver-partners close to our customers when they need them. AWS allows us to make 30+ changes a day to our highly scalable micro-services based platform consisting of 100s of low latency APIs, serving millions of requests a day. We have tried the AWS India region. It is great and should help us further enhance the experience for our customers.


NDTV, India’s leading media house is watched by millions of people across the world. NDTV has been using AWS since 2009 to run their video platform and all their web properties. During the Indian general elections in May 2014, NDTV fielded an unprecedented amount of web traffic that scaled 26X from 500 million hits per day to 13 billion hits on Election Day (regularly peaking at 400K hits per second), all running on AWS.  According to Kawaljit Singh Bedi (CTO of NDTV Convergence):

NDTV is pleased to report very promising results in terms of reliability and stability of AWS’ infrastructure in India in our preview tests. Based on tests that our technical teams have run in India, we have determined that the network latency from the AWS India infrastructure Region are far superior compared to other alternatives. Our web and mobile traffic has jumped by over 30% in the last year and as we expand to new territories like eCommerce and platform-integration we are very excited on the new AWS India region launch. With the portfolio of services AWS will offer at launch, low latency, great reliability, and the ability to meet regulatory requirements within India, NDTV has decided to move these critical applications and IT infrastructure all-in to the AWS India region from our current set-up.

 


Here are some of our other customers in the region:

Tata Motors Limited, a leading Indian multinational automotive manufacturing company runs its telematics systems on AWS. Fleet owners use this solution to monitor all vehicles in their fleet on a real time basis. AWS has helped Tata Motors become to more agile and has increased their speed of experimentation and innovation.

redBus is India’s leading bus ticketing platform that sells their tickets via web, mobile, and bus agents. They now cover over 67K routes in India with over 1,800 bus operators. redBus has scaled to sell more than 40 million bus tickets annually, up from just 2 million in 2010. At peak season, there are over 100 bus ticketing transactions every minute. The company also recently developed a new SaaS app on AWS that gives bus operators the option of handling their own ticketing and managing seat inventories. redBus has gone global expanding to new geographic locations such as Singapore and Peru using AWS.

Hotstar is India’s largest premium streaming platform with more than 85K hours of drama and movies and coverage of every major global sporting event. Launched in February 2015, Hotstar quickly became one of the fastest adopted new apps anywhere in the world. It has now been downloaded by more than 68M users and has attracted followers on the back of a highly evolved video streaming technology and high attention to quality of experience across devices and platforms.

Macmillan India has provided publishing services to the education market in India for more than 120 years. Prior to using AWS, Macmillan India has moved its core enterprise applications — Business Intelligence (BI), Sales and Distribution, Materials Management, Financial Accounting and Controlling, Human Resources and a customer relationship management (CRM) system from an existing data center in Chennai to AWS. By moving to AWS, Macmillan India has boosted SAP system availability to almost 100 percent and reduced the time it takes them to provision infrastructure from 6 weeks to 30 minutes.

Partners
We are pleased to be working with a broad selection of partners in India. Here’s a sampling:

  • AWS Premier Consulting Partners – BlazeClan Technologies Pvt. Limited, Minjar Cloud Solutions Pvt Ltd, and Wipro.
  • AWS Consulting Partners – Accenture, BluePi, Cloudcover, Frontier, HCL, Powerupcloud, TCS, and Wipro.
  • AWS Technology Partners – Freshdesk, Druva, Indusface, Leadsquared, Manthan, Mithi, Nucleus Software, Newgen, Ramco Systems, Sanovi, and Vinculum.
  • AWS Managed Service Providers – Progressive Infotech and Spruha Technologies.
  • AWS Direct Connect Partners – AirTel, Colt Technology Services,  Global Cloud Xchange, GPX, Hutchison Global Communications, Sify, and Tata Communications.

Amazon Offices in India
We have opened six offices in India since 2011 – Delhi, Mumbai, Hyderabad, Bengaluru, Pune, and Chennai. These offices support our diverse customer base in India including enterprises, government agencies, academic institutions, small-to-mid-size companies, startups, and developers.

Support
The full range of AWS Support options (Basic, Developer, Business, and Enterprise) is also available for the Mumbai Region. All AWS support plans include an unlimited number of account and billing support cases, with no long-term contracts.

Compliance
Every AWS region is designed and built to meet rigorous compliance standards including ISO 27001, ISO 9001, ISO 27017, ISO 27018, SOC 1, SOC 2, and PCI DSS Level 1 (to name a few). AWS implements an information Security Management System (ISMS) that is independently assessed by qualified third parties. These assessments address a wide variety of requirements which are communicated to customers by making certifications and audit reports available, either on our public-facing website or upon request.

To learn more; take a look at the AWS Cloud Compliance page and our Data Privacy FAQ.

Use it Now
This new region is now open for business and you can start using it today! You can find additional information about the new region, documentation on how to migrate, customer use cases, information on training and other events, and a list of AWS Partners in India on the AWS site.

We have set up a seller of record in India (known as AISPL); please see the AISPL customer agreement for details.

Jeff;

 

New – Cross-Account Copying of Encrypted EBS Snapshots

by Jeff Barr | on | in Amazon Elastic Block Store, Security | | Comments

AWS already supports the use of encrypted Amazon Elastic Block Store (EBS) volumes and snapshots, with keys stored in and managed by AWS Key Management Service (KMS). It also supports copying of EBS snapshots with other AWS accounts so that they can be used to create new volumes. Today we are joining these features to give you the ability to copy encrypted EBS snapshots between accounts, with the flexibility to move between AWS regions as you do so.

This announcement builds on three important AWS best practices:

  1. Take regular backups of your EBS volumes.
  2. Use multiple AWS accounts, one per environment (dev, test, staging, and prod).
  3. Encrypt stored data (data at rest), including backups.

Encrypted EBS Volumes & Snapshots
As a review, you can create an encryption key using the IAM Console:

And you can create an encrypted EBS volume by specifying an encryption key (you must use a custom key if you want to copy a snapshot to another account):

Then you can create an encrypted snapshot from the volume:

As you can see, I have already enabled the longer volume and snapshot IDs for my AWS account (read They’re Here – Longer EBS and Storage Gateway Resource IDs Now Available for more information).

Cross-Account Copying
None of what I have shown you so far is new. Let’s move on to the new part! To create a copy of the encrypted EBS snapshot in another account you need to complete four simple steps:

  1. Share the custom key associated with the snapshot with the target account.
  2. Share the encrypted EBS snapshot with the target account.
  3. In the context of the target account, locate the shared snapshot and make a copy of it.
  4. Use the newly created copy to create a new volume.

You will need the target account number in order to perform the first two steps. Here’s how you share the custom key with the target account from within the IAM Console:

Then you share the encrypted EBS snapshot. Select it and click on Modify Permissions:

Enter the target account number again and click on Save:

Note that you cannot share the encrypted snapshots publicly.

Before going any further I should say a bit about permissions! Here’s what you need to know in order to set up your policies and/or roles:

Source Account – The IAM user or role in the source account needs to be able to call the ModifySnapshotAttribute function and to perform the DescribeKey and ReEncypt operations on the key associated with the original snapshot.

Target Account – The IAM user or role in the target account needs to be able perform the DescribeKey, CreateGrant, and Decrypt operations on the key associated with the original snapshot. The user or role must also be able to perform the CreateGrant, Encrypt, Decrypt, DescribeKey, and GenerateDataKeyWithoutPlaintext operations on the key associated with the call to CopySnapshot.

With that out of the way, let’s copy the snapshot…

Switch to the target account, visit the Snapshots tab, and click on Private Snapshots. Locate the shared snapshot via its Snapshot ID (the name is stored as a tag and is not copied), select it, and choose the Copy action:

Select an encryption key for the copy of the snapshot and create the copy (here I am copying my snapshot to the Asia Pacific (Tokyo) Region):

Using a new key for the copy provides an additional level of isolation between the two accounts. As part of the copy operation, the data will be re-encrypted using the new key.

Available Now
This feature is available in all AWS Regions where AWS Key Management Service (KMS) is available. It is designed for use with data & root volumes and works with all volume types, but cannot be used to share encrypted AMIs at this time. You can use the snapshot to create an encrypted boot volume by copying the snapshot and then registering it as a new image.

Jeff;

 

Guest Post – Zynga Gets in the Game with Amazon Aurora

by Jeff Barr | on | in Amazon Aurora, Guest Post | | Comments

Long-time AWS customer Zynga is making great use of Amazon Aurora and other AWS database services. In today’s guest post you can learn about how they use Amazon Aurora to accommodate spikes in their workload. This post was written by Chris Broglie of Zynga.

Jeff;

Zynga has long operated various database technologies, ranging from simple in-memory caches like Memcached, to persistent NoSQL stores like Redis and Membase, to traditional relational databases like MySQL. We loved the features these technologies offered, but running them at scale required lots of manual time to recover from instance failure and to script and monitor mundane but critical jobs like backup and recovery. As we migrated from our own private cloud to AWS in 2015, one of the main objectives was to reduce the operational burden on our engineers by embracing the many managed services AWS offered.

We’re now using Amazon DynamoDB and Amazon ElastiCache (Memcached and Redis) widely in place of their self-managed equivalents. Now, engineers are able to focus on application code instead of managing database tiers, and we’ve improved our recovery times from instance failure (spoiler alert: machines are better at this than humans). But the one component missing here was MySQL. We loved the automation Amazon RDS for MySQL offers, but it relies on general-purpose Amazon Elastic Block Store (EBS) volumes for storage. Being able to dynamically allocate durable storage is great, but you trade off having to send traffic over the network, and traditional databases suffer from this additional latency. Our testing showed that the performance of RDS for MySQL just couldn’t compare to what we could obtain with i2 instances and their local (though ephemeral) SSDs. Provisioned IOPS narrow the gap, but they cost more. For these reasons, we used self-managed i2 instances wherever we had really strict performance requirements.

However, for one new service we were developing during our migration, we decided to take a measured bet on Amazon Aurora. Aurora is a MySQL-compatible relational database offered through Amazon RDS. Aurora was only in preview when we started writing the service, but it was expected to become generally available before production, and we knew we could always fall back to running MySQL on our own i2 instances. We were naturally apprehensive of any new technology, but we had to see for ourselves if Aurora could deliver on its claims of exceeding the performance of MySQL on local SSDs, while still using network storage and providing all the automation of a managed service like RDS. And after 8 months of production, Aurora has been nothing short of impressive. While our workload is fairly modest – the busiest instance is an r3.2xl handling ~9k selects/second during peak for a 150 GB data set – we love that so far Aurora has delivered the necessary performance without any of the operational overhead of running MySQL.

An example of what this kind of automation has enabled for us was an ops incident where a change in traffic patterns resulted in a huge load spike on one of our Aurora instances. Thankfully, the instance was able to keep serving traffic despite 100% CPU usage, but we needed even more throughput. With Aurora we were able to scale up the reader to an instance that was 4x larger, failover to it, and then watch it handle 4x the traffic, all with just a few clicks in the RDS console. And days later after we released a patch to prevent the incident from recurring, we were able to scale back down to smaller instances using the same procedure. Before Aurora we would have had to either get a DBA online to manually provision, replicate, and failover to a larger instance, or try to ship a code hotfix to reduce the load on the database. Manual changes are always slower and riskier, so Aurora’s automation is a great addition to our ops toolbox, and in this case it led to a resolution measured in minutes rather than hours.

Most of the automation we’re enjoying has long been standard for RDS, but using Aurora has delivered the automation of RDS along with the performance of self-managed i2 instances. Aurora is now our first choice for new services using relational databases.

Chris Broglie, Architect (Zynga)

 

New – AWS Marketplace for the U.S. Intelligence Community

by Jeff Barr | on | in AWS Marketplace | | Comments

AWS built and now runs a private cloud for the United States Intelligence Community.

In order to better meet the needs of this unique community, we have set up an AWS Marketplace designed specifically for them. Much like the existing AWS Marketplace, this new marketplace makes it easy to discover, buy, and deploy software packages and applications, with a focus on products in the Big Data, Analyics, Cloud Transition Support, DevOps, Geospatial, Information Assurance, and Security categories.

Selling directly to the Intelligence Community can be a burdensome process that limits the Intelligence Community’s options when purchasing software. Our goal is to give the Intelligence Community as broad a selection of software as possible, so we are working to help our AWS Marketplace sellers through the onboarding process so that the Intelligence Community can benefit from use of their software.

If you are an Amazon Marketplace Seller and have products in one of the categories above, listing your product in the AWS Marketplace for the Intelligence Community has some important benefits to your ISV or Authorized Reseller business:

Access – You get to reach a new market that may not have been visible or accessible to you.

Efficiency – You get to bypass the contract negotiation that is otherwise a prerequisite to selling to the US government. With no contract negotiation to contend with, you’ll have less business overhead.

To the greatest extent possible, we hope to make the products in the AWS Marketplace also available in the AWS Marketplace for the U.S. Intelligence Community. In order to get there, we are going to need your help!

Come on Board
Completing the steps necessary to make products available in the AWS Marketplace for the U.S. Intelligence Community can be challenging due to security and implementation requirements. Fortunately, the AWS team is here to help; here are the important steps:

  1. Have your company and your products listed commercially in AWS Marketplace if they are not already there.
  2. File for FOCI (Foreign Ownership, Control and Influence) approval and sign the AWS Marketplace IC Marketplace Publisher Addendum.
  3. Ensure your product will work in the Commercial Cloud Services (C2S) environment. This includes ensuring that your software does not make any calls outside to the public internet.
  4. Work with AWS to publish your software on the AWS Marketplace for the U.S. Intelligence Community. You will be able to take advantage of your existing knowledge of AWS and your existing packaging tools and processes that you use to prepare each release of your product for use in AWS Marketplace.

Again, we are here to help! After completing step 1, email us (icmp@amazon.com). We’ll help with the paperwork and the security and do our best to get you going as quickly as possible. To learn more about this process, read my colleague Kate Miller’s new post, AWS Marketplace for the Intelligence Community, on the AWS Partner Network Blog.

Jeff;

New AWS Competency – AWS Migration

by Jeff Barr | on | in AWS Partner Network | | Comments

More and more, I hear from customers who want to migrate large-scale workloads to AWS, and seek advice regarding their cloud migration strategy. We provide customers with a number of cloud migration tools and services, including the AWS Database Migration Service, and resources, such as the AWS Professional Services Cloud Adoption Framework. Further, we have a strong and mature ecosystem of AWS Partner Network (APN) Consulting and Technology Partners who’ve demonstrated expertise in helping customers like you successfully migrate to AWS.

In an effort to make it as easy as we can for you to identify APN Partners who’ve demonstrated technical proficiency and proven customer success in migration, I’m pleased to announce the launch of the AWS Migration Competency.

New Migration Competency – Migration Partner Solutions
Migration Competency Partners provide solutions or have deep experience helping businesses move successfully to AWS, through all phases of complex migration projects, discovery, planning, migration and operations.

The AWS Partner Competency Program has validated that the partners below have demonstrated that they can help enterprise customers migrate applications and legacy infrastructure to AWS.

Categories and Launch Partners

Migration Delivery Partners – Help customers through every stage of migration, accelerating results by providing personnel, tools, and education in the form of professional services. These partners either are, or have a relationship with an AWS audited Managed Service Provider to help customers with ongoing support of AWS workloads. Here are the launch partners:

Migration Consulting Partners – Provide expertise and training to help enterprises quickly develop specific capabilities or achieve specific outcomes. They provide consulting services to enable adoption of DevOps practices, to modernize applications, and implement solutions. Here are our launch partners:

Migration Technology for Discovery & Planning – Discover IT assets across your application portfolio, identify dependencies and requirements, and build your comprehensive migration plan with these technology solutions. Here are our launch partners:

Migration Technology for Workload Mobility – Execute migrations to AWS by capturing your host server, configuration, storage, and network states, then provision and configure your AWS target resources. Here are our launch partners:

Migration Technology for Application Profiling – Gain valuable insights into your applications by capturing and analyzing performance data, usage, and monitoring dependencies before and after migration. Here are our launch partners:

Launch Partners in Action
Do you want to hear from a few of our launch Partners? Visit the videos below to hear Cloud Technology Partners (CTP), REAN Cloud, and Slalom discuss the evolution of enterprise cloud migrations, and the value of the AWS Migration Competency for customers:

Cloud Technology Partners – The Evolution of Cloud Migration


 

REAN Cloud – the Role of DevOps in Cloud Migrations


 

Slalom – The Value of the AWS Migration Competency for Customers

Jeff;

 

Semi-Autonomous Driving Using EC2 Spot Instances at Mapbox

by Jeff Barr | on | in Customer Success, EC2 Spot Instances, Guest Post | | Comments

Will White of Mapbox shared the following guest post with me. In the post, Will describes how they use EC2 Spot Instances to economically process the billions of data points that they collect each day.

I do have one note to add to Will’s excellent post. We know that many AWS customers would like to create Spot Fleets that automatically scale up and down in response to changes in demand. This is on our near-term roadmap and I’ll have more to say about it before too long.

Jeff;

The largest automotive tech conference, TU-Automotive, kicked off in Detroit this morning with almost every conversation focused on strategies for processing the firehose of data coming off connected cars. The volume of data is staggering – last week alone we collected and processed over 100 million miles of sensor data into our maps.

Collecting Street Data
Rather than driving a fleet of cars down every street to make a map, we turn phones, cars, and other devices into a network of real-time sensors. EC2 Spot Instances process the billions of points we collect each day and let us see every street, analyze the speed of traffic, and connect the entire road network. This anonymized and aggregated data protects user privacy while allowing us to quickly detect road changes. The result is Mapbox Drive, the map built specifically for semi-autonomous driving, ride sharing, and connected cars.

Bidding for Spot Capacity
We use the Spot market to bid on spare EC2 instances, letting us scale our data collection and processing at 1/10th the cost. When you launch an EC2 Spot instance you set a bid price for how much you are willing to pay for the instance. The market price (the price you actually pay) constantly changes based on supply and demand in the market. If the market price ever exceeds your bid price, your EC2 Spot instance is terminated. Since spot instances can spontaneously terminate, they have become a popular cost-saving tool for non-critical environments like staging, QA, and R&D – services that don’t require high availability. However, if you can architect your application to handle this kind of sudden termination, it becomes possible to run extremely resource-intensive services on spot and save a massive amount of money while maintaining high availability.

The infrastructure that processes the 100 million miles of sensor data we collect each week is critical and must always be online, but it uses EC2 Spot Instances. We do it by running two Auto Scaling groups, a Spot group and an On-Demand group, that share a single Elastic Load Balancer. When Spot prices spike and instances get terminated, we simply fallback by automatically launching On-Demand instances to pick up the slack.

Handling Termination Notices
We use termination notices, which give us a two-minute warning before any EC2 Spot instance is terminated. When an instance receives a termination notice it immediately makes a call to the Auto Scaling API to scale up the On-Demand Auto Scaling group, seamlessly adding stable capacity to the Elastic Load Balancer. We have to pay On-Demand prices for the replacement EC2s, but only for as long as the Spot interruption lasts. When the Spot market price falls back below our bid price, our Spot Auto Scaling group will automatically launch new Spot instances. As the Spot capacity scales back up, an aggressive Auto Scaling policy scales down the On-Demand group, terminating the more expensive instances.

Building our data processing pipeline on Spot worked so well that we have now moved nearly every Mapbox service over to Spot too. As the traffic done by over 170 million unique users of apps like Foursquare, MapQuest, and Weather.com grows each month, our cost of goods sold (COGS) continues to fall. Spot interruptions are relatively rare for the instance types we use so the fallback is only triggered a 1-2 times per month. This means we are running on discounted Spot instances more than 98% of the time. On our maps service alone, this has resulted in an 90% savings on our EC2 costs each month.

Going Further with Spot
To further optimize our COGS we’re working on a “waterfall” approach to fallback, pushing traffic to other configurations of Spot Instances first and only using On-Demand as an absolute last resort. For example, an application that normally runs on c4.xlarge instances, is often compatible with other instance sizes in the same family (c4.2xlarge, c4.4xlarge, etc) and instance types in other families (m4.2xlarge, m4.4xlarge, etc). When our Spot EC2s get terminated, we’ll bid on the next cheapest option on the Spot market. This will result in more Spot interruptions, but our COGS decrease further because we’ll fallback on Spot instances instead of paying full price for On-Demand EC2 instances. This maximizes our COGS savings while maintaining high availability for our enterprise customers.

It’s worth noting that similar fallback functionality is built into EC2 Spot Fleet, but we prefer Auto Scaling groups due to a few limitations with Spot Fleet (for example, there’s no support for Auto Scaling without implementing it yourself) and because Auto Scaling groups give us the most flexibility.

Over the last 12 months, data collection and processing has increased our consumption of EC2 compute hours by 1044%, but our COGS actually decreased. We used to see our costs increase linearly with consumption, but now see these hockey stick increases in consumption while costs stay basically flat for the same period.

If you’re building a resource-hungry application that requires high availability and the costs for On-Demand EC2 instances make it unsustainable to run, take a close look at EC2 Spot Instances. Combined with the right architecture and some creative orchestration, EC2 Spot Instances will allow you to run your application with extremely low COGS.

Will White, Development Engineering, Mapbox

Amazon EMR 4.7.0 – Apache Tez & Phoenix, Updates to Existing Apps

by Jeff Barr | on | in Amazon EMR | | Comments

Amazon EMR allows you to quickly and cost-effectively process vast amounts of data. Since the 2009 launch, we have added many new features and support for an ever-increasing roster of applications from the Hadoop ecosystem. Here are a few of the additions that we have made this year:

Today we are pushing forward once again, with new support for Apache Tez (dataflow-driven data processing task orchestration) and Apache Phoenix (fast SQL for OLTP and operational analytics), along with updates to several of the existing apps. In order to make use of these new and/or updated applications, you will need to launch a cluster that runs release 4.7.0 of Amazon EMR.

New – Apache Tez (0.8.3)
Tez runs on top of Apache Hadoop YARN. Tez provides you with a set of dataflow definition APIs that allow you to define a DAG (Directed Acyclic Graph) of data processing tasks. Tez can be faster than Hadoop MapReduce, and can be used with both Hive and Pig. To learn more, read the EMR Release Guide. The Tez UI includes a graphical view of the DAG:

The UI also displays detailed information about each DAG:

New – Apache Phoenix (4.7.0)
Phoenix uses HBase (another member of the Hadoop ecosystem) as its datastore. You can connect to Phoenix using a JDBC driver included on the cluster or from other applications that are running on or off of the cluster. Either way, you get access to fast, low-latency SQL with full ACID transaction capabilities. Your SQL queries are compiled into a series of HBase scans, the scans are run in parallel, and the results are aggregated to produce the result set.  To learn more, read the Phoenix Quick Start Guide or review the Apache Phoenix Overview Presentation.

Updated Applications
We have also updated the following applications:

  • HBase 1.2.1 – HBase provides low-latency, random access to massive datasets. The new version includes some bug fixes.
  • Mahout 0.12.0 – Mahout provides scalable machine learning and data mining.  The new version includes a large set of math and statistics features.
  • Presto 0.147 – Presto is a distributed SQL query engine designed for large data sets. The new version adds features and fixes bugs.

Amazon Redshift JDBC Driver
You can use the new Redshift JDBC driver to allow applications running on your EMR clusters to access and update data stored in your Redshift clusters. Two versions of the driver are included on your cluster:

  • JDBC 4.0-compatible/usr/share/aws/redshift/jdbc/RedshiftJDBC4.jar.
  • JDBC 4.1-compatible/usr/share/aws/redshift/jdbc/RedshiftJDBC41.jar.

To start using the new and applications, simply launch a new EMR cluster, and select release 4.7.0 along with the desired set of applications.

Jeff;

Learn about Amazon Redshift in our new Data Warehousing on AWS Class

by Jeff Barr | on | in Amazon Redshift, Big Data, Training and Certification | | Comments

As our customers continue to look to use their data to help drive their missions forward, finding a way to simply and cost-effectively make use of analytics is becoming increasingly important. That is why I am happy to announce the upcoming availability of Data Warehousing on AWS, a new course that helps customers leverage the AWS Cloud as a platform for data warehousing solutions.

New Course
Data Warehousing on AWS is a new three-day course that is designed for database architects, database administrators, database developers, and data analysts/scientists.  It introduces you to concepts, strategies, and best practices for designing a cloud-based data warehousing solution using Amazon Redshift. This course demonstrates how to collect, store, and prepare data for the data warehouse by using other AWS services such as Amazon DynamoDB, Amazon EMR, Amazon Kinesis, and Amazon S3. Additionally, this course demonstrates how you can use business intelligence tools to perform analysis on your data. Organizations who are looking to get more out of their data by implementing a Data Warehousing solution or expanding their current Data Warehousing practice are encouraged to sign up.

These classes (and many more) are available through AWS and our Trainintg Partners. Find upcoming classes in our global training schedule or learn more at AWS Training.

Jeff;

 

New – Cross-Region Read Replicas for Amazon Aurora

by Jeff Barr | on | in Amazon Aurora, Amazon RDS | | Comments

You already have the power to scale the read capacity of your Amazon Aurora instances by adding additional read replicas to an existing cluster. Today we are giving you the power to create a read replica in another region. This new feature will allow you to support cross-region disaster recovery and to scale out reads. You can also use it to migrate from one region to another or to create a new database environment in a different region.

Creating a read replica in another region also creates an Aurora cluster in the region. This cluster can contain up to 15 more read replicas, with very low replication lag (typically less than 20 ms) within the region (between regions, latency will vary based on the distance between the source and target). You can use this model to duplicate your cluster and read replica setup across regions for disaster recovery. In the event of a regional disruption, you can promote the cross-region replica to be the master. This will allow you to minimize downtime for your cross-region application. This feature applies to unencrypted Aurora clusters.

Before you get actually create the read replica, you need to take care of a pair of prerequisites. You need to make sure that a VPC and the Database Subnet Groups exist in the target region, and you need to enable binary logging on the existing cluster.

Setting up the VPC
Because Aurora always runs within a VPC, ensure that the VPC and the desired Database Subnet Groups exist in the target region. Here are mine:

Enabling Binary Logging
Before you can create a cross region read replica, you need to enable binary logging for your existing cluster. Create a new DB Cluster Parameter Group (if you are not already using a non-default one):

Enable binary logging (choose MIXED) and then click on Save Changes:

Next, Modify the DB Instance, select the new DB Cluster Parameter Group, check Apply Immediately, and click on Continue. Confirm your modifications, and then click on Modify DB Instance to proceed:

Select the instance and reboot it, then wait until it is ready.

Create Read Replica
With the prerequisites out of the way it is time to create the read replica! From within the AWS Management Console, select the source cluster and choose Create Cross Region Read Replica from the Instance Actions menu:

Name the new cluster and the new instance, and then pick the target region. Choose the DB Subnet Group and set the other options as desired, then click Create:

Aurora will create the cluster and the instance. The state of both items will remain at creating until the items have been created and the data has been replicated (this could take some time, depending on amount of data stored in the existing cluster.

This feature is available now and you can start using it today!

Jeff;

 

New in AWS Marketplace: Alces Flight – Effortless HPC on Demand

by Jeff Barr | on | in AWS Marketplace, HPC | | Comments

In the past couple of years, academic and corporate researchers have begun to see the value of the cloud. Faced with a need to run demanding jobs and to deliver meaningful results as quickly as possible while keeping costs under control, they are now using AWS to run a wide variety of compute-intensive, highly parallel workloads.

Instead of fighting for time on a cluster that must be shared with other researchers, they accelerate their work by launching clusters on demand, running their jobs, and then shutting the cluster down shortly thereafter, paying only for the resources that they consume. They replace tedious RFPs, procurement, hardware builds and acceptance testing with cloud resources that they can launch in minutes. As their needs grow, they can scale the existing cluster or launch a new one.

This self-serve, cloud-based approach favors science over servers and accelerates the pace of research and innovation. Access to shared, cloud-based resources can be granted to colleagues located on the same campus or halfway around the world, without having to worry about potential issues at organizational or network boundaries.

Alces Flight in AWS Marketplace
Today we are making Alces Flight available in AWS Marketplace. This is a fully-featured HPC environment that you can launch in a matter of minutes. It can make use of On-Demand or Spot Instances and comes complete with a job scheduler and hundreds of HPC applications that are all set up and ready to run. Some of the applications include built-in collaborative features such as shared graphical views. For example, here’s the Integrative Genomics Viewer (IGV):

Each cluster is launched into a Virtual Private Cloud (VPC) with SSH and graphical desktop connectivity. Clusters can be of fixed size, or can be Auto Scaled in order to meet changes in demand.  Once launched, the cluster looks and behaves just like a traditional Linux-powered HPC cluster, with shared NFS storage and passwordless SSH access to the compute nodes. It includes access to HPC applications, libraries, tools, and MPI suites.

We are launching Alces Flight in AWS Marketplace today. You can launch a small cluster (up to 8 nodes) for evaluation and testing or a larger cluster for research.

If you subscribe to the product, you can download the AWS CloudFormation template from the Alces site. This template powers all of the products, and is used to quickly launch all of the AWS resources needed to create the cluster.

EC2 Spot Instances give you access to spare AWS capacity at up to a 90% discount from On-Demand pricing and can significantly reduce your cost per core. You simply enter the maximum bid price that you are willing to pay for a single compute node; AWS will manage your bid, running the nodes when capacity is available at the desired price point.

Running Alces Flight
In order to get some first-hand experience with Alces Flight, I launched a cluster of my own. Here are the settings that I used:

I set a tag for all of the resources in the stack as follows:

I confirmed my choices and gave CloudFormation the go-ahead to create my cluster. As expected, the cluster was all set up and ready to go within 5 minutes. Here are some of the events that were logged along the way:

Then I SSH’ed in to the login node and saw the greeting, all as expected:

After I launched my cluster I realized that this post would be more interesting if I had more compute nodes in my cluster. Instead of starting over, I simply modified my CloudFormation stack to have 4 nodes instead of 1, applied the change, and watched as the new nodes came online. Since I specified the use of Spot Instances when I launched the cluster, Auto Scaling placed bids automatically. Once the nodes were online I was able to locate them from within my PuTTY session:

Then I used the pdsh (Parallel Distributed Shell command) to check on the up-time of each compute node:

Learn More
This barely counts as scratching the surface; read Getting Started as Quickly as Possible to learn a lot more about what you can do! You should also watch one or more of the Alces videos to see this cool new product in action.

If you are building and running data-intensive HPC applications on AWS, you may also be interested in another Marketplace offering. The BeeGFS (self-supported or support included) parallel file system runs across multiple EC2 instances, aggregating the processing  power into a single namespace, with all data stored on EBS volumes.  The self-supported product is also available on a 14 day free trial. You can create a cluster file system using BeeGFS and then use it as part of your Alces cluster.

Jeff;