AWS Blog

They’re Here – Longer EBS and Storage Gateway Resource IDs Now Available

by Jeff Barr | on | in Amazon EC2, Amazon Elastic Block Store, AWS Storage Gateway | | Comments

Last November I let you know that were were planning to increase the length of the resource IDs for EC2 instances, reservations, EBS volumes, and snapshots in 2016. Early this year I showed you how to opt in to the new format for EC2 instances and EC2 reservations.

Effective today you can now opt in to the new format for volumes and snapshots for EBS and Storage Gateway.

As I said earlier:

If you build libraries, tools, or applications that make direct calls to the AWS API, now is the time to opt in and to start your testing process! If you store the IDs in memory or in a database, take a close look at fixed-length fields, data structures, schema elements, string operations, and regular expressions. Resources that were created before you opt in will retain their existing short identifiers; be sure that your revised code can still handle them!

You can opt in to the new format using the AWS Management Console, the AWS Command Line Interface (CLI), the AWS Tools for Windows PowerShell, or by calling the ModifyIdFormat API function.

Opting In – Console
To opt in via the Console, simply log in, choose EC2, and click on Resource ID length management:

Then click on Use Longer IDs for the desired resource types:

Note that volume applies to EBS volumes and to Storage Gateway volumes and that snapshot applies to EBS snapshots (both direct and through Storage Gateway).

For information on using the AWS Command Line Interface (CLI) or the AWS Tools for Windows PowerShell, take a look at They’re Here – Longer EC2 Resource IDs Now Available.

Things to Know
Here are a couple of things to keep in mind as you transition to the new resource IDs:

  1. Some of the older versions of the AWS SDKs and CLIs are not compatible with the new format. Visit the Longer EC2 and EBS Resource IDs FAQ for more information on compatibility.
  2. New AWS Regions get longer instance, reservation, volume, and snapshot IDs by default. You can opt out for Regions that launch between now and December 2016.
  3. Starting on April 28, 2016, new accounts in all commercial regions except Beijing (China) and AWS GovCloud (US) will get longer instance and reservation IDs by default, again with the ability to opt out.
Jeff;

 

 

Autheos – At the Nexus of Marketing and E-Commerce

by Jeff Barr | on | in Customer Success, Guest Post | | Comments

In today’s guest post, Leon Mergen, CTO of Autheos, reviews their company history and their move to AWS.

Jeff;

Adding video to a product page on an e-commerce site is perhaps the single most effective way to drive increased sales — studies have shown sales conversion rates can go up by more than two thirds. In addition, product video viewing data fills a gaping hole in a brand’s / supplier’s ability to assess the effectiveness of their online and offline marketing efforts at driving e-commerce sales. We had built an OK product video distribution platform… but we knew we couldn’t scale globally with the technology we were using. So, in September last year, we decided to transition to AWS, and, while doing so built an e-commerce marketing support tool for Brands which, judging by customer response, is a game changer. This is our story.

The Perils of Good Fortune
Autheos was founded in 2012 when the biggest Webshop in Holland and Belgium asked us to turn an existing piece of technology into a video hosting solution that would automatically find and insert product videos into their product sales pages.  A startup rarely finds itself in a better position to start, so we jumped right in and started coding.  Which was, in retrospect, a mistake for two reasons.

For one thing, we grew too fast.  When you have a great client that really wants your product, the natural reaction is to build it as fast as you can.  So, since there wasn’t a team in place, we (too) quickly on-boarded engineers and outsourced several components to remote development shops, which resulted in classic issues of communication problems and technical incompatibilities.

More importantly, however, since we already had an existing piece of technology, we didn’t take the time to think how we would build it if we were starting from scratch.  It seemed like it would be quicker to adapt it to the new requirements.  And kind of like a home-owner who opts for renovation instead of tear-down and rebuild, we had to make all sorts of compromises as a result.

However, thanks to many all-nighters we managed to meet the deadline and launch a platform that allowed brands such as Philips, LEGO, L’Oreal, and Bethesda to upload product videos (commercials, guides, reviews, and so forth) for free and tag them with a product code and language.

The webshops integrated a small piece of javascript code that enabled them to query our video database in real-time with a product code and language, display a custom button if a video was found, and pop up the right videos(s) for the product, in the desired language.

Click here to see an example video on Bol.com (the biggest webshop in Benelux); our video is behind the button.

The results: less work for the webshop (no more manual gathering of videos, decoding/encoding, hosting and matching them with the right products) and more sales. Our client convinced its Brands to start uploading their videos, and kickstarted our exponential growth. Soon we had so many Brands using our platform, and so many videos in our database, that nearly all major webshops in Benelux wanted to work with us as well (often pushed to do so by Brands, who didn’t want the hassle of interfacing / integrating with many different webshops).

This might sound great, but remember how we built the product in a rush with legacy code?  After three years of fire-fighting, interspersed with frequent moments of disbelief when we found out that certain features we wanted to offer were impossible due to limitations in our backend, we decided enough was enough… it was time to start over.

A New Beginning with AWS
Our key requirements were that we needed to seamlessly scale globally, log and process all of our data, and provide high performance access to our ever growing database of product videos. Besides this, we needed to make sure we could ship new features and products quickly without impacting wider operations. Oh, and we wanted to be up and running with the new platform in 6 months. As the de-facto standard for web applications, the choice of AWS was an easy one. However, we soon realized that it wasn’t just an easy decision, it was a really smart one too.

Elastic Transcoder was the main reason for us to decide to go with AWS. Before working with ET, we used a custom transcoding service that had been built by an outsourced company in Eastern Europe. As a result of hosting the service there on antiquated servers, the transcoding service suffered from lots of downtime, and caused many headaches. Elastic Transcoder allows us to forget about all these problems, and gives us stable transcoding service which we can scale on-demand.

When we moved our application servers to AWS, we also activated Amazon CloudFront. This was a no-brainer for us even though there are many other CDNs available, as CloudFront integrates unbelievably well within AWS. Essentially it just worked. With a few clicks we were able to build a transcoding pipeline that directly uploads its result to CloudFront. We make a single API call, and AWS takes care of the rest, including CDN hosting. It’s really that easy.

As we generate a huge number of log records every day, we had to make sure these were stored in a flexible and scalable environment. A regular PostgreSQL server would have worked, however, this would never have been cost-efficient at our scale. So we started running some prototypes with Amazon Redshift, the PostgreSQL compatible data warehousing solution by AWS. We set up Kinesis Firehose to stream data from our application servers to Amazon Redshift, writing it off in batches (in essence creating a full ETL process as a service), something that would have taken a major effort with a traditional webhost. Doing this outside of AWS would have taken months; with AWS we managed to set all of this up in three days.

Managing this data through data mining frameworks was the next big challenge, for which many solutions exist in market. However, Amazon has great solutions in an integrated platform that enabled us to test and implement rapidly. For batch processing we use Spark, provided by Amazon EMR. For temporary hooking into data streams – e.g. our monitoring systems – we use AWS Data Pipeline, which gives us access to the stream of data as it is generated by our application servers, comparable to what Apache Kafka would give you.

Everything we use is accessible through an SDK, which allows us to run integration tests effectively in an isolated environment. Instead of having to mock services, or setting up temporary services locally and in our CI environment, we use the AWS SDK to easily create and clean up AWS services. The flexibility and operational effectiveness this brings is incredible, as our whole production environment can be replicated in a programmable setup, in which we can simulate specific experiments. Furthermore, we catch many more problems by actually integrating all services in all automated tests, something you would otherwise only catch during manual testing / staging.

Through AWS CloudFormation and AWS CodeDeploy we seamlessly built our cloud using templates, and integrated this with our testing systems in order to support our Continuous Deployment setup. We could, of course, have used Chef or Puppet with traditional webhosts, but the key benefit in using the AWS services for this is that we have instant access to a comprehensive ecosystem of tools and features with which we can integrate (and de-integrate) as we go.

Unexpected Bounty
One month in, things were going so smoothly that we did something that we had never done before in the history of the company:  we expanded our goals during a project without pushing out the delivery date.  We always knew that we had data that could be really valuable for Brands, but since our previous infrastructure made it really difficult to access or work with this data, we had basically ignored it.  However, when we had just finished our migration to Redshift, one of our developers read an article about the powerful combination of Redshift and Periscope.  So we decided to prototype an e-commerce data analysis tool.

A smooth connection with our Redshift tables was made almost instantly, and we saw our 500+ million records visualized in a few graphs that the Periscope team prepared for us.  Jaws dropped and our product manager went ahead and built an MVP. A few weeks of SQL courses, IRC spamming and nagging the Periscope support team later, and we had an alpha product.

We have shown this to a dozen major Brands and the response has been all we could hope for… a classic case of the fabled product / market fit. And it would not have happened without AWS.

An example of the dashboard for one of our Founding Partners (a global game development company).

Jackpot
With a state of the art platform, promising new products, and the backend infrastructure to support global viral growth we finally had a company that could attract the attention of professional investors… and within a few weeks of making our new pitch we had closed our first outside investment round.

We’ve come a long way from working with a bare bones transcoding server, to building a scalable infrastructure and best-in-class products that are ready to take over the world!

Our very first transcoding server.

What’s Next?
Driving viral spread globally to increase network effects, we are signing up new Webshops and Brands at a tremendous pace.  We are putting the finishing touches on the first version of our ecommerce data analysis product for Brand marketers, and speccing out additional products and features for Brands and Webshops working with the Autheos Network.  And of course we are looking for amazing team members to help make this happen. If you would like to join us on the next stage of our journey, please look at our website for current openings — and yes, we are looking for DevOps engineers!

And lastly, since this is the Amazon Web Services blog, we can’t resist being cheeky and thus herewith take the opportunity to invite Mr. Bezos to sit down with us to see if we can become the global product video partner for Amazon.  One thing’s for sure: our infrastructure is the best!

— Leon Mergen, CTO – lmergen@autheos.com

Machine Learning, Recommendation Systems, and Data Analysis at Cloud Academy

by Jeff Barr | on | in Amazon RDS, Amazon S3, AWS Lambda, Guest Post | | Comments

In today’s guest post, Alex Casalboni and Giacomo Marinangeli of Cloud Academy discuss the design and development of their new Inspire system.

Jeff;

Our Challenge
Mixing technology and content has been our mission at Cloud Academy since the very early days. We are builders and we love technology, but we also know content is king. Serving our members with the best content and creating smart technology to automate it is what kept us up at night for a long time.

Companies are always fighting for people’s time and attention and at Cloud Academy, we face those same challenges as well. Our goal is to empower people, help them learn new Cloud skills every month, but we kept asking ourselves: “How much content is enough? How can we understand our customer’s goals and help them select the best learning paths?”

With this vision in mind about six months ago we created a project called Inspire which focuses on machine learning, recommendation systems and data analysis. Inspire solves our problem on two fronts. First, we see an incredible opportunity in improving the way we serve our content to our customers. It will allow us to provide better suggestions and create dedicated learning paths based on an individual’s skills, objectives and industries. Second, Inspire represented an incredible opportunity to improve our operations. We manage content that requires constant updates across multiple platforms with a continuously growing library of new technologies.

For instance, getting a notification to train on a new EC2 scenario that you’re using in your project can really make a difference in the way you learn new skills. By collecting data across our entire product, such as when you watch a video or when you’re completing an AWS quiz, we can gather that information to feed Inspire. Day by day, it keeps personalising your experience through different channels inside our product. The end result is a unique learning experience that will follow you throughout your entire journey and enable a customized continuous training approach based on your skills, job and goals.

Inspire: Powered by AWS
Inspire is heavily based on machine learning and AI technologies, enabled by our internal team of data scientists and engineers. Technically, this involves several machine learning models, which are trained on the huge amount of collected data. Once the Inspire models are fully trained, they need to be deployed in order to serve new predictions, at scale.

Here the challenge has been designing, deploying and managing a multi-model architecture, capable of storing our datasets, automatically training, updating and A/B testing our machine learning models, and ultimately offering a user-friendly and uniform interface to our website and mobile apps (available for iPhone and Android).

From the very beginning, we decided to focus high availability and scalability. With this in mind, we designed an (almost) serverless architecture based on AWS Lambda. Every machine learning model we build is trained offline and then deployed as an independent Lambda function.

Given the current maximum execution time of 5 minutes, we still run the training phase on a separate EC2 Spot instance, which reads the dataset from our data warehouse (hosted on Amazon RDS), but we are looking forward to migrating this step to a Lambda function as well.

We are using Amazon API Gateway to manage RESTful resources and API credentials, by mapping each resource to a specific Lambda function.

The overall architecture is logically represented in the diagram below:

Both our website and mobile app can invoke Inspire with simple HTTPS calls through API Gateway. Each Lambda function logically represents a single model and aims at solving a specific problem. More in detail, each Lambda function loads its configuration by downloading the corresponding machine learning model from Amazon S3 (i.e. a serialized representation of it).

Behind the scenes, and without any impact on scalability or availability, an EC2 instance takes care of periodically updating these S3 objects, as outcome of the offline training phase.

Moreover, we want to A/B test and optimize our machine learning models: this is transparently handled in the Lambda function itself by means of SixPack, an open-source A/B testing framework which uses Redis.

Data Collection Pipeline
As far as data collection is concerned, we use Segment.com as data hub: with a single API call, it allows us to log events into multiple external integrations, such as Google Analytics, Mixpanel, etc. We also developed our own custom integration (via webhook) in order to persistently store the same data in our AWS-powered data warehouse, based on Amazon RDS.

Every event we send to Segment.com is forwarded to a Lambda function – passing through API Gateway – which takes care of storing real-time data into an SQS queue. We use this queue as a temporary buffer in order to avoid scalability and persistency problems, even during downtime or scheduled maintenance. The Lambda function also handles the authenticity of the received data thanks to a signature, uniquely provided by Segment.com.

Once raw data has been written onto the SQS queue, an elastic fleet of EC2 instances reads each individual event – hence removing it from the queue without conflicts – and writes it into our RDS data warehouse, after performing the required data transformations.

The serverless architecture we have chosen drastically reduces the costs and problems of our internal operations, besides providing high availability and scalability by default.

Our Lambda functions have a pretty constant average response time – even during load peaks – and the SQS temporary buffer makes sure we have a fairly unlimited time and storage tolerance before any data gets lost.

At the same time, our machine learning models won’t need to scale up in a vertical or distributed fashion since Lambda takes care of horizontal scaling. Currently, they have an incredibly low average response time of 1ms (or less):

We consider Inspire an enabler for everything we do from a product and content perspective, both for our customers and our operations. We’ve worked to make this the core of our technology, so that its contributions can quickly be adapted and integrated by everyone internally. In the near future, it will be able to independently make decisions for our content team while focusing on our customers’ need.  At the end of the day, Inspire really answers our team’s doubts on which content we should prioritize, what works better and exactly how much of it we need. Our ultimate goal is to improve our customer’s learning experience by making Cloud Academy smarter by building real intelligence.

Join our Webinar
If you would like to learn more about Inspire, please join our April 27th webinar – How we Use AWS for Machine Learning and Data Collection.

Alex Casalboni, Senior Software Engineer, Cloud Academy
Giacomo Marinangeli, CTO, Cloud Academy

PS – Cloud Academy is hiring – check out our open positions!

AWS Week in Review – April 18, 2016

by Jeff Barr | on | in Week in Review | | Comments

Let’s take a quick look at what happened in AWS-land last week:

Monday

April 18

Tuesday

April 19

Wednesday

April 20

Thursday

April 21

Friday

April 22

Saturday

April 23

Sunday

April 24

New & Notable Open Source

New SlideShare Presentations

New Customer Success Stories

New YouTube Videos

Upcoming Events

Help Wanted

Stay tuned for next week! In the meantime, follow me on Twitter and subscribe to the RSS feed.

Jeff;

AWS Webinars – April 2016

by Jeff Barr | on | in Webinars | | Comments

Our 2016 series of webinars continues with a strong set of 200-level topics in April. The webinars are free, but space is limited and you should sign up ahead of time if you would like to attend. Here’s what we have on the calendar for the last week of April (all times are Pacific):

Tuesday, April 26
Are you ready to launch and connect to your first EC2 instance? Do you want to learn how to use Amazon Simple Storage Service (S3) to store and share files? Our getting started webinar will show you how to do both.

Webinar: Getting Started with AWS (9 – 10 AM).

Do you want to learn how to use Apache Spark to analyze real-time streams of data on an Amazon EMR cluster? Do you want to know how to use Spark as part of a system that includes Amazon DynamoDB, Amazon Redshift, Amazon Kinesis, and other big data tools? This webinar will show you how to use Spark to address common big data use cases.

Webinar: Best Practices for Apache Spark on AWS (10:30 – 11:30 AM).

Are you interested in running a commercial relational database in the cloud? Do you want to know more about best practices for running single and multiple database instances, or do you have questions about costs and licensing? Attend this webinar to learn more about Amazon RDS running Oracle.

Webinar: RDS for Oracle: Quick Provision, Easy to Manage, Reduced Cost (Noon – 1 PM).

Wednesday, April 27
In today’s real-time world, going from raw data to insights as quickly as possible has become a must. Fortunately, a number of AWS tools can help you to capture, store, and analyze real-time streaming data. Attend this webinar to learn about Amazon Kinesis Streams, Lambda, and Spark Streaming on Amazon EMR.

Webinar: Getting Started with Real-Time Data Analytics on AWS (9 – 10 AM).

As you move your business and your applications to the cloud, you should also look at modernizing your development and deployment practices. For example, many AWS customers use tools like AWS CodePipeline and AWS CodeDeploy to implement continuous delivery. Attend this webinar to learn more about what this means and how to put it in to practice in your own organization.

Webinar: Getting Started with Continuous Delivery on AWS (10:30 – 11:30 AM).

Now that Amazon S3 is a decade old, we have a wealth of experience to share about the best ways to use it for backup, compliance, archiving, and many other purposes. This webinar will share best practices for keeping your data safe, and will also provide an overview of several different transfer services.

Webinar: S3 Best Practices: A Decade of Field Experience (Noon – 1 PM).

Thursday, April 28
AWS Lambda brings some new flexibility to the development and deployment process. When used in conjunction with AWS Storage Gateway, it can be used as the basis for an automated development workflow that easily supports distinct development, staging, and production environments. Attend this webinar to learn more.

Webinar: Continuous Delivery to AWS Lambda (9 AM – 10 AM).

Amazon Aurora is a MySQL-compatible database engine that can boost performance, reliability, and availability while reducing the total cost of ownership. Join this webinar to learn more about Aurora and to better understand how to migrate your existing on-premises or cloud-based databases to it.

Webinar: Migrating Your Databases to Amazon Aurora (10:30 – 11:30 AM).

Containers and microservices are both a natural fit for the cloud. Attend this webinar to learn more about the challenges that might arise and the best practices to address them.

Webinar: Running Microservices on Amazon ECS (Noon – 1 PM).

Jeff;

 

Amazon EMR Update – Apache HBase 1.2 Is Now Available

by Jeff Barr | on | in Amazon EMR | | Comments

Apache HBase is a distributed, scalable big data store designed to support tables with billions of rows and millions of columns. HBase runs on top of Hadoop and HDFS and can also be queried using MapReduce, Hive, and Pig jobs.

AWS customers use HBase for their ad tech, web analytics, and financial services workloads. They appreciate its scalability and the ease with which it handles time-series data.

HBase 1.2 on Amazon EMR
Today we are making version 1.2 of HBase available for use with Amazon EMR.  Here are some of the most important and powerful features and benefits that you get when you run HBase:

Strongly Consistent Reads and Writes – When a writer returns, all of the readers will see the same value.

Scalability – Individual HBase tables can be comprised of billions of rows and millions of columns. HBase stores data in a sparse form in order to conserve space. You can use column families and column prefixes to organize your schemas and to indicate to HBase that the members of the family have a similar access pattern. You can also use timestamps and versioning to retain old versions of cells.

Backup to S3 – You can use the HBase Export Snapshot tool to backup your tables to Amazon S3. The backup operation is actually a MapReduce job and uses parallel processing to adeptly handle large tables.

Graphs And Timeseries – You can use HBase as the foundation for a more specialized data store. For example, you can use Titan for graph databases and OpenTSDB for time series.

Coprocessors – You can write custom business logic (similar to a trigger or a stored procedure) that runs within HBase and participates in query and update processing (read The How To of HBase Coprocessors to learn more).

You also get easy provisioning and scaling, access to a pre-configured installation of HDFS, and automatic node replacement for increased durability.

Getting Started with HBase
HBase 1.2 is available as part of Amazon EMR release 4.6. You can, as usual, launch it from the Amazon EMR Console, the Amazon EMR CLI, or through the Amazon EMR API. Here’s the command that I used:

$ aws --region us-east-1 emr create-cluster \
  --name "MyCluster" --release-label "emr-4.6.0" \
  --instance-type m3.xlarge --instance-count 3 --use-default-roles \
  --ec2-attributes KeyName=keys-jbarr-us-east \
  --applications Name=Hadoop Name=Hue Name=HBase Name=Hive

This command assumes that the EMR_DefaultRole and EMR_EC2_DefaultRole IAM roles already exist. They are created automatically when you launch an EMR cluster from the Console (read about Create and Use Roles for Amazon EMR and Create and Use Roles with the AWS CLI to learn more).

I  found the master node’s DNS on the Cluster Details page and SSH’ed in as user hadoop. Then I ran a couple of HBase shell commands:

Following the directions in our new HBase Tutorial, I created a table called customer, restored a multi-million record snapshot from S3 into the table, and ran some simple queries:

Available Now
You can start using HBase 1.2 on Amazon EMR today. To learn more, read the Amazon EMR Documentation.

Jeff;

AWS Week in Review – April 11, 2016

by Jeff Barr | on | in Week in Review | | Comments

Let’s take a quick look at what happened in AWS-land last week:

Monday

April 11

Tuesday

April 12

Wednesday

April 13

Thursday

April 14

Friday

April 15

Saturday

April 16

Sunday

April 17

New & Notable Open Source

  • cfn-include implements a Fn::Include for CloudFormation templates.
  • TumblessTemplates is a set of CloudFormation templates for quick setup of the Tumbless blogging platform.
  • s3git is Git for cloud storage.
  • s3_uploader is an S3 file uploader GUI written in Python.
  • SSH2EC2 lets you connect to EC2 instances via tags and metadata.
  • lambada is AWS Lambda for silly people.
  • aws-iam-proxy is a proxy that signs requests with IAM credentials.
  • hyperion is a Scala library and a set of abstractions for AWS Data Pipeline.
  • dynq is a DynamoDB query library.
  • cloud-custodian is a policy rules engine for AWS management.

New SlideShare Presentations

New Customer Success Stories

New YouTube Videos

Upcoming Events

Help Wanted

Stay tuned for next week! In the meantime, follow me on Twitter and subscribe to the RSS feed.

Jeff;

AWS Device Farm Update – Remote Access to Devices for Interactive Testing

by Jeff Barr | on | in AWS Device Farm | | Comments

Last year I wrote about AWS Device Farm and told you how you can use it to Test Mobile Apps on Real Devices. As I described at the time, AWS Device Farm allows you to create a project, identify an application, configure a test, and then run the test against a variety of iOS and Android devices.

Remote Access to Devices
Today we are launching a new feature that provides you with remote access to devices (phones and tablets) for interactive testing. You simply open a new session on the desired device, wait (generally a minute or two) until the device is available, and then interact with the device via the AWS Management Console.

You can gesture, swipe, and interact with devices in real time directly through your web browser as if the device was on your desk or in your hand. This includes installing and running applications!

Here’s a quick demo. I click on Start a new session to begin:

Then I search for a device of the desired type, including the desired OS version, select it, and name my session. I click on Confirm and start session to proceed:

Then I wait for the device to become available (about 30 seconds in this case):

 

Once the device is available I can see the screen and access it through the Console:

I can interact with the Kindle Fire using my mouse. Perhaps my app is not behaving as expected when the language is set to Latin American Spanish. I can change the Kindle Fire’s settings with a couple of clicks:

I can install my app on the Kindle Fire by clicking on Upload and choosing my APK.

My session can run for up to 60 minutes. After that time it will stop automatically.

Available Now
This new feature is available in beta form now, with a wide selection of Android phones and tablets. We will be adding iOS devices later this year, along with additional control over the device configuration and (virtual) location.

AWS Device Farm comes with a one-time free trial of 250 device minutes. After that you are charged $0.17 per device minute. Or you can pay $250 per slot per month for unmetered access (slots are the units of concurrent execution).

Jeff;

 

New – Your User Pools for Amazon Cognito

by Jeff Barr | on | in Amazon Cognito | | Comments

Amazon Cognito makes it easy for mobile and web apps to easily add authentication, user management, and data synchronization without having to write backend code or manage any infrastructure. In the last year we have added some powerful new features to Cognito including AWS CloudTrail support, the use of Twitter and Digits as login providers, the ability to run AWS Lambda functions in response to events in Cognito, and the streaming of user identity data into Amazon Kinesis.

Your User Pools
You can now use Amazon Cognito to easily add user sign-up and sign-in to your mobile and web apps. With the user pools feature, you can create your own user directory that can scale to hundreds of millions of users, and is fully managed so you don’t have to worry about the heavy lifting associated with building, securing, and scaling authentication to your apps. This feature also provides enhanced security functionality such as email verification, phone number verification, and multi-factor authentication. As an app developer, you already had the option to use an external identity provider such as Amazon, Facebook, Google, Twitter or Digits for this purpose using the Cognito feature that we now call Federated Identity Pools.

Using a user pool gives you detailed control over the sign-up and sign-in aspects of your web and mobile SaaS apps, games, and so forth. Building and running a directory service at scale (potentially tens or even hundreds of millions of users) is not easy, but is definitely undifferentiated heavy lifting, with the added security burden that comes when you are managing user names, passwords, email addresses, and other sensitive pieces of information. You don’t need to build or run your own directory service when you use Cognito Identity.

Multiple User Pools per Account
You can create multiple user pools within your AWS account (the identities in one pool are separate and distinct from those in any other pools). Each user pool is named, and you have full control over the attributes (address, email, phone number(s), locale, and so forth) that you want to store for each user. After you create a user pool, you can use the AWS Mobile SDKs (available for iOS, Android, and JavaScript) to authenticate users, get access credentials, and so forth.

Your users will benefit from a number of security features including SMS-based Multi-Factor Authentication (MFA) and account verification via phone or email. The password features use the Secure Remote Password (SRP) protocol to avoid sending cleartext passwords over the wire.

Creating a User Pool
Let’s walk through the process of creating a user pool from the AWS Management Console (API and CLI support is also available). My (hypothetical) app is called PaymentApp so I’ll create a pool named PaymentAppUsers:

I can either review and accept the defaults, or I can step through all of the settings. I’ll choose the latter course of action. Now I choose the set of attributes that must be collected when a new user signs up for my service:

I can also set up custom attributes. For my app, I would like to track the user’s preferred payment currency:

Then I set up the desired password strength. Since this is payment app, I’ll check all of the options:

Next, I enable Multi-Factor Authentication, and indicate that email addresses and phone numbers must be verified. I also customize the messages that are associated with each method of verification:

My app will have a mobile client so I’ll arrange to create a unique ID and a secret key for it:

Now I can arrange to have Cognito invoke some Lambda functions during the sign-up, verification, authentication, and confirmation steps (this is optional, but very useful if you want to customize the sign-up workflow by validating custom attributes):

Finally, I review my choices and create my pool:

At this point I am ready to build my web or mobile app.

Cognito at Asurion
Asurion provides over 200 million customers with insurance policies for high-value devices such as smartphones. Asurion plans to use Cognito to manage the user directory for their new device protection app. This app will collect device-related data and make recommendations that are aimed at optimizing usage.

Asurion chose Cognito due to its support for a wide variety of identity models. They can have their own fully managed directory or they can have users authenticate through third-party social identity providers without having to deal with the heavy lifting involved in scaling and security an identity system.

Ravi Tiyyagura (Asurion’s Director of Enterprise Architecture) told us:

It is critical for us to provide a secure and simple sign-up and sign-in experience for our tens of millions of end users. With Amazon Cognito, we can enable that without having to worry about building and managing any backend infrastructure.

Public Beta
We are launching user pools today as a public beta. All of the primary functionality is in place and you can use it to start building and testing your app. We expect to make it available for production use within a couple of months.

Jeff;

 

AWS Storage Update – Amazon S3 Transfer Acceleration + Larger Snowballs in More Regions

by Jeff Barr | on | in Amazon S3, AWS Import/Export | | Comments

Several AWS teams are focused on simplifying and accelerating the process of moving on-premises data to the cloud. We started out with the very basic PUT operation and multipart upload in the early days. Along the way we gave you the ability to send us a disk, and made that process even easier by launching the AWS Import/Export Snowball at last year’s AWS re:Invent (read AWS Import/Export Snowball – Transfer 1 Petabyte Per Week Using Amazon-Owned Storage Appliances to learn more).

Today we are launching some important improvements to Amazon S3 and to Snowball, both designed to further simplify and accelerate your data migration process. Here’s what’s new:

Amazon S3 Transfer Acceleration – This new feature accelerates Amazon S3 data transfers by making use of optimized network protocols and the AWS edge infrastructure. Improvements are typically in the range of 50% to 500% for cross-country transfer of larger objects, but can go ever higher under certain conditions.

Larger Snowballs in More Regions – A new, larger-capacity (80 terabyte) Snowball appliance is now available. In addition, the appliances can now be used with two additional US Regions and two additional international Regions.

Amazon S3 Transfer Acceleration
The AWS edge network has points of presence in more than 50 locations. Today, it is used to distribute content via Amazon CloudFront and to provide rapid responses to DNS queries made to Amazon Route 53. With today’s announcement, the edge network also helps to accelerate data transfers in to and out of Amazon S3. It will be of particular benefit to you if you are transferring data across or between continents, have a fast Internet connection, use large objects, or have a lot of content to upload.

You can think of the edge network as a bridge between your upload point (your desktop or your on-premises data center) and the target bucket. After you enable this feature for a bucket (by checking a checkbox in the AWS Management Console), you simply change the bucket’s endpoint to the form BUCKET_NAME.s3-accelerate.amazonaws.com. No other configuration changes are necessary! After you do this, your TCP connections will be routed to the best AWS edge location based on latency.  Transfer Acceleration will then send your uploads back to S3 over the AWS-managed backbone network using optimized network protocols, persistent connections from edge to origin, fully-open send and receive windows, and so forth.

Here’s how I enable Transfer Acceleration for my bucket (jbarr-public):

My bucket endpoint is https://jbarr-public.s3.amazonaws.com/. I simply change it to https://jbarr-public.s3-accelerate.amazonaws.com/ in my upload tool and/or code. Then I initiate the upload and take advantage of S3 Transfer Acceleration with no further effort. I can use the same rule to construct a URL that can be used for accelerated downloads.

Because network configurations and conditions vary from time to time and from location to location, you pay only for transfers where Transfer Acceleration can potentially improve upload performance. Pricing for accelerated transfers begins at $0.04 per gigabyte uploaded. As is always the case with S3, there’s no up-front fee or long-term commitment.

You can use our new Amazon S3 Transfer Acceleration Speed Comparison to get a better understanding of the value of Transfer Acceleration in your environment. I ran it from my Amazon WorkSpace in the US West (Oregon) Region and this is what I saw:

As you can see, the opportunity for improvement grew in rough proportion to the distance between my home Region and the target.

To learn more about this feature, read Getting Started with Amazon S3 Transfer Acceleration in the Amazon S3 Developer Guide. It is available today in all Regions except Beijing (China) and AWS GovCloud (US).

Larger Snowballs in More Regions
Many AWS customers are now using AWS Import/Export Snowball to move large amounts of data in and out of the AWS Cloud. For example, here’s a Snowball on-site at GE Oil & Gas:

Ben Wilson (CTO of GE Oil & Gas) posted the picture to LinkedIn with the following caption:

“PIGs and Snowballs – a match made in heaven!! AWS Snowball 25 TB of Pipeline PIG Data to be managed at AWS. That is our GE PIG we pulled some of the data from. It’s always fun to try new AWS features and try to break them!!”

Today we are making Snowball available in four new Regions: AWS GovCloud (US), US West (Northern California), Europe (Ireland), and Asia Pacific (Sydney). We expect to make Snowball available in the remaining AWS Regions in the coming year.

The original Snowball appliances had a capacity of 50 terabytes. Today we are launching a newer appliance with 80 terabytes of capacity. If you are transferring data in or out of the US East (Northern Virginia), US West (Oregon), US West (Northern California), or AWS GovCloud (US) Regions using Snowball you can choose the desired capacity. If you are transferring data in or out of the Europe (Ireland) or Asia Pacific (Sydney) Regions, you will use the 80 terabyte appliance. Here’s how you choose the desired capacity when you are creating your import job:

To learn more about Snowball, read the Snowball FAQ and the Snowball Developer Guide.

Jeff;