Category: Amazon S3


AWS Storage Update – S3 & Glacier Price Reductions + Additional Retrieval Options for Glacier

by Jeff Barr | on | in Amazon Glacier, Amazon S3, Launch, Price Reduction | | Comments

Back in 2006, we launched S3 with a revolutionary pay-as-you-go pricing model, with an initial price of 15 cents per GB per month. Over the intervening decade, we reduced the price per GB by 80%, launched S3 in every AWS Region, and enhanced the original one-size-fits-all model with user-driven features such as web site hosting, VPC integration, and IPv6 support, while adding new storage options including S3 Infrequent Access.

Because many AWS customers archive important data for legal, compliance, or other reasons and reference it only infrequently, we launched Glacier in 2012, and then gave you the ability to transition data between S3, S3 Infrequent Access, and Glacier by using lifecycle rules.

Today I have two big pieces of news for you: we are reducing the prices for S3 Standard Storage and for Glacier storage. We are also introducing additional retrieval options for Glacier.

S3 & Glacier Price Reduction
As long-time AWS customers already know, we work relentlessly to reduce our own costs, and to pass the resulting savings along in the form of a steady stream of AWS Price Reductions.

We are reducing the per-GB price for S3 Standard Storage in most AWS regions, effective December 1, 2016. The bill for your December usage will automatically reflect the new, lower prices. Here are the new prices for Standard Storage:

Regions 0-50 TB
($ / GB / Month)
51 – 500 TB
($ / GB / Month)
500+ TB
($ / GB / Month)
  • US East (Northern Virginia)
  • US East (Ohio)
  • US West (Oregon)
  • EU (Ireland)

(Reductions range from 23.33% to 23.64%)

 $0.0230 $0.0220 $0.0210
  • US West (Northern California)

(Reductions range from 20.53% to 21.21%)

 $0.0260 $0.0250 $0.0240
  • EU (Frankfurt)

(Reductions range from 24.24% to 24.38%)

 $0.0245 $0.0235 $0.0225
  • Asia Pacific (Singapore)
  • Asia Pacific (Tokyo)
  • Asia Pacific (Sydney)
  • Asia Pacific (Seoul)
  • Asia Pacific (Mumbai)

(Reductions range from 16.36% to 28.13%)

 $0.0250 $0.0240 $0.0230

As you can see from the table above, we are also simplifying the pricing model by consolidating six pricing tiers into three new tiers.

We are also reducing the price of Glacier storage in most AWS Regions. For example, you can now store 1 GB for 1 month in the US East (Northern Virginia), US West (Oregon), or EU (Ireland) Regions for just $0.004 (less than half a cent) per month, a 43% decrease. For reference purposes, this amount of storage cost $0.010 when we launched Glacier in 2012, and $0.007 after our last Glacier price reduction (a 30% decrease).

The lower pricing is a direct result of the scale that comes about when our customers trust us with trillions of objects, but it is just one of the benefits. Based on the feedback that I get when we add new features, the real value of a cloud storage platform is the rapid, steady evolution. Our customers often tell me that they love the fact that we anticipate their needs and respond with new features accordingly.

New Glacier Retrieval Options
Many AWS customers use Amazon Glacier as the archival component of their tiered storage architecture. Glacier allows them to meet compliance requirements (either organizational or regulatory) while allowing them to use any desired amount of cloud-based compute power to process and extract value from the data.

Today we are enhancing Glacier with two new retrieval options for your Glacier data. You can now pay a little bit more to expedite your data retrieval. Alternatively, you can indicate that speed is not of the essence and pay a lower price for retrieval.

We launched Glacier with a pricing model for data retrieval that was based on the amount of data that you had stored in Glacier and the rate at which you retrieved it. While this was an accurate reflection of our own costs to provide the service, it was somewhat difficult to explain. Today we are replacing the rate-based retrieval fees with simpler per-GB pricing.

Our customers in the Media and Entertainment industry archive their TV footage to Glacier. When an emergent situation calls for them to retrieve a specific piece of footage, minutes count and they want fast, cost-effective access to the footage. Healthcare customers are looking for rapid, “while you wait” access to archived medical imagery and genome data; photo archives and companies selling satellite data turn out to have similar requirements. On the other hand, some customers have the ability to plan their retrievals ahead of time, and are perfectly happy to get their data in 5 to 12 hours.

Taking all of this in to account, you can now select one of the following options for retrieving your data from Glacier (The original rate-based retrieval model is no longer applicable):

Standard retrieval is the new name for what Glacier already provides, and is the default for all API-driven retrieval requests. You get your data back in a matter of hours (typically 3 to 5), and pay $0.01 per GB along with $0.05 for every 1,000 requests.

Expedited retrieval addresses the need for “while you wait access.” You can get your data back quickly, with retrieval typically taking 1 to 5 minutes.  If you store (or plan to store) more than 100 TB of data in Glacier and need to make infrequent, yet urgent requests for subsets of your data, this is a great model for you (if you have less data, S3’s Infrequent Access storage class can be a better value). Retrievals cost $0.03 per GB and $0.01 per request.

Retrieval generally takes between 1 and 5 minutes, depending on overall demand. If you need to get your data back in this time frame even in rare situations where demand is exceptionally high, you can provision retrieval capacity. Once you have done this, all Expedited retrievals will automatically be served via your Provisioned capacity. Each unit of Provisioned capacity costs $100 per month and ensures that you can perform at least 3 Expedited Retrievals every 5 minutes, with up to 150 MB/second of retrieval throughput.

Bulk retrieval is a great fit for planned or non-urgent use cases, with retrieval typically taking 5 to 12 hours at a cost of $0.0025 per GB (75% less than for Standard Retrieval) along with $0.025 for every 1,000 requests. Bulk retrievals are perfect when you need to retrieve large amounts of data within a day, and are willing to wait a few extra hours in exchange for a very significant discount.

If you do not specify a retrieval option when you call InitiateJob to retrieve an archive, a Standard Retrieval will be initiated. Your existing jobs will continue to work as expected, and will be charged at the new rate.

To learn more, read about Data Retrieval in the Glacier FAQ.

As always, I am thrilled to be able to share this news with you, and I hope that you are equally excited!

If you want to learn more- we have a webinar coming up December 12th. Register here.

Jeff;

 

CloudTrail Update – Capture and Process Amazon S3 Object-Level API Activity

by Jeff Barr | on | in Amazon CloudTrail, Amazon CloudWatch, Amazon S3, AWS Lambda | | Comments

I would like to show you how several different AWS services can be used together to address a challenge faced by many of our customers. Along the way I will introduce you to a new AWS CloudTrail feature that launches today and show you how you can use it in conjunction with CloudWatch Events.

The Challenge
Our customers store many different types of mission-critical data in Amazon Simple Storage Service (S3) and want to be able to track object-level activity on their data. While some of this activity is captured and stored in the S3 access logs, the level of detail is limited and log delivery can take several hours. Customers, particularly in financial services and other regulated industries, are asking for additional detail, delivered on a more timely basis. For example, they would like to be able to know when a particular IAM user accesses sensitive information stored in a specific part of an S3 bucket.

In order to meet the needs of these customers, we are now giving CloudTrail the power to capture object-level API activity on S3 objects, which we call Data events (the original CloudTrail events are now called Management events). Data events include “read” operations such as GET, HEAD, and Get Object ACL as well as “write” operations such as PUT and POST. The level of detail captured for these operations is intended to provide support for many types of security, auditing, governance, and compliance use cases. For example, it can be used to scan newly uploaded data for Personally Identifiable Information (PII), audit attempts to access data in a protected bucket, or to verify that the desired access policies are in effect.

Processing Object-Level API Activity
Putting this all together, we can easily set up a Lambda function that will take a custom action whenever an S3 operation takes place on any object within a selected bucket or a selected folder within a bucket.

Before starting on this post, I created a new CloudTrail trail called jbarr-s3-trail:

I want to use this trail to log object-level activity on one of my S3 buckets (jbarr-s3-trail-demo). In order to do this I need to add an event selector to the trail. The selector is specific to S3, and allows me to focus on logging the events that are of interest to me. Event selectors are a new CloudTrail feature and are being introduced as part of today’s launch, in case you were wondering.

I indicate that I want to log both read and write events, and specify the bucket of interest. I can limit the events to part of the bucket by specifying a prefix, and I can also specify multiple buckets. I can also control the logging of Management events:

CloudTrail supports up to 5 event selectors per trail. Each event selector can specify up to 50 S3 buckets and optional bucket prefixes.

I set this up, opened my bucket in the S3 Console, uploaded a file, and took a look at one of the entries in the trail. Here’s what it looked like:

{
  "eventVersion": "1.05",
  "userIdentity": {
    "type": "Root",
    "principalId": "99999999999",
    "arn": "arn:aws:iam::99999999999:root",
    "accountId": "99999999999",
    "username": "jbarr",
    "sessionContext": {
      "attributes": {
        "creationDate": "2016-11-15T17:55:17Z",
        "mfaAuthenticated": "false"
      }
    }
  },
  "eventTime": "2016-11-15T23:02:12Z",
  "eventSource": "s3.amazonaws.com",
  "eventName": "PutObject",
  "awsRegion": "us-east-1",
  "sourceIPAddress": "72.21.196.67",
  "userAgent": "[S3Console/0.4]",
  "requestParameters": {
    "X-Amz-Date": "20161115T230211Z",
    "bucketName": "jbarr-s3-trail-demo",
    "X-Amz-Algorithm": "AWS4-HMAC-SHA256",
    "storageClass": "STANDARD",
    "cannedAcl": "private",
    "X-Amz-SignedHeaders": "Content-Type;Host;x-amz-acl;x-amz-storage-class",
    "X-Amz-Expires": "300",
    "key": "ie_sb_device_4.png"
  }

Then I create a simple Lambda function:

Next, I create a CloudWatch Events rule that matches the function name of interest (PutObject) and invokes my Lambda function (S3Watcher):

Now I upload some files to my bucket and check to see that my Lambda function has been invoked as expected:

I can also find the CloudWatch entry that contains the output from my Lambda function:

Pricing and Availability
Data events are recorded only for the S3 buckets that you specify, and are charged at the rate of $0.10 per 100,000 events. This feature is available in all commercial AWS Regions.

Jeff;

Human Longevity, Inc. – Changing Medicine Through Genomics Research

by Jeff Barr | on | in Amazon EC2, Amazon EMR, Amazon S3, AWS OpsWorks, Customer Success | | Comments

Human Longevity, Inc. (HLI) is at the forefront of genomics research and wants to build the world’s largest database of human genomes along with related phenotype and clinical data, all in support of preventive healthcare. In today’s guest post, Yaron Turpaz,  Bryan Coon, and Ashley Van Zeeland talk about how they are using AWS to store the massive amount of data that is being generated as part of this effort to revolutionize medicine.

Jeff;


When Human Longevity, Inc. launched in 2013, our founders recognized the challenges ahead. A genome contains all the information needed to build and maintain an organism; in humans, a copy of the entire genome, which contains more than three billion DNA base pairs, is contained in all cells that have a nucleus. Our goal is to sequence one million genomes and deliver that information—along with integrated health records and disease-risk models—to researchers and physicians. They, in turn, can interpret the data to provide targeted, personalized health plans and identify the optimal treatment for cancer and other serious health risks far earlier than has been possible in the past. The intent is to transform medicine by fostering preventive healthcare and risk prevention in place of the traditional “sick care” model, when people wind up seeing their doctors only after symptoms manifest.

Our work in developing and applying large-scale computing and machine learning to genomics research entails the collection, analysis, and storage of immense amounts of data from DNA-sequencing technology provided by companies like Illumina. Raw data from a single genome consumes about 100 gigabytes; that number increases as we align the genomic information with annotation and phenotype sources and analyze it for health insights.

From the beginning, we knew our choice of compute and storage technology would have a direct impact on the success of the company. Using the cloud was clearly the best option. We’re experts in genomics, and don’t want to spend resources building and maintaining an IT infrastructure. We chose to go all in on AWS for the breadth of the platform, the critical scalability we need, and the expertise AWS has developed in big data. We also saw that the pace of innovation at AWS—and its deliberate strategy of keeping costs as low as possible for customers—would be critical in enabling our vision.

Leveraging the Range of AWS Services

 Spectral karyotype analysis / Image courtesy of Human Longevity, Inc.

Spectral karyotype analysis / Image courtesy of Human Longevity, Inc.

Today, we’re using a broad range of AWS services for all kinds of compute and storage tasks. For example, the HLI Knowledgebase leverages a distributed system infrastructure comprised of Amazon S3 storage and a large number of Amazon EC2 nodes. This helps us achieve resource isolation, scalability, speed of provisioning, and near real-time response time for our petabyte-scale database queries and dynamic cohort builder. The flexibility of AWS services makes it possible for our customized Amazon Machine Images and pre-built, BTRFS-partitioned Amazon EBS volumes to achieve turn-up time in seconds instead of minutes. We use Amazon EMR for executing Spark queries against our data lake at the scale we need. AWS Lambda is a fantastic tool for hooking into Amazon S3 events and communicating with apps, allowing us to simply drop in code with the business logic already taken care of. We use Auto Scaling based on demand, and AWS OpsWorks for managing a Docker pipeline.

We also leverage the cost controls provided by Amazon EC2 Spot and Reserved Instance types. When we first started, we used on-demand instances, but the costs started to grow significantly. With Spot and Reserved Instances, we can allocate compute resources based on specific needs and workflows. The flexibility of AWS services enables us to make extensive use of dockerized containers through the resource-management services provided by Apache Mesos. Hundreds of dynamic Amazon EC2 nodes in both our persistent and spot abstraction layers are dynamically adjusted to scale up or down based on usage demand and the latest AWS pricing information. We achieve substantial savings by sharing this dynamically scaled compute cluster with our Knowledgebase service and the internal genomic and oncology computation pipelines. This flexibility gives us the compute power we need while keeping costs down. We estimate these choices have helped us reduce our compute costs by up to 50 percent from the on-demand model.

We’ve also worked with AWS Professional Services to address a particularly hard data-storage challenge. We have genomics data in hundreds of Amazon S3 buckets, many of them in the petabyte range and containing billions of objects. Within these collections are millions of objects that are unused, or used once or twice and never to be used again. It can be overwhelming to sift through these billions of objects in search of one in particular. It presents an additional challenge when trying to identify what files or file types are candidates for the Amazon S3-Infrequent Access storage class. Professional Services helped us with a solution for indexing Amazon S3 objects that saves us time and money.

Moving Faster at Lower Cost
Our decision to use AWS came at the right time, occurring at the inflection point of two significant technologies: gene sequencing and cloud computing. Not long ago, it took a full year and cost about $100 million to sequence a single genome. Today we can sequence a genome in about three days for a few thousand dollars. This dramatic improvement in speed and lower cost, along with rapidly advancing visualization and analytics tools, allows us to collect and analyze vast amounts of data in close to real time. Users can take that data and test a hypothesis on a disease in a matter of days or hours, compared to months or years. That ultimately benefits patients.

Our business includes HLI Health Nucleus, a genomics-powered clinical research program that uses whole-genome sequence analysis, advanced clinical imaging, machine learning, and curated personal health information to deliver the most complete picture of individual health. We believe this will dramatically enhance the practice of medicine as physicians identify, treat, and prevent diseases, allowing their patients to live longer, healthier lives.

— Yaron Turpaz (Chief Information Officer), Bryan Coon (Head of Enterprise Services), and Ashley Van Zeeland (Chief Technology Officer).

Learn More
Learn more about how AWS supports genomics in the cloud, and see how genomics innovator Illumina uses AWS for accelerated, cost-effective gene sequencing.

 

Amazon QuickSight Now Generally Available – Fast & Easy to Use Business Analytics for Big Data

by Jeff Barr | on | in Amazon QuickSight, Amazon RDS, Amazon Redshift, Amazon S3, Launch | | Comments

After a preview period that included participants from over 1,500 AWS customers ranging from startups to global enterprises, I am happy to be able to announce that Amazon QuickSight is now generally available! When I invited you to join the preview last year, I wrote:

In the past, Business Intelligence required an incredible amount of undifferentiated heavy lifting. You had to pay for, set up and run the infrastructure and the software, manage scale (while users fret), and hire consultants at exorbitant rates to model your data. After all that your users were left to struggle with complex user interfaces for data exploration while simultaneously demanding support for their mobile devices. Access to NoSQL and streaming data? Good luck with that!

Amazon QuickSight provides you with very fast, easy to use, cloud-powered business analytics at 1/10th the cost of traditional on-premises solutions. QuickSight lets you get started in minutes. You log in, point to a data source, and begin to visualize your data. Behind the scenes, the SPICE (Super-fast, Parallel, In-Memory Calculation Engine) will run your queries at lightning speed and provide you with highly polished data visualizations.

Deep Dive into Data
Every customer that I speak with wants to get more value from their stored data. They realize that the potential value locked up within the data is growing by the day, but are sometimes disappointed to learn that finding and unlocking that value can be expensive and difficult. On-premises business analytics tools are expensive to license and can place a heavy load on existing infrastructure. Licensing costs and the complexity of the tools can restrict the user base to just a handful of specialists.  Taken together, all of these factors have led many organizations to conclude that they are not ready to make the investment in a true business analytics function.

QuickSight is here to change that! It runs as a service and makes business analytics available to organizations of all shapes and sizes. It is fast and easy to use, does not impose a load on your existing infrastructure, and is available for a monthly fee that starts at just $9 per user.

As you’ll see in a moment, QuickSight allows you to work on data that’s stored in many different services and locations. You can get to your Amazon Redshift data warehouse, your Amazon Relational Database Service (RDS) relational databases, or your flat files in S3. You can also use a set of connectors to access data stored in on-premises MySQL, PostgreSQL, and SQL Server databases, Microsoft Excel spreadsheets, Salesforce and other services.

QuickSight is designed to scale with you. You can add more users, more data sources, and more data without having to purchase more long-term licenses or roll more hardware into your data center.

Take the Tour
Let’s take a tour through QuickSight. The administrator for my organization has already invited me to use QuickSight, so I am ready to log in and get started. Here’s the main screen:

I’d like to start by getting some data from a Redshift cluster. I click on Manage data and review my existing data sets:

I don’t see what I am looking for, so I click on New data set and review my options:

I click on Redshift (manual connect) and enter the credentials so that I can access my data warehouse (if I had a Redshift cluster running within my AWS account it would be available as an auto-discovered source):

QuickSight queries the data warehouse and shows me the schemas (sets of tables) and the tables that are available to me. I’ll select the public schema and the all_flights table to get started:

Now I have two options. I can pull the table in to SPICE for quick analysis or I can query it directly. I’ll pull it in to SPICE:

Again, I have two options! I can click on Edit/Preview data and select the rows and columns to import, or I can click on Visualize to import all of the data and proceed to the fun part! I’ll go for Edit/Preview. I can see the fields (on the left), and I can select only those that are interest using the checkboxes:

I can also click on New Filter, select a field from the popup menu, and then create a filter:

Both options (selecting fields and filtering on rows) allow me to control the data that I pull in to SPICE. This allows me to control the data that I want to visualize and also helps me to make more efficient use of memory. Once I am ready to proceed, I click on Prepare data & visualize. At this point the data is loaded in to SPICE and I’m ready to start visualizing it. I simply select a field to get started.  For example,  I can select the origin_state_abbr field and see how many flights originate in each state:

The miniaturized view on the right gives me some additional context. I can scroll up or down or select the range of values to display.  I can also click on a second field to learn more. I’ll click on flights, set the sort order to descending, and scroll to the top. Now I can see how many of the flights in my data originated in each state:

QuickSight’s AutoGraph feature automatically generates an appropriate visualization based on the data selected. For example, if I add the fl_date field, I get a state-by-state line chart over time:

Based on my query, the data types, and properties of the data, QuickSight also proposes alternate visualizations:

I also have my choice of many other visual types including vertical & horizontal bar charts, line charts, pivot tables, tree maps, pie charts, and heat maps:

Once I have created some effective visualizations, I can capture them and use the resulting storyboard to tell a data-driven story:

I can also share my visualizations with my colleagues:

Finally, my visualizations are accessible from my mobile device:

Pricing & SPICE Capacity
QuickSight comes with one free user and 1 GB of SPICE capacity for free, perpetually. This allows every AWS user to analyze their data and to gain business insights at no cost. The Standard Edition of Amazon QuickSight starts at $9 per month and includes 10 GB of SPICE capacity (see the [QuickSight Pricing] page for more info).

It is easy to manage SPICE capacity. I simply click on Manage QuickSight in the menu (I must have the ADMIN role in order to be able to make changes):

Then I can see where I stand:

I can click on Purchase more capacity to do exactly that:

I can also click on Release unused purchased capacity in order to reduce the amount of SPICE capacity that I own:

Get Started Today
Amazon QuickSight is now available in the US East (Northern Virginia), US West (Oregon), and EU (Ireland) regions and you can start using it today.

Despite the length of this blog post I have barely scratched the surface of QuickSight. Given that you can use it at no charge, I would encourage you to sign up, load some of your data, and take QuickSight for a spin!

We have a webinar coming up on January 16th where you can learn even more! Sign up here.

Jeff;

 

Genome Engineering Applications: Early Adopters of the Cloud

by Jeff Barr | on | in Amazon DynamoDB, Amazon S3, AWS Lambda, Customer Success | | Comments

Our friends at the Commonwealth Scientific and Industrial Research Organization (CSIRO) in Australia sent along the guest post below to tell us about how AWS powers an important new genome editing technique.

— Jeff


 

Recent developments in molecular engineering technology now enables the accurate editing of genomes. The new technology, called CRISPR-Cas9, can be programmed to recognize and edit specific locations in the genome by pattern-matching unique sequences of DNA. While this is a powerful new tool for researchers, the ability to scan and identify targets across the entire genome has created unprecedented demand for large-scale computation. Earlier this year, the US National Institutes of Health (NIH) has approved the use of these technologies for human health. This has the potential to revolutionize cancer treatments and also adds a new time-critical dimension to the compute requirements.

A New Approach to Cancer Treatments
Approximately two in five people will be diagnosed with cancer at some point during their lifetime and while overall cancer survival has doubled, there are still cancer types with very low survival rate, for example just 1% for pancreatic cancer. This is mainly due to the difficulty of finding therapeutic interventions that kill cancer cells but not harm the healthy tissue in the body.

The new NIH approved trial will leverage breakthroughs in the genome editing technology, CRISPR-Cas9, to develop a different treatment approach. In this, the patient’s own immune system is boosted through specific modifications of the cells that natively fight cancer. This has the potential of being effective for a wide range of different tumors, with the current trial including patients with specific blood and solid cancers, as well as melanoma.

Cloud Services for Computationally Guided Genome Engineering
This new application in human health requires an increase in robustness and efficiency of CRISPR-Cas9 design in order to meet the time constraints of clinical care. Built on AWS cloud-services, researchers in the eHealth program of the Commonwealth Scientific and Industrial Research Organization (CSIRO) in Australia, developed GT-Scan2, a novel software tool to address this issue.

“Compared to other available methods, GT-Scan2 identifies genomic location with higher sensitivity and specificity,” says Dr. Denis Bauer who is leading the transformational bioinformatics team.

GT-Scan2 shows the identified CRISPR target sites at the genomic position and annotates them with high or low activity as well as their off-target potential.

GT-Scan2 improves the effectiveness of the system by finding sites that are unique in the genome. This avoids diluting the effect due to “off-target”, which are other sites in the genome with high sequence similarity. It also optimizes robustness by finding sites that are easier to modify.

“While it was known that the three-dimensional genome organization plays a role in CRISPR binding, GT-Scan2 is the first tool to also leverage other components that are crucial for Cas9 activity,” says Dr. Laurence Wilson whose research focuses on computational genome engineering.

Specifically the off-target search is a compute intensive task traditionally reserved for researchers at large institutes with high-performance-compute infrastructure as every location in the 3 billion letter long genomic sequence needs to be investigated. GT-Scan2 democratizes the ability to find optimal sites by offering this complex computation as a cloud-service using AWS Lambda functions.

Scaling Instantaneously for Personalized Treatments
GT-Scan2 leverages the instantaneous scalability that the event-driven AWS Lambda service offers. This is crucial for personalized treatment, as complexity of the targeted gene can vary dramatically.

“The off-target search as well as the robustness analysis can be subdivided into independent, modular tasks that can run in parallel” says Aidan O’Brien who designed and implemented the system within weeks after its official Asia-Pacific launch in April this year at the AWS Summit 2016 attesting to the intuitive nature of the service. A typical job takes less than a minute and the variation between jobs range from 1 second to 5 minutes. This fast fluctuation in load over minutes rather than hours ruled out an EC2-based solution as new instances would come online too slowly to keep the runtime stable.

GT-Scan2 is served directly from S3 making it a static web app without server-side processing. It retrieves the dynamic content (such as job results and parameters) via API calls using API Gateway from a database (DynamoDB) using a JavaScript framework.

When a user submits a job, GT-Scan2 inserts the job parameters as an item into a DynamoDB table via an API call. This allows the solution to be freely scalable without creating a bottleneck. The database entry triggers the first Lambda function, which finds all putative CRISPR targets in the user-specified DNA sequence (fetched automatically upon user submission). Potential CRISPR target sites have fixed rules and can be easily found using a regular expression that completes in seconds and are inserted into a second DynamoDB table.

Adapting to leverage the power of Lambda-based microservices

All potential targets need to be evaluated for their off-target risk using the efficient string matching tool, Bowtie. Though Bowtie only requires a reduced representation of the 3 billion letter genomic sequence, the sizes of these index files exceed the storage limitation for each Lambda instance. “GT-Scan2 divides the genome into smaller blocks to fit the Lambda specifications” explains Adrian White (Research & Technical Computing, APAC) who supported the CSIRO team during development. For an average run, GT-Scan2 hence triggers 500-1000 individual Lambda functions, which simultaneously update the scores for the different putative targets in DynamoDB. During this process, the frontend is polling this table via API Gateway and updating the webpage as results come in, eliminating the need for server-side compute.

“AWS’s Lambda has given us a great framework to develop a future-ready software package able to support medical genome engineering applications,” says Dr. Bauer. “We are specifically impressed with the ability to instantaneously scale at run time by spawning more Lambda functions to cope with the varying complexity of the different genes.” Other benefits Dr. Bauer quotes include only paying for storage during periods of no use and jobs not competing with web server resources as the website is a static page with dynamic content updated through Angular 2 and the API Gateway, as well as not needing to maintain compute instances (security patches of OS).

“One of the best things about Lambda is that users will be able to easily swap-in different machine learning algorithms that are better suited for specific CRISPR applications” says Dr. Wilson.

The GT-Scan2 Team, from left, Denis Bauer, Laurence Wilson, Aidan O’Brien

“The computational genome engineering community is one of the early adopters of our AWS Lambda technology,” explains Dr. Mia Champion (Technical Business Development Manager, Scientific Computing). “GT-Scan2’s use of API Gateway and DynamoDB is a very neat solution to ensure scalability and their clever use of epigenomics really sets them apart from other recent applications using lambda to perform CRISPR searches. I am looking forward to seeing GT-Scan2 adopted in medical applications.”

Amazon Aurora Update – Call Lambda Functions From Stored Procedures; Load Data From S3

by Jeff Barr | on | in Amazon Aurora, Amazon S3, AWS Lambda | | Comments

Many AWS services work just fine by themselves, but even better together! This important aspect of our model allows you to select a single service, learn about it, get some experience with it, and then extend your span to other related services over time. On the other hand, opportunities to make the services work together are ever-present, and we have a number of them on our customer-driven roadmap.

Today I would like to tell you about two new features for Amazon Aurora, our MySQL-compatible relational database:

Lambda Function Invocation – The stored procedures that you create within your Amazon Aurora databases can now invoke AWS Lambda functions.

Load Data From S3 – You can now import data stored in an Amazon Simple Storage Service (S3) bucket into a table in an Amazon Aurora database.

Because both of these features involve Amazon Aurora and another AWS service, you must grant Amazon Aurora permission to access the service by creating an IAM Policy and an IAM Role, and then attaching the Role to your Amazon Aurora database cluster. To learn how to do this, see Authorizing Amazon Aurora to Access Other AWS Services On Your Behalf.

Lambda Function Integration
Relational databases use a combination of triggers and stored procedures to enable the implementation of higher-level functionality. The triggers are activated before or after some operations of interest are performed on a particular database table. For example, because Amazon Aurora is compatible with MySQL, it supports triggers on the INSERT, UPDATE, and DELETE operations. Stored procedures are scripts that can be run in response to the activation of a trigger.

You can now write stored procedures that invoke Lambda functions. This new extensibility mechanism allows you to wire your Aurora-based database to other AWS services. You can send email using Amazon Simple Email Service (SES), issue a notification using Amazon Simple Notification Service (SNS), insert publish metrics to Amazon CloudWatch, update a Amazon DynamoDB table, and more.

At the appliction level, you can implement complex ETL jobs and workflows, track and audit actions on database tables, and perform advanced performance monitoring and analysis.

Your stored procedure must call the mysql_lambda_async procedure. This procedure, as the name implies, invokes your desired Lambda function asynchronously, and does not wait for it to complete before proceeding. As usual, you will need to give your Lambda function permission to access any desired AWS services or resources.

To learn more, read Invoking a Lambda Function from an Amazon Aurora DB Cluster.

Load Data From S3
As another form of integration, data stored in an S3 bucket can now be imported directly in to Aurora (up until now you would have had to copy the data to an EC2 instance and import it from there).

The data can be located in any AWS region that is accessible from your Amazon Aurora cluster and can be in text or XML form.

To import data in text form, use the new LOAD DATA FROM S3 command. This command accepts many of the same options as MySQL’s LOAD DATA INFILE, but does not support compressed data. You can specify the line and field delimiters and the character set, and you can ignore any desired number of lines or rows at the start of the data.

To import data in XML form,  use the new LOAD XML from S3 command. Your XML can look like this:

<row column1="value1" column2="value2" />
...
<row column1="value1" column2="value2" />

Or like this:

<row>
  <column1>value1</column1>
  <column2>value2</column2>
</row>
...

Or like this:

<row>
  <field name="column1">value1</field>
  <field name="column2">value2</field>
</row>
...

To learn more, read Loading Data Into a DB Cluster From Text Files in an Amazon S3 Bucket.

Available Now
These new features are available now and you can start using them today!

There is no charge for either feature; you’ll pay the usual charges for the use of Amazon Aurora, Lambda, and S3.

Jeff;

 

IPv6 Support Update – CloudFront, WAF, and S3 Transfer Acceleration

by Jeff Barr | on | in Amazon CloudFront, Amazon S3, AWS Web Application Firewall | | Comments

As a follow-up to our recent announcement of IPv6 support for Amazon S3, I am happy to be able to tell you that IPv6 support is now available for Amazon CloudFront, Amazon S3 Transfer Acceleration, and AWS WAF and that all 60+ CloudFront edge locations now support IPv6. We are enabling IPv6 in a phased rollout that starts today and will extend across all of the networks over the next few weeks.

CloudFront IPv6 Support
You can now enable IPv6 support for individual Amazon CloudFront distributions. Viewers and networks that connect to a CloudFront edge location over IPv6 will automatically be served content over IPv6. Those that connect over IPv4 will continue to work as before. Connections to your origin servers will be made using IPv4.

Newly created distributions are automatically enabled for IPv6; you can modify an existing distribution by checking Enable IPv6 in the console or setting it via the CloudFront API:

Here are a couple of important things to know about this new feature:

  • Alias Records – After you enable IPv6  support for a distribution, the DNS entry for the distribution will be updated to include an AAAA record. If you are using Amazon Route 53 and an alias record to map all or part of your domain to the distribution, you will need to add an AAAA alias to the domain.
  • Log Files – If you have enabled CloudFront Access Logs, IPv6 addresses will start to show up in the c-ip field; make sure that your log processing system knows what to do with them.
  • Trusted Signers -If you make use of Trusted Signers in conjunction with an IP address whitelist, we strongly recommend the use of an IPv4-only distribution for Trusted Signer URLs that have an IP whitelist and a separate, IPv4/IPv6 distribution for the actual content. This model sidesteps an issue that would arise if the signing request arrived over an IPv4 address and was signed as such, only to have the request for the content arrive via a different, IPv6 address that is not on the whitelist.
  • CloudFormation – CloudFormation support is in the works. With today’s launch, distributions that are created from a CloudFormation template will not be enabled for IPv6. If you update an existing stack, the setting will remain as-is for any distributions referenced in the stack..
  • AWS WAF – If you use AWS WAF in conjunction with CloudFront, be sure to update your WebACLs and your IP rulesets as appropriate in order to whitelist or blacklist IPv6 addresses.
  • Forwarded Headers – When you enable IPv6 for a distribution, the X-Forwarded-For header that is presented to the origin will contain an IPv6 address. You need to make sure that the origin is able to process headers of this form.

To learn more, read IPv6 Support for Amazon CloudFront.

AWS WAF IPv6 Support
AWS WAF helps you to protect your applications from application-layer attacks (read New – AWS WAF to learn more).

AWS WAF can now inspect requests that arrive via IPv4 or IPv6 addresses. You can create web ACLs that match IPv6 addresses, as described in Working with IP Match Conditions:

All existing WAF features will work with IPv6 and there will be no visible change in performance. The IPv6 will appear in the Sampled Requests collected and displayed by WAF:

S3 Transfer Acceleration IPv6 Support
This important new S3 feature (read AWS Storage Update – Amazon S3 Transfer Acceleration + Larger Snowballs in More Regions for more info) now has IPv6 support. You can simply switch to the new dual-stack endpoint for your uploads. Simply change:

https://BUCKET.s3-accelerate.amazonaws.com

to

https://BUCKET.s3-accelerate.dualstack.amazonaws.com

Here’s some code that uses the AWS SDK for Java to create a client object and enable dual-stack transfer:

AmazonS3Client s3 = new AmazonS3Client();
s3.setS3ClientOptions(S3ClientOptions.builder().enableDualstack().setAccelerateModeEnabled(true).build());

Most applications and network stacks will prefer IPv6 automatically, and no further configuration should be required. You should plan to take a look at the IAM policies for your buckets in order to make sure that they will work as expected in conjunction with IPv6 addresses.

To learn more, read about Making Requests to Amazon S3 over IPv6.

Don’t Forget to Test
As a reminder, if IPv6 connectivity to any AWS region is limited or non-existent, IPv4 will be used instead. Also, as I noted in my earlier post, the client system can be configured to support IPv6 but connected to a network that is not configured to route IPv6 packets to the Internet. Therefore, we recommend some application-level testing of end-to-end connectivity before you switch to IPv6.

Jeff;

 

Streaming Real-time Data into an S3 Data Lake at MeetMe

by Jeff Barr | on | in Amazon Kinesis, Amazon S3, Guest Post | | Comments

In today’s guest post, Anton Slutsky of MeetMe describes the implementation process for their Data Lake.

Jeff;


Anton Slutsky is an experienced information technologist with nearly two decades of experience in the field. He has an MS in Computer Science from Villanova University and a PhD in Information Science from Drexel University.

Modern Big Data systems often include structures called Data Lakes. In the industry vernacular, a Data Lake is a massive storage and processing subsystem capable of absorbing large volumes of structured and unstructured data and processing a multitude of concurrent analysis jobs. Amazon Simple Storage Service (S3) is a popular choice nowadays for Data Lake infrastructure as it provides a highly scalable, reliable, and low-latency storage solution with little operational overhead. However, while S3 solves a number of problems associated with setting up, configuring and maintaining petabyte-scale storage, data ingestion into S3 is often a challenge as types, volumes, and velocities of source data differ greatly from one organization to another.

In this blog, I will discuss our solution, which uses Amazon Kinesis Firehose to optimize and streamline large-scale data ingestion at MeetMe, which is a popular social discovery platform that caters to more than a million active daily users. The Data Science team at MeetMe needed to collect and store approximately 0.5 TB per day of various types of data in a way that would expose it to data mining tasks, business-facing reporting and advanced analytics. The team selected Amazon S3 as the target storage facility and faced a challenge of collecting the large volumes of live data in a robust, reliable, scalable and operationally affordable way.

The overall aim of the effort was to set up a process to push large volumes of streaming data into AWS data infrastructure with as little operational overhead as possible. While many data ingestion tools, such as Flume, Sqoop and others are currently available, we chose Amazon Kinesis Firehose because of its automatic scalability and elasticity, ease of configuration and maintenance, and out-of-the-box integration with other Amazon services, including S3, Amazon Redshift, and Amazon Elasticsearch Service.

Business Value / Justification
As it is common for many successful startups, MeetMe focuses on delivering the most business value at the lowest possible cost. With that, the Data Lake effort had the following goals:

  • Empowering business users with high-level business intelligence for effective decision making.
  • Enabling the Data Science team with data needed for revenue generating insight discovery.

When considering commonly used data ingestion tools, such as Scoop and Flume, we estimated that, the Data Science team would need to add an additional full-time BigData engineer in order to set up, configure, tune and maintain the data ingestion process with additional time required from engineering to enable support redundancy. Such operational overhead would increase the cost of the Data Science efforts at MeetMe and would introduce unnecessary scope to the team affecting the overall velocity.

Amazon Kinesis Firehose service alleviated many of the operational concerns and, therefore, reduced costs. While we still needed to develop some amount of in-house integration, scaling, maintaining, upgrading and troubleshooting of the data consumers would be done by Amazon, thus significantly reducing the Data Science team size and scope.

Configuring an Amazon Kinesis Firehose Stream
Kinesis Firehose offers the ability to create multiple Firehose streams each of which could be aimed separately at different S3 locations, Redshift tables or Amazon Elasticsearch Service indices. In our case, our primary goal was to store data in S3 with an eye towards other services mentioned above in the future.

Firehose delivery stream setup is a 3-step process. In Step 1, it is necessary to choose the destination type, which lets you define whether you want your data to end up in an S3 bucket, a Redshift table or an Elasticsearch index. Since we wanted the data in S3, we chose “Amazon S3” as the destination option. If S3 is chosen as the destination, Firehose prompts for other S3 options, such as the S3 bucket name. As described in the Firehose documentation, Firehose will automatically organize the data by date/time and the “S3 prefix” setting serves as the global prefix that will be prepended to all S3 keys for a given Firehose stream object. It is possible to change the prefix at a later date even on a live stream that is in the process of consuming data, so there is little need to overthink the naming convention early on.

The next step of the process is the stream configuration. Here, it is possible to override various defaults and specify other meaningful values. For example, selecting GZIP compression instead of the uncompressed default will greatly reduce the S3 storage footprint and, consequently, the S3 costs. Enabling Data encryption will encrypt the data at rest, which is important for sensitive data.

One important note is that the choice of the compression algorithm affects the resulting filenames (S3 keys) for the stream objects. Therefore, while it is possible to change these settings later on a live stream, it may be prudent to settle on the compression/encryption approach early to avoid possible issues with processing scripts.

As mentioned in Amazon Kinesis Firehose Limits, Kinesis Firehose has a set of default throughput quotas. Once those quotas are exceeded, Firehose will respond with an error message “ServiceUnavailableException: Slow down.” and will drop data. Therefore, in order to avoid data loss, it is important to estimate individual throughput requirements. If those requirements are likely to exceed the default quotas, it is possible to request additional throughput by submitting a limit increase request as described in the limits.

The final step (not shown) is to review the desired configuration and create the stream.

Setting up the Upload Process
At MeetMe, RabbitMQ serves as a service bus for most of the data that flows through the system. Therefore, the task of data collection for the most part amounts to consuming large volumes of RabbitMQ messages and uploading them to S3 with the help of Firehose streams. To accomplish this, we developed lightweight RabbitMQ consumers. While RabbitMQ consumers are implemented elsewhere (such as Flume), we opted to develop our own versions to enable integration with Firehose API.

Firehose provides two ways of upload data – single record and bulk. With the single record approach, each individual record is packaged into Amazon Firehose API framework objects and each object is serialized to a Firehose end-point via HTTP/Rest. While this approach may be appropriate for some applications, we achieved better performance by using the bulk API methods. The bulk methods allow up to 500 records to be uploaded to Firehose with a single request.

To upload a batch of messages, the lightweight RabbitMQ consumer maintains a small internal buffer, which gets serialized to Firehose as often as possible by a predefined set of processor threads. Here’s the code:

new Thread(new Runnable()
{
  public void run()
  {
    logger.info("Kinesis writer thread started.  Waiting for records to process...");
    while(true)
    {
      try
      {
        if(!recordsQueue.isEmpty())
        {
           if(logger.isDebugEnabled())
             logger.debug("Uploading current batch to AWS: "+recordsQueue.size());
        
           List<MMMessage> records = recordsQueue;
           recordsQueue = new CopyOnWriteArrayList<MMMessage>();
        
           final int uploadThreshold = 499;
        
           List<Record> buffer = new ArrayList<Record>(uploadThreshold);
        
           for(int i = 0; i < records.size(); i++)
           {
             // Get a proprietary message object from an internal queue
             MMessage mmmessage = records.get(i);
                 
             // Get the bytes
             String message = new String(mmmessage.body, "UTF-8");
                 
             // Check for new line and tab characters in data to avoid
             // issues with Hadoop/Spark processing line-based processing
             // later on
             message = CharMatcher.anyOf("\n").replaceFrom(message, "\\n");
             message = CharMatcher.anyOf("\t").replaceFrom(message, "\\t");
 
             // Wrap the message bytes with Amazon Firehose API wrapper    
             Record record = new Record().withData(ByteBuffer.wrap(message.getBytes()));
 
             buffer.add(record);
                 
             // If the current buffer is large enough,
             if(buffer.size() == uploadThreshold)
             {
               // send it to Firehose
               uploadBuffer(buffer);
               // and instantiate a new buffer
               buffer = new ArrayList<Record>(uploadThreshold);
             }
           }
           // don't forget to upload last buffer!
           uploadBuffer(buffer);                                
         }
       }
       catch(Exception e)
      {
        logger.error("Error in sending to Kinesis:"+e.getMessage(), e);
      }
    }
  }
}).start();

The uploadBuffer method is a simple wrapper over the bulk upload Firehose API:

private void uploadBuffer(final List<Record> buffer)
{
  // Make a new request object
  PutRecordBatchRequest request = new PutRecordBatchRequest();
  // Specify the stream name
  request.setDeliveryStreamName("MEETME_STREAM");
        
  // Set the data buffer
  request.setRecords(buffer);
 
  // Attempt to send to Firehose
  PutRecordBatchResult result = getAmazonClient().putRecordBatch(request);
        
  // Always check for failures!
  Integer failed = result.getFailedPutCount();
  if (failed != null && failed.intValue() > 0)
  {
    // If there are failures, find out what caused them
    logger.warn("AWS upload of [" + buffer.size() + "] resulted in " + failed + " failures");
                 
    // Dig into the responses to see if there are various kinds of failures
    List<PutRecordBatchResponseEntry> res = result.getRequestResponses();
    if (res != null)
    {
      for (PutRecordBatchResponseEntry r : res)
      {
        if (r != null)
        {
          logger.warn("Report " + r.getRecordId() + ", " + r.getErrorCode() + ", " + r.getErrorMessage()
                      + ", " + r.toString());
        }
        else
        {
          logger.warn("Report NULL");
        }
      }
    }
    else
    {
      logger.warn("BatchReport NULL");
    }
  }
}

Monitoring Firehose Streams
Once the Firehose streams are set up and internal consumer processes begin to send data, a common task is to attempt to monitor the data flow. Some of the reasons for paying attention to data flow are data volume considerations, potential error conditions, capturing failures among many others. With Amazon Firehose, monitoring is accomplished with the help of Amazon CloudWatch. Common delivery stream metrics are available under the Monitoring tab in each Firehose stream configuration with additional metrics available through the CloudWatch Console.

While AWS provides an extensive set of monitoring facilities, in our experience it turned out to be important to carefully monitor internal data producer logs for errors. Such close monitoring using the syslog facility, Splunk, and other log monitoring tools allowed us to capture and fix specific errors and reduce the number of individual record failures to tolerable levels. Further, internal log monitoring allowed us to recognize early that our volumes were quickly exceeding default Firehose throughput quotas (see above).

Anton Slutsky, Director of Data Science, MeetMe

Now Available – IPv6 Support for Amazon S3

by Jeff Barr | on | in Amazon S3, Launch | | Comments

As you probably know, every server and device that is connected to the Internet must have a unique IP address. Way back in 1981, RFC 791 (“Internet Protocol”) defined an IP address as a 32-bit entity, with three distinct network and subnet sizes (Classes A, B, and C – essentially large, medium, and small) designed for organizations with requirements for different numbers of IP addresses. In time, this format came to be seen as wasteful and the more flexible CIDR (Classless Inter-Domain Routing) format was standardized and put in to use. The 32-bit entity (commonly known as an IPv4 address) has served the world well, but the continued growth of the Internet means that all available IPv4 addresses will ultimately be assigned and put to use.

In order to accommodate this growth and to pave the way for future developments, networks, devices, and service providers are now in the process of moving to IPv6. With 128 bits per IP address, IPv6 has plenty of address space (according to my rough calculation, 128 bits is enough to give 3.5 billion IP addresses to every one of the 100 octillion or so stars in the universe). While the huge address space is the most obvious benefit of IPv6, there are other more subtle benefits as well. These include extensibility, better support for dynamic address allocation, and additional built-in support for security.

Today I am happy to announce that objects in Amazon S3 buckets are now accessible via IPv6 addresses via new “dual-stack” endpoints. When a DNS lookup is performed on an endpoint of this type, it returns an “A” record with an IPv4 address and an “AAAA” record with an IPv6 address. In most cases the network stack in the client environment will automatically prefer the AAAA record and make a connection using the IPv6 address.

Accessing S3 Content via IPv6
In order to start accessing your content via IPv6, you need to switch to new dual-stack endpoints that look like this:

https://BUCKET.s3.dualstack.REGION.amazonaws.com

or this:

https://s3.dualstack.REGION.amazonaws.com/BUCKET

If you are using the AWS Command Line Interface (CLI) or the AWS Tools for Windows PowerShell read Using Dual-Stack endpoints from the AWS CLI for more info. If you are using one of the AWS SDKs, read Using Dual-Stack Endpoints from the AWS SDKs.

Things to Know
Here are some things that you need to know in order to make a smooth transition to IPv6:

Bucket and IAM Policies – If you use policies to grant or restrict access via IP address, update them to include the desired IPv6 ranges before you switch to the new endpoints. If you don’t do this, clients may incorrectly gain or lose access to the AWS resources. Update any policies that exclude access from certain IPv4 addresses by adding the corresponding IPv6 addresses.

IPv6 Connectivity – Because the network stack will prefer an IPv6 address to an IPv4 address, an unusual situation can arise under certain circumstances. The client system can be configured for IPv6 but connected to a network that is not configured to route IPv6 packets to the Internet. Be sure to test for end-to-end connectivity before you switch to the dual-stack endpoints.

Log Entries – Log entries will include the IPv4 or IPv6 address, as appropriate. If you analyze your log files using internal or third-party applications, you should ensure that they are able to recognize and process entries that include an IPv6 address.

S3 Feature Support – IPv6 support is available for all S3 features with the exception of Website Hosting, S3 Transfer Acceleration, and access via BitTorrent.

Region Support – IPv6 support is available in all commercial AWS Regions and in AWS GovCloud (US). It is not available in the China (Beijing) Region.

Jeff;

Amazon RDS for SQL Server – Support for Native Backup/Restore to Amazon S3

by Jeff Barr | on | in Amazon RDS, Amazon S3 | | Comments

Regular readers of this blog will know that I am a big fan of Amazon Relational Database Service (RDS). As a managed database service, it takes care of the more routine aspects of setting up, running, and scaling a relational database.

We first launched support for SQL Server in 2012. Continuing our effort to add features that have included SSL supportmajor version upgradestransparent data encryption, enhanced monitoring and Multi-AZ, we have now added support for SQL Server native backup/restore.

SQL Server native backups include all database objects: tables, indexes, stored procedures and triggers. These backups are commonly used to migrate databases between different SQL Server instances running on-premises or in the cloud. They can be used for data ingestion, disaster recovery, and so forth. The native backups also simplify the process of importing data and schemas from on-premises SQL Server instances, and will be easy for SQL Server DBAs to understand and use.

Support for Native Backup/Restore
You can now take native SQL Server database backups from your RDS instances and store them in an Amazon S3 bucket. Those backups can be restored to an on-premises copy of SQL Server or to another RDS-powered SQL Server instance.  You can also copy backups of your on-premises databases to S3 and then restore them to an RDS SQL Server instance. SQL Server Native Backup/Restore with Amazon S3 also supports backup encryption using AWS Key Management Service (KMS) across all SQL Server editions. Storing and transferring backups in and out of AWS through S3 provides you with another option for disaster recovery.

You can enable this feature by adding the SQL_SERVER_BACKUP_RESTORE option to an option group and associating the option group with your RDS SQL Server instance. This option must also be configured with your S3 bucket information and can include a KMS key to encrypt the backups.

Start by finding the desired option group:

Then add the SQL_SERVER_BACKUP_RESTORE option, specify (or create) an IAM role to allow RDS to access S3, point to a bucket, and (if you want) specify and configure encryption:

After you have set this up,  you can use SQL Server Management Studio to connect to the database instance and invoke the following stored procedures (available within the msdb database) as needed:

  • rds_backup_database – Back up a single database to an S3 bucket.
  • rds_restore_database – Restore a single database from S3.
  • rds_task_status – Track running backup and restore tasks.
  • rds_cancel_task – Cancel a running backup or restore task.

To learn more, take a look at Importing and Exporting SQL Server Data.

Now Available
SQL Server Native Backup/Restore is now available in the US East (Northern Virginia), US West (Oregon), EU (Ireland), EU (Frankfurt), Asia Pacific (Sydney), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Mumbai), and South America (São Paulo) Regions. There are no additional charges for using this feature with Amazon RDS for SQL Server, however, the usage of the Amazon S3 storage will be billed at regular rates.

Jeff;