Category: Amazon DynamoDB

AWS Online Tech Talks – September 2017

As a school supply aficionado, the month of September has always held a special place in my heart. Nothing sets the tone for success like getting a killer deal on pens and a crisp college ruled notebook. Even if back to school shopping trips have secured a seat in your distant memory, this is still a perfect time of year to stock up on office supplies and set aside some time for flexing those learning muscles. A great way to get started: scan through our September Tech Talks and check out the ones that pique your interest. This month we are covering re:Invent, AI, and much more.

September 2017 – Schedule

Noted below are the upcoming scheduled live, online technical sessions being held during the month of September. Make sure to register ahead of time so you won’t miss out on these free talks conducted by AWS subject matter experts.

Webinars featured this month are:

Monday, September 11


9:00 – 9:40 AM PDT: What’s New with Amazon DynamoDB


10:30 – 11:10 AM PDT: Local Testing and Deployment Best Practices for Serverless Applications


12:00 – 12:40 PM PDT: Managing Secrets for Containers with Amazon ECS


Tuesday, September 12


9:00 – 9:40 AM PDT: Get Ready for re:Invent 2017 Content Overview


10:30 – 11:10 AM PDT: Deep Dive on User Sign-up and Sign-in with Amazon Cognito

Management Tools

12:00 – 12:40 PM PDT: Using CloudTrail to Enhance Compliance and Governance of S3


Wednesday, September 13

Big Data

9:00 – 9:40 AM PDT: Best Practices for Processing Managed Hadoop Workloads


10:30 – 11:10 AM PDT: Migrating Your Oracle Database to PostgreSQL


12:00 – 12:40 PM PDT: Configuration Management in the Cloud


Thursday, September 14

Big Data

9:00 – 9:40 AM PDT: Tackle Your Dark Data Challenge with AWS Glue


10:30 – 11:10 AM PDT: Deep Dive on MySQL Databases on AWS


12:00 – 12:40 PM PDT: Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Workflows


Tuesday, September 26


9:00 – 9:40 AM PDT: An Overview of AI on the AWS Platform

10:30 – 11:10 AM PDT: Introduction to Generative Adversarial Networks (GAN) with Apache MXNet


12:00 – 12:40 PM PDT: Revolutionizing Backup & Recovery Using Amazon S3


2:00 – 2:40 PM PDT: Securing Your Desktops with Amazon WorkSpaces


Wednesday, September 27

Security & Identity

9:00 – 9:40 AM PDT: Advanced DNS Traffic Management using Amazon Route 53


10:30 – 11:10 AM PDT: Deep Dive on Amazon EFS (with Encryption)

Hands on Lab

12:30 – 2:00 PM PDT: Hands on Lab: Windows Workloads


Thursday, September 28

Security & Identity

9:00 – 9:40 AM PDT: How to use AWS WAF to Mitigate OWASP Top 10 attacks


10:30 – 11:10 AM PDT: AWS Greengrass Technical Deep Dive with Demo

Hands on Lab

1:00 – 1:40 PM PDT: Design, Deploy, and Optimize SQL Server on AWS


The AWS Online Tech Talks series covers a broad range of topics at varying technical levels. These sessions feature live demonstrations & customer examples led by AWS engineers and Solution Architects. Check out the AWS YouTube channel for more on-demand webinars on AWS technologies.

– Sara

New – VPC Endpoints for DynamoDB

Starting today Amazon Virtual Private Cloud (VPC) Endpoints for Amazon DynamoDB are available in all public AWS regions. You can provision an endpoint right away using the AWS Management Console or the AWS Command Line Interface (CLI). There are no additional costs for a VPC Endpoint for DynamoDB.

Many AWS customers run their applications within a Amazon Virtual Private Cloud (VPC) for security or isolation reasons. Previously, if you wanted your EC2 instances in your VPC to be able to access DynamoDB, you had two options. You could use an Internet Gateway (with a NAT Gateway or assigning your instances public IPs) or you could route all of your traffic to your local infrastructure via VPN or AWS Direct Connect and then back to DynamoDB. Both of these solutions had security and throughput implications and it could be difficult to configure NACLs or security groups to restrict access to just DynamoDB. Here is a picture of the old infrastructure.

Creating an Endpoint

Let’s create a VPC Endpoint for DynamoDB. We can make sure our region supports the endpoint with the DescribeVpcEndpointServices API call.

aws ec2 describe-vpc-endpoint-services --region us-east-1
    "ServiceNames": [

Great, so I know my region supports these endpoints and I know what my regional endpoint is. I can grab one of my VPCs and provision an endpoint with a quick call to the CLI or through the console. Let me show you how to use the console.

First I’ll navigate to the VPC console and select “Endpoints” in the sidebar. From there I’ll click “Create Endpoint” which brings me to this handy console.

You’ll notice the AWS Identity and Access Management (IAM) policy section for the endpoint. This supports all of the fine grained access control that DynamoDB supports in regular IAM policies and you can restrict access based on IAM policy conditions.

For now I’ll give full access to my instances within this VPC and click “Next Step”.

This brings me to a list of route tables in my VPC and asks me which of these route tables I want to assign my endpoint to. I’ll select one of them and click “Create Endpoint”.

Keep in mind the note of warning in the console: if you have source restrictions to DynamoDB based on public IP addresses the source IP of your instances accessing DynamoDB will now be their private IP addresses.

After adding the VPC Endpoint for DynamoDB to our VPC our infrastructure looks like this.

That’s it folks! It’s that easy. It’s provided at no cost. Go ahead and start using it today. If you need more details you can read the docs here.

DynamoDB Accelerator (DAX) Now Generally Available

Earlier this year I told you about Amazon DynamoDB Accelerator (DAX), a fully-managed caching service that sits in front of (logically speaking) your Amazon DynamoDB tables. DAX returns cached responses in microseconds, making it a great fit for eventually-consistent read-intensive workloads. DAX supports the DynamoDB API, and is seamless and easy to use. As a managed service, you simply create your DAX cluster and use it as the target for your existing reads and writes. You don’t have to worry about patching, cluster maintenance, replication, or fault management.

Now Generally Available
Today I am pleased to announce that DAX is now generally available. We have expanded DAX into additional AWS Regions and used the preview time to fine-tune performance and availability:

Now in Five Regions – DAX is now available in the US East (Northern Virginia), EU (Ireland), US West (Oregon), Asia Pacific (Tokyo), and US West (Northern California) Regions.

In Production – Our preview customers are reporting that they are using DAX in production, that they loved how easy it was to add DAX to their application, and have told us that their apps are now running 10x faster.

Getting Started with DAX
As I outlined in my earlier post, it is easy to use DAX to accelerate your existing DynamoDB applications. You simply create a DAX cluster in the desired region, update your application to reference the DAX SDK for Java (the calls are the same; this is a drop-in replacement), and configure the SDK to use the endpoint to your cluster. As a read-through/write-through cache, DAX seamlessly handles all of the DynamoDB read/write APIs.

We are working on SDK support for other languages, and I will share additional information as it becomes available.

DAX Pricing
You pay for each node in the cluster (see the DynamoDB Pricing page for more information) on a per-hour basis, with prices starting at $0.269 per hour in the US East (Northern Virginia) and US West (Oregon) regions. With DAX, each of the nodes in your cluster serves as a read target and as a failover target for high availability. The DAX SDK is cluster aware and will issue round-robin requests to all nodes in the cluster so that you get to make full use of the cluster’s cache resources.

Because DAX can easily handle sudden spikes in read traffic, you may be able to reduce the amount of provisioned throughput for your tables, resulting in an overall cost savings while still returning results in microseconds.



Box Platform on AWS Marketplace – Lambda Blueprints & Sample Code

Box is a cloud-based file sharing and content management system, with an API that recently became available in AWS Marketplace (Box Platform – Cloud Content Management APIs). With an array of features for collaboration and an emphasis on security, Box has found a home in many enterprises (see their success stories page for a list).

The Box API allows developers to build content experiences into web and mobile apps. Today I would like to tell you about some AWS Lambda blueprints and templates that will help you to build AWS applications that use this API to simplify user authentication and to add metadata to newly uploaded content. The templates are based on the Box Node Lambda Sample and should be a robust starting point for your own development.

Let’s take a look at the blueprints and then review some handy blog posts written by our friends at Box.

Box Blueprints for Lambda
The blueprints show you how to call the Box APIS and to connect a Box webhook to a Lambda function via Amazon API Gateway. To find them, simply open up the Lambda Console and search for box:

The first blueprint uses security credentials stored in the BOX_CONFIG environment variable. You can set the variable from within the Lambda Console:

The code in this blueprint retrieves and logs the Box User object for the user identified by the credentials.

The second blueprint implements a Box webhook that sits behind an API Gateway endpoint. It accepts requests, validates them, and logs them to Amazon CloudWatch:

Handy Blog Posts
The developer relations team at Box has written some blog posts that show you how to use Box in conjunction with several AWS services:

Manage User Authentication with Box Platform using Amazon Cognito – This post shows you how to use Amazon Cognito to power a login page for your app users. Cognito will handle authentication and user pool management and the code outlined in the blog post will create an App User in Box the first time the user logs in. The code is available as box-node-cognito-lambdas-sample on GitHub.

Add Deep Learning-based Image Recognition to your Box App with Amazon Rekognition – This post shows you how to build an image tagging application that is powered by Amazon Rekognition. Users take and upload photos, which are automatically labeled with metadata that that is stored in Amazon DynamoDB. The code is activated by a webhook when a file is uploaded. You can find the code in the box-node-rekognition-webhook on GitHub.

Thanks to our friends at Box for taking the time to create these helpful developer resources!




New – Auto Scaling for Amazon DynamoDB

Amazon DynamoDB has more than one hundred thousand customers, spanning a wide range of industries and use cases. These customers depend on DynamoDB’s consistent performance at any scale and presence in 16 geographic regions around the world. A recent trend we’ve been observing is customers using DynamoDB to power their serverless applications. This is a good match: with DynamoDB, you don’t have to think about things like provisioning servers, performing OS and database software patching, or configuring replication across availability zones to ensure high availability – you can simply create tables and start adding data, and let DynamoDB handle the rest.

DynamoDB provides a provisioned capacity model that lets you set the amount of read and write capacity required by your applications. While this frees you from thinking about servers and enables you to change provisioning for your table with a simple API call or button click in the AWS Management Console, customers have asked us how we can make managing capacity for DynamoDB even easier.

Today we are introducing Auto Scaling for DynamoDB to help automate capacity management for your tables and global secondary indexes. You simply specify the desired target utilization and provide upper and lower bounds for read and write capacity. DynamoDB will then monitor throughput consumption using Amazon CloudWatch alarms and then will adjust provisioned capacity up or down as needed. Auto Scaling will be on by default for all new tables and indexes, and you can also configure it for existing ones.

Even if you’re not around, DynamoDB Auto Scaling will be monitoring your tables and indexes to automatically adjust throughput in response to changes in application traffic. This can make it easier to administer your DynamoDB data, help you maximize availability for your applications, and help you reduce your DynamoDB costs.

Let’s see how it works…

Using Auto Scaling
The DynamoDB Console now proposes a comfortable set of default parameters when you create a new table. You can accept them as-is or you can uncheck Use default settings and enter your own parameters:

Here’s how you enter your own parameters:

Target utilization is expressed in terms of the ratio of consumed capacity to provisioned capacity. The parameters above would allow for sufficient headroom to allow consumed capacity to double due to a burst in read or write requests (read Capacity Unit Calculations to learn more about the relationship between DynamoDB read and write operations and provisioned capacity). Changes in provisioned capacity take place in the background.

Auto Scaling in Action
In order to see this important new feature in action, I followed the directions in the Getting Started Guide. I launched a fresh EC2 instance, installed (sudo pip install boto3) and configured (aws configure) the AWS SDK for Python. Then I used the code in the Python and DynamoDB section to create and populate a table with some data, and manually configured the table for 5 units each of read and write capacity.

I took a quick break in order to have clean, straight lines for the CloudWatch metrics so that I could show the effect of Auto Scaling. Here’s what the metrics look like before I started to apply a load:

I modified the code in Step 3 to continually issue queries for random years in the range of 1920 to 2007, ran a single copy of the code, and checked the read metrics a minute or two later:

The consumed capacity is higher than the provisioned capacity, resulting in a large number of throttled reads. Time for Auto Scaling!

I returned to the console and clicked on the Capacity tab for my table. Then I clicked on Read capacity, accepted the default values, and clicked on Save:

DynamoDB created a new IAM role (DynamoDBAutoscaleRole) and a pair of CloudWatch alarms to manage the Auto Scaling of read capacity:

DynamoDB Auto Scaling will manage the thresholds for the alarms, moving them up and down as part of the scaling process. The first alarm was triggered and the table state changed to Updating while additional read capacity was provisioned:

The change was visible in the read metrics within minutes:

I started a couple of additional copies of my modified query script and watched as additional capacity was provisioned, as indicated by the red line:

I killed all of the scripts and turned my attention to other things while waiting for the scale-down alarm to trigger. Here’s what I saw when I came back:

The next morning I checked my Scaling activities and saw that the alarm had triggered several more times overnight:

This was also visible in the metrics:

Until now, you would prepare for this situation by setting your read capacity well about your expected usage, and pay for the excess capacity (the space between the blue line and the red line). Or, you might set it too low, forget to monitor it, and run out of capacity when traffic picked up. With Auto Scaling you can get the best of both worlds: an automatic response when an increase in demand suggests that more capacity is needed, and another automated response when the capacity is no longer needed.

Things to Know
DynamoDB Auto Scaling is designed to accommodate request rates that vary in a somewhat predictable, generally periodic fashion. If you need to accommodate unpredictable bursts of read activity, you should use Auto Scaling in combination with DAX (read Amazon DynamoDB Accelerator (DAX) – In-Memory Caching for Read-Intensive Workloads to learn more). Also, the AWS SDKs will detect throttled read and write requests and retry them after a suitable delay.

I mentioned the DynamoDBAutoscaleRole earlier. This role provides Auto Scaling with the privileges that it needs to have in order for it to be able to scale your tables and indexes up and down. To learn more about this role and the permissions that it uses, read Grant User Permissions for DynamoDB Auto Scaling.

Auto Scaling has complete CLI and API support, including the ability to enable and disable the Auto Scaling policies. If you have some predictable, time-bound spikes in traffic, you can programmatically disable an Auto Scaling policy, provision higher throughput for a set period of time, and then enable Auto Scaling again later.

As noted on the Limits in DynamoDB page, you can increase provisioned capacity as often as you would like and as high as you need (subject to per-account limits that we can increase on request). You can decrease capacity up to nine times per day for each table or global secondary index.

You pay for the capacity that you provision, at the regular DynamoDB prices. You can also purchase DynamoDB Reserved Capacity to further savings.

Available Now
This feature is available now in all regions and you can start using it today!


Amazon DynamoDB Accelerator (DAX) – In-Memory Caching for Read-Intensive Workloads

I’m fairly sure that you already know about Amazon DynamoDB. As you probably know, it is a managed NoSQL database that scales to accommodate as much table space, read capacity, and write capacity as you need. With response times measured in single-digit milliseconds, our customers are using DynamoDB for many types of applications including adtech, IoT, gaming, media, online learning, travel, e-commerce, and finance. Some of these customers store more than 100 terabytes in a single DynamoDB table and make millions of read or write requests per second. The Amazon retail site relies on DynamoDB and uses it to withstand the traffic surges associated with brief, high-intensity events such as Black Friday, Cyber Monday, and Prime Day.

While DynamoDB’s ability to deliver fast, consistent performance benefits just about any application and workload, there’s always room to do even better. The business value of some workloads (gaming and adtech come to mind, but there are many others) is driven by low-latency, high-performance database reads. The ability to pull data from DynamoDB as quickly as possible leads to faster & more responsive games or ads that drive the highest click-through rates.

Amazon DynamoDB Accelerator
In order to support demanding, read-heavy workloads, we are launching a public preview of the Amazon DynamoDB Accelerator, otherwise known as DAX.

DAX is a fully managed caching service that sits (logically) in front of your DynamoDB tables. It operates in write-through mode, and is API-compatible with DynamoDB. Responses are returned from the cache in microseconds, making DAX a great fit for eventually-consistent read-intensive workloads. DAX is seamless and easy to use. As a managed service, you simply create your DAX cluster and use it as the target for your existing reads and writes. You don’t have to worry about patching, cluster maintenance, replication, or fault management.

Each DAX cluster can contain 1 to 10 nodes; you can add nodes in order to increase overall read throughput. The cache size (also known as the working set) is based on the node size (dax.r3.large to dax.r3.8xlarge) that you choose when you create the cluster. Clusters run within a VPC, with nodes spread across Availability Zones.

You will need to use the DAX SDK for Java to communicate with DAX. This SDK communicates with your cluster using a low-level TCP interface that is fine-tuned for low latency and high throughput (we’ll support access to DAX through other languages as quickly as possible).

Creating a DAX Cluster
Let’s create a DAX cluster from the DynamoDB Console (API and CLI support is also available). I open up the console and click on Create cluster to get started:

I enter a name and description, choose a node type, and set the initial size of my cluster. Then I create an IAM role and policy that gives DAX permission to access my DynamoDB tables (I can also choose an existing role):

The console allows me to create a policy that grants access to a single table. I add additional tables to the policy using the IAM Console.

Next, I create a subnet group that DAX uses to place cluster nodes. I name the group and choose the desired subnets:

I accept the default settings and then click on Launch cluster:

My cluster is ready to use within minutes:

The next step is to update my application to use the DAX SDK for Java and to configure it to use the endpoint of my cluster ( in this case).

Once my application is up and running, I can visit the Metrics tab to see how well the cache is performing. The Amazon CloudWatch metrics include cache hits and misses, request counts, error counts, and so forth:

I can use the Alarms tab to create a CloudWatch Alarm for any of the metrics. Perhaps I want to know if an excessive number of cache misses are taking place:

I can use the Nodes tab to see the nodes in my cluster. I can also add new nodes or delete existing ones:

In order to see how DAX works, I installed the DAX Sample Application and ran it twice. The first run accessed DynamoDB directly and demonstrated the non-cached, baseline performance:

As you can see from the middle group of results, the queries ran in 2.9 to 11.3 milliseconds. The second run used DAX and showed the effect of caching on performance:

The first iteration of each test results in a cache miss. The subsequent iterations retrieve the results from the cache, and are (as you can see) quite a bit faster.

Things to Know
Here are a few things to keep in mind as you think about how to put DAX to use in your environment:

Java API – As I mentioned earlier, we are launching this public preview with support for Java, with plans to add support for other languages. DAX is API-compatible with DynamoDB so there’s no need to write your own caching logic or make changes to your code.

Consistency – DAX offers the best opportunity for performance gains when you are using eventually consistent reads that can be served from the in-memory cache (DAX always refers back to the DynamoDB table when processing consistent reads).

Write-Throughs – DAX is a write-through cache. However, if there is a weak correlation between what you read and what you write, you may want to direct your writes to DynamoDB. This will allow DAX to be of greater assistance for your reads.

Deprovisioning – After you have put DAX to use in your environment, you should be able to reduce the amount of read capacity provisioned for the underlying tables. This will reduce your costs (dramatically in many cases), while allowing DAX to provide spare capacity for sudden surges in usage.

Available Now
The public preview of DAX is available today in the US East (Northern Virginia), US West (Oregon), and EU (Ireland) Regions and you can sign up today. You can use the public preview at no charge and you can also learn more by reading the DAX Developer Guide.




New – Manage DynamoDB Items Using Time to Live (TTL)

AWS customers are making great use of Amazon DynamoDB. They love the speed and flexibility and build Ad Tech (reference architecture), Gaming (reference architecture), IoT (reference architecture), and other applications that take advantage of the consistent, single-digit millisecond latency. They also love the fact that DynamoDB is a managed, serverless database that scales to handle millions of requests per second to tables that are many terabytes in size.

Many DynamoDB users store data that has a limited useful life or is accessed less frequently over time. Some of them track recent logins, trial subscriptions, or application metrics. Others store data that is subject to regulatory or contractual limitations on how long it can be stored. Until now, these customers implemented their own time-based data management. At scale, this sometimes meant that they ran a couple of Amazon Elastic Compute Cloud (EC2) instances that did nothing more than scan DynamoDB items, check date attributes, and issue delete requests for items that were no longer needed. This added cost and complexity to their application.

New Time to Live (TTL) Management
In order to streamline this popular and important use case, we are launching a new Time to Live (TTL) feature today. You can enable this feature on a table-by-table basis, specifying an item attribute that contains the expiration time for the item.

Once the attribute has been specified and TTL management has been enabled (a single API call takes care of both operations), DynamoDB will find and delete items that have expired. This processing takes place automatically and in the background and does not affect read or write traffic to the table.

You can use DynamoDB streams (see DynamoDB Update – Triggers (Streams + Lambda) + Cross-Region Replication App for more info) to process or archive the actual deletions. Like other update records in a stream, the deletions are available on a rolling 24-hour basis. You can move the expired items to cold storage, log them, or update other tables using AWS Lambda and DynamoDB Triggers.

Here’s how you enable TTL for a table and specify the desired attribute:

The attribute must be in DynamoDB’s Number data type, and is interpreted as seconds per the Unix Epoch time system.

As you can see from the screen shot above, you can also enable DynamoDB Streams, and you can look at a preview of the items that will be deleted when you enable TTL.

You can also call the UpdateTimeToLive function from your code, or you can use the update-time-to-live command from the AWS Command Line Interface (CLI).

AWS customer TUNE is already making good use of this feature as part of their HasOffers product.


HasOffers helps customers to analyze the effectiveness of their marketing campaigns, storing massive amounts of ad engagement data in the process. Once the customer-defined time window for the campaign has passed, the data is no longer needed and can be deleted. Before we made the TTL feature available to TUNE, they manually identified and then deleted the stale data. This was labor and compute-intensive, and also consumed some of the provisioned throughput for the table.

Now, they simply set an expiration time for each item and leave the rest to DynamoDB. The stale data disappears automatically, with no impact on the available throughput. As a result, TUNE has been able to purge 85 terabytes of stale data and has reduced their costs by over $200K per year, while also simplifying their application logic.

Things to Know
Here are a couple of things to keep in mind as you are thinking about putting TTL to use in your application.

TTL Attribute – The TTL attribute can be indexed or projected, but it cannot be an element of a JSON document. As I indicated earlier, it must have the Number data type. You can use IAM to regulate access to this attribute, just as you can do for any other one. Items that do not have the designated TTL attribute will not be considered for deletion. In order to avoid a possible accidental deletion due to a malformed TTL value, items that appear to be older than 5 years will not be deleted.

Tables – You can apply a TTL to a new or an existing table. The process of enabling TTL for a table can take up to an hour, and you can only make one change per table at a time.

Background Processing – The scans and the deletions take place in the background and do not count against the provisioned throughput. Deletion times will vary based on the number and nature of the expired items. After the expiration but before the actual deletion, the items remain in the table and will appear in reads and scans.

Indexes – Items are removed from any Local Secondary Indexes immediately, and from Global Secondary Indexes in the usual eventually consistent fashion.

Pricing – There is no charge for the internal scan operation or for the deletion. You will pay for storage until the item is actually deleted.

Available Now
This feature is available now and you can start using it today! To learn more, read about Time to Live in the DynamoDB Developer Guide.



Genome Engineering Applications: Early Adopters of the Cloud

Our friends at the Commonwealth Scientific and Industrial Research Organization (CSIRO) in Australia sent along the guest post below to tell us about how AWS powers an important new genome editing technique.

— Jeff


Recent developments in molecular engineering technology now enables the accurate editing of genomes. The new technology, called CRISPR-Cas9, can be programmed to recognize and edit specific locations in the genome by pattern-matching unique sequences of DNA. While this is a powerful new tool for researchers, the ability to scan and identify targets across the entire genome has created unprecedented demand for large-scale computation. Earlier this year, the US National Institutes of Health (NIH) has approved the use of these technologies for human health. This has the potential to revolutionize cancer treatments and also adds a new time-critical dimension to the compute requirements.

A New Approach to Cancer Treatments
Approximately two in five people will be diagnosed with cancer at some point during their lifetime and while overall cancer survival has doubled, there are still cancer types with very low survival rate, for example just 1% for pancreatic cancer. This is mainly due to the difficulty of finding therapeutic interventions that kill cancer cells but not harm the healthy tissue in the body.

The new NIH approved trial will leverage breakthroughs in the genome editing technology, CRISPR-Cas9, to develop a different treatment approach. In this, the patient’s own immune system is boosted through specific modifications of the cells that natively fight cancer. This has the potential of being effective for a wide range of different tumors, with the current trial including patients with specific blood and solid cancers, as well as melanoma.

Cloud Services for Computationally Guided Genome Engineering
This new application in human health requires an increase in robustness and efficiency of CRISPR-Cas9 design in order to meet the time constraints of clinical care. Built on AWS cloud-services, researchers in the eHealth program of the Commonwealth Scientific and Industrial Research Organization (CSIRO) in Australia, developed GT-Scan2, a novel software tool to address this issue.

“Compared to other available methods, GT-Scan2 identifies genomic location with higher sensitivity and specificity,” says Dr. Denis Bauer who is leading the transformational bioinformatics team.

GT-Scan2 shows the identified CRISPR target sites at the genomic position and annotates them with high or low activity as well as their off-target potential.

GT-Scan2 improves the effectiveness of the system by finding sites that are unique in the genome. This avoids diluting the effect due to “off-target”, which are other sites in the genome with high sequence similarity. It also optimizes robustness by finding sites that are easier to modify.

“While it was known that the three-dimensional genome organization plays a role in CRISPR binding, GT-Scan2 is the first tool to also leverage other components that are crucial for Cas9 activity,” says Dr. Laurence Wilson whose research focuses on computational genome engineering.

Specifically the off-target search is a compute intensive task traditionally reserved for researchers at large institutes with high-performance-compute infrastructure as every location in the 3 billion letter long genomic sequence needs to be investigated. GT-Scan2 democratizes the ability to find optimal sites by offering this complex computation as a cloud-service using AWS Lambda functions.

Scaling Instantaneously for Personalized Treatments
GT-Scan2 leverages the instantaneous scalability that the event-driven AWS Lambda service offers. This is crucial for personalized treatment, as complexity of the targeted gene can vary dramatically.

“The off-target search as well as the robustness analysis can be subdivided into independent, modular tasks that can run in parallel” says Aidan O’Brien who designed and implemented the system within weeks after its official Asia-Pacific launch in April this year at the AWS Summit 2016 attesting to the intuitive nature of the service. A typical job takes less than a minute and the variation between jobs range from 1 second to 5 minutes. This fast fluctuation in load over minutes rather than hours ruled out an EC2-based solution as new instances would come online too slowly to keep the runtime stable.

GT-Scan2 is served directly from S3 making it a static web app without server-side processing. It retrieves the dynamic content (such as job results and parameters) via API calls using API Gateway from a database (DynamoDB) using a JavaScript framework.

When a user submits a job, GT-Scan2 inserts the job parameters as an item into a DynamoDB table via an API call. This allows the solution to be freely scalable without creating a bottleneck. The database entry triggers the first Lambda function, which finds all putative CRISPR targets in the user-specified DNA sequence (fetched automatically upon user submission). Potential CRISPR target sites have fixed rules and can be easily found using a regular expression that completes in seconds and are inserted into a second DynamoDB table.

Adapting to leverage the power of Lambda-based microservices

All potential targets need to be evaluated for their off-target risk using the efficient string matching tool, Bowtie. Though Bowtie only requires a reduced representation of the 3 billion letter genomic sequence, the sizes of these index files exceed the storage limitation for each Lambda instance. “GT-Scan2 divides the genome into smaller blocks to fit the Lambda specifications” explains Adrian White (Research & Technical Computing, APAC) who supported the CSIRO team during development. For an average run, GT-Scan2 hence triggers 500-1000 individual Lambda functions, which simultaneously update the scores for the different putative targets in DynamoDB. During this process, the frontend is polling this table via API Gateway and updating the webpage as results come in, eliminating the need for server-side compute.

“AWS’s Lambda has given us a great framework to develop a future-ready software package able to support medical genome engineering applications,” says Dr. Bauer. “We are specifically impressed with the ability to instantaneously scale at run time by spawning more Lambda functions to cope with the varying complexity of the different genes.” Other benefits Dr. Bauer quotes include only paying for storage during periods of no use and jobs not competing with web server resources as the website is a static page with dynamic content updated through Angular 2 and the API Gateway, as well as not needing to maintain compute instances (security patches of OS).

“One of the best things about Lambda is that users will be able to easily swap-in different machine learning algorithms that are better suited for specific CRISPR applications” says Dr. Wilson.

The GT-Scan2 Team, from left, Denis Bauer, Laurence Wilson, Aidan O’Brien

“The computational genome engineering community is one of the early adopters of our AWS Lambda technology,” explains Dr. Mia Champion (Technical Business Development Manager, Scientific Computing). “GT-Scan2’s use of API Gateway and DynamoDB is a very neat solution to ensure scalability and their clever use of epigenomics really sets them apart from other recent applications using lambda to perform CRISPR searches. I am looking forward to seeing GT-Scan2 adopted in medical applications.”

How Tokyu Hands Architected a Cost-Effective Shopping System with Amazon DynamoDB

I am a flâneur! I enjoy wandering around a new city, exploring the nooks and crannies, and figuring out what makes it unique and special. On one of my trips to Tokyo I was walking through Shibuya and found and amazing hobby store. The 8-floor building contained tools, supplies, and kits for almost every imaginable hobby. As you can read from the post below (written by my colleagues in Japan), this store, TOKYU HANDS, is now an AWS customer!


TOKYU HANDS improved its customer experience with an innovative retail point-of-sale and online shopping system. Here’s what Hideki Hasegawa-san (CTO) had to say about their new solution:

As a retailer, we are always conscious about costs and DynamoDB’s easy scalability make it cost-effective to operate. For a few hundred dollars a month, we get a fast, highly-available and scalable database. We don’t need to spend any money on expensive hardware or personnel to operate the database.

TOKYU HANDS is one of Japan’s biggest and most popular retailers. It is a one-stop shop for zakka, Japanese stationary items, creative do-it-yourself (DIY) solutions, and other home items. TOKYU HANDS operates 40 stores across Japan and Singapore. In addition to its retail locations, TOKYU HANDS operates an online store where customers can shop 24/7 using a computer or a mobile device. To keep pace with its rapid growth and new business opportunities, TOKYU HANDS operates a fast, flexible and scalable IT system built entirely on Amazon Web Services (AWS).

Prior to AWS, TOKYU HANDS’ IT systems were located in its on-premises data center. Operating and scaling the data center became a significant burden. So TOKYU HANDS decided to go all-in on AWS and migrated its applications from the on-premises data center to the cloud. Offloading infrastructure management to AWS allowed TOKYU HANDS to focus on delivering more value to its customers. Hasegawa-san, says “I like AWS because we can spend time and resources innovating for our customers, and not on infrastructure management. AWS offers a wide variety of fully managed services that makes it easy for us to architect our entire IT system. Amazon DynamoDB is one such service that is at the core of critical applications.”

The Challenge
TOKYU HANDS’ most important applications for customer experience are its e-commerce system and Point of Sale (POS) + Merchandising System. Its e-commerce system contains all of the business logic to keep the store running. Choosing the right database for the e-commerce system is the key to achieving the scalability, flexibility and customizability needed to match its pace of innovation. TOKYU HANDS’ POS + Merchandising System has a customer-facing application that processes customer orders at the store registers and for online purchases. Among other things, the POS + Merchandising System also keeps track of item inventory as well as customer purchase history. Rather than spending time on routine backend maintenance tasks, TOKYU HANDS wanted its software development team to focus on its customers’ experience by making the POS + Merchandising System better. With its previous architecture, developers spent an inordinate amount of time maintaining and operating the backend data store to support the POS + Merchandising System.

A telling example of the operational burden TOKYU HANDS endured was trying to scale the e-commerce system to handle traffic spikes during “Hands Messe” – an annual super sale similar to Black Friday. The Hands Messe sale generated several times more traffic than usual for TOKYU HANDS’ retail and online stores. In the past, scaling up the database system to handle the spike during Hands Messe was a time-consuming and difficult task. TOKYU HANDS spent a lot time adding, configuring, and operating hardware needed to handle the Hands Messe traffic. Often, they would experience node failures during the sale, resulting in system outages. Even when hardware didn’t fail, TOKYU HANDS experienced sub-optimal database performance. The net result was an inferior shopping experience for their customers and loss of revenue during the 2012 and 2013 seasons. For startups, such service interruptions erodes customer confidence. As a result of their painful experience in managing an on-premises database, TOKYU HANDS began searching for a fully-managed database optimized for availability and scalability.

Here is the Tokyu Hands dev team:

Back row: Hideki Hasegawa, Toshiharu Ozawa, Yoshimitsu Sugawara, Minoru Saito, and Taiji Inoue.

Front row: Seigo Miyoshi, Yusuke Usui, and Manami Osawa.

Burden-free and Cost-effective Operations with DynamoDB
“My first encounter with DynamoDB was at AWS re:Invent. Through presentations and conversations with solution architects, we learned about DynamoDB’s capabilities and were intrigued by its high availability, scalability and worry-free operations. When I learned that core applications for and other AWS services were using DynamoDB, I decided to give it a try”, said Hasegawa-san, explaining how they started using Amazon DynamoDB.

In just a few months, TOKYU HANDS was able to re-architect its e-commerce system to use Amazon DynamoDB. In 2014, TOKYU HANDS put the new e-commerce system to the test with its annual Hands Messe sale. Amazon DynamoDB’s multiple availability zone architecture ensured their tables were highly available. TOKYU HANDS did not have to worry about system down-time resulting from node failures. TOKYU HANDS was also able to handle traffic spikes easily by scaling up DynamoDB throughput just before the sale.  Unlike previous years, none of the database requests were rejected due to capacity constraints. After the sale ended, TOKYU HANDS dialed back throughput, thereby saving them money. TOKYU HANDS is considering purchasing reserved capacity for throughput in order to save even more.

Easy Application Development with Amazon DynamoDB
Using Amazon DynamoDB together with a host of other AWS services including Amazon S3, Amazon SQS, and Amazon SNS, TOKYU HANDS was able architect a brand new POS + Merchandising System. “Most of our team is made up of former clerks at our retail stores. They know our customers in and out, and we want them to use that knowledge to build powerful applications and not worry about infrastructure. With AWS, we are able to do just that”.

Here is a diagram of their architecture:

TOKYU HANDS has experienced three important benefits with AWS and Amazon DynamoDB:

  • Cost effective operations
  • Hands-off, worry-free operations
  • Easy development of high-availability applications

See how your business can leverage this fully managed AWS NoSQL service to achieve cost-effective scale by visiting our DynamoDB Getting Started and Developer Resources pages.  Then head over to the FAQs to learn how to leverage DynamoDB Streams and Triggers using serverless programming with AWS Lambda.

— AWS Japan Solution Architects


New CloudWatch Events – Track and Respond to Changes to Your AWS Resources

When you pull the curtain back on an AWS-powered application, you’ll find that a lot is happening behind the scenes. EC2 instances are launched and terminated by Auto Scaling policies in response to changes in system load, Amazon DynamoDB tables, Amazon SNS topics and Amazon SQS queues are created and deleted, and attributes of existing resources are changed from the AWS Management Console, the AWS APIs, or the AWS Command Line Interface (CLI).

Many of our customers build their own high-level tools to track, monitor, and control the overall state of their AWS environments. Up until now, these tools have worked in a polling fashion. In other words, they periodically call AWS functions such as DescribeInstances, DescribeVolumes, and ListQueues to list the AWS resources of various types (EC2 instances, EBS volumes, and SQS queues here) and to track their state. Once they have these lists, they need to call other APIs to get additional state information for each resources, compare it against historical data to detect changes, and then take action as they see fit. As their systems grow larger and more complex, all of this polling and state tracking can become onerous.

New CloudWatch Events
In order to allow you to track changes to your AWS resources with less overhead and greater efficiency, we are introducing CloudWatch Events today.

CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources. Using simple rules that you can set up in a couple of minutes, you can easily route each type of event to one or more targets: AWS Lambda functions, Amazon Kinesis streams, Amazon SNS topics, and built-in targets.

You can think of CloudWatch Events as the central nervous system for your AWS environment. It is wired in to every nook and cranny of the supported services, and becomes aware of operational changes as they happen. Then, driven by your rules,  it activates functions and sends messages (activating muscles, if you will) to respond to the environment, making changes, capturing state information, or taking corrective action.

We are launching CloudWatch Events with an initial set of AWS services and events today, and plan to support many more over the next year or so.

Diving in to CloudWatch Events
The three main components that you need to know about are events, rules, and targets.

Events (represented as small blobs of JSON) are generated in four ways. First, they arise from within AWS when resources change state. For example, an event is generated when the state of an EC2 instance changes from pending to running or when Auto Scaling launches an instance. Second, events are generated by API calls and console sign-ins that are delivered to Amazon CloudWatch Events via CloudTrail. Third, your own code can generate application-level events and publish them to Amazon CloudWatch Events for processing. Fourth, they can be issued on a scheduled basis, with options for periodic or Cron-style scheduling.

Rules match incoming events and route them to one or more targets for processing. Rules are not processed in any particular order; all of the rules that match an event will be processed (this allows disparate parts of a single organization to independently look for and process events that are of interest).

Targets process events and are specified within rules. There are four initial target types: built-in, Lambda functions, Kinesis streams, and SNS topics, with more types on the drawing board. A single rule can specify multiple targets. Each event is passed to each target in JSON form. Each rule has the opportunity to customize the JSON that flows to the target. They can elect to pass the event as-is, pass only certain keys (and the associated values) to the target, or to pass a constant (literal) string.

CloudWatch Events in Action
Let’s go ahead and set up a rule or two! I’ll use a simple Lambda function called SomethingHappened. It will simply log the contents of the event:

Next, I switch to the new CloudWatch Events Console, click on Create rule and choose an event source (here’s the menu with all of the choices):

Just a quick note before going forward. Some of the AWS services fire events directly. Others are fired based on the events logged to CloudTrail; you’ll need to enable CloudTrail for the desired service(s) in order to receive them.

I want to keep tabs on my EC2 instances, so I choose EC2 from the menu. I can choose to create a rule that fires on any state transition, or on a transition to one or more states that are of interest:

I want to know about newly launched instances, so I’ll choose Running. I can make the rule respond to any of my instances in the region, or to specific instances. I’ll go with the first option; here’s my pattern:

Now I need to make something happen. I do this by picking a target. Again, here are my choices:

I simply choose Lambda and pick my function:

I’m almost there! I just need to name and describe my rule, and then click on Create rule:

I click on Create Rule and the rule is all set to go:

Now I can test it by launching an EC2 instance. In fact, I’ll launch 5 of them just to exercise my code! After waiting a minute or so for the instances to launch and to initialize, I can check my Lambda metrics to verify that my function was invoked:

This looks good (the earlier invocations were for testing). Then I can visit the CloudWatch logs to view the output from my function:

As you can see, the event contains essential information about the newly launched instance. Your code can call AWS functions in order to learn more about what’s going on. For example, you could call DescribeInstances to access more information about newly launched instances.

Clearly, a “real” function would do something a lot more interesting. It could add some mandatory tags to the instance, update a dynamic visualization, or send me a text message via SNS. If you want to do any (or all of these things), you would need to have a more permissive IAM role for the function, of course. I could make the rule more general (or create another one) if  I wanted to capture some of the other state transitions.

Scheduled Execution of Rules
I can also set up a rule that fires periodically or according to a pattern described in a Cron expression. Here’s how I would do that:

You might find it interesting to know that this is the underlying mechanism used to set up scheduled Lambda jobs, as announced at AWS re:Invent.

API Access
Like most AWS services, you can access CloudWatch Events through an API. Here are some of the principal functions:

  • PutRule to create a new rule.
  • PutTargets and RemoveTargets to connect targets to rules, and to disconnect them.
  • ListRules, ListTargetsByRule, and DescribeRule to find out more about existing rules.
  • PutEvents to submit a set of events to CloudWatch events. You can use this function (or the CLI equivalent) to submit application-level events.

Metrics for Events
CloudWatch Events reports a number of metrics to CloudWatch, all within the AWS/Events namespace. You can use these metrics to verify that your rules are firing as expected, and to track the overall activity level of your rule collection.

The following metrics are reported for the service as a whole:

  • Invocations – The number of times that target have been invoked.
  • FailedInvocations – The number of times that an invocation of a target failed.
  • MatchedEvents – The number of events that matched one or more rules.
  • TriggeredRules – The number of rules that have been triggered.

The following metrics are reported for each rule:

  • Invocations – The number of times that the rule’s targets have been invoked.
  • TriggeredRules – The number of times that the rule has been triggered.

In the Works
Like many emerging AWS services, we are launching CloudWatch Events with an initial set of features (and a lot of infrastructure behind the scenes) and some really big plans, including AWS CloudFormation support. We’ll adjust our plans based on your feedback, but you can expect coverage of many more AWS services and access to additional targets over time. I’ll do my best to keep you informed.

Getting Started
We are launching CloudWatch Events in the US East (Northern Virginia), US West (Oregon), EU (Ireland), and Asia Pacific (Tokyo) regions. It is available now and you can start using it today!