Category: Amazon DynamoDB

DynamoDB Accelerator (DAX) Now Generally Available

Earlier this year I told you about Amazon DynamoDB Accelerator (DAX), a fully-managed caching service that sits in front of (logically speaking) your Amazon DynamoDB tables. DAX returns cached responses in microseconds, making it a great fit for eventually-consistent read-intensive workloads. DAX supports the DynamoDB API, and is seamless and easy to use. As a managed service, you simply create your DAX cluster and use it as the target for your existing reads and writes. You don’t have to worry about patching, cluster maintenance, replication, or fault management.

Now Generally Available
Today I am pleased to announce that DAX is now generally available. We have expanded DAX into additional AWS Regions and used the preview time to fine-tune performance and availability:

Now in Five Regions – DAX is now available in the US East (Northern Virginia), EU (Ireland), US West (Oregon), Asia Pacific (Tokyo), and US West (Northern California) Regions.

In Production – Our preview customers are reporting that they are using DAX in production, that they loved how easy it was to add DAX to their application, and have told us that their apps are now running 10x faster.

Getting Started with DAX
As I outlined in my earlier post, it is easy to use DAX to accelerate your existing DynamoDB applications. You simply create a DAX cluster in the desired region, update your application to reference the DAX SDK for Java (the calls are the same; this is a drop-in replacement), and configure the SDK to use the endpoint to your cluster. As a read-through/write-through cache, DAX seamlessly handles all of the DynamoDB read/write APIs.

We are working on SDK support for other languages, and I will share additional information as it becomes available.

DAX Pricing
You pay for each node in the cluster (see the DynamoDB Pricing page for more information) on a per-hour basis, with prices starting at $0.269 per hour in the US East (Northern Virginia) and US West (Oregon) regions. With DAX, each of the nodes in your cluster serves as a read target and as a failover target for high availability. The DAX SDK is cluster aware and will issue round-robin requests to all nodes in the cluster so that you get to make full use of the cluster’s cache resources.

Because DAX can easily handle sudden spikes in read traffic, you may be able to reduce the amount of provisioned throughput for your tables, resulting in an overall cost savings while still returning results in microseconds.



Box Platform on AWS Marketplace – Lambda Blueprints & Sample Code

Box is a cloud-based file sharing and content management system, with an API that recently became available in AWS Marketplace (Box Platform – Cloud Content Management APIs). With an array of features for collaboration and an emphasis on security, Box has found a home in many enterprises (see their success stories page for a list).

The Box API allows developers to build content experiences into web and mobile apps. Today I would like to tell you about some AWS Lambda blueprints and templates that will help you to build AWS applications that use this API to simplify user authentication and to add metadata to newly uploaded content. The templates are based on the Box Node Lambda Sample and should be a robust starting point for your own development.

Let’s take a look at the blueprints and then review some handy blog posts written by our friends at Box.

Box Blueprints for Lambda
The blueprints show you how to call the Box APIS and to connect a Box webhook to a Lambda function via Amazon API Gateway. To find them, simply open up the Lambda Console and search for box:

The first blueprint uses security credentials stored in the BOX_CONFIG environment variable. You can set the variable from within the Lambda Console:

The code in this blueprint retrieves and logs the Box User object for the user identified by the credentials.

The second blueprint implements a Box webhook that sits behind an API Gateway endpoint. It accepts requests, validates them, and logs them to Amazon CloudWatch:

Handy Blog Posts
The developer relations team at Box has written some blog posts that show you how to use Box in conjunction with several AWS services:

Manage User Authentication with Box Platform using Amazon Cognito – This post shows you how to use Amazon Cognito to power a login page for your app users. Cognito will handle authentication and user pool management and the code outlined in the blog post will create an App User in Box the first time the user logs in. The code is available as box-node-cognito-lambdas-sample on GitHub.

Add Deep Learning-based Image Recognition to your Box App with Amazon Rekognition – This post shows you how to build an image tagging application that is powered by Amazon Rekognition. Users take and upload photos, which are automatically labeled with metadata that that is stored in Amazon DynamoDB. The code is activated by a webhook when a file is uploaded. You can find the code in the box-node-rekognition-webhook on GitHub.

Thanks to our friends at Box for taking the time to create these helpful developer resources!




New – Auto Scaling for Amazon DynamoDB

Amazon DynamoDB has more than one hundred thousand customers, spanning a wide range of industries and use cases. These customers depend on DynamoDB’s consistent performance at any scale and presence in 16 geographic regions around the world. A recent trend we’ve been observing is customers using DynamoDB to power their serverless applications. This is a good match: with DynamoDB, you don’t have to think about things like provisioning servers, performing OS and database software patching, or configuring replication across availability zones to ensure high availability – you can simply create tables and start adding data, and let DynamoDB handle the rest.

DynamoDB provides a provisioned capacity model that lets you set the amount of read and write capacity required by your applications. While this frees you from thinking about servers and enables you to change provisioning for your table with a simple API call or button click in the AWS Management Console, customers have asked us how we can make managing capacity for DynamoDB even easier.

Today we are introducing Auto Scaling for DynamoDB to help automate capacity management for your tables and global secondary indexes. You simply specify the desired target utilization and provide upper and lower bounds for read and write capacity. DynamoDB will then monitor throughput consumption using Amazon CloudWatch alarms and then will adjust provisioned capacity up or down as needed. Auto Scaling will be on by default for all new tables and indexes, and you can also configure it for existing ones.

Even if you’re not around, DynamoDB Auto Scaling will be monitoring your tables and indexes to automatically adjust throughput in response to changes in application traffic. This can make it easier to administer your DynamoDB data, help you maximize availability for your applications, and help you reduce your DynamoDB costs.

Let’s see how it works…

Using Auto Scaling
The DynamoDB Console now proposes a comfortable set of default parameters when you create a new table. You can accept them as-is or you can uncheck Use default settings and enter your own parameters:

Here’s how you enter your own parameters:

Target utilization is expressed in terms of the ratio of consumed capacity to provisioned capacity. The parameters above would allow for sufficient headroom to allow consumed capacity to double due to a burst in read or write requests (read Capacity Unit Calculations to learn more about the relationship between DynamoDB read and write operations and provisioned capacity). Changes in provisioned capacity take place in the background.

Auto Scaling in Action
In order to see this important new feature in action, I followed the directions in the Getting Started Guide. I launched a fresh EC2 instance, installed (sudo pip install boto3) and configured (aws configure) the AWS SDK for Python. Then I used the code in the Python and DynamoDB section to create and populate a table with some data, and manually configured the table for 5 units each of read and write capacity.

I took a quick break in order to have clean, straight lines for the CloudWatch metrics so that I could show the effect of Auto Scaling. Here’s what the metrics look like before I started to apply a load:

I modified the code in Step 3 to continually issue queries for random years in the range of 1920 to 2007, ran a single copy of the code, and checked the read metrics a minute or two later:

The consumed capacity is higher than the provisioned capacity, resulting in a large number of throttled reads. Time for Auto Scaling!

I returned to the console and clicked on the Capacity tab for my table. Then I clicked on Read capacity, accepted the default values, and clicked on Save:

DynamoDB created a new IAM role (DynamoDBAutoscaleRole) and a pair of CloudWatch alarms to manage the Auto Scaling of read capacity:

DynamoDB Auto Scaling will manage the thresholds for the alarms, moving them up and down as part of the scaling process. The first alarm was triggered and the table state changed to Updating while additional read capacity was provisioned:

The change was visible in the read metrics within minutes:

I started a couple of additional copies of my modified query script and watched as additional capacity was provisioned, as indicated by the red line:

I killed all of the scripts and turned my attention to other things while waiting for the scale-down alarm to trigger. Here’s what I saw when I came back:

The next morning I checked my Scaling activities and saw that the alarm had triggered several more times overnight:

This was also visible in the metrics:

Until now, you would prepare for this situation by setting your read capacity well about your expected usage, and pay for the excess capacity (the space between the blue line and the red line). Or, you might set it too low, forget to monitor it, and run out of capacity when traffic picked up. With Auto Scaling you can get the best of both worlds: an automatic response when an increase in demand suggests that more capacity is needed, and another automated response when the capacity is no longer needed.

Things to Know
DynamoDB Auto Scaling is designed to accommodate request rates that vary in a somewhat predictable, generally periodic fashion. If you need to accommodate unpredictable bursts of read activity, you should use Auto Scaling in combination with DAX (read Amazon DynamoDB Accelerator (DAX) – In-Memory Caching for Read-Intensive Workloads to learn more). Also, the AWS SDKs will detect throttled read and write requests and retry them after a suitable delay.

I mentioned the DynamoDBAutoscaleRole earlier. This role provides Auto Scaling with the privileges that it needs to have in order for it to be able to scale your tables and indexes up and down. To learn more about this role and the permissions that it uses, read Grant User Permissions for DynamoDB Auto Scaling.

Auto Scaling has complete CLI and API support, including the ability to enable and disable the Auto Scaling policies. If you have some predictable, time-bound spikes in traffic, you can programmatically disable an Auto Scaling policy, provision higher throughput for a set period of time, and then enable Auto Scaling again later.

As noted on the Limits in DynamoDB page, you can increase provisioned capacity as often as you would like and as high as you need (subject to per-account limits that we can increase on request). You can decrease capacity up to nine times per day for each table or global secondary index.

You pay for the capacity that you provision, at the regular DynamoDB prices. You can also purchase DynamoDB Reserved Capacity to further savings.

Available Now
This feature is available now in all regions and you can start using it today!


Amazon DynamoDB Accelerator (DAX) – In-Memory Caching for Read-Intensive Workloads

I’m fairly sure that you already know about Amazon DynamoDB. As you probably know, it is a managed NoSQL database that scales to accommodate as much table space, read capacity, and write capacity as you need. With response times measured in single-digit milliseconds, our customers are using DynamoDB for many types of applications including adtech, IoT, gaming, media, online learning, travel, e-commerce, and finance. Some of these customers store more than 100 terabytes in a single DynamoDB table and make millions of read or write requests per second. The Amazon retail site relies on DynamoDB and uses it to withstand the traffic surges associated with brief, high-intensity events such as Black Friday, Cyber Monday, and Prime Day.

While DynamoDB’s ability to deliver fast, consistent performance benefits just about any application and workload, there’s always room to do even better. The business value of some workloads (gaming and adtech come to mind, but there are many others) is driven by low-latency, high-performance database reads. The ability to pull data from DynamoDB as quickly as possible leads to faster & more responsive games or ads that drive the highest click-through rates.

Amazon DynamoDB Accelerator
In order to support demanding, read-heavy workloads, we are launching a public preview of the Amazon DynamoDB Accelerator, otherwise known as DAX.

DAX is a fully managed caching service that sits (logically) in front of your DynamoDB tables. It operates in write-through mode, and is API-compatible with DynamoDB. Responses are returned from the cache in microseconds, making DAX a great fit for eventually-consistent read-intensive workloads. DAX is seamless and easy to use. As a managed service, you simply create your DAX cluster and use it as the target for your existing reads and writes. You don’t have to worry about patching, cluster maintenance, replication, or fault management.

Each DAX cluster can contain 1 to 10 nodes; you can add nodes in order to increase overall read throughput. The cache size (also known as the working set) is based on the node size (dax.r3.large to dax.r3.8xlarge) that you choose when you create the cluster. Clusters run within a VPC, with nodes spread across Availability Zones.

You will need to use the DAX SDK for Java to communicate with DAX. This SDK communicates with your cluster using a low-level TCP interface that is fine-tuned for low latency and high throughput (we’ll support access to DAX through other languages as quickly as possible).

Creating a DAX Cluster
Let’s create a DAX cluster from the DynamoDB Console (API and CLI support is also available). I open up the console and click on Create cluster to get started:

I enter a name and description, choose a node type, and set the initial size of my cluster. Then I create an IAM role and policy that gives DAX permission to access my DynamoDB tables (I can also choose an existing role):

The console allows me to create a policy that grants access to a single table. I add additional tables to the policy using the IAM Console.

Next, I create a subnet group that DAX uses to place cluster nodes. I name the group and choose the desired subnets:

I accept the default settings and then click on Launch cluster:

My cluster is ready to use within minutes:

The next step is to update my application to use the DAX SDK for Java and to configure it to use the endpoint of my cluster ( in this case).

Once my application is up and running, I can visit the Metrics tab to see how well the cache is performing. The Amazon CloudWatch metrics include cache hits and misses, request counts, error counts, and so forth:

I can use the Alarms tab to create a CloudWatch Alarm for any of the metrics. Perhaps I want to know if an excessive number of cache misses are taking place:

I can use the Nodes tab to see the nodes in my cluster. I can also add new nodes or delete existing ones:

In order to see how DAX works, I installed the DAX Sample Application and ran it twice. The first run accessed DynamoDB directly and demonstrated the non-cached, baseline performance:

As you can see from the middle group of results, the queries ran in 2.9 to 11.3 milliseconds. The second run used DAX and showed the effect of caching on performance:

The first iteration of each test results in a cache miss. The subsequent iterations retrieve the results from the cache, and are (as you can see) quite a bit faster.

Things to Know
Here are a few things to keep in mind as you think about how to put DAX to use in your environment:

Java API – As I mentioned earlier, we are launching this public preview with support for Java, with plans to add support for other languages. DAX is API-compatible with DynamoDB so there’s no need to write your own caching logic or make changes to your code.

Consistency – DAX offers the best opportunity for performance gains when you are using eventually consistent reads that can be served from the in-memory cache (DAX always refers back to the DynamoDB table when processing consistent reads).

Write-Throughs – DAX is a write-through cache. However, if there is a weak correlation between what you read and what you write, you may want to direct your writes to DynamoDB. This will allow DAX to be of greater assistance for your reads.

Deprovisioning – After you have put DAX to use in your environment, you should be able to reduce the amount of read capacity provisioned for the underlying tables. This will reduce your costs (dramatically in many cases), while allowing DAX to provide spare capacity for sudden surges in usage.

Available Now
The public preview of DAX is available today in the US East (Northern Virginia), US West (Oregon), and EU (Ireland) Regions and you can sign up today. You can use the public preview at no charge and you can also learn more by reading the DAX Developer Guide.




New – Manage DynamoDB Items Using Time to Live (TTL)

AWS customers are making great use of Amazon DynamoDB. They love the speed and flexibility and build Ad Tech (reference architecture), Gaming (reference architecture), IoT (reference architecture), and other applications that take advantage of the consistent, single-digit millisecond latency. They also love the fact that DynamoDB is a managed, serverless database that scales to handle millions of requests per second to tables that are many terabytes in size.

Many DynamoDB users store data that has a limited useful life or is accessed less frequently over time. Some of them track recent logins, trial subscriptions, or application metrics. Others store data that is subject to regulatory or contractual limitations on how long it can be stored. Until now, these customers implemented their own time-based data management. At scale, this sometimes meant that they ran a couple of Amazon Elastic Compute Cloud (EC2) instances that did nothing more than scan DynamoDB items, check date attributes, and issue delete requests for items that were no longer needed. This added cost and complexity to their application.

New Time to Live (TTL) Management
In order to streamline this popular and important use case, we are launching a new Time to Live (TTL) feature today. You can enable this feature on a table-by-table basis, specifying an item attribute that contains the expiration time for the item.

Once the attribute has been specified and TTL management has been enabled (a single API call takes care of both operations), DynamoDB will find and delete items that have expired. This processing takes place automatically and in the background and does not affect read or write traffic to the table.

You can use DynamoDB streams (see DynamoDB Update – Triggers (Streams + Lambda) + Cross-Region Replication App for more info) to process or archive the actual deletions. Like other update records in a stream, the deletions are available on a rolling 24-hour basis. You can move the expired items to cold storage, log them, or update other tables using AWS Lambda and DynamoDB Triggers.

Here’s how you enable TTL for a table and specify the desired attribute:

The attribute must be in DynamoDB’s Number data type, and is interpreted as seconds per the Unix Epoch time system.

As you can see from the screen shot above, you can also enable DynamoDB Streams, and you can look at a preview of the items that will be deleted when you enable TTL.

You can also call the UpdateTimeToLive function from your code, or you can use the update-time-to-live command from the AWS Command Line Interface (CLI).

AWS customer TUNE is already making good use of this feature as part of their HasOffers product.


HasOffers helps customers to analyze the effectiveness of their marketing campaigns, storing massive amounts of ad engagement data in the process. Once the customer-defined time window for the campaign has passed, the data is no longer needed and can be deleted. Before we made the TTL feature available to TUNE, they manually identified and then deleted the stale data. This was labor and compute-intensive, and also consumed some of the provisioned throughput for the table.

Now, they simply set an expiration time for each item and leave the rest to DynamoDB. The stale data disappears automatically, with no impact on the available throughput. As a result, TUNE has been able to purge 85 terabytes of stale data and has reduced their costs by over $200K per year, while also simplifying their application logic.

Things to Know
Here are a couple of things to keep in mind as you are thinking about putting TTL to use in your application.

TTL Attribute – The TTL attribute can be indexed or projected, but it cannot be an element of a JSON document. As I indicated earlier, it must have the Number data type. You can use IAM to regulate access to this attribute, just as you can do for any other one. Items that do not have the designated TTL attribute will not be considered for deletion. In order to avoid a possible accidental deletion due to a malformed TTL value, items that appear to be older than 5 years will not be deleted.

Tables – You can apply a TTL to a new or an existing table. The process of enabling TTL for a table can take up to an hour, and you can only make one change per table at a time.

Background Processing – The scans and the deletions take place in the background and do not count against the provisioned throughput. Deletion times will vary based on the number and nature of the expired items. After the expiration but before the actual deletion, the items remain in the table and will appear in reads and scans.

Indexes – Items are removed from any Local Secondary Indexes immediately, and from Global Secondary Indexes in the usual eventually consistent fashion.

Pricing – There is no charge for the internal scan operation or for the deletion. You will pay for storage until the item is actually deleted.

Available Now
This feature is available now and you can start using it today! To learn more, read about Time to Live in the DynamoDB Developer Guide.



Genome Engineering Applications: Early Adopters of the Cloud

Our friends at the Commonwealth Scientific and Industrial Research Organization (CSIRO) in Australia sent along the guest post below to tell us about how AWS powers an important new genome editing technique.

— Jeff


Recent developments in molecular engineering technology now enables the accurate editing of genomes. The new technology, called CRISPR-Cas9, can be programmed to recognize and edit specific locations in the genome by pattern-matching unique sequences of DNA. While this is a powerful new tool for researchers, the ability to scan and identify targets across the entire genome has created unprecedented demand for large-scale computation. Earlier this year, the US National Institutes of Health (NIH) has approved the use of these technologies for human health. This has the potential to revolutionize cancer treatments and also adds a new time-critical dimension to the compute requirements.

A New Approach to Cancer Treatments
Approximately two in five people will be diagnosed with cancer at some point during their lifetime and while overall cancer survival has doubled, there are still cancer types with very low survival rate, for example just 1% for pancreatic cancer. This is mainly due to the difficulty of finding therapeutic interventions that kill cancer cells but not harm the healthy tissue in the body.

The new NIH approved trial will leverage breakthroughs in the genome editing technology, CRISPR-Cas9, to develop a different treatment approach. In this, the patient’s own immune system is boosted through specific modifications of the cells that natively fight cancer. This has the potential of being effective for a wide range of different tumors, with the current trial including patients with specific blood and solid cancers, as well as melanoma.

Cloud Services for Computationally Guided Genome Engineering
This new application in human health requires an increase in robustness and efficiency of CRISPR-Cas9 design in order to meet the time constraints of clinical care. Built on AWS cloud-services, researchers in the eHealth program of the Commonwealth Scientific and Industrial Research Organization (CSIRO) in Australia, developed GT-Scan2, a novel software tool to address this issue.

“Compared to other available methods, GT-Scan2 identifies genomic location with higher sensitivity and specificity,” says Dr. Denis Bauer who is leading the transformational bioinformatics team.

GT-Scan2 shows the identified CRISPR target sites at the genomic position and annotates them with high or low activity as well as their off-target potential.

GT-Scan2 improves the effectiveness of the system by finding sites that are unique in the genome. This avoids diluting the effect due to “off-target”, which are other sites in the genome with high sequence similarity. It also optimizes robustness by finding sites that are easier to modify.

“While it was known that the three-dimensional genome organization plays a role in CRISPR binding, GT-Scan2 is the first tool to also leverage other components that are crucial for Cas9 activity,” says Dr. Laurence Wilson whose research focuses on computational genome engineering.

Specifically the off-target search is a compute intensive task traditionally reserved for researchers at large institutes with high-performance-compute infrastructure as every location in the 3 billion letter long genomic sequence needs to be investigated. GT-Scan2 democratizes the ability to find optimal sites by offering this complex computation as a cloud-service using AWS Lambda functions.

Scaling Instantaneously for Personalized Treatments
GT-Scan2 leverages the instantaneous scalability that the event-driven AWS Lambda service offers. This is crucial for personalized treatment, as complexity of the targeted gene can vary dramatically.

“The off-target search as well as the robustness analysis can be subdivided into independent, modular tasks that can run in parallel” says Aidan O’Brien who designed and implemented the system within weeks after its official Asia-Pacific launch in April this year at the AWS Summit 2016 attesting to the intuitive nature of the service. A typical job takes less than a minute and the variation between jobs range from 1 second to 5 minutes. This fast fluctuation in load over minutes rather than hours ruled out an EC2-based solution as new instances would come online too slowly to keep the runtime stable.

GT-Scan2 is served directly from S3 making it a static web app without server-side processing. It retrieves the dynamic content (such as job results and parameters) via API calls using API Gateway from a database (DynamoDB) using a JavaScript framework.

When a user submits a job, GT-Scan2 inserts the job parameters as an item into a DynamoDB table via an API call. This allows the solution to be freely scalable without creating a bottleneck. The database entry triggers the first Lambda function, which finds all putative CRISPR targets in the user-specified DNA sequence (fetched automatically upon user submission). Potential CRISPR target sites have fixed rules and can be easily found using a regular expression that completes in seconds and are inserted into a second DynamoDB table.

Adapting to leverage the power of Lambda-based microservices

All potential targets need to be evaluated for their off-target risk using the efficient string matching tool, Bowtie. Though Bowtie only requires a reduced representation of the 3 billion letter genomic sequence, the sizes of these index files exceed the storage limitation for each Lambda instance. “GT-Scan2 divides the genome into smaller blocks to fit the Lambda specifications” explains Adrian White (Research & Technical Computing, APAC) who supported the CSIRO team during development. For an average run, GT-Scan2 hence triggers 500-1000 individual Lambda functions, which simultaneously update the scores for the different putative targets in DynamoDB. During this process, the frontend is polling this table via API Gateway and updating the webpage as results come in, eliminating the need for server-side compute.

“AWS’s Lambda has given us a great framework to develop a future-ready software package able to support medical genome engineering applications,” says Dr. Bauer. “We are specifically impressed with the ability to instantaneously scale at run time by spawning more Lambda functions to cope with the varying complexity of the different genes.” Other benefits Dr. Bauer quotes include only paying for storage during periods of no use and jobs not competing with web server resources as the website is a static page with dynamic content updated through Angular 2 and the API Gateway, as well as not needing to maintain compute instances (security patches of OS).

“One of the best things about Lambda is that users will be able to easily swap-in different machine learning algorithms that are better suited for specific CRISPR applications” says Dr. Wilson.

The GT-Scan2 Team, from left, Denis Bauer, Laurence Wilson, Aidan O’Brien

“The computational genome engineering community is one of the early adopters of our AWS Lambda technology,” explains Dr. Mia Champion (Technical Business Development Manager, Scientific Computing). “GT-Scan2’s use of API Gateway and DynamoDB is a very neat solution to ensure scalability and their clever use of epigenomics really sets them apart from other recent applications using lambda to perform CRISPR searches. I am looking forward to seeing GT-Scan2 adopted in medical applications.”

How Tokyu Hands Architected a Cost-Effective Shopping System with Amazon DynamoDB

I am a flâneur! I enjoy wandering around a new city, exploring the nooks and crannies, and figuring out what makes it unique and special. On one of my trips to Tokyo I was walking through Shibuya and found and amazing hobby store. The 8-floor building contained tools, supplies, and kits for almost every imaginable hobby. As you can read from the post below (written by my colleagues in Japan), this store, TOKYU HANDS, is now an AWS customer!


TOKYU HANDS improved its customer experience with an innovative retail point-of-sale and online shopping system. Here’s what Hideki Hasegawa-san (CTO) had to say about their new solution:

As a retailer, we are always conscious about costs and DynamoDB’s easy scalability make it cost-effective to operate. For a few hundred dollars a month, we get a fast, highly-available and scalable database. We don’t need to spend any money on expensive hardware or personnel to operate the database.

TOKYU HANDS is one of Japan’s biggest and most popular retailers. It is a one-stop shop for zakka, Japanese stationary items, creative do-it-yourself (DIY) solutions, and other home items. TOKYU HANDS operates 40 stores across Japan and Singapore. In addition to its retail locations, TOKYU HANDS operates an online store where customers can shop 24/7 using a computer or a mobile device. To keep pace with its rapid growth and new business opportunities, TOKYU HANDS operates a fast, flexible and scalable IT system built entirely on Amazon Web Services (AWS).

Prior to AWS, TOKYU HANDS’ IT systems were located in its on-premises data center. Operating and scaling the data center became a significant burden. So TOKYU HANDS decided to go all-in on AWS and migrated its applications from the on-premises data center to the cloud. Offloading infrastructure management to AWS allowed TOKYU HANDS to focus on delivering more value to its customers. Hasegawa-san, says “I like AWS because we can spend time and resources innovating for our customers, and not on infrastructure management. AWS offers a wide variety of fully managed services that makes it easy for us to architect our entire IT system. Amazon DynamoDB is one such service that is at the core of critical applications.”

The Challenge
TOKYU HANDS’ most important applications for customer experience are its e-commerce system and Point of Sale (POS) + Merchandising System. Its e-commerce system contains all of the business logic to keep the store running. Choosing the right database for the e-commerce system is the key to achieving the scalability, flexibility and customizability needed to match its pace of innovation. TOKYU HANDS’ POS + Merchandising System has a customer-facing application that processes customer orders at the store registers and for online purchases. Among other things, the POS + Merchandising System also keeps track of item inventory as well as customer purchase history. Rather than spending time on routine backend maintenance tasks, TOKYU HANDS wanted its software development team to focus on its customers’ experience by making the POS + Merchandising System better. With its previous architecture, developers spent an inordinate amount of time maintaining and operating the backend data store to support the POS + Merchandising System.

A telling example of the operational burden TOKYU HANDS endured was trying to scale the e-commerce system to handle traffic spikes during “Hands Messe” – an annual super sale similar to Black Friday. The Hands Messe sale generated several times more traffic than usual for TOKYU HANDS’ retail and online stores. In the past, scaling up the database system to handle the spike during Hands Messe was a time-consuming and difficult task. TOKYU HANDS spent a lot time adding, configuring, and operating hardware needed to handle the Hands Messe traffic. Often, they would experience node failures during the sale, resulting in system outages. Even when hardware didn’t fail, TOKYU HANDS experienced sub-optimal database performance. The net result was an inferior shopping experience for their customers and loss of revenue during the 2012 and 2013 seasons. For startups, such service interruptions erodes customer confidence. As a result of their painful experience in managing an on-premises database, TOKYU HANDS began searching for a fully-managed database optimized for availability and scalability.

Here is the Tokyu Hands dev team:

Back row: Hideki Hasegawa, Toshiharu Ozawa, Yoshimitsu Sugawara, Minoru Saito, and Taiji Inoue.

Front row: Seigo Miyoshi, Yusuke Usui, and Manami Osawa.

Burden-free and Cost-effective Operations with DynamoDB
“My first encounter with DynamoDB was at AWS re:Invent. Through presentations and conversations with solution architects, we learned about DynamoDB’s capabilities and were intrigued by its high availability, scalability and worry-free operations. When I learned that core applications for and other AWS services were using DynamoDB, I decided to give it a try”, said Hasegawa-san, explaining how they started using Amazon DynamoDB.

In just a few months, TOKYU HANDS was able to re-architect its e-commerce system to use Amazon DynamoDB. In 2014, TOKYU HANDS put the new e-commerce system to the test with its annual Hands Messe sale. Amazon DynamoDB’s multiple availability zone architecture ensured their tables were highly available. TOKYU HANDS did not have to worry about system down-time resulting from node failures. TOKYU HANDS was also able to handle traffic spikes easily by scaling up DynamoDB throughput just before the sale.  Unlike previous years, none of the database requests were rejected due to capacity constraints. After the sale ended, TOKYU HANDS dialed back throughput, thereby saving them money. TOKYU HANDS is considering purchasing reserved capacity for throughput in order to save even more.

Easy Application Development with Amazon DynamoDB
Using Amazon DynamoDB together with a host of other AWS services including Amazon S3, Amazon SQS, and Amazon SNS, TOKYU HANDS was able architect a brand new POS + Merchandising System. “Most of our team is made up of former clerks at our retail stores. They know our customers in and out, and we want them to use that knowledge to build powerful applications and not worry about infrastructure. With AWS, we are able to do just that”.

Here is a diagram of their architecture:

TOKYU HANDS has experienced three important benefits with AWS and Amazon DynamoDB:

  • Cost effective operations
  • Hands-off, worry-free operations
  • Easy development of high-availability applications

See how your business can leverage this fully managed AWS NoSQL service to achieve cost-effective scale by visiting our DynamoDB Getting Started and Developer Resources pages.  Then head over to the FAQs to learn how to leverage DynamoDB Streams and Triggers using serverless programming with AWS Lambda.

— AWS Japan Solution Architects


New CloudWatch Events – Track and Respond to Changes to Your AWS Resources

When you pull the curtain back on an AWS-powered application, you’ll find that a lot is happening behind the scenes. EC2 instances are launched and terminated by Auto Scaling policies in response to changes in system load, Amazon DynamoDB tables, Amazon SNS topics and Amazon SQS queues are created and deleted, and attributes of existing resources are changed from the AWS Management Console, the AWS APIs, or the AWS Command Line Interface (CLI).

Many of our customers build their own high-level tools to track, monitor, and control the overall state of their AWS environments. Up until now, these tools have worked in a polling fashion. In other words, they periodically call AWS functions such as DescribeInstances, DescribeVolumes, and ListQueues to list the AWS resources of various types (EC2 instances, EBS volumes, and SQS queues here) and to track their state. Once they have these lists, they need to call other APIs to get additional state information for each resources, compare it against historical data to detect changes, and then take action as they see fit. As their systems grow larger and more complex, all of this polling and state tracking can become onerous.

New CloudWatch Events
In order to allow you to track changes to your AWS resources with less overhead and greater efficiency, we are introducing CloudWatch Events today.

CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources. Using simple rules that you can set up in a couple of minutes, you can easily route each type of event to one or more targets: AWS Lambda functions, Amazon Kinesis streams, Amazon SNS topics, and built-in targets.

You can think of CloudWatch Events as the central nervous system for your AWS environment. It is wired in to every nook and cranny of the supported services, and becomes aware of operational changes as they happen. Then, driven by your rules,  it activates functions and sends messages (activating muscles, if you will) to respond to the environment, making changes, capturing state information, or taking corrective action.

We are launching CloudWatch Events with an initial set of AWS services and events today, and plan to support many more over the next year or so.

Diving in to CloudWatch Events
The three main components that you need to know about are events, rules, and targets.

Events (represented as small blobs of JSON) are generated in four ways. First, they arise from within AWS when resources change state. For example, an event is generated when the state of an EC2 instance changes from pending to running or when Auto Scaling launches an instance. Second, events are generated by API calls and console sign-ins that are delivered to Amazon CloudWatch Events via CloudTrail. Third, your own code can generate application-level events and publish them to Amazon CloudWatch Events for processing. Fourth, they can be issued on a scheduled basis, with options for periodic or Cron-style scheduling.

Rules match incoming events and route them to one or more targets for processing. Rules are not processed in any particular order; all of the rules that match an event will be processed (this allows disparate parts of a single organization to independently look for and process events that are of interest).

Targets process events and are specified within rules. There are four initial target types: built-in, Lambda functions, Kinesis streams, and SNS topics, with more types on the drawing board. A single rule can specify multiple targets. Each event is passed to each target in JSON form. Each rule has the opportunity to customize the JSON that flows to the target. They can elect to pass the event as-is, pass only certain keys (and the associated values) to the target, or to pass a constant (literal) string.

CloudWatch Events in Action
Let’s go ahead and set up a rule or two! I’ll use a simple Lambda function called SomethingHappened. It will simply log the contents of the event:

Next, I switch to the new CloudWatch Events Console, click on Create rule and choose an event source (here’s the menu with all of the choices):

Just a quick note before going forward. Some of the AWS services fire events directly. Others are fired based on the events logged to CloudTrail; you’ll need to enable CloudTrail for the desired service(s) in order to receive them.

I want to keep tabs on my EC2 instances, so I choose EC2 from the menu. I can choose to create a rule that fires on any state transition, or on a transition to one or more states that are of interest:

I want to know about newly launched instances, so I’ll choose Running. I can make the rule respond to any of my instances in the region, or to specific instances. I’ll go with the first option; here’s my pattern:

Now I need to make something happen. I do this by picking a target. Again, here are my choices:

I simply choose Lambda and pick my function:

I’m almost there! I just need to name and describe my rule, and then click on Create rule:

I click on Create Rule and the rule is all set to go:

Now I can test it by launching an EC2 instance. In fact, I’ll launch 5 of them just to exercise my code! After waiting a minute or so for the instances to launch and to initialize, I can check my Lambda metrics to verify that my function was invoked:

This looks good (the earlier invocations were for testing). Then I can visit the CloudWatch logs to view the output from my function:

As you can see, the event contains essential information about the newly launched instance. Your code can call AWS functions in order to learn more about what’s going on. For example, you could call DescribeInstances to access more information about newly launched instances.

Clearly, a “real” function would do something a lot more interesting. It could add some mandatory tags to the instance, update a dynamic visualization, or send me a text message via SNS. If you want to do any (or all of these things), you would need to have a more permissive IAM role for the function, of course. I could make the rule more general (or create another one) if  I wanted to capture some of the other state transitions.

Scheduled Execution of Rules
I can also set up a rule that fires periodically or according to a pattern described in a Cron expression. Here’s how I would do that:

You might find it interesting to know that this is the underlying mechanism used to set up scheduled Lambda jobs, as announced at AWS re:Invent.

API Access
Like most AWS services, you can access CloudWatch Events through an API. Here are some of the principal functions:

  • PutRule to create a new rule.
  • PutTargets and RemoveTargets to connect targets to rules, and to disconnect them.
  • ListRules, ListTargetsByRule, and DescribeRule to find out more about existing rules.
  • PutEvents to submit a set of events to CloudWatch events. You can use this function (or the CLI equivalent) to submit application-level events.

Metrics for Events
CloudWatch Events reports a number of metrics to CloudWatch, all within the AWS/Events namespace. You can use these metrics to verify that your rules are firing as expected, and to track the overall activity level of your rule collection.

The following metrics are reported for the service as a whole:

  • Invocations – The number of times that target have been invoked.
  • FailedInvocations – The number of times that an invocation of a target failed.
  • MatchedEvents – The number of events that matched one or more rules.
  • TriggeredRules – The number of rules that have been triggered.

The following metrics are reported for each rule:

  • Invocations – The number of times that the rule’s targets have been invoked.
  • TriggeredRules – The number of times that the rule has been triggered.

In the Works
Like many emerging AWS services, we are launching CloudWatch Events with an initial set of features (and a lot of infrastructure behind the scenes) and some really big plans, including AWS CloudFormation support. We’ll adjust our plans based on your feedback, but you can expect coverage of many more AWS services and access to additional targets over time. I’ll do my best to keep you informed.

Getting Started
We are launching CloudWatch Events in the US East (Northern Virginia), US West (Oregon), EU (Ireland), and Asia Pacific (Tokyo) regions. It is available now and you can start using it today!


New – Store and Process Graph Data using the DynamoDB Storage Backend for Titan

Graph databases elegantly and efficiently represent entities (generally known as vertices or nodes) and relationships (edges) that connect them. Here’s a very simple example of a graph:

Bill and Candace have a daughter named Janet, and she has a son named Bob. This makes Candace Bob’s grandmother, and Bill his grandfather.

Once a graph has been built, it is processed by traversing the edges between the vertices. In the graph above, we could traverse from Bill to Janet, and from there to Bob. Graphs can be used to model social networks (friends and “likes”), business relationships (companies, employees, partners, suppliers, and customers), dependencies, and so forth. Both vertices and edges can be typed; some vertices could be people as in our example, and others places. Similarly some edges could denote (as above) familial relationships and others could denote “likes.” Every graph database allows additional information to be attached to each vertex and to each edge, often in the form of name-value pairs.

Titan is a scalable graph database that is optimized for storing and querying graphs that contain hundreds of billions of vertices and edges. It is transactional, and can support concurrent access from thousands of users.

DynamoDB Storage Backend for Titan
Titan’s pluggable data storage layer already supports several NoSQL databases and key-value stores. This allows you to choose the backend that provides the performance and features required by your application, while giving you the freedom to switch from one backend to another with minimal changes to your application code.

Today we are making a new DynamoDB Storage Backend for Titan available. Storing your Titan graphs in Amazon DynamoDB lets you scale to handle huge graphs without having to worry about building, running, or maintaining your own database cluster. Because DynamoDB can scale to any size and provides high data availability and predictable performance, you can focus on your application instead of on your graph storage and processing infrastructure. You can also run Titan and DynamoDB Local on your laptop for development and testing.

The backend works with versions 0.4.4 and 0.5.4 of Titan. Both versions support fast traversals, edges that are both directed and typed, and stored relationships. The newer version adds support for vertex partitioning, vertex labels, and user defined transaction logs. The backend is client-based; we did not make any changes to DynamoDB to support it. You are simply using DynamoDB as an efficient way to store your Titan graphs.

Version 0.4.4 of Titan is compatible with version 2.4 of the Tinkerpop stack; version 0.5.4 of Titan is compatible with version 2.5 of the stack. Tinkerpop is a collection of tools and algorithms that provides you with even more in the way of graph processing and analysis options.

Since I am talking about graphs, I should illustrate all of the items that I have talked about in the form of a graph! Here you go:

My colleague Alex Patrikalakis created the following Gremlin script. It replicates the graph above using Titan and DynamoDB:

conf = new BaseConfiguration()
conf.setProperty("storage.backend", "")
conf.setProperty("storage.dynamodb.client.endpoint", "http://localhost:4567")
g =
titan = g.addVertex(null, [name:"Titan"])
blueprints = g.addVertex(null, [name:"Blueprints"])
pipes = g.addVertex(null, [name:"Pipes"])
gremlin = g.addVertex(null, [name:"Gremlin"])
frames = g.addVertex(null, [name:"Frames"])
furnace = g.addVertex(null, [name:"Furnace"])
rexster = g.addVertex(null, [name:"Rexster"])
DynamoDBStorageBackend = g.addVertex(null, [name:"DynamoDB Storage Backend for Titan"])
DynamoDBLocal = g.addVertex(null, [name:"DynamoDB Local"])
DynamoDB = g.addVertex(null, [name:"DynamoDB"])
g.addEdge(titan, blueprints, "implements")
g.addEdge(pipes, blueprints, "builds-on")
g.addEdge(gremlin, blueprints, "builds-on")
g.addEdge(frames, blueprints, "builds-on")
g.addEdge(furnace, blueprints, "builds-on")
g.addEdge(rexster, blueprints, "builds-on")
g.addEdge(titan, DynamoDBStorageBackend, "backed-by")
g.addEdge(DynamoDBStorageBackend, DynamoDBLocal, "connects-to")
g.addEdge(DynamoDBStorageBackend, DynamoDB, "connects-to")

Getting Started
The DynamoDB Storage Backend for Titan is available as a Maven project on GitHub.  It runs on Windows, OSX, and Linux and requires Maven and Java 1.7 (or later). The Amazon DynamoDB Storage Backend for Titan includes installation instructions and an example that makes creative use of the Marvel Universe Social Graph public dataset. We have also created a CloudFormation template that will launch an EC2 instance that has the Titan/Rexster stack and the DynamoDB Storage Backend for Titan installed and ready to use.


New Logstash Plugin – Search DynamoDB Content using Elasticsearch

When I take a look at our recent service releases and combine it with some of the AWS-related repos on GitHub, an interesting trend becomes apparent. It seems to me that connecting AWS services to each other and to third party tools is becoming more and more common. For example, in a recent post, I showed you how to combine CloudWatch Logs, Elasticsearch, and Kibana to visualize event data. In another post I showed you how AWS OpsWorks can provision and manage ECS container instances. As I noted in that post, I think of this as “peanut butter and chocolate” — combining two good flavors in to another that is even better.

DynamoDB + Elasticsearch
Today I would like to tell you about another way to connect several interesting pieces of technology together: DynamoDB and Elasticsearch by way of a DynamoDB Streams connector (plugin) for Logstash.

You can run Logstash on an Amazon Elastic Compute Cloud (EC2) instance or on-premises. After you configure it to take input from the DynamoDB tables and streams that you designate, it will track changes (inserts, updates, and deletions) to the tables and update your Elasticsearch cluster accordingly. You can also configure the plugin to scan the table (with checkpoints along the way) to bring your cluster in to sync with the table.

Once your cluster is up, running, and tracking changes to your tables, you can perform efficient queries (structured, full-text, and multifield) using Elasticsearch. Your queries can make use of proximity matching and partial matching, and you can also control relevance using Elasticsearch’s scoring infrastructure (you can learn about these topics and more in Elasticsearch: The Definitive Guide).

Changes made to your DynamoDB tables are reflected in the stream very quickly (generally a second or less). The plugin will have access to these changes and will update your cluster as expeditiously as possible.

Download, Install, and Run
You can download the plugin from our new DynamoDB Community page, install it on an EC2 instance or on-premises, point it at your Elasticsearch cluster, and start searching your DynamoDB content today! Start by reading the documentation and following the directions in the README.