Category: Amazon DynamoDB
AWS customers are making great use of Amazon DynamoDB. They love the speed and flexibility and build Ad Tech (reference architecture), Gaming (reference architecture), IoT (reference architecture), and other applications that take advantage of the consistent, single-digit millisecond latency. They also love the fact that DynamoDB is a managed, serverless database that scales to handle millions of requests per second to tables that are many terabytes in size.
Many DynamoDB users store data that has a limited useful life or is accessed less frequently over time. Some of them track recent logins, trial subscriptions, or application metrics. Others store data that is subject to regulatory or contractual limitations on how long it can be stored. Until now, these customers implemented their own time-based data management. At scale, this sometimes meant that they ran a couple of Amazon Elastic Compute Cloud (EC2) instances that did nothing more than scan DynamoDB items, check date attributes, and issue delete requests for items that were no longer needed. This added cost and complexity to their application.
New Time to Live (TTL) Management
In order to streamline this popular and important use case, we are launching a new Time to Live (TTL) feature today. You can enable this feature on a table-by-table basis, specifying an item attribute that contains the expiration time for the item.
Once the attribute has been specified and TTL management has been enabled (a single API call takes care of both operations), DynamoDB will find and delete items that have expired. This processing takes place automatically and in the background and does not affect read or write traffic to the table.
You can use DynamoDB streams (see DynamoDB Update – Triggers (Streams + Lambda) + Cross-Region Replication App for more info) to process or archive the actual deletions. Like other update records in a stream, the deletions are available on a rolling 24-hour basis. You can move the expired items to cold storage, log them, or update other tables using AWS Lambda and DynamoDB Triggers.
Here’s how you enable TTL for a table and specify the desired attribute:
As you can see from the screen shot above, you can also enable DynamoDB Streams, and you can look at a preview of the items that will be deleted when you enable TTL.
HasOffers helps customers to analyze the effectiveness of their marketing campaigns, storing massive amounts of ad engagement data in the process. Once the customer-defined time window for the campaign has passed, the data is no longer needed and can be deleted. Before we made the TTL feature available to TUNE, they manually identified and then deleted the stale data. This was labor and compute-intensive, and also consumed some of the provisioned throughput for the table.
Now, they simply set an expiration time for each item and leave the rest to DynamoDB. The stale data disappears automatically, with no impact on the available throughput. As a result, TUNE has been able to purge 85 terabytes of stale data and has reduced their costs by over $200K per year, while also simplifying their application logic.
Things to Know
Here are a couple of things to keep in mind as you are thinking about putting TTL to use in your application.
TTL Attribute – The TTL attribute can be indexed or projected, but it cannot be an element of a JSON document. As I indicated earlier, it must have the Number data type. You can use IAM to regulate access to this attribute, just as you can do for any other one. Items that do not have the designated TTL attribute will not be considered for deletion. In order to avoid a possible accidental deletion due to a malformed TTL value, items that appear to be older than 5 years will not be deleted.
Tables – You can apply a TTL to a new or an existing table. The process of enabling TTL for a table can take up to an hour, and you can only make one change per table at a time.
Background Processing – The scans and the deletions take place in the background and do not count against the provisioned throughput. Deletion times will vary based on the number and nature of the expired items. After the expiration but before the actual deletion, the items remain in the table and will appear in reads and scans.
Indexes – Items are removed from any Local Secondary Indexes immediately, and from Global Secondary Indexes in the usual eventually consistent fashion.
Pricing – There is no charge for the internal scan operation or for the deletion. You will pay for storage until the item is actually deleted.
This feature is available now and you can start using it today! To learn more, read about Time to Live in the DynamoDB Developer Guide.
Our friends at the Commonwealth Scientific and Industrial Research Organization (CSIRO) in Australia sent along the guest post below to tell us about how AWS powers an important new genome editing technique.
Recent developments in molecular engineering technology now enables the accurate editing of genomes. The new technology, called CRISPR-Cas9, can be programmed to recognize and edit specific locations in the genome by pattern-matching unique sequences of DNA. While this is a powerful new tool for researchers, the ability to scan and identify targets across the entire genome has created unprecedented demand for large-scale computation. Earlier this year, the US National Institutes of Health (NIH) has approved the use of these technologies for human health. This has the potential to revolutionize cancer treatments and also adds a new time-critical dimension to the compute requirements.
A New Approach to Cancer Treatments
Approximately two in five people will be diagnosed with cancer at some point during their lifetime and while overall cancer survival has doubled, there are still cancer types with very low survival rate, for example just 1% for pancreatic cancer. This is mainly due to the difficulty of finding therapeutic interventions that kill cancer cells but not harm the healthy tissue in the body.
The new NIH approved trial will leverage breakthroughs in the genome editing technology, CRISPR-Cas9, to develop a different treatment approach. In this, the patient’s own immune system is boosted through specific modifications of the cells that natively fight cancer. This has the potential of being effective for a wide range of different tumors, with the current trial including patients with specific blood and solid cancers, as well as melanoma.
Cloud Services for Computationally Guided Genome Engineering
This new application in human health requires an increase in robustness and efficiency of CRISPR-Cas9 design in order to meet the time constraints of clinical care. Built on AWS cloud-services, researchers in the eHealth program of the Commonwealth Scientific and Industrial Research Organization (CSIRO) in Australia, developed GT-Scan2, a novel software tool to address this issue.
“Compared to other available methods, GT-Scan2 identifies genomic location with higher sensitivity and specificity,” says Dr. Denis Bauer who is leading the transformational bioinformatics team.
GT-Scan2 shows the identified CRISPR target sites at the genomic position and annotates them with high or low activity as well as their off-target potential.
GT-Scan2 improves the effectiveness of the system by finding sites that are unique in the genome. This avoids diluting the effect due to “off-target”, which are other sites in the genome with high sequence similarity. It also optimizes robustness by finding sites that are easier to modify.
“While it was known that the three-dimensional genome organization plays a role in CRISPR binding, GT-Scan2 is the first tool to also leverage other components that are crucial for Cas9 activity,” says Dr. Laurence Wilson whose research focuses on computational genome engineering.
Specifically the off-target search is a compute intensive task traditionally reserved for researchers at large institutes with high-performance-compute infrastructure as every location in the 3 billion letter long genomic sequence needs to be investigated. GT-Scan2 democratizes the ability to find optimal sites by offering this complex computation as a cloud-service using AWS Lambda functions.
Scaling Instantaneously for Personalized Treatments
GT-Scan2 leverages the instantaneous scalability that the event-driven AWS Lambda service offers. This is crucial for personalized treatment, as complexity of the targeted gene can vary dramatically.
“The off-target search as well as the robustness analysis can be subdivided into independent, modular tasks that can run in parallel” says Aidan O’Brien who designed and implemented the system within weeks after its official Asia-Pacific launch in April this year at the AWS Summit 2016 attesting to the intuitive nature of the service. A typical job takes less than a minute and the variation between jobs range from 1 second to 5 minutes. This fast fluctuation in load over minutes rather than hours ruled out an EC2-based solution as new instances would come online too slowly to keep the runtime stable.
When a user submits a job, GT-Scan2 inserts the job parameters as an item into a DynamoDB table via an API call. This allows the solution to be freely scalable without creating a bottleneck. The database entry triggers the first Lambda function, which finds all putative CRISPR targets in the user-specified DNA sequence (fetched automatically upon user submission). Potential CRISPR target sites have fixed rules and can be easily found using a regular expression that completes in seconds and are inserted into a second DynamoDB table.
Adapting to leverage the power of Lambda-based microservices
All potential targets need to be evaluated for their off-target risk using the efficient string matching tool, Bowtie. Though Bowtie only requires a reduced representation of the 3 billion letter genomic sequence, the sizes of these index files exceed the storage limitation for each Lambda instance. “GT-Scan2 divides the genome into smaller blocks to fit the Lambda specifications” explains Adrian White (Research & Technical Computing, APAC) who supported the CSIRO team during development. For an average run, GT-Scan2 hence triggers 500-1000 individual Lambda functions, which simultaneously update the scores for the different putative targets in DynamoDB. During this process, the frontend is polling this table via API Gateway and updating the webpage as results come in, eliminating the need for server-side compute.
“AWS’s Lambda has given us a great framework to develop a future-ready software package able to support medical genome engineering applications,” says Dr. Bauer. “We are specifically impressed with the ability to instantaneously scale at run time by spawning more Lambda functions to cope with the varying complexity of the different genes.” Other benefits Dr. Bauer quotes include only paying for storage during periods of no use and jobs not competing with web server resources as the website is a static page with dynamic content updated through Angular 2 and the API Gateway, as well as not needing to maintain compute instances (security patches of OS).
“One of the best things about Lambda is that users will be able to easily swap-in different machine learning algorithms that are better suited for specific CRISPR applications” says Dr. Wilson.
The GT-Scan2 Team, from left, Denis Bauer, Laurence Wilson, Aidan O’Brien
“The computational genome engineering community is one of the early adopters of our AWS Lambda technology,” explains Dr. Mia Champion (Technical Business Development Manager, Scientific Computing). “GT-Scan2’s use of API Gateway and DynamoDB is a very neat solution to ensure scalability and their clever use of epigenomics really sets them apart from other recent applications using lambda to perform CRISPR searches. I am looking forward to seeing GT-Scan2 adopted in medical applications.”
I am a flâneur! I enjoy wandering around a new city, exploring the nooks and crannies, and figuring out what makes it unique and special. On one of my trips to Tokyo I was walking through Shibuya and found and amazing hobby store. The 8-floor building contained tools, supplies, and kits for almost every imaginable hobby. As you can read from the post below (written by my colleagues in Japan), this store, TOKYU HANDS, is now an AWS customer!
TOKYU HANDS improved its customer experience with an innovative retail point-of-sale and online shopping system. Here’s what Hideki Hasegawa-san (CTO) had to say about their new solution:
As a retailer, we are always conscious about costs and DynamoDB’s easy scalability make it cost-effective to operate. For a few hundred dollars a month, we get a fast, highly-available and scalable database. We don’t need to spend any money on expensive hardware or personnel to operate the database.
TOKYU HANDS is one of Japan’s biggest and most popular retailers. It is a one-stop shop for zakka, Japanese stationary items, creative do-it-yourself (DIY) solutions, and other home items. TOKYU HANDS operates 40 stores across Japan and Singapore. In addition to its retail locations, TOKYU HANDS operates an online store where customers can shop 24/7 using a computer or a mobile device. To keep pace with its rapid growth and new business opportunities, TOKYU HANDS operates a fast, flexible and scalable IT system built entirely on Amazon Web Services (AWS).
Prior to AWS, TOKYU HANDS’ IT systems were located in its on-premises data center. Operating and scaling the data center became a significant burden. So TOKYU HANDS decided to go all-in on AWS and migrated its applications from the on-premises data center to the cloud. Offloading infrastructure management to AWS allowed TOKYU HANDS to focus on delivering more value to its customers. Hasegawa-san, says “I like AWS because we can spend time and resources innovating for our customers, and not on infrastructure management. AWS offers a wide variety of fully managed services that makes it easy for us to architect our entire IT system. Amazon DynamoDB is one such service that is at the core of critical applications.”
TOKYU HANDS’ most important applications for customer experience are its e-commerce system and Point of Sale (POS) + Merchandising System. Its e-commerce system contains all of the business logic to keep the store running. Choosing the right database for the e-commerce system is the key to achieving the scalability, flexibility and customizability needed to match its pace of innovation. TOKYU HANDS’ POS + Merchandising System has a customer-facing application that processes customer orders at the store registers and for online purchases. Among other things, the POS + Merchandising System also keeps track of item inventory as well as customer purchase history. Rather than spending time on routine backend maintenance tasks, TOKYU HANDS wanted its software development team to focus on its customers’ experience by making the POS + Merchandising System better. With its previous architecture, developers spent an inordinate amount of time maintaining and operating the backend data store to support the POS + Merchandising System.
A telling example of the operational burden TOKYU HANDS endured was trying to scale the e-commerce system to handle traffic spikes during “Hands Messe” – an annual super sale similar to Black Friday. The Hands Messe sale generated several times more traffic than usual for TOKYU HANDS’ retail and online stores. In the past, scaling up the database system to handle the spike during Hands Messe was a time-consuming and difficult task. TOKYU HANDS spent a lot time adding, configuring, and operating hardware needed to handle the Hands Messe traffic. Often, they would experience node failures during the sale, resulting in system outages. Even when hardware didn’t fail, TOKYU HANDS experienced sub-optimal database performance. The net result was an inferior shopping experience for their customers and loss of revenue during the 2012 and 2013 seasons. For startups, such service interruptions erodes customer confidence. As a result of their painful experience in managing an on-premises database, TOKYU HANDS began searching for a fully-managed database optimized for availability and scalability.
Here is the Tokyu Hands dev team:
Back row: Hideki Hasegawa, Toshiharu Ozawa, Yoshimitsu Sugawara, Minoru Saito, and Taiji Inoue.
Front row: Seigo Miyoshi, Yusuke Usui, and Manami Osawa.
Burden-free and Cost-effective Operations with DynamoDB
“My first encounter with DynamoDB was at AWS re:Invent. Through presentations and conversations with solution architects, we learned about DynamoDB’s capabilities and were intrigued by its high availability, scalability and worry-free operations. When I learned that core applications for Amazon.com and other AWS services were using DynamoDB, I decided to give it a try”, said Hasegawa-san, explaining how they started using Amazon DynamoDB.
In just a few months, TOKYU HANDS was able to re-architect its e-commerce system to use Amazon DynamoDB. In 2014, TOKYU HANDS put the new e-commerce system to the test with its annual Hands Messe sale. Amazon DynamoDB’s multiple availability zone architecture ensured their tables were highly available. TOKYU HANDS did not have to worry about system down-time resulting from node failures. TOKYU HANDS was also able to handle traffic spikes easily by scaling up DynamoDB throughput just before the sale. Unlike previous years, none of the database requests were rejected due to capacity constraints. After the sale ended, TOKYU HANDS dialed back throughput, thereby saving them money. TOKYU HANDS is considering purchasing reserved capacity for throughput in order to save even more.
Easy Application Development with Amazon DynamoDB
Using Amazon DynamoDB together with a host of other AWS services including Amazon S3, Amazon SQS, and Amazon SNS, TOKYU HANDS was able architect a brand new POS + Merchandising System. “Most of our team is made up of former clerks at our retail stores. They know our customers in and out, and we want them to use that knowledge to build powerful applications and not worry about infrastructure. With AWS, we are able to do just that”.
Here is a diagram of their architecture:
TOKYU HANDS has experienced three important benefits with AWS and Amazon DynamoDB:
- Cost effective operations
- Hands-off, worry-free operations
- Easy development of high-availability applications
See how your business can leverage this fully managed AWS NoSQL service to achieve cost-effective scale by visiting our DynamoDB Getting Started and Developer Resources pages. Then head over to the FAQs to learn how to leverage DynamoDB Streams and Triggers using serverless programming with AWS Lambda.
— AWS Japan Solution Architects
When you pull the curtain back on an AWS-powered application, you’ll find that a lot is happening behind the scenes. EC2 instances are launched and terminated by Auto Scaling policies in response to changes in system load, Amazon DynamoDB tables, Amazon SNS topics and Amazon SQS queues are created and deleted, and attributes of existing resources are changed from the AWS Management Console, the AWS APIs, or the AWS Command Line Interface (CLI).
Many of our customers build their own high-level tools to track, monitor, and control the overall state of their AWS environments. Up until now, these tools have worked in a polling fashion. In other words, they periodically call AWS functions such as
ListQueues to list the AWS resources of various types (EC2 instances, EBS volumes, and SQS queues here) and to track their state. Once they have these lists, they need to call other APIs to get additional state information for each resources, compare it against historical data to detect changes, and then take action as they see fit. As their systems grow larger and more complex, all of this polling and state tracking can become onerous.
New CloudWatch Events
In order to allow you to track changes to your AWS resources with less overhead and greater efficiency, we are introducing CloudWatch Events today.
CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources. Using simple rules that you can set up in a couple of minutes, you can easily route each type of event to one or more targets: AWS Lambda functions, Amazon Kinesis streams, Amazon SNS topics, and built-in targets.
You can think of CloudWatch Events as the central nervous system for your AWS environment. It is wired in to every nook and cranny of the supported services, and becomes aware of operational changes as they happen. Then, driven by your rules, it activates functions and sends messages (activating muscles, if you will) to respond to the environment, making changes, capturing state information, or taking corrective action.
We are launching CloudWatch Events with an initial set of AWS services and events today, and plan to support many more over the next year or so.
Diving in to CloudWatch Events
The three main components that you need to know about are events, rules, and targets.
Events (represented as small blobs of JSON) are generated in four ways. First, they arise from within AWS when resources change state. For example, an event is generated when the state of an EC2 instance changes from pending to running or when Auto Scaling launches an instance. Second, events are generated by API calls and console sign-ins that are delivered to Amazon CloudWatch Events via CloudTrail. Third, your own code can generate application-level events and publish them to Amazon CloudWatch Events for processing. Fourth, they can be issued on a scheduled basis, with options for periodic or Cron-style scheduling.
Rules match incoming events and route them to one or more targets for processing. Rules are not processed in any particular order; all of the rules that match an event will be processed (this allows disparate parts of a single organization to independently look for and process events that are of interest).
Targets process events and are specified within rules. There are four initial target types: built-in, Lambda functions, Kinesis streams, and SNS topics, with more types on the drawing board. A single rule can specify multiple targets. Each event is passed to each target in JSON form. Each rule has the opportunity to customize the JSON that flows to the target. They can elect to pass the event as-is, pass only certain keys (and the associated values) to the target, or to pass a constant (literal) string.
CloudWatch Events in Action
Let’s go ahead and set up a rule or two! I’ll use a simple Lambda function called
SomethingHappened. It will simply log the contents of the event:
Next, I switch to the new CloudWatch Events Console, click on Create rule and choose an event source (here’s the menu with all of the choices):
Just a quick note before going forward. Some of the AWS services fire events directly. Others are fired based on the events logged to CloudTrail; you’ll need to enable CloudTrail for the desired service(s) in order to receive them.
I want to keep tabs on my EC2 instances, so I choose EC2 from the menu. I can choose to create a rule that fires on any state transition, or on a transition to one or more states that are of interest:
I want to know about newly launched instances, so I’ll choose Running. I can make the rule respond to any of my instances in the region, or to specific instances. I’ll go with the first option; here’s my pattern:
Now I need to make something happen. I do this by picking a target. Again, here are my choices:
I simply choose Lambda and pick my function:
I’m almost there! I just need to name and describe my rule, and then click on Create rule:
I click on Create Rule and the rule is all set to go:
Now I can test it by launching an EC2 instance. In fact, I’ll launch 5 of them just to exercise my code! After waiting a minute or so for the instances to launch and to initialize, I can check my Lambda metrics to verify that my function was invoked:
This looks good (the earlier invocations were for testing). Then I can visit the CloudWatch logs to view the output from my function:
As you can see, the event contains essential information about the newly launched instance. Your code can call AWS functions in order to learn more about what’s going on. For example, you could call
DescribeInstances to access more information about newly launched instances.
Clearly, a “real” function would do something a lot more interesting. It could add some mandatory tags to the instance, update a dynamic visualization, or send me a text message via SNS. If you want to do any (or all of these things), you would need to have a more permissive IAM role for the function, of course. I could make the rule more general (or create another one) if I wanted to capture some of the other state transitions.
Scheduled Execution of Rules
I can also set up a rule that fires periodically or according to a pattern described in a Cron expression. Here’s how I would do that:
You might find it interesting to know that this is the underlying mechanism used to set up scheduled Lambda jobs, as announced at AWS re:Invent.
Like most AWS services, you can access CloudWatch Events through an API. Here are some of the principal functions:
PutRuleto create a new rule.
RemoveTargetsto connect targets to rules, and to disconnect them.
DescribeRuleto find out more about existing rules.
PutEventsto submit a set of events to CloudWatch events. You can use this function (or the CLI equivalent) to submit application-level events.
Metrics for Events
CloudWatch Events reports a number of metrics to CloudWatch, all within the AWS/Events namespace. You can use these metrics to verify that your rules are firing as expected, and to track the overall activity level of your rule collection.
The following metrics are reported for the service as a whole:
- Invocations – The number of times that target have been invoked.
- FailedInvocations – The number of times that an invocation of a target failed.
- MatchedEvents – The number of events that matched one or more rules.
- TriggeredRules – The number of rules that have been triggered.
The following metrics are reported for each rule:
- Invocations – The number of times that the rule’s targets have been invoked.
- TriggeredRules – The number of times that the rule has been triggered.
In the Works
Like many emerging AWS services, we are launching CloudWatch Events with an initial set of features (and a lot of infrastructure behind the scenes) and some really big plans, including AWS CloudFormation support. We’ll adjust our plans based on your feedback, but you can expect coverage of many more AWS services and access to additional targets over time. I’ll do my best to keep you informed.
We are launching CloudWatch Events in the US East (Northern Virginia), US West (Oregon), EU (Ireland), and Asia Pacific (Tokyo) regions. It is available now and you can start using it today!
Graph databases elegantly and efficiently represent entities (generally known as vertices or nodes) and relationships (edges) that connect them. Here’s a very simple example of a graph:
Bill and Candace have a daughter named Janet, and she has a son named Bob. This makes Candace Bob’s grandmother, and Bill his grandfather.
Once a graph has been built, it is processed by traversing the edges between the vertices. In the graph above, we could traverse from Bill to Janet, and from there to Bob. Graphs can be used to model social networks (friends and “likes”), business relationships (companies, employees, partners, suppliers, and customers), dependencies, and so forth. Both vertices and edges can be typed; some vertices could be people as in our example, and others places. Similarly some edges could denote (as above) familial relationships and others could denote “likes.” Every graph database allows additional information to be attached to each vertex and to each edge, often in the form of name-value pairs.
Titan is a scalable graph database that is optimized for storing and querying graphs that contain hundreds of billions of vertices and edges. It is transactional, and can support concurrent access from thousands of users.
DynamoDB Storage Backend for Titan
Titan’s pluggable data storage layer already supports several NoSQL databases and key-value stores. This allows you to choose the backend that provides the performance and features required by your application, while giving you the freedom to switch from one backend to another with minimal changes to your application code.
Today we are making a new DynamoDB Storage Backend for Titan available. Storing your Titan graphs in Amazon DynamoDB lets you scale to handle huge graphs without having to worry about building, running, or maintaining your own database cluster. Because DynamoDB can scale to any size and provides high data availability and predictable performance, you can focus on your application instead of on your graph storage and processing infrastructure. You can also run Titan and DynamoDB Local on your laptop for development and testing.
The backend works with versions 0.4.4 and 0.5.4 of Titan. Both versions support fast traversals, edges that are both directed and typed, and stored relationships. The newer version adds support for vertex partitioning, vertex labels, and user defined transaction logs. The backend is client-based; we did not make any changes to DynamoDB to support it. You are simply using DynamoDB as an efficient way to store your Titan graphs.
Version 0.4.4 of Titan is compatible with version 2.4 of the Tinkerpop stack; version 0.5.4 of Titan is compatible with version 2.5 of the stack. Tinkerpop is a collection of tools and algorithms that provides you with even more in the way of graph processing and analysis options.
Since I am talking about graphs, I should illustrate all of the items that I have talked about in the form of a graph! Here you go:
My colleague Alex Patrikalakis created the following Gremlin script. It replicates the graph above using Titan and DynamoDB:
conf = new BaseConfiguration() conf.setProperty("storage.backend", "com.amazon.titan.diskstorage.dynamodb.DynamoDBStoreManager") conf.setProperty("storage.dynamodb.client.endpoint", "http://localhost:4567") g = TitanFactory.open(conf) titan = g.addVertex(null, [name:"Titan"]) blueprints = g.addVertex(null, [name:"Blueprints"]) pipes = g.addVertex(null, [name:"Pipes"]) gremlin = g.addVertex(null, [name:"Gremlin"]) frames = g.addVertex(null, [name:"Frames"]) furnace = g.addVertex(null, [name:"Furnace"]) rexster = g.addVertex(null, [name:"Rexster"]) DynamoDBStorageBackend = g.addVertex(null, [name:"DynamoDB Storage Backend for Titan"]) DynamoDBLocal = g.addVertex(null, [name:"DynamoDB Local"]) DynamoDB = g.addVertex(null, [name:"DynamoDB"]) g.addEdge(titan, blueprints, "implements") g.addEdge(pipes, blueprints, "builds-on") g.addEdge(gremlin, blueprints, "builds-on") g.addEdge(frames, blueprints, "builds-on") g.addEdge(furnace, blueprints, "builds-on") g.addEdge(rexster, blueprints, "builds-on") g.addEdge(titan, DynamoDBStorageBackend, "backed-by") g.addEdge(DynamoDBStorageBackend, DynamoDBLocal, "connects-to") g.addEdge(DynamoDBStorageBackend, DynamoDB, "connects-to") g.commit()
The DynamoDB Storage Backend for Titan is available as a Maven project on GitHub. It runs on Windows, OSX, and Linux and requires Maven and Java 1.7 (or later). The Amazon DynamoDB Storage Backend for Titan includes installation instructions and an example that makes creative use of the Marvel Universe Social Graph public dataset. We have also created a CloudFormation template that will launch an EC2 instance that has the Titan/Rexster stack and the DynamoDB Storage Backend for Titan installed and ready to use.
When I take a look at our recent service releases and combine it with some of the AWS-related repos on GitHub, an interesting trend becomes apparent. It seems to me that connecting AWS services to each other and to third party tools is becoming more and more common. For example, in a recent post, I showed you how to combine CloudWatch Logs, Elasticsearch, and Kibana to visualize event data. In another post I showed you how AWS OpsWorks can provision and manage ECS container instances. As I noted in that post, I think of this as “peanut butter and chocolate” — combining two good flavors in to another that is even better.
DynamoDB + Elasticsearch
Today I would like to tell you about another way to connect several interesting pieces of technology together: DynamoDB and Elasticsearch by way of a DynamoDB Streams connector (plugin) for Logstash.
You can run Logstash on an Amazon Elastic Compute Cloud (EC2) instance or on-premises. After you configure it to take input from the DynamoDB tables and streams that you designate, it will track changes (inserts, updates, and deletions) to the tables and update your Elasticsearch cluster accordingly. You can also configure the plugin to scan the table (with checkpoints along the way) to bring your cluster in to sync with the table.
Once your cluster is up, running, and tracking changes to your tables, you can perform efficient queries (structured, full-text, and multifield) using Elasticsearch. Your queries can make use of proximity matching and partial matching, and you can also control relevance using Elasticsearch’s scoring infrastructure (you can learn about these topics and more in Elasticsearch: The Definitive Guide).
Changes made to your DynamoDB tables are reflected in the stream very quickly (generally a second or less). The plugin will have access to these changes and will update your cluster as expeditiously as possible.
Download, Install, and Run
You can download the plugin from our new DynamoDB Community page, install it on an EC2 instance or on-premises, point it at your Elasticsearch cluster, and start searching your DynamoDB content today! Start by reading the documentation and following the directions in the README.
I’ve got some really good news for Amazon DynamoDB users! First, the DynamoDB Streams feature is now available and you can start using it today. As you will see from this blog post, it is now very easy to use AWS Lambda to process the change records from a stream. Second, we are making it really easy for you to replicate content from one DynamoDB table to another, either across regions or within a region.
Let’s dig in!
We launched a sneak preview of DynamoDB Streams last fall, just a couple of days before AWS re:Invent. As I wrote at the time, we built this feature because many AWS customers expressed a desire to be able to track the changes made to their DynamoDB tables.
DynamoDB Streams are now ready for production use. Once you enable it for a table, all changes (puts, updates, and deletes) are tracked on a rolling 24-hour basis and made available in near real-time as a stream record. Multiple stream records are grouped in to shards and returned as a unit for faster and more efficient processing.
The relative ordering of a sequence of changes made to a single primary key will be preserved within a shard. Further, a given key will be present in at most one of a set of sibling shards that are active at a given point in time. As a result, your code can simply process the stream records within a shard in order to accurately track changes to an item.
Your code can retrieve the shards, iterate through the records, and process them in any desired way. The records can be retrieved at approximately twice the rate of the table’s provisioned write capacity.
You can enable streams for a table at creation time by supplying a stream specification parameter when you call
CreateTable. You can also enable streams for an existing table by supplying a similar specification to
UpdateTable. In either case, the specification must include a flag (enable or disable streams), and a view type (store and return item keys only, new image only, old image only, or both new and old images).
Read the new DynamoDB Streams Developer Guide to learn more about this new feature.
You can create DynamoDB Streams on your DynamoDB tables at no charge. You pay only for reading data from your Streams. Reads are measured as read request units; each call to
GetRecords is billed as a single request unit and can return up to 1 MB of data. See the DynamoDB Pricing page for more info.
DynamoDB Streams + Lambda = Database Triggers
AWS Lambda makes it easy for you to write, host, and run code (currently Node.js and Java) in the cloud without having to worry about fault tolerance or scaling, all on a very economical basis (you pay only for the compute time used to run your code, in 100 millisecond increments).
As the centerpiece of today’s launch of DynamoDB Streams in production status, we are also making it easy for you to use Lambda to process stream records without writing a lot of code or worrying about scalability as your tables grow larger and busier.
You can think of the combination of Streams and Lambda as a clean and lightweight way to implement database triggers, NoSQL style! Historically, relational database triggers were implemented within the database engine itself. As such, the repertoire of possible responses to an operation is limited to the operations defined by the engine. Using Lambda to implement the actions associated with the triggers (inserting, deleting, and changing table items) is far more powerful and significantly more expressive. You can write simple code to analyze changes (by comparing the new and the old item images), initiate updates to other forms of data, enforce business rules, or activate synchronous or asynchronous business logic. You can allow Lambda to manage the hosting and the scaling so that you can focus on the unique and valuable parts of your application.
Getting set up to run your own code to handle changes is really easy. Let’s take a quick walk-through using a new table. After I create an invocation role for Lambda (so that it can access DynamoDB on my behalf), I open up the Lambda Console and click on Create a Lambda function. Then I choose the blueprint labeled dynamodb-process-stream:
Each blueprint configures an event source and a skeletal Lambda function to get you started. The Console prompts me to configure the event source. I connect it to one of my DynamoDB tables (user_table), indicate that my code can handle batches of up to 100 stream records, and that I want to process new records (I could also choose to process existing records dating back to the stream’s trim horizon):
The blueprint includes a function that I can use as-is for testing purposes; I simply give it a name (ProcessUserTableRecords) and choose an IAM role so that the function can access DynamoDB:
Now I confirm my intent. I will enable the event source (for real development you might want to defer this until after you have written and tested your code):
Clicking Create function will create the function and use my table’s update stream as an event source. I can see the status of this and the other event sources on the Event sources tab in the Lambda Console:
Ok, I am all set. At this point I have a function, it is connected to my table’s update stream, and it is ready to process records! To test this out I switch to the DynamoDB Console and insert a couple of items into my table in order to generate some activity on the stream:
Then I go back to the Lambda Console (browser tabs make all of this really easy, of course) and verify that everything worked as expected. A quick glance at the Monitoring tab confirms that my function ran twice, with no apparent errors:
That looks good, so I inspect the CloudWatch Logs for the function to learn more:
If I was building a real application, I could start with the code provided by the blueprint and add more functionality from there.
AWS customer Mapbox is already making use of DynamoDB Streams and Lambda, take a look at their new blog post, Scaling the Mapbox Infrastructure with DynamoDB Streams.
To learn more about how to use DynamoDB and Lambda together, read the documentation on Using DynamoDB Streams and AWS Lambda. There is no charge for DynamoDB Triggers; you pay the usual rates for the execution of your Lambda functions (see the Lambda Pricing page for more information).
I believe that this new feature will allow you to make your applications simpler, more powerful, and more responsive. Let me know what you build!
Cross-Region DynamoDB Replication
As an example of what can be done with the new DynamoDB Streams feature, we are also releasing a new cross-region replication app for DynamoDB. This application makes use of the DynamoDB Cross Region Replication library that we published last year (you can also use this library as part of your own applications, of course).
You can use replication to duplicate your DynamoDB data across regions for several different reasons including disaster recovery and low-latency access from multiple locations. As you’ll see, the app makes it easy for you to set up and maintain replicas.
You can initiate the launch process from within the DynamoDB Console. CloudFormation will prompt you for the information that it needs to have in order to create the stack and the containers:
Give the stack (a collective name for the set of AWS resources launched by the template) a name and then click on Next. Then fill in the parameters (you can leave most of these at their default values):
The Metadata table contains the information that the replicator needs to have in order to know which tables to replicate and where the replicas are to be stored. After you launch the replication app you can access its online configuration page (the CloudFormation template will produce a URL) and set things up:
This feature is available to you at no charge. You will be charged for the resources (provisioned throughput and storage for the replica tables, data transfer between regions, reading data from the Streams, the EC2 instances, and the SQS queue that is used to control the application). See the DynamoDB Pricing page for more information.
Read about Cross Region Replication to learn how to set everything up!
Thousands of customers use Amazon DynamoDB to build popular applications for Gaming (Battle Camp), Mobile (The Simpsons Tapped Out), Ad-tech (AdRoll), Internet-of-Things (Earth Networks) and Modern Web applications (SmugMug).
We have made some improvements to DynamoDB in order to make it more powerful and easier to use. Here’s what’s new:
- You can now add, edit, and retrieve native JSON documents in the AWS Management Console.
- You can now use a friendly key condition expression to filter the data returned from a query by specifying a logical condition that must be met by the hash or hash range keys for the table.
Let’s take a closer look!
Native JSON Editing
As you may know, DynamoDB already has support for storage, display, and editing of JSON documents (see my previous post, Amazon DynamoDB Update – JSON, Expanded Free Tier, Flexible Scaling, Larger Items if this is news to you). You can store entire JSON-formatted documents (each up to 400 KB) as single DynamoDB items. This support is implemented within the AWS SDKs and lets you use DynamoDB as a full-fledged document store (a very common use case).
You already have the ability to add, edit, and display JSON documents in the console in DynamoDB’s internal format. Here’s what this looks like:
Today we are adding support for adding, editing, and display documents in native JSON format. Here’s what the data from the example above looks like in this format:
You can work with the data in DynamoDB format by clicking on DynamoDB JSON. You can enter (or paste) JSON directly when you are creating a new item:
You can also view and edit the same information in structured form:
Key Condition Expressions
You already have the ability to specify a key condition when you call DynamoDB’s Query function. If you do not specify a condition, all of the items that match the given hash key will be returned. If you specify a condition, only items that meet the criteria that it specifies will be returned. For example, you could choose to retrieve all customers in Zip Code 98074 (the hash key) that have a last name that begins with “Ba.”
With today’s release, we are adding support for a new and easier to use expression-style syntax for the key conditions. You can now use the following expression to specify the query that I described in the preceding paragraph:
zip_code = "98074" and begins_with(last_name, "Ba")
The expression can include Boolean operators (
=, <, <=, >, >=), range tests (
BETWEEN/AND), and prefix tests (
begins_with). You can specify a key condition (the
KeyCondition parameter) or a key condition expression (the
KeyConditionExpression parameter) on a given call to the
Query function, but you cannot specify both. We recommend the use of expressions for new applications. To learn more, read about Key Condition Expressions in the DynamoDB API Reference.
These features are available now and you can start using them today!
Let’s say you are building a data-intensive application. It might be for gaming, mobile, Internet of Things (IoT) or simply a modern web application. You are customer-driven and you derive satisfaction from being able to rapidly respond to the ever-changing needs of your users. Their requirements are quickly evolving and the application needs to evolve just as fast. One day they’ll ask you to start storing additional information; the next they’ll ask you for better retrieval functions keyed off of that new information.
Amazon DynamoDB is a great fit for an application built in the environment that I just described. Unlike traditional relational databases, DynamoDB provides a fast and flexible database that has no schema and supports JSON objects. This means that you can begin to store additional information (new attributes and values) as the data becomes available without having to alter the table definition or modify the existing items. As you receive requests to retrieve this information in new ways, you can use Global Secondary Indexes for flexibility and efficiency (you can even create them online with no down time).
As you may know, you have three query options for your DynamoDB tables:
- You can use a
GetItemoperation to retrieve a specific item,
- You can use a
Queryoperation to retrieve specific items based on conditions, or
- You can use a
Scanoperation to retrieve all items.
Secondary Index Scans
Today we are giving you the ability to perform Scan operations on your table’s local and global secondary indexes. This will allow you to efficiently retrieve the precise set of attributes that you have projected in to the index. You can also apply a filter to the results in order to retrieve only the items which meet one or more conditions. Because you can choose to project any desired subset of the table’s attributes in to the index, you can create slim indexes that will make efficient use of your provisioned read throughput. The effect of this feature will become more and more pronounced as you add additional attributes to some or all of the items in the table.
You can use this new feature from the DynamoDB API by including the name of the desired index in your call to the Scan function. You can scan local or global secondary indexes. Scans on local secondary indexes can ask for non-projected attributes; scan on global secondary indexes cannot. The indexed attribute determined the order of the results returned by a scan operation.
The AWS Management Console now supports scanning on secondary indexes. Suppose I have the following items in my table:
And the following Global Secondary Indexes:
I can scan on an index by clicking on Explore Table, choosing the desired index from the dropdown menu, and clicking on Start New Scan:
As you can see, the results contain only the attributes that I projected in to the index. I can further qualify the scan by adding a filter:
This feature is available now and you can start using it today! The cost of a secondary index scan (measured in read capacity units) is identical to the cost of a query on the index. Local index scans that do not filter on or request non-projected attributes cost the same as a regular scan on the same table.
PS – In my sample data, birth dates for Luke and his father are based on the Tho Yor Arrival (in case you were wondering).
Developers all over the world are using Amazon DynamoDB to build applications that take advantage of its ability to provide consistent low-latency performance. The developers that I have talked to enjoy the flexibility provided by DynamoDB’s schemaless model, along with the ability to scale capacity up and down as needed. They also benefit from the DynamoDB Reserved Capacity model in situations where they are able to forecast their need for read and write throughput ahead of time.
A little over a year ago we made DynamoDB more flexible by adding support for Global Secondary Indexes. This important feature moved DynamoDB far beyond its roots as a key-value store by allowing lookups on attributes other than the primary key.
Today we are making Global Secondary Indexes even more flexible by giving you the ability to add and delete them from existing tables on the fly.
We are also making it easier for you to purchase Reserved Capacity directly from the AWS Management Console. As part of this change to a self-service model, you can now purchase more modest amounts of Reserved Capacity than ever before.
Let’s zoom in for a closer look!
Global Secondary Indexes on the Fly
Up until now you had to define the Global Secondary Indexes for each of your DynamoDB tables at the time you created the table. This static model worked well in situations where you fully understood your data model and a good sense for the kinds of queries that you needed to use to build your application.
DynamoDB’s schemaless model means that you can add new attributes to an existing table by simply storing them. Perhaps your original table stored a first name, a last name, and an email address. Later, you decided to make your application location-aware by adding a zip code. With today’s release you can add a Global Secondary Index to the existing table. Even better, you can do this without taking the application offline or impacting the overall throughput of the table.
Here’s how you add a new index using the AWS Management Console. First, select the table and click on Create Index:
Then enter the details (you can use a hash key or a combination of a hash key and a range key):
The index will be created and ready to go before too long (the exact time depends on the number of items in the table and the amount of provisioned capacity). You can also delete indexes that you no longer need. All of this functionality is also available through DynamoDB’s UpdateTable API.
There is no extra charge for this feature. However, you may need to provision additional write throughput in order to allow for the needs of the index creation process. You’ll pay the usual DynamoDB price for storage of the Global Secondary Indexes that you create.
Purchasing Reserved Capacity
DynamoDB’s unique provisioned capacity model makes it easy for you to build applications that can scale to any desired level of throughput. Instead of having to worry about adding hardware, tuning software, or rearchitecting your application as traffic grows, you can simply provision additional read or write capacity. The provisioning model even allows you to add capacity in anticipation of high traffic (perhaps your application is busiest during local business hours) and to remove it when it is not needed. This model allows you to create a cost structure that closely mirrors actual usage of your application and avoids unnecessary charges for idle resources.
In situations where you have enough confidence in your usage model and your predictions for growth over time, you can reduce your DynamoDB costs even more by purchasing Reserved Capacity for a one or a three year term. After you pay the upfront fee, you will be billed monthly for the amount of capacity that you purchase. By purchasing capacity up front, you will save 53% (one year term) or 76% (three year term) over the regular hourly rates.
In order to make Reserved Capacity accessible to more DynamoDB users, we have made two important changes. First, we have simplified the purchase process and made it accessible from within the Console. Second, we have reduced the minimum purchase to just 100 read or write capacity units. To purchase Reserved Capacity within a particular AWS region, open up the Console, choose the region, and click on the Reserved Capacity button:
Select the amount of read and/or write capacity that you need (in units of 100), choose a term, and fill in your email address:
Your purchases are visible in the Console:
You can read more about this feature in the recent post, On DynamoDB Provisioning: Simple, Flexible, and Affordable, in the AWS Startup Collection.
From our Customers
AWS customer Eddie Dingels (Lead Architect for Earth Networks) is already taking advantage of on-the-fly indexing and the new pricing model! In his words:
With online indexing, we can re-index tables to run new queries whenever we want. DynamoDB handles consistently changing the index while taking live traffic without a performance impact even on large data sets.
He’s also saving money:
DynamoDB has a very simple and innovative approach to database provisioning, it is truly pay as you go. Reserved capacity ends up dropping DynamoDB throughput costs by up to 76%, and today’s announcement makes it easier than ever for us to perform incremental purchases as we grow.
The new Reserved Capacity pricing model is available today in all regions. Online indexing is available today in the Asia Pacific (Tokyo), Asia Pacific (Singapore), EU (Ireland), US East (Northern Virginia), US West (Oregon), and US West (Northern California) regions. We expect to make it available in the EU (Frankfurt), South America (São Paulo), China (Beijing), and AWS GovCloud (US) regions within a week or so.
PS – Some of our developers put together a new article to show you how to Build a Mars Rover Application With DynamoDB. The code in this article takes advantage of the new JSON support and is a great way to exercise DynamoDB’s expanded free tier.