Category: Amazon DynamoDB

Driving Big Data Innovation on AWS – ISV Highlights, April 2017

by Kate Miller | on | in Amazon DynamoDB, Amazon Redshift, Amazon S3, APN Technology Partners, AWS Competencies, Big Data, Big Data Competency | | Comments

Introduction by Terry Wise, Global Vice President, Channels & Alliances at AWS

What can your data do for you? More importantly, how can insights derived from your data help you drive additional value for end customers?

Our APN partners offer services and solutions that complement what AWS has to offer. As an example, many customers are choosing to build a data lake on AWS. NorthBay is a Big Data Competency Consulting partner that helped architect and implement a data lake on AWS for Eliza Corporation. You can read details of the solution they built here. Today, I want to tell you a bit about four of our AWS Big Data Competency ISVs and what makes them unique: Alteryx, Databricks, SnapLogic, and Treasure Data.


AWS Big Data Competency Holder in Data Integration

How is your time spent when you embark on a new data analytics project? For many, the time required to gather, prepare, and process their data cuts into the time they can spend actually analyzing and learning from their data. Alteryx’s mission is to change the game for these analysts through the company’s self-service data analytics platform. “Alteryx Analytics provides analysts the unique ability to easily prep, blend, and analyze all of their data using a repeatable workflow, then deploy and share analytics at scale for deeper insights in hours, not weeks. Analysts love the Alteryx Analytics platform because they can connect to and cleanse data from data warehouses, cloud applications, spreadsheets, and other sources, easily join this data together, then perform analytics – predictive, statistical, and spatial – using the same intuitive user interface, without writing any code,” says Bob Laurent, VP of product marketing at Alteryx. The company’s products are used by a number of AWS customers, including Chick-fil-A, Marketo, and The National Trust.

Alteryx integrates with Amazon Redshift and provides support for Amazon Aurora and Amazon S3. Using Alteryx on AWS, users can blend data stored in the AWS Cloud, such as data stored in Redshift, with data from other sources using Alteryx’s advanced analytic workflow. Earlier this year, the company virtualized its Alteryx Server platform to make it easy for users to deploy on AWS through the AWS Marketplace. “Organizations can deploy our Alteryx Server platform in the AWS Cloud within minutes, while maintaining the enterprise-class security and scalability of our popular on-premises solution. This gives organizations a choice for how they want to quickly share critical business insights with others in their organization,” explains Laurent.

See Alteryx in action by  downloading a free 14-day trial of Alteryx Designer here, or launch Alteryx Server from the AWS Marketplace here. If you’re interested in becoming an Alteryx Partner, click here. To learn more about Alteryx, visit the company’s AWS-dedicated site.


AWS Big Data Competency Holder in Advanced Analytics

Are you looking for an efficient way to run Apache® Spark™ as you seek to create value from your data and build a sophisticated analytics solution on AWS? Then take a look at Databricks, founded by the team who created the Apache Spark project. “Databricks provides a just-in-time data platform, to simplify data integration, real-time experimentation, and robust deployment of production applications,” says John Tripier, Senior Director, Business Development at Databricks. The company’s mission is to help users of all types within an organization, from data scientists to data engineers to architects to business analysts, harness and maximize the power of Spark. Users can also take advantage of a wide range of BI tools and systems that integrate with the platform, including Tableau, Looker, and Alteryx. The company works with companies across a wide range of industries, including Capital One, 3M, NBC Universal,, Viacom, and LendUp.

Databricks is hosted on AWS, and takes advantage of Amazon EC2 and Amazon S3. “Databricks is a cloud-native platform that deploys Spark clusters within the AWS accounts of our 500+ customers. We leverage the compute, storage, and security resources offered by AWS. We find AWS is a reliable and secure environment and enables fast implementation of infrastructure in regions all over the world,” says Tripier.

Want to give Databricks a spin? The company offers a free trial of their software here. Learn more about the Databricks platform here. And if you’re a Consulting Partner interested in learning more about becoming a Databricks Partner, click here. Databricks deploys in all regions, including AWS GovCloud, and is also an AWS Public Sector Partner.


AWS Big Data Competency Holder in Data Integration

Where does your data come from? For most companies, particularly enterprises, the answer is, a lot of places. SnapLogic is focused on helping enterprises easily connect applications, data, and things between on-premises, cloud, and hybrid environments through its Enterprise Integration Cloud (EIC). True to its name, the company provides Snaps, which are modular collections of integration components built for a specific data source, business application, or technology. “We help customers automate business processes, accelerate analytics, and drive digital transformation,” says Ray Hines, director of strategic partners and ISVs at SnapLogic. The company works with hundreds of customers, including Adobe, Box, and Earth Networks.

The SnapLogic Enterprise Integration Cloud integrates with Amazon Redshift, Amazon DynamoDB, and Amazon RDS. “We provided pre-built integrations with these services because our customers are rapidly adopting them for their cloud data warehousing needs,” explains Hines. The company’s solution can help simplify the onboarding process for Redshift, DynamoDB, and RDS customers. For instance, Snap Patterns provide pre-built data integrations for common use cases and a number of other features (learn more here).

Care to try out SnapLogic for AWS? Click here for a 30-day free trial of SnapLogic Integration Cloud for Redshift or download the data sheet here. You can request a custom demo here. Consulting Partners, learn more about becoming a SnapLogic Partner here.

Treasure Data

AWS Big Data Competency Holder in Data Management

Are you a marketer looking for the ability to use data to provide great experiences to end customers? Are you in sales operations and are you looking to create a centralized dashboard for real-time sales data? Give life to your customer data through Treasure Data. “Treasure Data simplifies data management. Our Live Customer Data platform keeps data connected, current, and easily accessible to the people and algorithms that drive business success,” says Stephen Lee, vice president of business development at Treasure Data. “We provide a turnkey solution that collects data from 300+ sources, stores the data at scale, and provides users the tools to analyze and activate their data in their application of choice.” The company works with customers across industries including Grindr, Warner Brothers, and Dentsu.

“We deployed our solution on AWS because of the scalability, reliability, and global footprint of the AWS Cloud and the ability to deploy without having capital expenditures. With AWS, we can easily deploy our solution in new regions. We’ve also found there to be a strong support ecosystem,” says Lee. Treasure Data’s Live Customer Data Platform integrates with Amazon Redshift, Amazon S3, and Amazon Kinesis, along with many other solutions including Tableau, Chartio, Qlik, Looker, and Heroku (see all integrations and learn some nifty integration recipes). Getting started with Treasure Data is easy. “Our Solution Architects work with our new customers to get their initial data sources set up, after which our customers can be up and running in minutes,” explains Lee.

You can request a custom demo here, or simply email the team directly at Consulting Partners interested in becoming a Treasure Data partner can visit the company’s partner page here.

Want to learn more about big data on AWS? Click here. Bookmark the AWS Big Data Blog for a wealth of technical blog posts you can look to as you begin to take advantage of AWS for big data.

This blog is intended for educational purposes and is not an endorsement of the third-party products. Please contact the firms for details regarding performance and functionality.

Have You Read Our 2016 AWS Partner Solutions Architect Guest Posts?

by Kate Miller | on | in Amazon DynamoDB, Amazon ECS, APN Competency Partner, APN Partner Highlight, APN Technical Content Launch, APN Technology Partners, Automation, AWS CloudFormation, AWS Lambda, AWS Marketplace, AWS Partner Solutions Architect (SA) Guest Post, AWS Product Launch, AWS Quick Starts, Big Data, Containers, Database, DevOps on AWS, Digital Media, Docker, Financial Services, Healthcare, NAT, Networking, Red Hat, SaaS on AWS, Security, Storage | | Comments

In 2016, we hosted 38 guest posts from AWS Partner Solutions Architects (SAs), who work very closely with both Consulting and Technology Partners as they build solutions on AWS. As we kick off 2017, I want to take a look back at all of the fantastic content created by our SAs. A few key themes emerged throughout SA content in 2016, including a focus on building SaaS on AWS, DevOps and how to take advantage of particular AWS DevOps Competency Partner tools on AWS, Healthcare and Life Sciences, Networking, and AWS Quick Starts.

Partner SA Guest Posts

There’ll be plenty more to come from our SAs in 2017, and we want to hear from you. What topics would you like to see our SAs discuss on the APN Blog? What would be most helpful for you as you continue to take advantage of AWS and build your business? Tell us in the comments. We look forward to hearing from you!


How We Built a SaaS Solution on AWS, by CrowdTangle

by Kate Miller | on | in Amazon DynamoDB, Amazon RDS, Amazon Redshift, APN Partner Highlight, APN Technology Partners, AWS Elastic Beanstalk, Database, SaaS on AWS, Startups | | Comments

The following is a guest post from Matt Garmur, CTO at CrowdTangle, a startup and APN Technology Partner who makes it easy for you to keep track of what’s happening on social media. Enjoy!

Horses were awesome.

If you had a messenger service 150 years ago, using horses was so much better than the alternative, walking. Sure, you had to hire people to take care of horses, feed them, and clean up after them, but the speed gains you got were easily worth the cost. And over time, your skills at building a business let you create systems that could handle each of these contingencies extremely efficiently.

And then cars came around, and you were out of luck.

Not immediately, of course. The first car on the street didn’t put you out of business. Even as cars got more mainstream, you still had the benefit of experience over startup car services. But once the first company grew up that was built with the assumption that cars existed, despite all your knowledge, you were in big trouble.

At CrowdTangle, we build some of the best tools in the world for helping people keep track of what’s happening on social media. We have a team of engineers and account folks helping top media companies, major league sports teams, and others find what they care about in real time (and we’re hiring!). Importantly, we started our company in 2011, which meant that AWS had been around for 5 years, and we could, and did, confidently build our entire business around the assumption that it would exist.

AWS was our car.

It may seem like an exaggeration, but it’s not. We were able to build an entirely different type of organization on AWS than we could have built five years prior. Specifically, it has impacted us in four critical ways: business model, hiring, projections and speed, which of course are all different ways of saying, “cost,” and thus, “survival.”

First is the business model. When we started developing our company, we didn’t consider producing physical media to hold our software, nor did we consider installing it on-premises. By making our model Software as a Service (SaaS), we got a lot of immediate benefits: we were able to allow users to try our product with no more effort than going to a website; we could push features and fixes dozens of times a day; and we could know that everyone would get the same controlled experience. But by taking on the hosting ourselves, we would need to have a significant capital outlay at the start in order to simply deliver our product. Having AWS to begin on without those initial costs made SaaS a viable option for our growing startup.

Next is hiring. AWS has Amazon Relational Database Service (Amazon RDS), a managed database service, which means I don’t need to hire a DBA, since it’s coder-ready (and on Intel Xeon E5s, so we’re certainly not sacrificing quality). AWS has Elastic Beanstalk, a service that makes it simple for us to deploy our application on AWS, which means I can set up separate environments for front- and back-end servers, and scale them independently at the push of a button. Amazon DynamoDB, the company’s managed noSQL database service, helps alleviate me of the need to have four full-time engineers on staff keeping my database ring up and running. We keep terabytes of real-time data, get single-digit millisecond response times, and from my perspective, it takes care of itself. My team can be focused on what matters to our driving the growth of our business, because we don’t need to spend a single hire on keeping the lights on.

Third is projections. If you’re in the horse world, your purchasing model for computers is to run as close to capacity as possible until it’s clear you need a capital outlay. Then you research the new machine, contact your supplier, spend a lot of money at once, wait for shipping, install it, and when it goes out of service, try to resell it and recover some of the cost. In the car world, if I think we might need more machinery, even for a short time, I request an instance, have it available immediately, and start paying pennies or dollars by the hour. If I’m done with that instance? Terminate and I stop paying for it. If I need a bigger instance? I simply provision a bigger instance on the spot.

Finally, I want to talk about speed. Because of our choice to build our solution on AWS, we have a lean team that can provision resources faster, and can constantly work on fun projects rather than having to focus on simple maintenance. Not only can we move quickly on the scoped projects, but we can do cheap R&D for the moonshots. Every new project could be a bust or our next million-dollar product, but they start the same — have an idea, clone an existing environment, put your project branch on it, trot it out for clients to play with, and spin it down when done.

We recently decided that an aggregation portion of our system was slower than we liked, and we researched moving it to Amazon Redshift. To do so, we spun up a small Redshift instance (note: no projections), did initial testing, then copied our entire production database into Redshift (note: R&D speed). “Production” testing proved the benefits, so now we have an entire secondary Amazon Kinesis-Redshift managed pipeline for our system (note: no hiring, despite adding systems), and the speed increase has opened the door for new products that weren’t possible for us under the prior method. How much would that experimentation cost in the horse world? What would it have taken to execute? Would any of those projects have been small enough to be worth taking a chance on? We place small bets all the time, and that’s what helps us remain a leader in our field.

Your next competitor will have grown up in the age of cars. How can you compete when you have horses?

To learn more about CrowdTangle, click here.






The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.

APN Partner Webinar Series – AWS Database Services

by Kate Miller | on | in Amazon Aurora, Amazon DynamoDB, Amazon Redshift, APN Webcast, Database | | Comments

Want to dive deep and learn more about AWS Database offerings? This webinar series will provide you an exclusive deep dive into Amazon Aurora, Amazon Redshift, and Amazon DynamoDB. These webinars feature technical sessions led by AWS solutions architects and engineers, live demonstrations, customer examples, and Q&A with AWS experts.

Check out these upcoming webinars and register to attend!

Amazon Aurora Architecture Overview

September 26, 2016 | 11:30am-12:30pm PDT

This webinar provides a deep architecture overview of Amazon Aurora. Partners attending this webinar will learn how Amazon Aurora differs from other relational database engines with special focus on features such as High Availability (HA) and 5x Performance compared to MySQL.

Register Here >>

Understanding the Aurora Storage Layer

October 3, 2016 | 11:30am-12:30pm PDT

This webinar will dive deep into the Amazon Aurora Storage Layer. Attendees will receive a technical overview of performance and availability features as well as insights into future enhancements.

Register Here >>

Amazon Aurora Migration Best Practices

October 10, 2016 | 11:30am-12:30pm PDT

This webinar will cover best practices for migrating from Oracle to Amazon Aurora. Partners attending this webinar will learn about common migration opportunities, challenges, and how to address them.

Register Here >>

Selecting an AWS Database

October 17, 2016 | 11:30am-12:30pm PDT

Amazon Aurora, Amazon Redshift, and Amazon DynamoDB are managed AWS database offerings well-suited for a variety of use cases. In this webinar, partners will learn best practices for selecting a database and how each offering fits into the broader AWS portfolio of database services.

Register Here >>

Amazon RDS PostgreSQL Deep Dive

October 24, 2016 | 11:30am-12:30pm PDT

Amazon RDS makes it easy to set up, operate, and scale PostgreSQL deployments in the cloud. Amazon RDS manages time-consuming administrative tasks such as PostgreSQL software upgrades, storage management, replication, and backups. This webinar will dive deep into the technical and business benefits of RDS PostgreSQL, including best practices for migrating from SQL Server and Oracle.

Register Here >>

We’ll be hosting more educations webinars for APN Partners throughout the end of the year. Stay tuned to the APN Blog for more information!

How Signiant Uses AWS Lambda and Amazon DynamoDB to run its SaaS Solution on AWS

by Mike Deck | on | in Amazon DynamoDB, AWS Competencies, AWS Lambda, AWS Partner Solutions Architect (SA) Guest Post, SaaS on AWS | | Comments

By Mike Deck, Partner Solutions Architect, AWS

When AWS Lambda was launched in 2014, it unlocked an ability for AWS customers and partners to implement full-featured, scalable solutions without the need to deploy or manage any servers. I work with many SaaS partners who are now leveraging the serverless model for various components of their architecture. The first step in this journey is often to re-engineer ancillary workloads that can be easily re-implemented without servers. This can afford immediate reductions in infrastructure costs and operational surface area as well as provide valuable experience in building and running systems using this new paradigm.

Signiant, an Advanced APN Technology Partner, Digital Media Competency Partner, and Storage Competency Partner, is a textbook example of a firm who has put this pattern into practice. Over time, Signiant has re-architected its solution on AWS that leverages the bounce and delivery notifications provided by Amazon Simple Email Service (Amazon SES), via an Amazon Simple Notification Service (Amazon SNS) topic. Each iteration of the company’s system has made improvements to the scalability and operational efficiency of the solution, culminating in a simple, Lambda-based serverless architecture. In this post, I will walk through how Signiant has re-architected its SaaS solution on AWS to take advantage of the capabilities of AWS Lambda and a serverless architecture.

Solution Overview

Signiant’s SaaS solution on AWS is called Media Shuttle. This product is used pervasively within the media and entertainment industry to quickly transfer very large files. Using a simple browser plugin or mobile app, users can send or share files of any size through a simple portal. Media Shuttle takes care of the transfer acceleration, security, scalability, elasticity, and resource management on the user’s behalf.

Architecture Evolution

One key feature of Media Shuttle is its delivery notification system, built on Amazon SES. When a file becomes available, the system will send an email to the user with a secure link for downloading the content. These notification emails will occasionally bounce or be caught in spam filtering systems which prevents users from retrieving their files, and generally results in a support call from the sender to figure out why the email was never received.

To improve the support team’s ability to resolve these issues while maintaining the privacy of the sender and email content, Signiant developed a simple system for tracking email bounces that has evolved over time. The initial solution, depicted below, was to subscribe an internal email distribution list to the Amazon SNS topic that received the bounce notifications. This provided simple alerts to the support team when emails bounced and was very easy to implement, but it presented scalability problems as adoption of the product grew. Pretty soon, there were thousands of notifications flooding the support team’s inboxes, and searching for a given customer’s email quickly became cumbersome.


In the next iteration of the solution, the email list was replaced by a database-backed web application running on Amazon Elastic Compute Cloud (Amazon EC2). The bounce notifications were delivered via the Amazon SNS topic to an Amazon Simple Queue Service (Amazon SQS) queue. The application would poll the queue and update a local database that the support team could then search through a simple web UI. Shortly after this version of the system was released, SES added the ability to capture notifications for deliveries in addition to bounces. Signiant added these notifications to the system as well, so that support engineers could see successful delivery statuses in addition to bounces.

The v2 architecture shown above worked well. It was more scalable than using an email distribution list, and the search capabilities were vastly improved. Despite these functional improvements, the new system required more maintenance than the team would have liked. They now had an additional server they were running just for this process, and the database they had chosen was having difficulties managing the increasing load. To optimize the system further, the team decided to re-engineer its solution to take advantage of the benefits of AWS Lambda and a serverless architecture.

The team designed a completely serverless architecture using Lambda to host the message processing logic and Amazon DynamoDB for its database. In the current architecture, instead of a PHP process polling a queue, they have a simple Lambda function written in Python subscribed to the SNS topics fed by SES. The Lambda function was easy to develop based on their existing PHP application that processed SQS messages. The relational database has been replaced by a DynamoDB table which is trivial to scale as the number of emails tracked continues to grow.

Signiant’s current architecture for capturing email status is depicted above. While this system captures delivery status of SES emails, the pattern being employed is extremely versatile and can be applied to any event-driven data processing workflow. By moving to a serverless architecture, Signiant not only decreased the direct cost of running this system, but also removed the operational overhead of managing a one-off server for this isolated task. “Porting our previous message processor to run on Lambda was really straightforward and the new design is much simpler and more robust than our previous server-based system,” said Dave North, the Director of DevOps at Signiant. The new architecture also eliminated the scaling concerns present in the other versions of the system. Using AWS Lambda, the message processors now scale seamlessly without any additional configuration or management, and the database’s throughput can be increased with a simple parameter update.


In this post, I’ve walked through how Signiant has evolved its architecture over time to take advantage of a serverless architecture design. Whether event data is delivered via an SNS topic as is the case here, sent directly to Lambda through a direct service integration as is the case with Amazon Simple Storage Service (Amazon S3), or generated from your own applications using the AWS SDK, you can build systems to capture, process and report on those events using this same basic architecture.

If you would like to see working code samples for the system discussed in this post, Signiant has open sourced both the code for the Lambda functions ( and the reporting console application ( Note that these are not managed by AWS. You can also get started building you own serverless SES notification handler using the SES notifications blueprint in the Lambda console.

Improving the Reader Experience: The Globe and Mail, ClearScale, and AWS

by Kate Miller | on | in Amazon DynamoDB, APN Consulting Partners, APN Partner Highlight, AWS Case Study, AWS Competency, DevOps on AWS, Digital Media | | Comments

We recently announced that we’ll be opening an AWS Region in Canada in the coming year. Today, we’d like to share the story of The Globe and Mail, a Canadian customer already taking advantage of the benefits of the AWS platform, with the help of Premier APN Consulting Partner ClearScale. The company is also an AWS Big Data, DevOps, Marketing & Commerce, and Mobile Competency holder. ClearScale helped The Globe build an article recommendation engine on AWS.

“ClearScale was introduced to The Globe and Mail by one of the Amazon Web Services (AWS) representatives. We were more than impressed by a publication with a 170-year history looking to develop a cutting-edge recommendation engine for their mobile application,” explains Pavel Pragin, CEO, ClearScale. “The Globe & Mail was interested in a reliable cloud services partner that could deliver an innovative and reliable solution. ClearScale’s solution helped The Globe & Mail expand their readership and gain additional advertising opportunities while meeting their tight project deadlines.”

Who is The Globe and Mail? 

The Globe and Mail is Canada’s most read newspaper with a national weekly digital readership of 4.7 million. In print for 170 years, the newspaper delivers coverage of national, international, business, technology, arts, entertainment, and lifestyle news.

Why did The Globe and Mail engage AWS? 

The Globe and Mail was planning to launch a new application that enables its growing online readership to access stories and breaking news from mobile devices. And to increase reader engagement, it wanted to serve up targeted articles based on each reader’s individual interests. The Globe team considered building a custom system on premises, but concluded that hosting its article recommendation engine in the cloud would be faster and would provide greater flexibility for testing different algorithms. The Globe team had already been prototyping a number of recommendation algortithms on AWS. Having had a great experience, the team decided to use AWS for its official platform. The Globe uses a number of AWS services, including:

What are the benefits for The Globe and Mail? 

By building its personalized recommendation system on AWS, The Globe has experienced a number of benefits:

  • The team was able to get the service to market in just three months, less than half the time it would have taken had the newspaper built an on-premises solution
  • The Globe has obtained a flexible platform for testing, allowing the company to improve its mobile app over time
  • The Globe is dramatically increasing reader engagement; initial results show that parts of the mobile app that promote a personalized selection of articles based on the recommendation engine are seeing a 25% greater click-through than a selection of the current most popular articles that would otherwise be the top performers

Want to learn more? Read the full case study here.

To learn more about ClearScale, visit the company’s website, or check out the company’s AWS Partner Directory page.

Take Advantage of Upcoming AWS NoSQL and IoT Services Workshops – Seattle and Boston

by Kate Miller | on | in Amazon DynamoDB, APN Consulting Partners, APN Technology Partners, AWS Events, AWS IoT, Database | | Comments

Are you looking to gain deep, hands-on technical training for AWS NoSQL and IoT services?

We’re excited to announce a series of NoSQL and IoT technical workshops for APN Partners we’ll be delivering in Seattle on Feb 23rd – 24th, and Boston on March 29th – 30th. Intended for Advanced and Premier APN Partners who hold at least one AWS Competency, these workshops will be led by our AWS Solution Architecture (SA), AWS Business Development (BD), and AWS Partner Network (APN) teams.

What Will the Workshops Cover?

The NoSQL workshop will cover data lifecycle management and an overview of how customers are using NoSQL to meet business challenges.  Participants will learn how to create low-latency NoSQL databases through a deep dive into DynamoDB schema creation, tuning, and performance management, including caching with Elasticache.  In addition to DynamoDB best practices, we will also cover Cassandra and MongoDB on Amazon EC2 as well as tips to migrate from RDBMS to NoSQL databases.

The IoT workshop will cover an introduction to the IoT ecosystem including core IoT services including AWS IoT, AWS Lambda, API Gateway, Amazon SNS, and Amazon SQS.  We will also include a deep dive and walkthrough of AWS IoT integration with Amazon DynamoDB, Amazon Kinesis, and AWS Lambda.  You will see demos of AWS IoT services with AWS Lambda + Amazon DynamoDB capture and AWS IoT Device Shadows, as well as participate in hands-on labs where you will set up your IoT device to capture data into Amazon DynamoDB using AWS IoT and Lambda.

Attendance at both workshops is not required but highly encouraged.

Who Should Attend?

We will be prioritizing seats for Advanced/Premier APN Partners who hold at least one AWS Competency. As both workshops will be technical, 400+ level, and hands-on, we encourage senior engineers to attend. There is no cost to attend these workshops. However, travel and related expenses must be covered by your firm.

What’s the Schedule?

Our first workshops will be hosted in Seattle and Boston, with more to come around the globe throughout 2016.

Seattle Workshops:

  • February 23rd: AWS NoSQL
  • February 24th: AWS IoT

Boston Workshops:

  • March 29th: AWS NoSQL
  • March 30th: AWS IoT

NoSQL Agenda:

 Time Agenda
8:30 – 9:00
9:00 – 10:00
    Data Lifecycle Management
10:00 – 10:30
    How Customers are Using NoSQL to Meet Business Challenges
10:30 – 11:00
    Why NoSQL is a Strategic Opportunity
11:00 – 11:30
    Voice of the Customer
11:30 – 12:15
    Running Cassandra on Amazon EC2
12:15 – 12:45
12:45 – 1:30
    Running MongoDB on Amazon EC2
1:30 – 3:30
    DDB Deep Dive and Best Practices
3:30 – 4:30
    RDBMS to NoSQL Migrations
4:30 – 5:30
    Amazon ElastiCache: NoSQL; No Worries

IoT Agenda:

 Time  Agenda
8:30 – 9:00
9:00 – 10:00
    Introduction to the IoT Ecosystem. This first hour covers core services: AWS IoT, AWS Lambda, Amazon API Gateway, Amazon SNS, and Amazon SQS
10:00 – 12:00
    Deep Dive and Walk Through of AWS IoT. Covers integration with other services, including Amazon DynamoDB, Amazon Kinesis, AWS Lambda, and Amazon SNS
12:00 – 1:00
1:00 – 3:00
    Q&A on Integration Points; Demos of AWS IoT Services with AWS Lambda; Amazon DynamoDB Capture and AWS IoT Device Shadows
3:00 – 5:00
    Labs, Hands-on activities, and Examples. Includes: lab setup, machine configuration, development environment setup, obtaining code; partner use-cases, walking through existing partner customers and potential use-cases for AWS IoT

*please note that these agendas may be subject to change.

Interested in Attending?

If you’d like to learn more about the workshops, or would like to reserve a spot, please reach out to your Partner Development Manager.

Multi-Tenant Storage with Amazon DynamoDB

by Tod Golding | on | in Amazon DynamoDB, APN Technology Partners, AWS Partner Solutions Architect (SA) Guest Post, SaaS on AWS | | Comments

Tod Golding is an AWS Partner Solutions Architect (SA). He works closely with our SaaS Partner ecosystem. 

If you’re designing a true multi-tenant software as a service (SaaS) solution, you’re likely to devote a significant amount of time to selecting a strategy for effectively partitioning your system’s tenant data. On Amazon Web Services (AWS), your partitioning options mirror much of what you see in the wild. However, if you’re looking at using Amazon DynamoDB, you’ll find that the global, managed nature of this NoSQL database presents you with some new twists that will likely influence your approach.

Before we dig into the specifics of the DynamoDB options, let’s look at the traditional models that are generally applied to achieve tenant data partitioning. The list of partitioning solutions typically includes the following variations:

  • Separate database – each tenant has a fully isolated database with its own representation of the data
  • Shared database, separate schema – tenants all reside in the same database, but each tenant can have its own representation of the data
  • Shared everything – tenants all reside in the same database and all leverage a universal representation of the data

These options all have their strengths and weaknesses. If, for example, you’d like to support the ability for tenants to have their own data customizations, you might want to lean toward a model that supports separate schemas. If that’s not the case, you’ll likely prefer a more unified schema. Security and isolation requirements are also key factors that could shape your strategy. Ultimately, the specific needs of your solutions will steer you toward one or more of these approaches. In some cases, where a system is decomposed into more granular services, you may see situations where multiple strategies are applied. The requirements of each service may dictate which flavor of partitioning best suits that service.

With this as a backdrop, let’s look at how these partitioning models map to the different partitioning approaches that are available with DynamoDB.

Linked Account Partitioning (Separate Database)

This model is by far the most extreme of the available options. Its focus is on providing each tenant with its own table namespace and footprint with DynamoDB. While this seems like a fairly basic goal, it is not easily achieved. DynamoDB does not have the notion of an instance or some distinct, named construct that can be used to partition a collection of tables. In fact, all the tables that are created by DynamoDB are global to a given region.

Given these scoping characteristics, the best option for achieving this level of isolation is to introduce separate linked AWS accounts for each tenant. To leverage this approach, you need to start by enabling the AWS Consolidated Billing feature. This option allows you to have a parent payer account that is then linked to any number of child accounts.

Once the linked account mechanism is established, you can then provision a separate linked account for each new tenant (shown in the following diagram). These tenants would then have distinct AWS account IDs and, in turn, have a scoped view of DynamoDB tables that are owned by that account.

While this model has its advantages, it is often cumbersome to manage. It introduces a layer of complexity and automation to the tenant provisioning lifecycle. It also seems impractical and unwieldy for environments where there might be a large collection of tenants. Caveats aside, there are some nice benefits that are natural byproducts of this model. Having this hard line between accounts makes it a bit simpler to manage the scope and schema of each tenant’s data. It also provides a rather natural model for evaluating and metering a tenant’s usage of AWS resources.

Tenant Table Name Partitioning (Shared Database, Separate Schema)

The linked account model represents a more concrete separation of tenant data. A less invasive approach would be to introduce a table naming schema that adds a unique tenant context to each DynamoDB table. The following diagram represents a simplified version of this approach, prepending a tenant ID (T1, T2, and T3) to each table name to identify the tenant’s ownership of the table.

This model embraces all the freedoms that come with an isolated tenant scheme, allowing each tenant to have its own unique data representation. With this level of granularity, you’ll also find that this aligns your tenants with other AWS constructs. These include:

  • The ability to apply AWS Identity and Access Management (IAM) roles at the table level allows you to constrain table access to a given tenant role.
  • Amazon CloudWatch metrics can be captured at the table level, simplifying the aggregation of tenant metrics for storage activity.
  • IOPS is applied at the table level, allowing you to create distinct scaling policies for each tenant.

Provisioning also can be somewhat simpler under this model since each tenant’s tables can be created and managed independently.

The downside of this model tends to be more on the operational and management side. Clearly, with this approach, your operational views of a tenant will require some awareness of the tenant table naming scheme in order to filter and present information in a tenant-centric context. The approach also adds a layer of indirection to any code you might have that is metering tenant consumption of DynamoDB resources.

Tenant Index Partitioning (Shared Everything)

Index-based partitioning is perhaps the most agile and common technique that is applied by SaaS developers. This approach places all the tenant data in the same table(s) and partitions it with a DynamoDB index. This is achieved by populating the hash key of an index with a tenant’s unique ID. This essentially means that the keys that would typically be your hash key (Customer ID, Account ID, etc.) are now represented as range keys.  The following example provides a simplified view of an index that introduces a tenant ID as a hash key. Here, the customer ID is now represented as a range key.

This model, where the data for every tenant resides in a shared representation, simplifies many aspects of the multi-tenant model. It promotes a unified approach to managing and migrating the data for all tenants without requiring a table-by-table processing of the information. It also enables a simpler model for performing tenant-wide analytics of the data. This can be extremely helpful in assessing and profiling trends in the data.

Of course, there are also limitations with this model. Chief among these is the inability to have more granular, tenant-centric control over access, performance, and scaling. However, some may view this as an advantage since it allows you to have a more global set of policies that respond to the load of all tenants instead of absorbing the load of maintaining policies on a tenant-by-tenant basis. When you choose your partitioning approach, you’ll likely strike a balance between these tradeoffs.

Another consideration here is that this approach could be viewed as creating a single point of failure. Any problem with the shared table could affect the entire population of tenants.

Abstracting Client Access

Each technique outlined in this blog post requires some awareness of tenant context. Every attempt to access data for a tenant requires acquiring a unique tenant identifier and injecting that identifier into any requests to manage data in DynamoDB.

Of course, in most cases, end-users of the data should have no direct knowledge that their provider is a tenant of your service. Instead, the solution you build should introduce an abstraction layer that acquires and applies the tenant context to any DynamoDB interactions.

This data access layer will also enhance your ability to add security checks and business logic outside of your partitioning strategies, with minimal impact to end-users.

Supporting Multiple Environments

As you think about partitioning, you may also need to consider how the presence of multiple environments (development, QA, production, etc.) might influence your approach. Each partitioning model we’ve discussed here would require an additional mechanism to associate tables with a given environment.

The strategy for addressing this problem varies based on the partitioning scheme you’ve adopted. The linked account model is the least affected, since the provisioning process will likely just create separate accounts for each environment. However, with table name and index-based partitioning, you’ll need to introduce an additional qualifier to your naming scheme that will identify the environment associated with each table.

The key takeaway is that you need to be thinking about whether and how environments might also influence your entire build and deployment lifecycle. If you’re building for multiple environments, the context of those environments likely need to be factored into your overall provisioning and naming scheme.

Microservice Considerations

With the shift toward microservice architectures, teams are decomposing their SaaS solutions into small, autonomous services. A key tenant of this architectural model is that each service must encapsulate, manage, and own its representation of data. This means that each service can leverage whichever partitioning approach best aligns with the requirements and performance characteristics of that service.

The other factor to consider is how microservices might influence the identity of your DynamoDB tables. With each service owning its own storage, the provisioning process needs assurance that the tables it’s creating for a given service are guaranteed to be unique. This typically translates into adding some notion of the service’s identity into the actual name of the table. A catalog manager service, for example, might have a table that is an amalgam of the tenant ID, the service name, and the logical table name. This may or may not be necessary, but it’s certainly another factor you’ll want to keep in mind as you think about the naming model you’ll use when tables are being provisioned.

Agility vs. Isolation

It’s important to note that there is no single preferred model for the solutions that are outlined in this blog post. Each model has its merits and applicability to different problem domains. That being said, it’s also important to consider agility when you’re building SaaS solutions. Agility is fundamental to the success of many SaaS organizations and it’s essential that teams consider how each partitioning model might influence its ability to continually deploy and evolve both applications and business.

Each variation outlined here highlights some of the natural tension that exists in SaaS design. In picking a partitioning strategy, you must balance the simplicity and agility of a fully shared model with the security and variability offered by more isolated models.

The good news is that DynamoDB supports all the mechanisms you’ll need to implement each of the common partitioning models. As you dig deeper into DynamoDB, you’ll find that it actually aligns nicely with many of the core SaaS values. As a managed service, DynamoDB allows you to shift the burden of management, scale, and availability directly to AWS. The schemaless nature of DynamoDB also enables a level of flexibility and agility that is crucial to many SaaS organizations.

Kicking the Tires

The best way to really understand the merits of each of these partitioning models is to simply dig in and get your hands dirty. It’s important to examine the overall provisioning lifecycle of each partitioning approach and determine how and where it would fit into a broader build and deployment lifecycle. You’ll also want to look more carefully at how these partitioning models interact with AWS constructs. Each approach has nuances that can influence the experience you’ll get with the console, IAM roles, CloudWatch metrics, billing, and so on. Naturally, the fundamentals of how you’re isolating tenants and the requirements of your domain are also going to have a significant impact on the approach you choose.

Are you building SaaS on AWS? Check out the AWS SaaS Partner Program, an APN Program providing Technology Partners with support to build, launch, and grow SaaS solutions on AWS.