Category: Amazon RDS


New – USASpending.gov on an Amazon RDS Snapshot

My colleague Jed Sundwall runs the AWS Public Datasets program. He wrote the guest post below to tell you about an important new dataset that is available as an Amazon RDS Snapshot. In the post, Jed introduces the dataset and shows you how to create an Amazon RDS DB Instance from the snapshot.

Jeff;


I am very excited to announce that, starting today, the entire public USAspending.gov database is available for anyone to copy via Amazon Relational Database Service (RDS). USAspending.gov data includes data on all spending by the federal government, including contracts, grants, loans, employee salaries, and more. The data is available via a PostgreSQL snapshot, which provides bulk access to the entire USAspending.gov database, and is updated nightly. At this time, the database includes all USAspending.gov for the second quarter of fiscal year 2017, and data going back to the year 2000 will be added over the summer. You can learn more about the database and how to access it on its AWS Public Dataset landing page.

Through the AWS Public Datasets program, we work with AWS customers to experiment with ways that the cloud can make data more accessible to more people. Most of our AWS Public Datasets are made available through Amazon S3 because of its tremendous flexibility and ability to scale to serve any volume of any kind of data files. What’s exciting about the USAspending.gov database is that it provides a great example of how Amazon RDS can be used to share an entire relational database quickly and easily. Typically, sharing a relational database requires extract, transfer, and load (ETL) processes that require redundant storage capacity, time for data transfer, and often scripts to migrate your database schema from one database engine to another. ETL processes can be so intimidating and cumbersome that they’re effectively impossible for many people to carry out.

By making their data available as a public Amazon RDS snapshot, the team at USASPending.gov has made it easy for anyone to get a copy of their entire production database for their own use within minutes. This will be useful for researchers and businesses who want to work with real data about all US Government spending and quickly combine it with their own data or other data resources.

Deploying the USASpending.gov Database Using the AWS Management Console
Let’s go through the steps involved in deploying the database in your AWS account using the AWS Management Console.

  1. Sign in to the AWS Management Console and select the US East (N. Virginia) region in the menu bar.
  2. Open the Amazon RDS Console and choose Snapshots in the navigation pane.
  3. In the filter for the search bar, select All Public Snapshots and search for 515495268755:
  4. Select the snapshot named arn:aws:rds:us-east-1:515495268755:snapshot:usaspending-db.
  5. Select Snapshot Actions -> Restore Snapshot. Select an instance size, and enter the other details, then click on Restore DB Instance.
  6. You will see that a DB Instance is being created from the snapshot, within your AWS account.
  7. After a few minutes, the status of the instance will change to Available.
  8. You can see the endpoint for your database on the main page along with other useful info:

Deploying the USASpending.gov Database Using the AWS CLI
You can also install the AWS Command Line Interface (CLI) and use it to create a DB Instance from the snapshot. Here’s a sample command:

$ aws rds restore-db-instance-from-db-snapshot --db-instance-identifier my-test-db-cli \
  --db-snapshot-identifier arn:aws:rds:us-east-1:515495268755:snapshot:usaspending-db \
  --region us-east-1

This will give you an ARN (Amazon Resource Name) that you can use to reference the DB Instance. For example:

$ aws rds describe-db-instances \
  --db-instance-identifier arn:aws:rds:us-east-1:917192695859:db:my-test-db-cli

This command will display the Endpoint.Address that you use to connect to the database.

Connecting to the DB Instance
After following the AWS Management Console or AWS CLI instructions above, you will have access to the full USAspending.gov database within this Amazon RDS DB instance, and you can connect to it using any PostgreSQL client using the following credentials:

  • Username: root
  • Password: password
  • Database: data_store_api

If you use psql, you can access the database using this command:

$ psql -h my-endpoint.rds.amazonaws.com -U root -d data_store_api

You should change the database password after you log in:

ALTER USER "root" WITH ENCRYPTED PASSWORD '{new password}';

If you can’t connect to your instance but think you should be able to, you may need to check your VPC Security Groups and make sure inbound and outbound traffic on the port (usually 5432) is allowed from your IP address.

Exploring the Data
The USAspending.gov data is very rich, so it will be hard to do it justice in this blog post, but hopefully these queries will give you an idea of what’s possible. To learn about the contents of the database, please review the USAspending.gov Data Dictionary.

The following query will return the total amount of money the government is obligated to pay for contracts awarded by NASA that include “Mars” or “Martian” in the description of the award:

select sum(total_obligation) from awards, subtier_agency 
  where (awards.description like '% MARTIAN %' OR awards.description like '% MARS %') 
  AND subtier_agency.name = 'National Aeronautics and Space Administration';

As I write this, the result I get for this query is $55,411,025.42. Note that the database is updated nightly and will include more historical data in the coming months, so you may get a different result if you run this query.

Now, here’s the same query, but looking for awards with “Jupiter” or “Jovian” in the description:

select sum(total_obligation) from awards, subtier_agency
  where (awards.description like '%JUPITER%' OR awards.description like '%JOVIAN%') 
  AND subtier_agency.name = 'National Aeronautics and Space Administration';

The result I get is $14,766,392.96.

Questions & Comments
I’m looking forward to seeing what people can do with this data. If you have any questions about the data, please create an issue on the USAspending.gov API’s issue tracker on GitHub.

— Jed

AWS Database Migration Service – 20,000 Migrations and Counting

I first wrote about AWS Database Migration Service just about a year ago in my AWS Database Migration Service post. At that time I noted that over 1,000 AWS customers had already made use of the service as part of their move to AWS.

As a quick recap, AWS Database Migration Service and Schema Conversion Tool (SCT) help our customers migrate their relational data from expensive, proprietary databases and data warehouses (either on premises on in the cloud, but with restrictive licensing terms either way) to more cost-effective cloud-based databases and data warehouses such as Amazon Aurora, Amazon Redshift, MySQL, MariaDB, and PostgreSQL, with minimal downtime along the way. Our customers tell us that they love the flexibility and the cost-effective nature of these moves. For example, moving to Amazon Aurora gives them access to a database that is MySQL and PostgreSQL compatible, at 1/10th the cost of a commercial database. Take a peek at our AWS Database Migration Services Customer Testimonials to see how Expedia, Thomas Publishing, Pega, and Veoci have made use of the service.

20,000 Unique Migrations
I’m pleased to be able to announce that our customers have already used AWS Database Migration Service to migrate 20,000 unique databases to AWS and that the pace continues to accelerate (we reached 10,000 migrations in September of 2016).

We’ve added many new features to DMS and SCT over the past year. Here’s a summary:

Learn More
Here are some resources that will help you to learn more and to get your own migrations underway, starting with some recent webinars:

Migrating From Sharded to Scale-Up – Some of our customers implemented a scale-out strategy in order to deal with their relational workload, sharding their database across multiple independent pieces, each running on a separate host. As part of their migration, these customers often consolidate two or more shards onto a single Aurora instance, reducing complexity, increasing reliability, and saving money along the way. If you’d like to do this, check out the blog post, webinar recording, and presentation.

Migrating From Oracle or SQL Server to Aurora – Other customers migrate from commercial databases such as Oracle or SQL Server to Aurora. If you would like to do this, check out this presentation and the accompanying webinar recording.

We also have plenty of helpful blog posts on the AWS Database Blog:

  • Reduce Resource Consumption by Consolidating Your Sharded System into Aurora – “You might, in fact, save bunches of money by consolidating your sharded system into a single Aurora instance or fewer shards running on Aurora. That is exactly what this blog post is all about.”
  • How to Migrate Your Oracle Database to Amazon Aurora – “This blog post gives you a quick overview of how you can use the AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (AWS DMS) to facilitate and simplify migrating your commercial database to Amazon Aurora. In this case, we focus on migrating from Oracle to the MySQL-compatible Amazon Aurora.”
  • Cross-Engine Database Replication Using AWS Schema Conversion Tool and AWS Database Migration Service – “AWS SCT makes heterogeneous database migrations easier by automatically converting source database schema. AWS SCT also converts the majority of custom code, including views and functions, to a format compatible with the target database.”
  • Database Migration—What Do You Need to Know Before You Start? – “Congratulations! You have convinced your boss or the CIO to move your database to the cloud. Or you are the boss, CIO, or both, and you finally decided to jump on the bandwagon. What you’re trying to do is move your application to the new environment, platform, or technology (aka application modernization), because usually people don’t move databases for fun.”
  • How to Script a Database Migration – “You can use the AWS DMS console or the AWS CLI or the AWS SDK to perform the database migration. In this blog post, I will focus on performing the migration with the AWS CLI.”

The documentation includes five helpful walkthroughs:

There’s also a hands-on lab (you will need to register in order to participate).

See You at a Summit
The DMS team is planning to attend and present at many of our upcoming AWS Summits and would welcome the opportunity to discuss your database migration requirements in person.

Jeff;

 

 

Amazon RDS – 2016 in Review

Even though we published 294 posts on this blog last year, I left out quite a number of worthwhile launches! Today I would like to focus on Amazon Relational Database Service (RDS) and recap all of the progress that the teams behind this family of services made in 2016. The team focused on four major areas last year:

  • High Availability
  • Enhanced Monitoring
  • Simplified Security
  • Database Engine Updates

Let’s take a look at each of these areas…

High Availability
Relational databases are at the heart of many types of applications. In order to allow our customers to build applications that are highly available, RDS has offered multi-AZ support since early 2010 (read Amazon RDS – Multi-AZ Deployments For Enhanced Availability & Reliability for more info). Instead of spending weeks setting up multiple instances, arranging for replication, writing scripts to detect network, instance, & network issues, making failover decisions, and bringing a new secondary instance online, you simply opt for Multi-AZ Deployment when you create the Database Instance. RDS also makes it easy for you to create cross-region read replicas.

Here are some of the other enhancements that we made in 2016 in order to help you to achieve high availability:

Enhanced Monitoring
We announced the first big step toward enhanced monitoring at the end of 2015 (New – Enhanced Monitoring for Amazon RDS) with support for MySQL, MariaDB, and Amazon Aurora and then made additional announcements in 2016:

Simplified Security
We want to make it as easy and simple as possible for you to use encryption to protect your data, whether it is at rest or in motion. Here are the enhancements that we made in this area last year:

Database Engine Updates
The open source community and the vendors of commercial databases add features and produce new releases at a rapid pace and we track their work very closely, aiming to update RDS as quickly as possible after each significant release. Here’s what we did in 2016:

Stay Tuned
We’ve already made some big announcements this year (you can find them in the AWS What’s New for 2017) with plenty more in store including the recently announced PostgreSQL-compatible version of Aurora, so stay tuned! You may also want to subscribe to the AWS Database Blog for detailed posts that will show you how to get the most from RDS, Amazon Aurora, and Amazon ElastiCache.

Jeff;

PS – This post does not include all of the enhancements that we made to AWS Database Migration Service or the Schema Conversion Tool last year. I’m working on another post on that topic.

New – Create an Amazon Aurora Read Replica from an RDS MySQL DB Instance

Migrating from one database engine to another can be tricky when the database is supporting an application or a web site that is running 24×7. Without the option to take the database offline, an approach that is based on replication is generally the best solution.

Today we are launching a new feature that allows you to migrate from an Amazon RDS DB Instance for MySQL to Amazon Aurora by creating an Aurora Read Replica. The migration process begins by creating a DB snapshot of the existing DB Instance and then using it as the basis for a fresh Aurora Read Replica. After the replica has been set up, replication is used to bring it up to date with respect to the source. Once the replication lag drops to 0, the replication is complete. At this point, you can make the Aurora Read Replica into a standalone Aurora DB cluster and point your client applications at it.

Migration takes several hours per terabyte of data, and works for MySQL DB Instances of up to 6 terabytes. Replication runs somewhat faster for InnoDB tables than it does for MyISAM tables, and also benefits from the presence of uncompressed tables. If migration speed is a factor, you can improve it by moving your MyISAM tables to InnoDB tables and uncompressing any compressed tables.

To migrate an RDS DB Instance, simply select it in the AWS Management Console, click on Instance Actions, and choose Create Aurora Read Replica:

Then enter your database instance identifier, set any other options as desired, and click on Create Read Replica:

You can monitor the progress of the migration in the console:

After the migration is complete, wait for the Replica Lag to reach zero on the new Aurora Read Replica (use the SHOW SLAVE STATUS command on the replica and look for “Seconds behind master”) to indicate that the replica is in sync with the source, stop the flow of new transactions to the source MySQL DB Instance, and promote the Aurora Read Replica to a DB cluster:

Confirm your intent and then wait (typically a minute or so) until the new cluster is available:

Instruct your application to use the cluster’s read and write endpoints, and you are good to go!

Jeff;

 

Look Before You Leap – December 31, 2016 Leap Second on AWS

If you are counting down the seconds before 2016 is history, be sure to add one at the very end!

The next leap second (the 27th so far) will be inserted on December 31, 2016 at 23:59:60 UTC. This will keep Earth time (Coordinated Universal Time) close to mean solar time and means that the last minute of the year will have 61 seconds.

The information in our last post (Look Before You Leap – The Coming Leap Second and AWS), still applies, with a few nuances and new developments:

AWS Adjusted Time – We will spread the extra second over the 24 hours surrounding the leap second (11:59:59 on December 31, 2016 to 12:00:00 on January 1, 2017). AWS Adjusted Time and Coordinated Universal time will be in sync at the end of this time period.

Microsoft Windows – Instances that are running Microsoft Windows AMIs supplied by Amazon will follow AWS Adjusted Time.

Amazon RDS – The majority of Amazon RDS database instances will show “23:59:59” twice. Oracle versions 11.2.0.2, 11.2.0.3, and 12.1.0.1 will follow AWS Adjusted Time. For Oracle versions 11.2.0.4 and 12.1.0.2 contact AWS Support for more information.

Need Help?
If you have any questions about this upcoming event, please contact AWS Support or post in the EC2 Forum.

Jeff;

 

Amazon QuickSight Now Generally Available – Fast & Easy to Use Business Analytics for Big Data

After a preview period that included participants from over 1,500 AWS customers ranging from startups to global enterprises, I am happy to be able to announce that Amazon QuickSight is now generally available! When I invited you to join the preview last year, I wrote:

In the past, Business Intelligence required an incredible amount of undifferentiated heavy lifting. You had to pay for, set up and run the infrastructure and the software, manage scale (while users fret), and hire consultants at exorbitant rates to model your data. After all that your users were left to struggle with complex user interfaces for data exploration while simultaneously demanding support for their mobile devices. Access to NoSQL and streaming data? Good luck with that!

Amazon QuickSight provides you with very fast, easy to use, cloud-powered business analytics at 1/10th the cost of traditional on-premises solutions. QuickSight lets you get started in minutes. You log in, point to a data source, and begin to visualize your data. Behind the scenes, the SPICE (Super-fast, Parallel, In-Memory Calculation Engine) will run your queries at lightning speed and provide you with highly polished data visualizations.

Deep Dive into Data
Every customer that I speak with wants to get more value from their stored data. They realize that the potential value locked up within the data is growing by the day, but are sometimes disappointed to learn that finding and unlocking that value can be expensive and difficult. On-premises business analytics tools are expensive to license and can place a heavy load on existing infrastructure. Licensing costs and the complexity of the tools can restrict the user base to just a handful of specialists.  Taken together, all of these factors have led many organizations to conclude that they are not ready to make the investment in a true business analytics function.

QuickSight is here to change that! It runs as a service and makes business analytics available to organizations of all shapes and sizes. It is fast and easy to use, does not impose a load on your existing infrastructure, and is available for a monthly fee that starts at just $9 per user.

As you’ll see in a moment, QuickSight allows you to work on data that’s stored in many different services and locations. You can get to your Amazon Redshift data warehouse, your Amazon Relational Database Service (RDS) relational databases, or your flat files in S3. You can also use a set of connectors to access data stored in on-premises MySQL, PostgreSQL, and SQL Server databases, Microsoft Excel spreadsheets, Salesforce and other services.

QuickSight is designed to scale with you. You can add more users, more data sources, and more data without having to purchase more long-term licenses or roll more hardware into your data center.

Take the Tour
Let’s take a tour through QuickSight. The administrator for my organization has already invited me to use QuickSight, so I am ready to log in and get started. Here’s the main screen:

I’d like to start by getting some data from a Redshift cluster. I click on Manage data and review my existing data sets:

I don’t see what I am looking for, so I click on New data set and review my options:

I click on Redshift (manual connect) and enter the credentials so that I can access my data warehouse (if I had a Redshift cluster running within my AWS account it would be available as an auto-discovered source):

QuickSight queries the data warehouse and shows me the schemas (sets of tables) and the tables that are available to me. I’ll select the public schema and the all_flights table to get started:

Now I have two options. I can pull the table in to SPICE for quick analysis or I can query it directly. I’ll pull it in to SPICE:

Again, I have two options! I can click on Edit/Preview data and select the rows and columns to import, or I can click on Visualize to import all of the data and proceed to the fun part! I’ll go for Edit/Preview. I can see the fields (on the left), and I can select only those that are interest using the checkboxes:

I can also click on New Filter, select a field from the popup menu, and then create a filter:

Both options (selecting fields and filtering on rows) allow me to control the data that I pull in to SPICE. This allows me to control the data that I want to visualize and also helps me to make more efficient use of memory. Once I am ready to proceed, I click on Prepare data & visualize. At this point the data is loaded in to SPICE and I’m ready to start visualizing it. I simply select a field to get started.  For example,  I can select the origin_state_abbr field and see how many flights originate in each state:

The miniaturized view on the right gives me some additional context. I can scroll up or down or select the range of values to display.  I can also click on a second field to learn more. I’ll click on flights, set the sort order to descending, and scroll to the top. Now I can see how many of the flights in my data originated in each state:

QuickSight’s AutoGraph feature automatically generates an appropriate visualization based on the data selected. For example, if I add the fl_date field, I get a state-by-state line chart over time:

Based on my query, the data types, and properties of the data, QuickSight also proposes alternate visualizations:

I also have my choice of many other visual types including vertical & horizontal bar charts, line charts, pivot tables, tree maps, pie charts, and heat maps:

Once I have created some effective visualizations, I can capture them and use the resulting storyboard to tell a data-driven story:

I can also share my visualizations with my colleagues:

Finally, my visualizations are accessible from my mobile device:

Pricing & SPICE Capacity
QuickSight comes with one free user and 1 GB of SPICE capacity for free, perpetually. This allows every AWS user to analyze their data and to gain business insights at no cost. The Standard Edition of Amazon QuickSight starts at $9 per month and includes 10 GB of SPICE capacity (see the [QuickSight Pricing] page for more info).

It is easy to manage SPICE capacity. I simply click on Manage QuickSight in the menu (I must have the ADMIN role in order to be able to make changes):

Then I can see where I stand:

I can click on Purchase more capacity to do exactly that:

I can also click on Release unused purchased capacity in order to reduce the amount of SPICE capacity that I own:

Get Started Today
Amazon QuickSight is now available in the US East (Northern Virginia), US West (Oregon), and EU (Ireland) regions and you can start using it today.

Despite the length of this blog post I have barely scratched the surface of QuickSight. Given that you can use it at no charge, I would encourage you to sign up, load some of your data, and take QuickSight for a spin!

We have a webinar coming up on January 16th where you can learn even more! Sign up here.

Jeff;

 

Amazon RDS for SQL Server – Support for Native Backup/Restore to Amazon S3

Regular readers of this blog will know that I am a big fan of Amazon Relational Database Service (RDS). As a managed database service, it takes care of the more routine aspects of setting up, running, and scaling a relational database.

We first launched support for SQL Server in 2012. Continuing our effort to add features that have included SSL supportmajor version upgradestransparent data encryption, enhanced monitoring and Multi-AZ, we have now added support for SQL Server native backup/restore.

SQL Server native backups include all database objects: tables, indexes, stored procedures and triggers. These backups are commonly used to migrate databases between different SQL Server instances running on-premises or in the cloud. They can be used for data ingestion, disaster recovery, and so forth. The native backups also simplify the process of importing data and schemas from on-premises SQL Server instances, and will be easy for SQL Server DBAs to understand and use.

Support for Native Backup/Restore
You can now take native SQL Server database backups from your RDS instances and store them in an Amazon S3 bucket. Those backups can be restored to an on-premises copy of SQL Server or to another RDS-powered SQL Server instance.  You can also copy backups of your on-premises databases to S3 and then restore them to an RDS SQL Server instance. SQL Server Native Backup/Restore with Amazon S3 also supports backup encryption using AWS Key Management Service (KMS) across all SQL Server editions. Storing and transferring backups in and out of AWS through S3 provides you with another option for disaster recovery.

You can enable this feature by adding the SQL_SERVER_BACKUP_RESTORE option to an option group and associating the option group with your RDS SQL Server instance. This option must also be configured with your S3 bucket information and can include a KMS key to encrypt the backups.

Start by finding the desired option group:

Then add the SQL_SERVER_BACKUP_RESTORE option, specify (or create) an IAM role to allow RDS to access S3, point to a bucket, and (if you want) specify and configure encryption:

After you have set this up,  you can use SQL Server Management Studio to connect to the database instance and invoke the following stored procedures (available within the msdb database) as needed:

  • rds_backup_database – Back up a single database to an S3 bucket.
  • rds_restore_database – Restore a single database from S3.
  • rds_task_status – Track running backup and restore tasks.
  • rds_cancel_task – Cancel a running backup or restore task.

To learn more, take a look at Importing and Exporting SQL Server Data.

Now Available
SQL Server Native Backup/Restore is now available in the US East (Northern Virginia), US West (Oregon), EU (Ireland), EU (Frankfurt), Asia Pacific (Sydney), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Mumbai), and South America (São Paulo) Regions. There are no additional charges for using this feature with Amazon RDS for SQL Server, however, the usage of the Amazon S3 storage will be billed at regular rates.

Jeff;

New – Cross-Region Read Replicas for Amazon Aurora

You already have the power to scale the read capacity of your Amazon Aurora instances by adding additional read replicas to an existing cluster. Today we are giving you the power to create a read replica in another region. This new feature will allow you to support cross-region disaster recovery and to scale out reads. You can also use it to migrate from one region to another or to create a new database environment in a different region.

Creating a read replica in another region also creates an Aurora cluster in the region. This cluster can contain up to 15 more read replicas, with very low replication lag (typically less than 20 ms) within the region (between regions, latency will vary based on the distance between the source and target). You can use this model to duplicate your cluster and read replica setup across regions for disaster recovery. In the event of a regional disruption, you can promote the cross-region replica to be the master. This will allow you to minimize downtime for your cross-region application. This feature applies to unencrypted Aurora clusters.

Before you get actually create the read replica, you need to take care of a pair of prerequisites. You need to make sure that a VPC and the Database Subnet Groups exist in the target region, and you need to enable binary logging on the existing cluster.

Setting up the VPC
Because Aurora always runs within a VPC, ensure that the VPC and the desired Database Subnet Groups exist in the target region. Here are mine:

Enabling Binary Logging
Before you can create a cross region read replica, you need to enable binary logging for your existing cluster. Create a new DB Cluster Parameter Group (if you are not already using a non-default one):

Enable binary logging (choose MIXED) and then click on Save Changes:

Next, Modify the DB Instance, select the new DB Cluster Parameter Group, check Apply Immediately, and click on Continue. Confirm your modifications, and then click on Modify DB Instance to proceed:

Select the instance and reboot it, then wait until it is ready.

Create Read Replica
With the prerequisites out of the way it is time to create the read replica! From within the AWS Management Console, select the source cluster and choose Create Cross Region Read Replica from the Instance Actions menu:

Name the new cluster and the new instance, and then pick the target region. Choose the DB Subnet Group and set the other options as desired, then click Create:

Aurora will create the cluster and the instance. The state of both items will remain at creating until the items have been created and the data has been replicated (this could take some time, depending on amount of data stored in the existing cluster.

This feature is available now and you can start using it today!

Jeff;

 

Machine Learning, Recommendation Systems, and Data Analysis at Cloud Academy

In today’s guest post, Alex Casalboni and Giacomo Marinangeli of Cloud Academy discuss the design and development of their new Inspire system.

Jeff;


Our Challenge
Mixing technology and content has been our mission at Cloud Academy since the very early days. We are builders and we love technology, but we also know content is king. Serving our members with the best content and creating smart technology to automate it is what kept us up at night for a long time.

Companies are always fighting for people’s time and attention and at Cloud Academy, we face those same challenges as well. Our goal is to empower people, help them learn new Cloud skills every month, but we kept asking ourselves: “How much content is enough? How can we understand our customer’s goals and help them select the best learning paths?”

With this vision in mind about six months ago we created a project called Inspire which focuses on machine learning, recommendation systems and data analysis. Inspire solves our problem on two fronts. First, we see an incredible opportunity in improving the way we serve our content to our customers. It will allow us to provide better suggestions and create dedicated learning paths based on an individual’s skills, objectives and industries. Second, Inspire represented an incredible opportunity to improve our operations. We manage content that requires constant updates across multiple platforms with a continuously growing library of new technologies.

For instance, getting a notification to train on a new EC2 scenario that you’re using in your project can really make a difference in the way you learn new skills. By collecting data across our entire product, such as when you watch a video or when you’re completing an AWS quiz, we can gather that information to feed Inspire. Day by day, it keeps personalising your experience through different channels inside our product. The end result is a unique learning experience that will follow you throughout your entire journey and enable a customized continuous training approach based on your skills, job and goals.

Inspire: Powered by AWS
Inspire is heavily based on machine learning and AI technologies, enabled by our internal team of data scientists and engineers. Technically, this involves several machine learning models, which are trained on the huge amount of collected data. Once the Inspire models are fully trained, they need to be deployed in order to serve new predictions, at scale.

Here the challenge has been designing, deploying and managing a multi-model architecture, capable of storing our datasets, automatically training, updating and A/B testing our machine learning models, and ultimately offering a user-friendly and uniform interface to our website and mobile apps (available for iPhone and Android).

From the very beginning, we decided to focus high availability and scalability. With this in mind, we designed an (almost) serverless architecture based on AWS Lambda. Every machine learning model we build is trained offline and then deployed as an independent Lambda function.

Given the current maximum execution time of 5 minutes, we still run the training phase on a separate EC2 Spot instance, which reads the dataset from our data warehouse (hosted on Amazon RDS), but we are looking forward to migrating this step to a Lambda function as well.

We are using Amazon API Gateway to manage RESTful resources and API credentials, by mapping each resource to a specific Lambda function.

The overall architecture is logically represented in the diagram below:

Both our website and mobile app can invoke Inspire with simple HTTPS calls through API Gateway. Each Lambda function logically represents a single model and aims at solving a specific problem. More in detail, each Lambda function loads its configuration by downloading the corresponding machine learning model from Amazon S3 (i.e. a serialized representation of it).

Behind the scenes, and without any impact on scalability or availability, an EC2 instance takes care of periodically updating these S3 objects, as outcome of the offline training phase.

Moreover, we want to A/B test and optimize our machine learning models: this is transparently handled in the Lambda function itself by means of SixPack, an open-source A/B testing framework which uses Redis.

Data Collection Pipeline
As far as data collection is concerned, we use Segment.com as data hub: with a single API call, it allows us to log events into multiple external integrations, such as Google Analytics, Mixpanel, etc. We also developed our own custom integration (via webhook) in order to persistently store the same data in our AWS-powered data warehouse, based on Amazon RDS.

Every event we send to Segment.com is forwarded to a Lambda function – passing through API Gateway – which takes care of storing real-time data into an SQS queue. We use this queue as a temporary buffer in order to avoid scalability and persistency problems, even during downtime or scheduled maintenance. The Lambda function also handles the authenticity of the received data thanks to a signature, uniquely provided by Segment.com.

Once raw data has been written onto the SQS queue, an elastic fleet of EC2 instances reads each individual event – hence removing it from the queue without conflicts – and writes it into our RDS data warehouse, after performing the required data transformations.

The serverless architecture we have chosen drastically reduces the costs and problems of our internal operations, besides providing high availability and scalability by default.

Our Lambda functions have a pretty constant average response time – even during load peaks – and the SQS temporary buffer makes sure we have a fairly unlimited time and storage tolerance before any data gets lost.

At the same time, our machine learning models won’t need to scale up in a vertical or distributed fashion since Lambda takes care of horizontal scaling. Currently, they have an incredibly low average response time of 1ms (or less):

We consider Inspire an enabler for everything we do from a product and content perspective, both for our customers and our operations. We’ve worked to make this the core of our technology, so that its contributions can quickly be adapted and integrated by everyone internally. In the near future, it will be able to independently make decisions for our content team while focusing on our customers’ need.  At the end of the day, Inspire really answers our team’s doubts on which content we should prioritize, what works better and exactly how much of it we need. Our ultimate goal is to improve our customer’s learning experience by making Cloud Academy smarter by building real intelligence.

Join our Webinar
If you would like to learn more about Inspire, please join our April 27th webinar – How we Use AWS for Machine Learning and Data Collection.

Alex Casalboni, Senior Software Engineer, Cloud Academy
Giacomo Marinangeli, CTO, Cloud Academy

PS – Cloud Academy is hiring – check out our open positions!

Amazon RDS for SQL Server – Support for Windows Authentication

Regular readers of this blog will know that I am a big fan of Amazon Relational Database Service (RDS). As a managed database service, it takes care of the more routine aspects of setting up, running, and scaling a relational database.

We first launched support for SQL Server in 2012. Since that time we have added many features including SSL support, major version upgrades, transparent data encryption, and Multi-AZ.  Each of these features broadened the applicability of RDS for SQL Server and opened the door to additional use cases.

Many organizations store their account credentials and the associated permissions in Active Directory. The directory provides a single, coherent source for this information and allows for centralized management.  Given that you can use the AWS Directory Service to run the Enterprise Edition of Microsoft Active Directory in the AWS Cloud,  it is time to take the next step!

Support for Windows Authentication
You can now allow your applications to authenticate against Amazon RDS for SQL Server using credentials stored in the AWS Directory Service for Microsoft Active Directory (Enterprise Edition). Keeping all of your credentials in the same directory will save you time and effort because you will no longer have to find and update each copy. This may also improve your overall security profile.

You can enable this feature and choose an Active Directory when you create a new database instance that runs SQL Server. You can also enable it for an existing database instance. Here’s how you choose a directory when you create a new database instance (you can also create a new one):

To learn more, read about Using Microsoft SQL Server Windows Authentication with a SQL Server DB Instance.

Now Available
This feature is now available in the US East (Northern Virginia), US West (Oregon), EU (Ireland), Asia Pacific (Sydney), Asia Pacific (Tokyo), and Asia Pacific (Singapore) Regions and you can start using it today. There is no charge for the feature, but you will pay the standard rate for the use of AWS Directory Service for Microsoft Active Directory.

Jeff;