Category: Amazon RDS
I first wrote about AWS Database Migration Service just about a year ago in my AWS Database Migration Service post. At that time I noted that over 1,000 AWS customers had already made use of the service as part of their move to AWS.
As a quick recap, AWS Database Migration Service and Schema Conversion Tool (SCT) help our customers migrate their relational data from expensive, proprietary databases and data warehouses (either on premises on in the cloud, but with restrictive licensing terms either way) to more cost-effective cloud-based databases and data warehouses such as Amazon Aurora, Amazon Redshift, MySQL, MariaDB, and PostgreSQL, with minimal downtime along the way. Our customers tell us that they love the flexibility and the cost-effective nature of these moves. For example, moving to Amazon Aurora gives them access to a database that is MySQL and PostgreSQL compatible, at 1/10th the cost of a commercial database. Take a peek at our AWS Database Migration Services Customer Testimonials to see how Expedia, Thomas Publishing, Pega, and Veoci have made use of the service.
20,000 Unique Migrations
I’m pleased to be able to announce that our customers have already used AWS Database Migration Service to migrate 20,000 unique databases to AWS and that the pace continues to accelerate (we reached 10,000 migrations in September of 2016).
We’ve added many new features to DMS and SCT over the past year. Here’s a summary:
- March 2016 – General availability in eight AWS regions.
- May 2016 – Addition of Redshift as a migration target.
- July 2016 – Highly available architecture for ongoing replication.
- July 2016 – SSL-enabled endpoints & support for Sybase as a migration source and target.
- August 2016 – Expansion to Asia Pacific (Mumbai), Asia Pacific (Seoul), and South America (São Paulo) Regions.
- October 2016 – Expansion to US East (Ohio) Region.
- December 2016 – Support for Oracle SSL connections.
- December 2016 – Expansion to Canada (Central) and EU (London) Regions.
- January 2017 – Support for events and event subscriptions.
- February 2017 – Migration from Oracle and Teradata data warehouses to Amazon Redshift.
Here are some resources that will help you to learn more and to get your own migrations underway, starting with some recent webinars:
Migrating From Sharded to Scale-Up – Some of our customers implemented a scale-out strategy in order to deal with their relational workload, sharding their database across multiple independent pieces, each running on a separate host. As part of their migration, these customers often consolidate two or more shards onto a single Aurora instance, reducing complexity, increasing reliability, and saving money along the way. If you’d like to do this, check out the blog post, webinar recording, and presentation.
Migrating From Oracle or SQL Server to Aurora – Other customers migrate from commercial databases such as Oracle or SQL Server to Aurora. If you would like to do this, check out this presentation and the accompanying webinar recording.
We also have plenty of helpful blog posts on the AWS Database Blog:
- Reduce Resource Consumption by Consolidating Your Sharded System into Aurora – “You might, in fact, save bunches of money by consolidating your sharded system into a single Aurora instance or fewer shards running on Aurora. That is exactly what this blog post is all about.”
- How to Migrate Your Oracle Database to Amazon Aurora – “This blog post gives you a quick overview of how you can use the AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (AWS DMS) to facilitate and simplify migrating your commercial database to Amazon Aurora. In this case, we focus on migrating from Oracle to the MySQL-compatible Amazon Aurora.”
- Cross-Engine Database Replication Using AWS Schema Conversion Tool and AWS Database Migration Service – “AWS SCT makes heterogeneous database migrations easier by automatically converting source database schema. AWS SCT also converts the majority of custom code, including views and functions, to a format compatible with the target database.”
- Database Migration—What Do You Need to Know Before You Start? – “Congratulations! You have convinced your boss or the CIO to move your database to the cloud. Or you are the boss, CIO, or both, and you finally decided to jump on the bandwagon. What you’re trying to do is move your application to the new environment, platform, or technology (aka application modernization), because usually people don’t move databases for fun.”
- How to Script a Database Migration – “You can use the AWS DMS console or the AWS CLI or the AWS SDK to perform the database migration. In this blog post, I will focus on performing the migration with the AWS CLI.”
The documentation includes five helpful walkthroughs:
- Migrating Databases to Amazon Web Services.
- Migrating an On-Premises Oracle Database to Amazon Aurora Using AWS Database Migration Service.
- Migrating an Amazon RDS Oracle Database to Amazon Aurora Using AWS Database Migration Service.
- Migrating MySQL-Compatible Databases to AWS.
- Migrating MySQL-Compatible Databases to Amazon Aurora.
See You at a Summit
The DMS team is planning to attend and present at many of our upcoming AWS Summits and would welcome the opportunity to discuss your database migration requirements in person.
Even though we published 294 posts on this blog last year, I left out quite a number of worthwhile launches! Today I would like to focus on Amazon Relational Database Service (RDS) and recap all of the progress that the teams behind this family of services made in 2016. The team focused on four major areas last year:
- High Availability
- Enhanced Monitoring
- Simplified Security
- Database Engine Updates
Let’s take a look at each of these areas…
Relational databases are at the heart of many types of applications. In order to allow our customers to build applications that are highly available, RDS has offered multi-AZ support since early 2010 (read Amazon RDS – Multi-AZ Deployments For Enhanced Availability & Reliability for more info). Instead of spending weeks setting up multiple instances, arranging for replication, writing scripts to detect network, instance, & network issues, making failover decisions, and bringing a new secondary instance online, you simply opt for Multi-AZ Deployment when you create the Database Instance. RDS also makes it easy for you to create cross-region read replicas.
Here are some of the other enhancements that we made in 2016 in order to help you to achieve high availability:
- Amazon RDS for MariaDB now covered by RDS SLA.
- Amazon RDS adds Multi-AZ support for SQL Server in Asia Pacific (Seoul) AWS Region.
- Amazon RDS for PostgreSQL now supports cross-region read replicas.
- Amazon RDS for PostgreSQL now supports logical replication.
- Amazon RDS now supports read replicas of encrypted database instances across regions.
- Additional Failover Control for Amazon Aurora.
- Cross-Region Read Replicas for Amazon Aurora.
- Amazon RDS now supports copying encrypted snapshots of encrypted DB instances across regions.
- Amazon RDS for SQL Server supports SQL Server Native Backup/Restore with S3.
We announced the first big step toward enhanced monitoring at the end of 2015 (New – Enhanced Monitoring for Amazon RDS) with support for MySQL, MariaDB, and Amazon Aurora and then made additional announcements in 2016:
- Enhanced Monitoring for Amazon RDS now available for MySQL 5.5.
- RDS Enhanced Monitoring is now available in Asia Pacific (Seoul).
- Amazon RDS Enhanced Monitoring is now available in South America (Sao Paulo) and China (Beijing).
- Enhanced Monitoring is now available for Amazon RDS for Oracle.
- Enhanced Monitoring and option to enforce SSL connections is now available for Amazon RDS for PostgreSQL.
- Enhanced Monitoring for SQL Server.
We want to make it as easy and simple as possible for you to use encryption to protect your data, whether it is at rest or in motion. Here are the enhancements that we made in this area last year:
- MariaDB audit plug-in now available for RDS MySQL and MariaDB.
- Amazon RDS for SQL Server supports Windows Authentication.
- Secure Sockets Layer (SSL) and Oracle Native Network Encryption (NNE) in all Amazon RDS for Oracle editions.
- Amazon RDS for Oracle now supports the Oracle Label Security (OLS) option.
- RDS now supports sharing encrypted database snapshots with other AWS accounts.
- RDS now allows specifying the VPC for your Amazon RDS DB Instance.
- Cross-account snapshot sharing for Amazon Aurora.
Database Engine Updates
The open source community and the vendors of commercial databases add features and produce new releases at a rapid pace and we track their work very closely, aiming to update RDS as quickly as possible after each significant release. Here’s what we did in 2016:
- SQL Server:
- Support for SQL Server 2008 R2 SP3 and SQL Server 2012 SP3.
- Support Microsoft SQL Server 2016.
- Outbound network access using custom DNS servers.
- January PSU patches, improved custom Oracle directories and read privileges support.
- Oracle Repository Creation Utility (RCU) and April PSU patches.
- 11g to 12c major version upgrade.
- Oracle Enterprise Manager (OEM) Cloud Control.
- License included offering for Oracle Standard Edition Two (SE2).
- Amazon Aurora:
- Cluster creation from a MySQL Backup.
- NUMA-aware scheduling.
- Parallel read-ahead.
- Single reader endpoint.
- Lambda integration.
- Data loading from S3.
- Spatial indexing & zero-downtime patching.
- Efficient storage of binary logs.
- Cluster-level Maintenance and Patching.
- Lock compression, performance schema, hot row contention improvements, improved memory handling, and smart read selector.
We’ve already made some big announcements this year (you can find them in the AWS What’s New for 2017) with plenty more in store including the recently announced PostgreSQL-compatible version of Aurora, so stay tuned! You may also want to subscribe to the AWS Database Blog for detailed posts that will show you how to get the most from RDS, Amazon Aurora, and Amazon ElastiCache.
Migrating from one database engine to another can be tricky when the database is supporting an application or a web site that is running 24×7. Without the option to take the database offline, an approach that is based on replication is generally the best solution.
Today we are launching a new feature that allows you to migrate from an Amazon RDS DB Instance for MySQL to Amazon Aurora by creating an Aurora Read Replica. The migration process begins by creating a DB snapshot of the existing DB Instance and then using it as the basis for a fresh Aurora Read Replica. After the replica has been set up, replication is used to bring it up to date with respect to the source. Once the replication lag drops to 0, the replication is complete. At this point, you can make the Aurora Read Replica into a standalone Aurora DB cluster and point your client applications at it.
Migration takes several hours per terabyte of data, and works for MySQL DB Instances of up to 6 terabytes. Replication runs somewhat faster for InnoDB tables than it does for MyISAM tables, and also benefits from the presence of uncompressed tables. If migration speed is a factor, you can improve it by moving your MyISAM tables to InnoDB tables and uncompressing any compressed tables.
To migrate an RDS DB Instance, simply select it in the AWS Management Console, click on Instance Actions, and choose Create Aurora Read Replica:
Then enter your database instance identifier, set any other options as desired, and click on Create Read Replica:
You can monitor the progress of the migration in the console:
After the migration is complete, wait for the Replica Lag to reach zero on the new Aurora Read Replica (use the
SHOW SLAVE STATUS command on the replica and look for “Seconds behind master”) to indicate that the replica is in sync with the source, stop the flow of new transactions to the source MySQL DB Instance, and promote the Aurora Read Replica to a DB cluster:
Confirm your intent and then wait (typically a minute or so) until the new cluster is available:
Instruct your application to use the cluster’s read and write endpoints, and you are good to go!
If you are counting down the seconds before 2016 is history, be sure to add one at the very end!
The next leap second (the 27th so far) will be inserted on December 31, 2016 at 23:59:60 UTC. This will keep Earth time (Coordinated Universal Time) close to mean solar time and means that the last minute of the year will have 61 seconds.
The information in our last post (Look Before You Leap – The Coming Leap Second and AWS), still applies, with a few nuances and new developments:
AWS Adjusted Time – We will spread the extra second over the 24 hours surrounding the leap second (11:59:59 on December 31, 2016 to 12:00:00 on January 1, 2017). AWS Adjusted Time and Coordinated Universal time will be in sync at the end of this time period.
Microsoft Windows – Instances that are running Microsoft Windows AMIs supplied by Amazon will follow AWS Adjusted Time.
Amazon RDS – The majority of Amazon RDS database instances will show “23:59:59” twice. Oracle versions 18.104.22.168, 22.214.171.124, and 126.96.36.199 will follow AWS Adjusted Time. For Oracle versions 188.8.131.52 and 184.108.40.206 contact AWS Support for more information.
After a preview period that included participants from over 1,500 AWS customers ranging from startups to global enterprises, I am happy to be able to announce that Amazon QuickSight is now generally available! When I invited you to join the preview last year, I wrote:
In the past, Business Intelligence required an incredible amount of undifferentiated heavy lifting. You had to pay for, set up and run the infrastructure and the software, manage scale (while users fret), and hire consultants at exorbitant rates to model your data. After all that your users were left to struggle with complex user interfaces for data exploration while simultaneously demanding support for their mobile devices. Access to NoSQL and streaming data? Good luck with that!
Amazon QuickSight provides you with very fast, easy to use, cloud-powered business analytics at 1/10th the cost of traditional on-premises solutions. QuickSight lets you get started in minutes. You log in, point to a data source, and begin to visualize your data. Behind the scenes, the SPICE (Super-fast, Parallel, In-Memory Calculation Engine) will run your queries at lightning speed and provide you with highly polished data visualizations.
Deep Dive into Data
Every customer that I speak with wants to get more value from their stored data. They realize that the potential value locked up within the data is growing by the day, but are sometimes disappointed to learn that finding and unlocking that value can be expensive and difficult. On-premises business analytics tools are expensive to license and can place a heavy load on existing infrastructure. Licensing costs and the complexity of the tools can restrict the user base to just a handful of specialists. Taken together, all of these factors have led many organizations to conclude that they are not ready to make the investment in a true business analytics function.
QuickSight is here to change that! It runs as a service and makes business analytics available to organizations of all shapes and sizes. It is fast and easy to use, does not impose a load on your existing infrastructure, and is available for a monthly fee that starts at just $9 per user.
As you’ll see in a moment, QuickSight allows you to work on data that’s stored in many different services and locations. You can get to your Amazon Redshift data warehouse, your Amazon Relational Database Service (RDS) relational databases, or your flat files in S3. You can also use a set of connectors to access data stored in on-premises MySQL, PostgreSQL, and SQL Server databases, Microsoft Excel spreadsheets, Salesforce and other services.
QuickSight is designed to scale with you. You can add more users, more data sources, and more data without having to purchase more long-term licenses or roll more hardware into your data center.
Take the Tour
Let’s take a tour through QuickSight. The administrator for my organization has already invited me to use QuickSight, so I am ready to log in and get started. Here’s the main screen:
I’d like to start by getting some data from a Redshift cluster. I click on Manage data and review my existing data sets:
I don’t see what I am looking for, so I click on New data set and review my options:
I click on Redshift (manual connect) and enter the credentials so that I can access my data warehouse (if I had a Redshift cluster running within my AWS account it would be available as an auto-discovered source):
QuickSight queries the data warehouse and shows me the schemas (sets of tables) and the tables that are available to me. I’ll select the public schema and the all_flights table to get started:
Now I have two options. I can pull the table in to SPICE for quick analysis or I can query it directly. I’ll pull it in to SPICE:
Again, I have two options! I can click on Edit/Preview data and select the rows and columns to import, or I can click on Visualize to import all of the data and proceed to the fun part! I’ll go for Edit/Preview. I can see the fields (on the left), and I can select only those that are interest using the checkboxes:
I can also click on New Filter, select a field from the popup menu, and then create a filter:
Both options (selecting fields and filtering on rows) allow me to control the data that I pull in to SPICE. This allows me to control the data that I want to visualize and also helps me to make more efficient use of memory. Once I am ready to proceed, I click on Prepare data & visualize. At this point the data is loaded in to SPICE and I’m ready to start visualizing it. I simply select a field to get started. For example, I can select the origin_state_abbr field and see how many flights originate in each state:
The miniaturized view on the right gives me some additional context. I can scroll up or down or select the range of values to display. I can also click on a second field to learn more. I’ll click on flights, set the sort order to descending, and scroll to the top. Now I can see how many of the flights in my data originated in each state:
QuickSight’s AutoGraph feature automatically generates an appropriate visualization based on the data selected. For example, if I add the fl_date field, I get a state-by-state line chart over time:
Based on my query, the data types, and properties of the data, QuickSight also proposes alternate visualizations:
I also have my choice of many other visual types including vertical & horizontal bar charts, line charts, pivot tables, tree maps, pie charts, and heat maps:
Once I have created some effective visualizations, I can capture them and use the resulting storyboard to tell a data-driven story:
I can also share my visualizations with my colleagues:
Finally, my visualizations are accessible from my mobile device:
Pricing & SPICE Capacity
QuickSight comes with one free user and 1 GB of SPICE capacity for free, perpetually. This allows every AWS user to analyze their data and to gain business insights at no cost. The Standard Edition of Amazon QuickSight starts at $9 per month and includes 10 GB of SPICE capacity (see the [QuickSight Pricing] page for more info).
It is easy to manage SPICE capacity. I simply click on Manage QuickSight in the menu (I must have the ADMIN role in order to be able to make changes):
Then I can see where I stand:
I can click on Purchase more capacity to do exactly that:
I can also click on Release unused purchased capacity in order to reduce the amount of SPICE capacity that I own:
Get Started Today
Amazon QuickSight is now available in the US East (Northern Virginia), US West (Oregon), and EU (Ireland) regions and you can start using it today.
Despite the length of this blog post I have barely scratched the surface of QuickSight. Given that you can use it at no charge, I would encourage you to sign up, load some of your data, and take QuickSight for a spin!
We have a webinar coming up on January 16th where you can learn even more! Sign up here.
Regular readers of this blog will know that I am a big fan of Amazon Relational Database Service (RDS). As a managed database service, it takes care of the more routine aspects of setting up, running, and scaling a relational database.
We first launched support for SQL Server in 2012. Continuing our effort to add features that have included SSL support, major version upgrades, transparent data encryption, enhanced monitoring and Multi-AZ, we have now added support for SQL Server native backup/restore.
SQL Server native backups include all database objects: tables, indexes, stored procedures and triggers. These backups are commonly used to migrate databases between different SQL Server instances running on-premises or in the cloud. They can be used for data ingestion, disaster recovery, and so forth. The native backups also simplify the process of importing data and schemas from on-premises SQL Server instances, and will be easy for SQL Server DBAs to understand and use.
Support for Native Backup/Restore
You can now take native SQL Server database backups from your RDS instances and store them in an Amazon S3 bucket. Those backups can be restored to an on-premises copy of SQL Server or to another RDS-powered SQL Server instance. You can also copy backups of your on-premises databases to S3 and then restore them to an RDS SQL Server instance. SQL Server Native Backup/Restore with Amazon S3 also supports backup encryption using AWS Key Management Service (KMS) across all SQL Server editions. Storing and transferring backups in and out of AWS through S3 provides you with another option for disaster recovery.
You can enable this feature by adding the SQL_SERVER_BACKUP_RESTORE option to an option group and associating the option group with your RDS SQL Server instance. This option must also be configured with your S3 bucket information and can include a KMS key to encrypt the backups.
Start by finding the desired option group:
Then add the SQL_SERVER_BACKUP_RESTORE option, specify (or create) an IAM role to allow RDS to access S3, point to a bucket, and (if you want) specify and configure encryption:
After you have set this up, you can use SQL Server Management Studio to connect to the database instance and invoke the following stored procedures (available within the msdb database) as needed:
rds_backup_database– Back up a single database to an S3 bucket.
rds_restore_database– Restore a single database from S3.
rds_task_status– Track running backup and restore tasks.
rds_cancel_task– Cancel a running backup or restore task.
To learn more, take a look at Importing and Exporting SQL Server Data.
SQL Server Native Backup/Restore is now available in the US East (Northern Virginia), US West (Oregon), EU (Ireland), EU (Frankfurt), Asia Pacific (Sydney), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Mumbai), and South America (São Paulo) Regions. There are no additional charges for using this feature with Amazon RDS for SQL Server, however, the usage of the Amazon S3 storage will be billed at regular rates.
You already have the power to scale the read capacity of your Amazon Aurora instances by adding additional read replicas to an existing cluster. Today we are giving you the power to create a read replica in another region. This new feature will allow you to support cross-region disaster recovery and to scale out reads. You can also use it to migrate from one region to another or to create a new database environment in a different region.
Creating a read replica in another region also creates an Aurora cluster in the region. This cluster can contain up to 15 more read replicas, with very low replication lag (typically less than 20 ms) within the region (between regions, latency will vary based on the distance between the source and target). You can use this model to duplicate your cluster and read replica setup across regions for disaster recovery. In the event of a regional disruption, you can promote the cross-region replica to be the master. This will allow you to minimize downtime for your cross-region application. This feature applies to unencrypted Aurora clusters.
Before you get actually create the read replica, you need to take care of a pair of prerequisites. You need to make sure that a VPC and the Database Subnet Groups exist in the target region, and you need to enable binary logging on the existing cluster.
Setting up the VPC
Because Aurora always runs within a VPC, ensure that the VPC and the desired Database Subnet Groups exist in the target region. Here are mine:
Enabling Binary Logging
Before you can create a cross region read replica, you need to enable binary logging for your existing cluster. Create a new DB Cluster Parameter Group (if you are not already using a non-default one):
Enable binary logging (choose MIXED) and then click on Save Changes:
Next, Modify the DB Instance, select the new DB Cluster Parameter Group, check Apply Immediately, and click on Continue. Confirm your modifications, and then click on Modify DB Instance to proceed:
Select the instance and reboot it, then wait until it is ready.
Create Read Replica
With the prerequisites out of the way it is time to create the read replica! From within the AWS Management Console, select the source cluster and choose Create Cross Region Read Replica from the Instance Actions menu:
Name the new cluster and the new instance, and then pick the target region. Choose the DB Subnet Group and set the other options as desired, then click Create:
Aurora will create the cluster and the instance. The state of both items will remain at creating until the items have been created and the data has been replicated (this could take some time, depending on amount of data stored in the existing cluster.
This feature is available now and you can start using it today!
In today’s guest post, Alex Casalboni and Giacomo Marinangeli of Cloud Academy discuss the design and development of their new Inspire system.
Mixing technology and content has been our mission at Cloud Academy since the very early days. We are builders and we love technology, but we also know content is king. Serving our members with the best content and creating smart technology to automate it is what kept us up at night for a long time.
Companies are always fighting for people’s time and attention and at Cloud Academy, we face those same challenges as well. Our goal is to empower people, help them learn new Cloud skills every month, but we kept asking ourselves: “How much content is enough? How can we understand our customer’s goals and help them select the best learning paths?”
With this vision in mind about six months ago we created a project called Inspire which focuses on machine learning, recommendation systems and data analysis. Inspire solves our problem on two fronts. First, we see an incredible opportunity in improving the way we serve our content to our customers. It will allow us to provide better suggestions and create dedicated learning paths based on an individual’s skills, objectives and industries. Second, Inspire represented an incredible opportunity to improve our operations. We manage content that requires constant updates across multiple platforms with a continuously growing library of new technologies.
For instance, getting a notification to train on a new EC2 scenario that you’re using in your project can really make a difference in the way you learn new skills. By collecting data across our entire product, such as when you watch a video or when you’re completing an AWS quiz, we can gather that information to feed Inspire. Day by day, it keeps personalising your experience through different channels inside our product. The end result is a unique learning experience that will follow you throughout your entire journey and enable a customized continuous training approach based on your skills, job and goals.
Inspire: Powered by AWS
Inspire is heavily based on machine learning and AI technologies, enabled by our internal team of data scientists and engineers. Technically, this involves several machine learning models, which are trained on the huge amount of collected data. Once the Inspire models are fully trained, they need to be deployed in order to serve new predictions, at scale.
Here the challenge has been designing, deploying and managing a multi-model architecture, capable of storing our datasets, automatically training, updating and A/B testing our machine learning models, and ultimately offering a user-friendly and uniform interface to our website and mobile apps (available for iPhone and Android).
From the very beginning, we decided to focus high availability and scalability. With this in mind, we designed an (almost) serverless architecture based on AWS Lambda. Every machine learning model we build is trained offline and then deployed as an independent Lambda function.
Given the current maximum execution time of 5 minutes, we still run the training phase on a separate EC2 Spot instance, which reads the dataset from our data warehouse (hosted on Amazon RDS), but we are looking forward to migrating this step to a Lambda function as well.
We are using Amazon API Gateway to manage RESTful resources and API credentials, by mapping each resource to a specific Lambda function.
The overall architecture is logically represented in the diagram below:
Both our website and mobile app can invoke Inspire with simple HTTPS calls through API Gateway. Each Lambda function logically represents a single model and aims at solving a specific problem. More in detail, each Lambda function loads its configuration by downloading the corresponding machine learning model from Amazon S3 (i.e. a serialized representation of it).
Behind the scenes, and without any impact on scalability or availability, an EC2 instance takes care of periodically updating these S3 objects, as outcome of the offline training phase.
Moreover, we want to A/B test and optimize our machine learning models: this is transparently handled in the Lambda function itself by means of SixPack, an open-source A/B testing framework which uses Redis.
Data Collection Pipeline
As far as data collection is concerned, we use Segment.com as data hub: with a single API call, it allows us to log events into multiple external integrations, such as Google Analytics, Mixpanel, etc. We also developed our own custom integration (via webhook) in order to persistently store the same data in our AWS-powered data warehouse, based on Amazon RDS.
Every event we send to Segment.com is forwarded to a Lambda function – passing through API Gateway – which takes care of storing real-time data into an SQS queue. We use this queue as a temporary buffer in order to avoid scalability and persistency problems, even during downtime or scheduled maintenance. The Lambda function also handles the authenticity of the received data thanks to a signature, uniquely provided by Segment.com.
Once raw data has been written onto the SQS queue, an elastic fleet of EC2 instances reads each individual event – hence removing it from the queue without conflicts – and writes it into our RDS data warehouse, after performing the required data transformations.
The serverless architecture we have chosen drastically reduces the costs and problems of our internal operations, besides providing high availability and scalability by default.
Our Lambda functions have a pretty constant average response time – even during load peaks – and the SQS temporary buffer makes sure we have a fairly unlimited time and storage tolerance before any data gets lost.
At the same time, our machine learning models won’t need to scale up in a vertical or distributed fashion since Lambda takes care of horizontal scaling. Currently, they have an incredibly low average response time of 1ms (or less):
We consider Inspire an enabler for everything we do from a product and content perspective, both for our customers and our operations. We’ve worked to make this the core of our technology, so that its contributions can quickly be adapted and integrated by everyone internally. In the near future, it will be able to independently make decisions for our content team while focusing on our customers’ need. At the end of the day, Inspire really answers our team’s doubts on which content we should prioritize, what works better and exactly how much of it we need. Our ultimate goal is to improve our customer’s learning experience by making Cloud Academy smarter by building real intelligence.
Join our Webinar
If you would like to learn more about Inspire, please join our April 27th webinar – How we Use AWS for Machine Learning and Data Collection.
PS – Cloud Academy is hiring – check out our open positions!
Regular readers of this blog will know that I am a big fan of Amazon Relational Database Service (RDS). As a managed database service, it takes care of the more routine aspects of setting up, running, and scaling a relational database.
We first launched support for SQL Server in 2012. Since that time we have added many features including SSL support, major version upgrades, transparent data encryption, and Multi-AZ. Each of these features broadened the applicability of RDS for SQL Server and opened the door to additional use cases.
Many organizations store their account credentials and the associated permissions in Active Directory. The directory provides a single, coherent source for this information and allows for centralized management. Given that you can use the AWS Directory Service to run the Enterprise Edition of Microsoft Active Directory in the AWS Cloud, it is time to take the next step!
Support for Windows Authentication
You can now allow your applications to authenticate against Amazon RDS for SQL Server using credentials stored in the AWS Directory Service for Microsoft Active Directory (Enterprise Edition). Keeping all of your credentials in the same directory will save you time and effort because you will no longer have to find and update each copy. This may also improve your overall security profile.
You can enable this feature and choose an Active Directory when you create a new database instance that runs SQL Server. You can also enable it for an existing database instance. Here’s how you choose a directory when you create a new database instance (you can also create a new one):
To learn more, read about Using Microsoft SQL Server Windows Authentication with a SQL Server DB Instance.
This feature is now available in the US East (Northern Virginia), US West (Oregon), EU (Ireland), Asia Pacific (Sydney), Asia Pacific (Tokyo), and Asia Pacific (Singapore) Regions and you can start using it today. There is no charge for the feature, but you will pay the standard rate for the use of AWS Directory Service for Microsoft Active Directory.
Do you currently store relational data in an on-premises Oracle, SQL Server, MySQL, MariaDB, or PostgreSQL database? Would you like to move it to the AWS cloud with virtually no downtime so that you can take advantage of the scale, operational efficiency, and the multitude of data storage options that are available to you?
If so, the new AWS Database Migration Service (DMS) is for you! First announced last fall at AWS re:Invent, our customers have already used it to migrate over 1,000 on-premises databases to AWS. You can move live, terabyte-scale databases to the cloud, with options to stick with your existing database platform or to upgrade to a new one that better matches your requirements. If you are migrating to a new database platform as part of your move to the cloud, the AWS Schema Conversion Tool will convert your schemas and stored procedures for use on the new platform.
The AWS Database Migration Service works by setting up and then managing a replication instance on AWS. This instance unloads data from the source database and loads it into the destination database, and can be used for a one-time migration followed by on-going replication to support a migration that entails minimal downtime. Along the way DMS handles many of the complex details associated with migration, including data type transformation and conversion from one database platform to another (Oracle to Aurora, for example). The service also monitors the replication and the health of the instance, notifies you if something goes wrong, and automatically provisions a replacement instance if necessary.
The service supports many different migration scenarios and networking options One of the endpoints must always be in AWS; the other can be on-premises, running on an EC2 instance, or running on an RDS database instance. The source and destination can reside within the same Virtual Private Cloud (VPC) or in two separate VPCs (if you are migrating from one cloud database to another). You can connect to an on-premises database via the public Internet or via AWS Direct Connect.
Migrating a Database
You can set up your first migration with a couple of clicks! You simply create the target database, migrate the database schema, set up the data replication process, and initiate the migration. After the target database has caught up with the source, you simply switch to using it in your production environment.
I start by opening up the AWS Database Migration Service Console (in the Database section of the AWS Management Console as DMS) and clicking on Create migration.
The Console provides me with an overview of the migration process:
I click on Next and provide the parameters that are needed to create my replication instance:
For this blog post, I selected one of my existing VPCs and unchecked Publicly accessible. My colleagues had already set me up with an EC2 instance to represent my “on-premises” database.
After the replication instance has been created, I specify my source and target database endpoints and then click on Run test to make sure that the endpoints are accessible (truth be told, I spent some time adjusting my security groups in order to make the tests pass):
Now I create the actual migration task. I can (per the Migration type) migrate existing data, migrate and then replicate, or replicate going forward:
I could have clicked on Task Settings to set some other options (LOBs are Large Objects):
The migration task is ready, and will begin as soon as I select it and click on Start/Resume:
I can watch for progress, and then inspect the Table statistics to see what happened (these were test tables and the results are not very exciting):
At this point I would do some sanity checks and then point my application to the new endpoint. I could also have chosen to perform an ongoing replication.
The AWS Database Migration Service offers many options and I have barely scratched the surface. You can, for example, choose to migrate only certain tables. You can also create several different types of replication tasks and activate them at different times. I highly recommend you read the DMS documentation as it does a great job of guiding you through your first migration.
Price and Availability
The AWS Database Migration Service is available in the US East (Northern Virginia), US West (Oregon), US West (Northern California), EU (Ireland), EU (Frankfurt), Asia Pacific (Tokyo), Asia Pacific (Singapore), and Asia Pacific (Sydney) Regions and you can start using it today (we plan to add support for other Regions in the coming months).
Pricing is based on the compute resources used during the migration process, with a charge for longer-term storage of logs. See the Database Migration Service Pricing page for more information.