• About Amazon.com

    Amazon.com is the world’s leading online retailer and the pioneer of customer reviews, 1-Click shopping, personalized recommendations, Prime, AWS, Kindle, Alexa, and many more products and services.

     

  • Benefits

    • Reduced operational effort by 90%
    • Moved 150 TB in 2 months
    • Experienced no downtime during migration
    • Increased availability by 10 times

     

  • AWS Services Used

Amazon.com has the world’s largest selection of items—including billions of items in its catalog and more than 500 million available for sale in 2018. The Items and Offers Platform is Amazon’s technology platform for ingesting, processing, and hosting authoritative information about catalog entities across the company’s entire business.

The Item Master Service (IMS) is responsible for the ingestion and processing of data from various suppliers and production of an authoritative catalog. IMS publishes catalog data to the Amazon website, search indexes, and other services that Amazon customers and sellers interact with every day. IMS operates at a massive scale, processing more than five billion catalog updates a day.

As Amazon’s business grew, the size of the IMS database also grew—by up to 50 percent annually. Such growth required the engineering team to spend up to 40 percent of its time each year scaling the database persistence layer. The Oracle database that IMS originally used had limited database connections and input/output (I/O). The company had to deploy ever-more-powerful hardware to support more connections and higher I/O capacity, resulting in overprovisioning and high operating costs.

Though all the databases had redundancy to compensate for outages, failover to a replica significantly reduced system availability due to the need to access multiple tables partitioned across the databases. Whenever a database failed or the team needed to update the database software, system availability fell by up to 50 percent. As the system grew, points of failure increased and availability dropped.

The team decided to migrate IMS to Amazon DynamoDB, a nonrelational database that delivers reliable performance at any scale. IMS did not use the relational features of Oracle and was architected for the horizontal scalability inherent in nonrelational databases.

IMS is one of the longest-lived internal services at Amazon and has been migrated multiple times over the years. However, because of rapid data growth, by the time the team decided to migrate to DynamoDB, it faced a scale several orders of magnitude higher than anything it had faced in the past, with more than 600 billion records adding up to more than 150 TB of data.

The team estimated it would take several hundred hours of software-development effort to build a tool that could transform and migrate data reliably at this scale. That’s why it chose to use AWS Database Migration Service (AWS DMS). AWS DMS automatically manages the deployment, management, and monitoring of all hardware and software needed for migration. It also enables dynamic scaling to match workloads. By using AWS DMS, the team could avoid having to build its own tools and successfully migrate the database in a relatively short time.

During the migration, writes were switched from the source database to the target database. Before the team started the migration, the application’s persistence layer was updated to support reads and writes to the source, to the destination, or both. Optimistic concurrency checks were used to prevent new data from being overwritten with old data.

To ensure the IMS migration did not affect application availability and scale, the team performed the migration in two phases: live migration and backfill. During live migration, the migration team updated the application to switch writes from the Oracle database to DynamoDB. Both Oracle and DynamoDB served reads, and stale data was eliminated using the version number of the record.

After the team enabled live migration for the tables, it migrated “cold” data from Oracle to DynamoDB by using AWS DMS in two steps. First, the team transformed the Oracle data to a nonrelational document format by using AWS DMS object-mapping templates. Second, the team configured IMS to perform conditional “puts” on DynamoDB.

The team created enough AWS DMS instances to achieve sufficient throughput. Because the data stores were already partitioned, Amazon could run multiple instances of IMS on each table to achieve a migration rate of more than 100,000 records per second. The team validated migrated data by verifying that the total number of records they migrated matched the source tables. After the data migration began, Amazon migrated seven tables with approximately 150 TB of data in 24 Oracle databases to DynamoDB in less than two months, thanks in part to the efficiency of AWS DMS.

The move of IMS to DynamoDB delivered all the benefits the team hoped for—and more. Previously, in preparation for peak-load events such as Amazon Prime Day, the engineering team had to fine-tune and test database configurations, which could take up to two weeks. DynamoDB reduced the engineering time required to just a few hours.

DynamoDB supports a simplified architecture that uses global secondary indexes to improve reliability and availability. Additionally, Amazon simplified the persistence layer for the application by eliminating the need to maintain partition information and related logic to route data to the appropriate database.

The availability of IMS improved by a factor of 10, ensuring consistent performance and reducing the operational workload of keeping the database up and running. With DynamoDB, the team used the point-in-time recovery feature to simplify backup and restore operations. Amazon obtained these benefits at a lower overall cost compared to the previous system, thanks in part to auto scaling that dynamically adjusted the provisioned read/write capacity based on usage instead of having a fixed capacity based on peak loads.

Today, IMS can grow as large as Amazon requires without putting an undue burden on the engineering team. The reliable, consistent performance of DynamoDB and fewer limits on scalability mean Amazon customers can always enjoy the experience they have come to expect.