In the summer of 2018, Dropbox experienced a capacity crunch in its on-premises metadata store due to fast data growth in some of the partitions. Dropbox’s database team had three choices: double the on-premises storage capacity (which would cost millions of dollars), delete swaths of metadata, or find a new, highly scalable yet cost-effective solution. The third option was the best, but achieving it would be a challenge. Dropbox had less than 2 years until its on-premises system would reach maximum capacity, and the implementation team for the project was made up of just two employees.
Those circumstances pushed Dropbox to pursue a managed solution from Amazon Web Services (AWS). Using Amazon DynamoDB, a fully managed, flexible NoSQL database that delivers single-digit millisecond performance at any scale, and Amazon Simple Storage Service (Amazon S3), a cloud object storage service, Dropbox rapidly developed a new managed storage system called Alki. This made room for virtually unlimited user metadata and not only saved the company millions of dollars—since it would not have to increase on-premises storage—but also reduced the cost per gigabyte by a factor of 5.5.
Migrating Audit Log Data from a Legacy Database to the Cloud
Founded in 2007 by two Massachusetts Institute of Technology students, Dropbox is a global collaboration tool and file sharing service. It has become one of the most successful startups in the world, with over 600 million users uploading more than 400 billion pieces of content.
Dropbox’s metadata stores were originally housed solely within the company’s main data store, Edgestore, hosted in an on-premises distributed database built on top of sharded MySQL clusters. By mid-2018, the rapidly growing cold metadata—data that is accessed infrequently but needs to be stored durably and available instantly—was less than 2 years away from overwhelming Edgestore. Yet increasing the capacity of the on-premises database would require splitting existing partitions and buying new machines to host them, which would double the cost of Edgestore by adding millions of dollars per year. Additionally, it no longer made sense to store cold metadata in the same database as hot—or frequently use —metadata. “If you’re writing data that’s not meant to be read often, it’s extremely expensive to use—not to mention pointless to store—in mediums that are optimized for retrieval speed,” says Jonathan Lee, tech lead for Dropbox’s Alki team.
As a result, two employees split off from the database team to build Alki, the solution that would cost-effectively store metadata. They focused particularly on audit logging data Edgestore’s top cold metadata use case. Because the small Alki team faced a tight deadline that, if missed, could potentially lead to lost user metadata, it decided to implement managed services from AWS. Using Amazon DynamoDB and Amazon S3, Dropbox rapidly prototyped and deployed a cold metadata store on AWS within just a year. AWS Solutions Architects functioned like an extension of Dropbox’s Alki team, providing prescriptive guidance and implementation help.
“When building a storage system, you have to think about a lot of components, including replication, backups, and capacity management. Amazon DynamoDB and Amazon S3 fit that need well—they are industry standards,” says Lee. “These are problems that large teams take several years to solve. But by using Amazon DynamoDB and Amazon S3, we simplify these problems because AWS handles many of the complex tasks like data replication, data durability management, and hardware provisioning. Both Amazon DynamoDB and Amazon S3 grow automatically with our capacity needs. We no longer need to plan for on-premises capacity and budget for hardware purchases and then be stuck with our decisions for 4 years.”
When building a storage system, you have to think about components like replication, backups, and capacity management. By using Amazon DynamoDB and Amazon S3, we simplify these problems because AWS handles many of the complex tasks.”
Alki Team Tech Lead, Dropbox
Building Hot and Cold Metadata Stores Using AWS Solutions
The Alki team, aided by AWS Solutions Architects, constructed a log-structured merge-tree (LSM tree)–based metadata storage system, which has two layers of data storage: an upper layer for hot metadata and a lower layer for cold metadata. Amazon DynamoDB acts as the hot storage layer, ingesting audit logging data to six DynamoDB tables at 4,000–6,000 writes per second per table. Then each of these tables stores 50–80 GB daily. At the end of each day, the team offloads the metadata from these tables into Amazon S3 for permanent storage, after which the tables in Amazon DynamoDB are deleted.
By the beginning of 2019, less than 6 months after the Alki team chose Amazon DynamoDB and Amazon S3, Alki was in its beta stage of production, ingesting all data and serving a subset of the reads. By October 2019 about 300 TB of audit log data—representing a quarter of all data stored in Edgestore—had been migrated to Alki, which was now in full production.
The scalability of Amazon DynamoDB and Amazon S3 helped the Dropbox team complete that data migration in less than 2 weeks. “Normally you might design a system for 10 times the scale you would expect in steady state,” explains Lee. “But we could scale 100–1,000 times on AWS without designing the system ahead of time.” The Alki team expected steady state to be 4,000 queries per second, yet it was able to provision Amazon DynamoDB for 600,000 queries per second during the migration.
AWS Solutions Architects provided premium support to the Alki team throughout the migration, according to Lee. “We have nothing but positive things to say about our interaction with the AWS team working on Alki. It’s always been very proactive with helping us find issues, pointing out how we might make things faster or identifying areas where we might want to be more careful operationally,” Lee says. The Alki team and the AWS Solutions Architects were able to stay in constant communication through real-time channels. And the Alki team will continue to reap the benefits of that collaboration through the managed services of AWS. “Running a system durably takes expertise, and we didn’t have that expertise,” says Stas Ilinskiy, software engineer on the Alki team. “But by using Amazon DynamoDB, we also gain the people with the expertise to run it.”
Alki saved Dropbox millions of dollars in expansion costs and significantly reduced per-user gigabyte costs by using Amazon DynamoDB and Amazon S3. Dropbox’s Edgestore would cost users 5.5 times more than Alki per user-gigabyte per year.
Continuing to Create a Superior User Storage Experience
The Alki team is exploring how it might use Amazon EMR to more efficiently offload the data from Amazon DynamoDB to Amazon S3, a process that is currently handled by Dropbox’s own batch processing system. Additionally, to further realize cost savings with Alki, Dropbox migrated another database with 300 TB of cold metadata to Alki from Edgestore in October 2020. This sets the stage for how Dropbox might use Alki in the future to optimize and further drive down costs: the company might use it as a general-purpose cold metadata store. “Rather than moving specific use cases over, could we integrate Alki with Edgestore and transparently move data between the two?” asks Lee. “That’s the next vision.”
By using Amazon DynamoDB and Amazon S3, the Alki team was able to rapidly launch durable, scalable metadata storage that has resulted in massive cost savings for Dropbox. The managed services offered by AWS make maintaining this storage a sustainable, long term option. The solution has also enabled Dropbox to launch several projects that it couldn’t on Edgestore. “The whole Alki project was very hotly watched by all of upper management,” Lee says. “We are very happy with the performance of Alki and thus the performance of Amazon DynamoDB and Amazon S3.”
Dropbox, headquartered in San Francisco, provides one place to keep life organized and keep work moving. With more than 600 million registered users across 180 countries, Dropbox is on a mission to design a more enlightened way of working.
AWS Services Used
Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance.
Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale.
Learn more »
Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto.
Learn more »
Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.