AWS Cloud Enterprise Strategy Blog

Database Freedom: Let’s Take Off Our Database Blinders—For Good

Live blog post from re:Invent 2018, Las Vegas

For decades, enterprises have thought of data in terms of the relational database model. It is a brilliant model and has solved many of the data-handling problems of early IT. With a normalized database schema, we could reduce redundancy and bring out the relationships between data items more clearly. We could enforce referential integrity withinthe database engine through declarative constraints, and enforce transactional integrity withinthe database through ACID transactional mechanisms. With SQL we gained a common language that—at least in theory—gave us portability between different database products.

We liked this model so much that we used it for pretty much all of our data. We used it even when our data was going to be retrieved from a single table rather than from a relation between tables. We used it even when the relational structure wasn’t a good logical model for the type of data we were collecting (for example, for time series data or for graph data). We used it when our data was rarely changing but frequently analyzed, and we used it when our data was geographically distributed, making relational joins expensive. We used it even though security and privacy capabilities had to be bolted on. We used it even as we broke our code into microservices, and in doing so we wound up with undesirably tight coupling between microservices through their shared databases.

The penalty that we paid for this obsession with the relational model was that as our databases scaled we sacrificed performance—the overhead of the relational model was just an impediment in many of these use cases. We also paid the penalty in more complex code as we had to manipulate all our data to fit within the relational model, and we paid a penalty in our continuous delivery processes, which had to handle data in cumbersome ways that resisted automation and the ability to roll back changes. Alterations to database schema were expensive and awkward.

As Werner Vogels pointed out in his keynote at re:Invent yesterday, by far the majority of our data needs don’t really fit into the relational model. This is why the open source community, AWS, and many other software and service vendors have been offering alternatives to the relational model that perform optimally for these other use cases. These alternative types of databases are available; what remains is for enterprises to escape from the relational way of thinking and take advantage of the non-relational models where appropriate to increase performance, resilience, security, development speed, and agility.

According to Werner, a study of the way databases were being used showed that about 70% of data use cases could better be served by key-value stores rather than relational databases. Amazon’s DynamoDB is optimized for those use cases. Other data—those where there are many and complex relationships between data items—can most effectively be handled by a graph database like Amazon Neptune. And, newly announced this week, AWS has a database that is optimal for time series data, Amazon Timestream, and another that provides capabilities for data that must be cryptographically validated, Amazon Quantum Ledger Database.

The important point is not that AWS offers databases in lots of different colors and trims, but that we offer databases that are optimized for the different use cases that enterprises face. One of the largest opportunities we are missing as a profession is the opportunity to choose the database model that is most appropriate for the type of data and access that we have, and our insistence on putting the relational model even when it makes little sense. By doing so, we are reducing the performance, scalability, resilience, and security of our systems. Let’s keep the relational model for the places where it is useful, but be creative and deliberate about how we handle all the other types of data that are important to us today.

Mark

@schwartz_cio
A Seat at the Table: IT Leadership in the Age of Agility
The Art of Business Value
War and Peace and IT: Business Leadership, Technology, and Success in the Digital Age (now available for pre-order!)

Mark Schwartz

Mark Schwartz

Mark Schwartz is an Enterprise Strategist at Amazon Web Services and the author of The Art of Business Value and A Seat at the Table: IT Leadership in the Age of Agility. Before joining AWS he was the CIO of US Citizenship and Immigration Service (part of the Department of Homeland Security), CIO of Intrax, and CEO of Auctiva. He has an MBA from Wharton, a BS in Computer Science from Yale, and an MA in Philosophy from Yale.