Why purpose-built databases?

Background

To understand the importance of purpose-built databases, let's explore how application architecture has changed over the past few decades. Then, we'll see how databases work in modern application development.

Click through the tabs to learn a few reasons why a cloud-native database is a better fit for today's applications.

Purpose-built Databases - Why (9:24)
  • Evolution of application architectures
  • Application architectures have changed quite a bit in recent decades. Originally, companies would use on-premises mainframes to handle their critical applications. These mainframes would combine all aspects of an application—compute and storage—and would run for years on end without interruption.

    Applications then were split into pieces using a client-server architecture. The server would respond to requests from various clients, allowing for more distributed systems. The clients and servers could be on the same computer, but the separation allowed for more scalable systems when needed. Clients and servers could be split up so they would not be competing for resources.

    With the rise of the internet, the three-tier architecture became prominent. Applications were split into groups by function: a presentation tier served as the user interface, an application tier handled the business logic and processing, and a data tier provided long-term persistent storage. Again this increased the scalability of applications because each tier could be scaled independently.

    Though the moves from mainframe to client-server and from client-server to three-tier architecture were horizontal splits—where different technical portions of the application would be handled at different layers—the next shift was a vertical split. In recent years, the notion of microservices has taken off. With microservices, you split your application into different services based on their functionality. Rather than having one application handle both your inventory data and order history data, you might split those into two separate services that are focused on their domain. This split allows for the scaling of the two services independently and for better agility between development teams.

  • Databases in a world of microservices
  • With this knowledge of the evolution of application architectures in mind, let's see how databases have evolved in tandem.

    In the beginning, most databases used a hierarchical data model. Data was stored as a tree, where parent records were connected to child records. Each child record could only be connected to a single parent record, which limited the ability to model more complex concepts, such as many-to-many relationships. Hierarchical databases were often deployed in on-premises data centers and used for internal systems.

    The relational database then took the world by storm. In a relational database, strict schemas are enforced and records are normalized to avoid data duplication. Schema enforcement helped with data integrity concerns and normalization helped save on storage costs because storage was the most expensive resource in the data center. The adoption of the SQL query language allowed developers to recombine normalized data as needed. During the rise of the relational database, remote data centers were used increasingly to host applications.

    In the early 2000s, two big shifts changed the database landscape. First, web-enabled global businesses were on the rise. With global applications, database capabilities were pushed to their limits. Because storage costs had dropped precipitously and relational databases were struggling to handle the performance needs, the database world saw the rise of NoSQL databases. These databases dropped certain features of relational databases—such as schema enforcement, transaction support, and the SQL query language—to provide better performance than relational databases.

    At the same time, more and more companies were moving their applications to the cloud. Companies enjoyed the flexible scaling model of the cloud because they could scale up and down according to actual customer demand rather than to estimates from prior quarters. With the change to microservices in application development, developers didn't need to choose a multi-purpose database that could handle everything. Rather, each development team could choose the right database for their application. Accordingly, purpose-built databases have become more common. When designing your application, you can choose the database that is the best fit for the job.

Factors to consider when choosing a purpose-built database

Choosing a database is one of the most important decisions you will make in your application architecture. The type of database you choose will affect the access patterns you can handle, the performance of your application, and the operations for which your team will be responsible. You should consider a number of factors, including your application workload, data shape, performance requirements, and operations burden.

  • Application workload
  • The first factor you should consider when choosing a purpose-built database is the application workload. The workload describes the type of data being stored by your application and the data access patterns.

    There are three broad categories of workloads:

    • Transactional: Also called OLTP (online transactional processing), this refers to application use cases that are characterized by a high number of concurrent operations and where each operation is reading or writing a small number of rows. This applies to most user-facing applications, including ecommerce, mobile gaming, and social networking.
    • Analytical: Also called OLAP (online analytical processing), this refers to use cases that aggregate and summarize large volumes of data. There are usually far fewer concurrent queries in analytical data stores, but they are operating on many more rows per query. Analytical access patterns are usually for internal applications such as reporting.
    • Caching: In a caching workload, you compute and store frequently accessed data in a separate database for faster response times. This can reduce the load on your transactional database and improve response times to your end users. The database is rarely a primary source of data. Rather, the cache is a secondary source that stores derived data from your transactional workloads.

    Understanding the type of workload you have will make it easier for you to choose the right database for your use case.

  • Data shape
  • The second factor to consider is the shape of your data. In considering data shape, ask about the types of entities you will be modeling and the relationships between your entities. How will you access your data? How often will entities be updated?

    The following are a few common data models and when you should choose them:

    • Relational: The relational database is known by most developers. In a relational database, you normalize your data into separate tables and assemble related entities together at query time. It is a good choice when you have multiple related entities with varying update patterns. The strict schema validation and normalized data model help to ensure data integrity across your application.
    • Key-value or wide-column: A key-value or wide-column data store is designed for scale. Your data is split across multiple storage nodes, and additional storage nodes can be added as your data grows. This partitioning scheme allows for nearly infinite scalability with no performance degradation.
    • Document: A document data model uses large records called documents to assemble heterogeneous bits of data that are frequently accessed together. Rather than spreading this data across multiple tables, you can keep the data together in a document for faster access.
    • Graph: A graph data model emphasizes relationships between data. With a graph database, you traverse relationships between objects to find hidden connections between data. A graph database is commonly used for social networking or fraud-detection services.

    Think about the access patterns you have with your data to help choose the right purpose-built database for your application.

  • Performance requirements
  • Another important factor to consider when choosing a purpose-built database are the performance requirements of your application. This refers to not only the speed of your data access and the size of your records, but also where your service will be used in reference to end users.

    If your service is serving a critical workload for users that are awaiting a response, speed is of the utmost importance. You may want to use an in-memory cache to help decrease latency to users.

    On the other hand, if your service is serving internal analytics or is doing background data processing, speed might be less of a factor. You might be more concerned with whether your service can handle the amount of data coming into your system.

    Additionally, you should consider geographic requirements for your data. Some databases, such as Amazon DynamoDB and Amazon Aurora, make it easy to replicate your data across the world. This means your data is closer to your users and results in lower response times.

  • Operations burden
  • A final factor you should consider when choosing a purpose-built database is the operations burden of your database. Developing against your database is only half the battle because you also need to ensure that you have prepared for instance failures, configured backups, and created a plan for upgrades.

    When using AWS purpose-built databases, most of the operations burden is handled for you. AWS databases can automatically promote a replica instance in the event that your primary instance fails. Backups and restores are fully managed for you, and you don't need to think about software upgrades. By using fully managed databases from AWS, you can focus on developing features and innovating for your users.

In the rest of this course, you complete walkthroughs of five different AWS purpose-built databases. During this course, you use a restaurant-review application as a guiding example. You build an application whose users can review restaurants and view summaries of top-reviewed restaurants. In these walkthroughs, you use purpose-built databases to handle primary storage, caching, fraud detection, and more.

Was this page helpful?