Skip to main content

What is a Database Schema?

A database schema is a logical structure that defines how data is organized within a database. Relational databases and some non-relational databases use schemas to describe the structure of their data, its interconnections, and internal processes. Database schemas provide a logical blueprint for data storage and organization, for greater user accessibility, scalability, and data integrity.

What are the benefits of a database schema?

As a database schema defines how a business organizes its data, there are several benefits to using one.

Improve organization

Businesses can organize their information into clear data structures to improve organization and ensure that the relationships between datasets are clear and consistent. A well-defined schema also enables businesses to scale their database management system more easily.

Enhance data integrity

By implementing rules about how your business stores data with a schema, you ensure a high level of integrity, even in complex storage systems. Maintaining consistent rules helps ensure data validity and meet compliance requirements.

Increase accessibility

A database schema offers various views into the overall data structures you use. Using these various levels, designers, administrators, and stakeholders can all discuss the structure even without technical knowledge.

What are the steps to design a database schema?

There are three steps to designing database schemas commonly used in a database management system.

1. Conceptual database schema

A conceptual database schema design is the highest-level view of a database, providing an overall view of the database without the minor details. A conceptual database schema design is typically a quick drawing by hand.

For example, relational databases store data in tables, with each table containing a set of related data. A conceptual database schema might describe a product table and its attributes, a customer table, and a many-to-many relationship between tables. However, a conceptual database schema may not contain finer implementation details, such as data types or access constraints.

A conceptual schema is useful for charting the overall data flow in an organization, without offering too much detail.

2. Logical database schema

A logical database schema design provides an outline of how data within a database is structured. It describes the relationship between entities and shows more details about how data is organized. A logical database schema design is typically a digital data modelling exercise.

Each entity in the data schema is defined in relation to information such as:

  • Table names
  • Entity relationships
  • Attribute names
  • Default values
  • Data types
  • Security constraints
  • Procedures
  • Views
  • Indexes
  • Metadata

A complete logical schema design ensures data consistency and integrity by providing constraints for new and existing data.

Logical database schemas do not typically include any technical requirements.

3. Physical database schema

A physical database schema describes exactly where data can be found within a broader database structure. It includes technical storage details, identifying the file locations, specific storage formats, and indexing strategies used by each table to store its data. A physical schema design is typically a combination of fixed database technical design patterns and user specifications.

The physical schema is the least conceptual form of database schema and offers real insight into data locations. Logical and physical schemas are required for database instatiation.

What are ways to model database schemas?

Different types of database schema styles suit distinct business needs and data types. Online transaction processing (OLTP) databases, such as product ordering systems, use the Entity-Relationship schema modeling technique. Online Analytical Processing (OLAP) databases, such as complex business querying, may require different modeling techniques, such as the star schema and snowflake schema.

Here are some of the most popular database schema styles.

Entity-Relationship (ER) schema

A entity-relationship schema assigns each entity to a table and then maps the connections between tables. E-R schemas have multiple bridging relationships between all pieces of data; 1:1. 1:many, and many:many. This type of relational database schema utilizes tables, columns, and rows to construct data systems, which they connect through relationships and constraints.

Star schema

Businesses can utilize a star schema to manage and organize large datasets based on two primary principles: facts and dimensions. In the context of a star schema, facts are the center of the structure and provides measurement-based pieces of data. Examples of such central facts are the number of transactions, website clicks, or total purchases. Dimensions then provides additional information about the fact, such as which customer made the purchase, where they made it from, and what product they bought.

Snowflake schema

A snowflake schema, much like a star schema, uses a central fact table that then connects to multiple dimension tables. However, unlike the star schema, the snowflake schema dimension tables will then have a range of additional database tables that branch off them, offering more details on those dimensions. Using a snowflake schema is useful for data that has a large number of dimensions and sub-dimensions. Both star and snowflake schemas are often used in business intelligence. Both of these approaches allow database users to organize their view of data organized by specific business dimensions.

Hierarchical schema

A hierarchical database schema employs a tree-like structure, with a root node at the top that branches out into other node branches. In a hierarchical model, each ‘parent’ piece of data can have multiple child nodes, while each child node can only have one parent. For example, a hierarchical model could begin with a company, branch out to each department, and then branch further to individual employees within each department.

What is the process of designing an OLTP database schema?

The process of designing a database schema is called data modeling.

Here are the main steps to produce a data model of an OLTP system.

Gather requirements

Before creating a database, you must identify its purpose and outline the key information, such as the data you want it to contain and how you plan to use the database. The best database for you will vary depending on:

  • Specific data you use
  • Queries you need to interact with the database
  • Reports you want to generate

This step outlines your objectives, guiding your database schema design process.

Create entity-relationship diagrams

An entity relationship diagram (ERD) maps out how tables, database objects, and individual entities within a database connect. Creating a conceptual schema view of your database enables you to visualize how the database functions and gain insight into the data it stores.

At this stage, you can also define the naming conventions that tables, columns, database objects, and indexes use in your database. Conventions help everyone to have a standard approach when inputting data.

Organize data entities into tables

Based on your ERD map, you can now organize all of your data into specific tables. Each entity in your database structure should have its own table, with individual columns holding related attributes. Define the primary key that will allow you to easily identify and retrieve specific data values.

Normalize data structures

Normalization is a process in database schema design aimed at reducing data redundancy and improving data integrity. It involves organizing data into tables in such a way that relationships among the data are well-structured and anomalies are minimized.

There are several normal forms, each with specific requirements. Each successive normal form addresses a different redundancy or dependency type to enhance data consistency and make the schema more robust.

1NF

1NF requires that each column contains atomic (indivisible) values and that each record is unique. It removes repeating groups and multi-valued fields.

2NF

2NF builds on 1NF by ensuring that all non-key attributes are fully functionally dependent on the entire primary key (i.e., it eliminates partial dependencies).

3NF

3NF adds that all non-key attributes must depend only on the primary key, and not on other non-key attributes (i.e., it removes transitive dependencies).

Implement security measures

Create a permissions structure to ensure that only authorized users can access your database and view the information it contains. You can assign distinct privileges to different user groups in the database, such as the ability to read, write, or delete information, which helps keep your sensitive data safe. Define role based access controls to ensure that only authorized users can view or modify sensitive data.

Test

Testing your database schema design with some basic queries and other interactions ensures everything functions as intended. Collecting data on how the database functions at this stage will inform any additional changes you need to make to ensure your schema is effective and free from performance issues.

What is the difference between a database schema and a database instance?

A database schema refers to the overall design of a database, providing information about its structure, what it may include, and the relationships between datasets. However, a data schema is only the blueprint for data organization; the schema doesn’t contain any data.

A database instance is the active session that a database schema describes and holds the data at any given moment. An instance is where the actual data values are and will constantly change as new data is added, deleted, or updated. Unlike database schemas, database instances contain all the data.

What is database schema conversion?

A database schema conversion is the process of adapting an existing database schema to a new format. This may involve adding or modifying tables, columns, indexes, constraints, or relationships between tables.

The goal is often to support new application requirements, improve performance, or move to a different database system. Schema conversion enables more efficient data organization or to support features of a new system.

Data migration may or may not require schema conversion, depending on the source and destination databases.

How can AWS support your database schema requirements?

The process of data modeling is typically done outside the database. Once the model has been created, Amazon Relational Database Service (RDS) supports schema creation and management through standard SQL. Amazon RDS provides managed relational database management systems such as PostgreSQL, MySQL, and Amazon Aurora.

For database migrations, AWS Database Migration Service (DMS) is a managed migration service that helps you move your databases and analytics workloads to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime for applications that rely on it.

DMS Schema Conversion in AWS DMS makes database migrations between different types of databases more predictable. It can assess the complexity of your migration for your source data provider and convert database schemas and code objects. You can then apply the converted code to your target database.

The new generative AI capability in AWS DMS Schema Conversion automates some of the most time-intensive schema conversion tasks. The feature automatically converts up to 90 percent of schema objects from commercial databases to PostgreSQL migrations.

You can also use the AWS Schema Conversion Tool (SCT) to convert your existing database schema from one database engine to another.

Get started with database schema conversion on AWS by creating a free account today.