What is a data mart?
A data mart is a data storage system that contains information specific to an organization's business unit. It contains a small and selected part of the data that the company stores in a larger storage system. Companies use a data mart to analyze department-specific information more efficiently. It provides summarized data that key stakeholders can use to quickly make informed decisions.
For example, a company might store data from various sources, such as supplier information, orders, sensor data, employee information, and financial records in their data warehouse or data lake. However, the company stores information relevant to, for instance, the marketing department, such as social media reviews and customer records, in a data mart.
How does a data mart compare to other types of data storage systems?
Companies use several different types of data storage systems for data management and analytics. Let’s look at some common types of data storage to understand the context in which companies use data marts.
A database is organized storage that computer systems use to store, search, retrieve, and analyze information. There are various types of databases, such as relational databases. A relational database stores information in tables consisting of rows and columns. Data in different tables is connected by a unique identifier known as a key. Keys are the non-repetitive values in specific columns.
Data mart vs. database
A data mart serves as the front-facing element for a department’s data. You can use a data mart to retrieve and analyze information. Meanwhile, a database collects, manages, and stores information. You can then use tools to process, format, and transfer the stored information to a data mart.
A data warehouse is an extensive database system that stores information for an entire business. It collects raw information from various sources, such as business software and social media feeds, and processes it into structured data stored in a tabular format. Businesses can connect an enterprise data warehouse to business intelligence tools to make smarter decisions.
Data mart vs. data warehouse
A data mart shares many of the qualities of a data warehouse. Where they differ is that a data warehouse contains enterprise-wide data about various topics. Meanwhile, a data mart stores information closely related to a specific subject. For example, a data warehouse might store information for the marketing, human resources, procurement, and customer support departments. However, a data mart might store only transactional data relevant to a single department. The appeal of building a data mart is that departments who manage their data marts have complete control over the loading and management of their data.
Many organizations are using technologies like data sharing to publish their data marts to a central data warehouse. By doing so they can be more agile by distributing ownership and isolating workloads. Similarly, data sharing allows departmental data marts to consume data shared from a data warehouse or other data marts.
A data lake is data storage that holds raw and unstructured information. It does not store information in files and folders. Instead, it stores unprocessed information in a flat hierarchy on massive storage. Data lakes store different types of raw information, including text documents, images, videos, and audio.
Data analysts use data lakes to conduct predictive analysis from unstructured data. For example, a data lake might store texts from social media reviews that businesses can use for sentiment analysis. Data analysts can use sentiment analysis to detect negative opinion trends for a company.
Data mart vs. data lake
Because data lakes store unprocessed data, some of the information might be duplicates or might not be meaningful to the company. Meanwhile, a data mart stores processed data that meets a specific need. A data lake could be the source of a data mart. Businesses determine data trends by looking at historical data in data marts, but they use data lakes to analyze the stored information deeply.
Online Analytical Processing (OLAP) is a method to represent data in multiple dimensions. For example, data analysts use an OLAP cube to simultaneously show sales revenue based on months, cities, and products. OLAP data structures are wide, with fields classified as either facts or dimensions and result in data duplication. This contrasts with conventional relational databases, which favor narrow structures and little data duplication.
Data mart vs. OLAP cube
OLAP is a specific information storage strategy which denormalizes data into wide tables. OLAP simplifies complex representations of multidimensional data. Some data marts might use OLAP to structure their information, but others use conventional, normalized structures. Business analysts benefit from OLAP structures to visualize information from a data mart.
Operational data store
An operational data store (ODS) is information storage that acts as an intermediary between data sources and the data warehouse. Data analysts use the ODS to provide near-real-time reporting about transactional data. The ODS supports simple queries and provides only a limited amount of information. For example, the ODS might store sales records only for the past 12 hours.
Data mart vs. ODS
A data mart extracts subject-oriented information from a data warehouse, but an ODS sends information into the data warehouse for processing. Data marts offer historical information that you can analyze, but an ODS provides an updated view of current operations. For example, you can use a data mart to identify sales patterns for the past quarter but receive hourly sales figure updates from the ODS.
Why is a data mart important?
These are some good reasons that companies might use a data mart.
Retrieve data more efficiently
By using a data mart, companies can access specific information more efficiently. Compared to a data warehouse, a data mart contains relevant and detailed information that a department accesses frequently. Therefore, business managers don’t need to search the entire data warehouse to generate performance reports or graphics.
Companies can create a subset of data from a data warehouse with a data mart. Employees within the department can then analyze the data and make decisions based on the same set of information.
Control information more effectively
A data mart gives employees highly granular access privileges. This means the company can authorize a certain person to view or retrieve specific data. It helps companies to improve data governance and enforce information access policies. For example, you can use data marts to provide user access to employees for specific information in a data warehouse.
Manage data flexibly
A data mart is smaller and contains fewer tables than a data warehouse. This means data engineers can manage and change information in a data mart without causing major database changes.
How does a data mart work?
A data mart turns raw information into structured, meaningful content for a specific business department. To do this, data engineers set up a data mart to receive information either from a data warehouse or directly from external data sources.
When it is connected to a data warehouse, the data mart retrieves a selection of information that is relevant to a business unit. Often, the information contains summarized data and excludes unnecessary or detailed data.
Extract, transform, and load (ETL) is a process for integrating and transferring information from various data sources into a single physical database. Data marts use ETL to retrieve information from external sources when it does not come from a data warehouse. The process involves the following steps.
- Extract: collect raw information from various sources
- Transform: structure the information into a common format
- Load: transfer the processed data to the database
ETL tools copy information from external sources like spreadsheets, apps, and text documents. The data mart then processes, organizes, and stores the information in a structured form.
Business analysts use software tools to retrieve, analyze, and represent data from the data mart. For example, they use the information stored in data marts for business intelligence analytics, reporting dashboards, and cloud applications.
Each data mart serves a small number of users. For example, the marketing manager and senior marketers have access to a data mart, so it takes less time to generate reports and graphs or to perform predictive analysis.
What are the types of data marts?
These are the different types of data marts.
Dependent data mart
A dependent data mart populates its storage with a subset of information from a centralized data warehouse. The data warehouse gathers all the information from data sources. Then, the data mart queries and retrieves subject-specific information from the data warehouse.
Pros and cons
Most data management and administration works are performed in the data warehouse. This means that business analysts do not need to be highly skilled in database management to use information from the data mart. Although dependent data marts make retrieving information much easier, they present a single point of failure. If the data warehouse fails, all the connected data marts will also fail.
Independent data mart
An independent data mart does not rely on a central data warehouse or any other data mart. Each data mart collects information from its sources instead of from a data warehouse. Independent data marts are suitable for smaller companies, but only specific departments need to access and analyze information.
Pros and cons
Companies can set up independent data marts with relative ease. However, managing them might be difficult. This is because business analysts need to perform database administrative work at each data mart. It is straightforward to share data between different data marts using strategies like data sharing; departments can read another department’s data and even augment it with their own data. However, a strong data cataloging strategy must be put into place to ensure each department knows what they are looking at.
Hybrid data mart
Hybrid data marts collect information from a data warehouse and from external sources. This allows companies the flexibility to test independent data sources before they direct the data to the data warehouse.
For example, suppose you launch a new product and want to analyze its initial sales data. The data mart uses sales information that comes directly from the e-commerce software and retrieves sales records for other products from the data mart. After the product becomes a permanent fixture in your store, you channel the transaction details to the data warehouse.
What are the structures of a data mart?
Data marts use these structures to store and represent information.
The star structure has a fact table at its center and branches to several dimension tables. This results in a star-shaped connection. The fact table is a data table that contains summarized data that you can use for analytical purposes. Meanwhile, dimension tables hold descriptive information in a fact table. Each dimension table links to the fact table with a foreign key. A foreign key is a unique identifier, such as a product ID or supplier ID.
For example, a fact table for sales transactions has the following columns:
- Sales ID
- Product ID
- Supplier ID
- Sales amount
A dimension table for products stores the following information:
- Product ID
- Product name
- Product cost
The supplier dimension table has the following columns:
- Supplier ID
- Supplier name
In a star structure, the dimension table is denormalized to not extend into additional tables. This means that the dimension table might contain redundant data but improves search and retrieval speed. It also takes less space to store dimension tables.
Business analysts can use a star-structured data mart to make complex queries straightforward. When they search for a specific sales record, the data management system searches through the fact table. When the data mart system finds the correct record, it uses the product ID and supplier ID to query data from the respective dimension tables.
A denormalized structure stores all the related data in a single table. It does not have complex joints between fact tables and dimension tables. Data analysts use a denormalized data mart because it improves query speed. For example, a search for a sales record takes place in a single denormalized table as follows:
- Sales ID
- Product name
- Product cost
- Model name
- Supplier name
- Sales amount
A denormalized data mart is suitable for real-time reporting because of its single-table approach. However, denormalizing the data mart leads to data redundancy. For example, the same product name might appear in multiple records. This leads to additional storage space and expensive implementation costs.
What are the steps in implementing a data mart?
Cloud data engineers set up a data mart by doing the following:
- Launch their cloud-native data platform.
- Populate the data mart with business data. They ensure that the data has the correct format and is relevant to the business users.
- Set up the data mart so that multiple users can access data from it. For example, they install a reporting dashboard in the data mart.
- Continue to monitor, optimize, and resolve issues when the data mart runs.
How can you implement a data mart on AWS?
Companies need to process increasing data volumes that stretch conventional data mart storage to the limit. Data marts that are installed on on-premises servers are difficult to scale. Cloud architecture offers cheaper, more scalable, and more manageable enterprise-level integration for data marts.
Amazon Redshift is a data warehousing solution that you can use to implement data marts in the cloud. You can get integrated insights by running real-time and predictive analytics on complex, scaled data across your operational databases, data lake, data warehouse, and thousands of third-party datasets. You can automatically create, train, and deploy machine learning (ML) models with ease. You can create data marts on Amazon Redshift and use them to make smarter decisions.
Amazon Redshift has some key features that make it a great solution for your data mart:
- With Amazon Redshift Serverless, considerations for the size and scale of the cluster are handled for you.
- Because of native data sharing, data in your data mart can access data in your data warehouse or can be shared to your data warehouse.
Get started with data marts by creating an AWS account today.
AWS Data Mart next steps
Instantly get access to the AWS free tier.
Get started building in the AWS Management Console.