AWS Marketplace

How Vista built a data mesh enabled by solutions available in AWS Marketplace

Vista, a Cimpress business, is the design and marketing partner to millions of small businesses around the world. Data plays a key role in enabling Vista to deliver value to its customers. To deliver high-impact, data-driven results faster, Vista employs a data mesh strategy.

In this post, I will explain what a data mesh is and show why Vista decided to create a data mesh. I’ll also show how Vista uses AWS services combined with solutions available in AWS Marketplace to build a multi-tenant data platform that enables its data mesh approach.

Addressing Vista’s data challenges

Previously, Vista had an almost-petabyte-size data warehouse hosted on-premises. A centralized team of specialists was required to curate the data and produce one-off data reports, such as spreadsheets, visualizations, and presentations. As Vista modernized its workloads, it moved away from monolithic applications and towards distributed, microservice architectures. Having a centralized data team led to some challenges, including the following:

  1. The central data team did not have deep domain knowledge of the data they were curating, making it difficult to efficiently deliver clean, correct, and meaningful datasets.
  2. The central data team was acting as a bottleneck for the many data requests that came from across the business, leading to request backlog.
  3. Lots of data was being collected, but not a lot of business value was being derived from this data because the data team could not keep up with demand.

What is a data mesh, and why is it useful?

A data mesh is an interdependent web of data products contributed from across an organization’s business domains. A domain is a department, such as marketing or customer care. A data product might be a personalized product recommendation algorithm that suggests products a customer may be interested in based on what other products they have viewed or purchased. In a data mesh, these data products are fully owned by the individual business domains.

Cross-functional teams include business stakeholders, product managers, engineers, and data scientists who work together within a domain to build data products. Once the data products are contributed to the data mesh, teams both within and outside the original domain can easily consume them.

How to create a data mesh

Creating a data mesh is conceptually similar to breaking down a software monolith into microservices from the world of software engineering. To build a data mesh, you:

  1. Define domains within your business. For example, your domains might be marketing, customer care, and manufacturing. There should be a clear mapping between domains and the data that each domain produces and owns.
  2. Create cross-functional teams comprised of business stakeholders, product managers, data engineers, data scientists, and machine learning practitioners. Each domain should have one or more teams.
  3. Define data products. Working backwards from business objectives, each domain defines one or more data products that add value to the overall business. For example, a data product could be a machine learning (ML)-powered recommendation engine that helps customers discover relevant products on the company website. The cross-functional teams within the domain own these data products top-to-bottom. I recommend using a data product catalog to make these data products easily discoverable across the business.

In Vista’s case, the data mesh approach ensured product-first thinking. In turn, product-first thinking ensured Vista was focused primarily on creating data products that drove business value instead of being focused on the data itself. And having purpose-built, cross-functional teams own these data products enabled Vista to be nimbler and more autonomous by having all of the domain context and expertise within the team.

How a self-service, API-driven data platform supports a data mesh

As you may have noticed, my coauthors and I have not yet mentioned any specific technologies when defining a data mesh. This is because a data mesh does not require any specific technologies. A data mesh is an organizational approach rather than a strictly technical one. It produces business value by organizing teams specifically focused on data products. These data products can be anything from a csv file to an executive dashboard to a product recommendation engine.

That said, a cross-functional team that creates data products can create these products faster and more easily if it is enabled by a data platform that is self-service and API-driven. Cimpress, Vista’s parent company, works closely with Vista on the research and development of a central data platform and data mesh control plane. These both facilitate Vista’s data product creation and management.

How Vista created a data mesh

To ensure that Vista’s data product teams have access to best-in-class solutions, Cimpress uses a combination of AWS managed services, free and open-source software (FOSS) running on AWS, and solutions available in AWS Marketplace.

The following diagram shows how the Vista business domains on the left and right contribute data products to a data product catalog in the middle, which is provided by the data mesh control plane at the bottom.

On the left are three Vista business domains,  Manufacturing & Supply Chain, Channel Marketing, and Customer Experience & Care. The left column also shows examples of those domains’ associated data products, such as Carrier SLA Analytics in the Manufacturing & Supply Chain domain and the Agent Performance Dashboard in the Customer Experience & Care domain.

The right side shows three additional domains, Pricing & Promotions, Product & Personalization, and Customer & Business Performance Monitoring, each with three related data products.

Both sides feed into a Data Product Catalog in the middle, which facilitates data product usage across business domains.

The data product catalog is shown to be one component of the larger data mesh control plane, which provides data product management and governance, data contracts and quality assurance, and data mesh observability. Shown below the data mesh control plane is the Cimpress data platform, which provides self-service access to a set of tools that data product owners can use to build data products.

Cimpress’ data mesh control plane

The Cimpress data mesh control plane provides centralized capabilities that are required to organize a data mesh architecture between Vista’s business domains. The main components of Vista’s data mesh control plane are:

  1. Data Product Management and Governance: Responsible for the data products’ creation and lifecycle. It has modules for data and resource access, roles, and user management. This component is crucial to ensure that data is only accessed by authorized users and groups.
  2. Data Contracts and Quality Assurance: Responsible for the quality and access assurance between the data products. It runs frequent data product producer or data product consumer-defined quality assurance rules, ensuring data consistency, freshness, and security. This in turn builds trust between users of the data mesh and drives further adoption.
  3. Data Mesh Observability: Responsible for monitoring all of the platform and data resources. This acquires their logs and metrics to common data stores that enable analytics of usage, auditing, automated recommendations, and optimizations of the resources in all the system layers.

How Cimpress’ managed data platform works

In addition to the data mesh control plane, Cimpress also provides a managed data platform. The following diagram shows the data platform is comprised of six elements, most of which Cimpress procured and deployed from AWS Marketplace:

  1. Data ingestion: Fivetran is a data pipeline service that performs bulk ingestion from relational databases and third-party connectors. Cimpress also built a data ingestion component called River. For more information on how Cimpress built River, read How Cimpress Built a Self-service, API-driven Data Platform Ingestion Service.
  2. Data warehouse: Snowflake creates dedicated, on-demand virtual data warehouses for each data product or data domain.
  3. Data transformation: Databricks provides isolated data analytics notebook environments to data scientists and ML practitioners. Matillion extracts, transforms, and loads (ETL) jobs where users prefer a simple graphical interface to build pipelines.
  4. Metadata catalog: Amundsen, an open-source tool running on AWS services, provides the metadata store and catalog.
  5. Data orchestration and modeling: This enables creation and usage of isolated Apache Airflow environments with Directed Acyclic Graphs (DAG)s and data models responsible for data production.
  6. Data modeling and visualization: Looker creates business intelligence dashboards and provides data modeling capabilities. Refer to the following diagram.

All of these data platform components work together and are self-service-accessible to data product creators either through an API or a graphical user interface. This in turn enables Cimpress’ new data product teams to create and share new data products faster while maintaining high quality. They can also avoid the gatekeeping processes that had previously slowed them down when using an on-premises data warehouse.

The Cimpress data platform team also works closely with the Vista data product teams to shape the roadmap for the Cimpress Data Platform itself. The focus is on selecting technologies and capabilities based on the requirements of the data product teams. Additionally, having access to thousands of partner solutions in AWS Marketplace makes it easier for the Cimpress data platform team to meet the requirements of the data product teams.

Business impact

To date, Vista has created over one hundred data products in its data mesh, including:

  • a customer-facing product recommendation engine that led to a 5% profit increase,
  • customer care dashboards that reduced manual reporting efforts by 95%,
  • a customer industry classifier that led to a 15% increase in clickthrough rates.

Conclusion

Vista’s data mesh approach helps the company deliver business value. By creating dedicated cross-functional teams, Vista uses domain-specific contexts to create data products that increase engagement and profit. It also reduces manual effort. These data product teams use the self-service Cimpress data platform to speed its data product creation and management processes. The Cimpress data platform team uses the flexibility afforded to them by AWS Marketplace to simplify procurement of multiple partner solutions that make up its data platform.

Next steps

To learn more about Vista’s data mesh story, read these posts:

To learn more about the concept of data mesh, read the original blog post by Zhamak Dehgani, who coined the term data mesh, and keep up-to-date with data-mesh-related AWS blog posts here.

About the authors

Ethan Fahy is an Enterprise Senior Solutions Architect at AWS based in Boston, MA. Ethan has a background in geophysics and enjoys building large-scale, cloud-native architectures to support scientific workloads.
Michał Zasadziński is a Principal Software Engineer at Cimpress and leads architecture and software engineering in the Cimpress Data Domain. He is passionate about building data systems that scale and impress users. He holds a distinguished PhD in root cause analysis and failure prevention in distributed computing. His second means of transport is windsurfing.
Atilla Tunçelli, Principal Software Engineer at Cimpress, is engineering lead on the Data Domain of Cimpress. He is responsible for technical excellence and architecture of products, leading high-performing software engineering teams, and being a technology advisor in the data space. He has 18 years of hands-on experience in IT industry in broad areas such as real-time processing, embedded systems, image processing, data, and e-commerce. He holds a master’s degree in Computer Science from the University of Bahcesehir, Istanbul. He is also a part-time drummer.
Christopher Bova, Data Platform Director at Cimpress, leads the data function at Cimpress Technology. He is passionate about building teams and tools that allow Cimpress businesses to turn their data into strategic insights and business results. He is based in Arlington, MA, is a father of two, and makes a pretty tasty scratch pie crust.