AWS for Industries

Rapidly experimenting with Catena-X data space technology on AWS

The automotive industry possesses one of the most complex value chains, with tens of thousands of companies involved in making a car. Original equipment manufacturers (OEMs) typically lack visibility into their supply chain beyond Tier 2 suppliers and they are often unaware of their products’ journey after they are delivered to third parties, including downstream partners like car sharing providers and dealerships. One of the main reasons for this lack of visibility is non-standardized data exchange protocols and semantics between those companies. Expensive translation of data formats and semantics at each step in the value chain are necessary for data to flow. Additionally, the resulting data siloing makes it more challenging for OEMs to comply with regulations such as the amended EU Battery Regulation bringing the new EU Battery Passport requirement, the EU Corporate Sustainability Reporting Directive, or the German Supply Chain Due Diligence Act.

The Catena-X Automotive Network e.V. (Catena-X) aims to help solve some of these issues by supporting the automotive industry’s transition from a peer-to-peer data exchange to an interoperable, transparent solution using open standards and collaboration. Catena-X is an automotive consortium formed by European automakers and their suppliers focused on creating an automotive-specific “data space”, i.e. a space for decentralized data sharing. The Catena-X data space is designed to improve information flow across the automotive value chain. Catena-X promotes a federated architecture, similar to data meshes, where data providers retain control, sharing only metadata through a decentralized catalog. Data exchange via Catena-X occurs directly between partners, adhering to usage policies that are contractually agreed on. Amazon Web Services (AWS) joined Catena-X as a member in January 2023 to help enable automotive customers to participate in the Catena-X data space. Noteworthy initiatives of AWS APN Partners related to Catena-X include guidance from Think-it, on building data spaces for sustainability use cases, and T-Systems’ Managed Connect Services for Dataspace Connections on AWS.

To help customers explore data space technology while establishing solutions to address use cases like product carbon footprint, parts traceability, or digital product passes, AWS released the sample code Minimum Viable Dataspace (MVD) for Catena-X on AWS. This sample code provides automotive customers with a single-command access to Catena-X’s APIs and technology stack in an AWS sandbox environment. This blog post covers how to deploy and use this sample code with two fictional data space participants, Alice and Bob, using Eclipse Tractus-X Dataspace Connector (EDC) connectors to perform data space operations like data asset creation, contract negotiation, and data transfer.

Overview of solution

The Tractus-X EDC connector is an important Catena-X data space component, acting as the interface between an enterprise’s data assets and the data space. It lets members advertise data assets, negotiate terms for consuming them, and transfer data between participants.

The two connectors contained in the “MVD for Catena-X on AWS” sample on GitHub are deployed to a cluster on Amazon Elastic Kubernetes Service (Amazon EKS), a managed Kubernetes service. Each connector uses an Amazon Aurora PostgreSQL database to persist application metadata, such as assets, policies, and negotiations. Actual data assets are stored in buckets on Amazon Simple Storage Service (Amazon S3), an object storage service. The password-protected connector APIs are exposed to the Internet through Elastic Load Balancing, specifically a network load balancer (NLB). All data is encrypted in transit and at rest, with resources isolated within an Amazon Virtual Private Cloud (Amazon VPC). Figure 1 illustrates the solution architecture.

Figure 1 Minimum Viable Dataspace (MVD) for Catena-X on AWS architectureFigure 1: Minimum Viable Dataspace (MVD) for Catena-X on AWS architecture

On top of this infrastructure, the MVD sample deploys Catena-X’s Minimum Tractus-X Dataspace (MXD) with basic components for a Catena-X compatible data space: A central identity wallet and Keycloak instance, as well as vault instances and two EDC connectors for the data space’s participants, each with a data plane and control plane container.

Data space deployment walk-through

This section shows how you can experiment with the MVD in your own AWS account. For this, you will need the following:

  • AWS account;
  • AWS Command Line Interface (CLI), a unified tool to manage AWS services; and
  • Local installation of Terraform, kubectl and Git.

Step 1: Clone the MVD repository

git clone https://github.com/aws-samples/minimum-viable-dataspace-for-catenax.git

Step 2: Follow the deployment and setup instructions in the file docs/data-exchange-tutorial.mdin the cloned repository.

Example data exchange

The docs/data-exchange-tutorial.md file also contains a scenario in which Alice and Bob exchange data within the MVD. Alice offers a file stored in her Amazon S3 bucket to Bob. Her goal is to ensure that the file can be consumed only by Bob, and she wants Bob to know that he can use the contents of the file only while he is an active member of the data space. To achieve this, Alice has to create objects within her instance of the data space connector. She needs an Asset that represents the file. To restrict access and usage of the data, she has to define Policies, and she has to associate the Policies to the Asset by creating a Contract.

After Alice has created these objects by calling her connector’s control plane API, Bob can see the Asset and the Contract in Alice’s data catalog. He can then initiate a contract Negotiation for the Asset, which results in an EDC Contract Agreement. Based on this Contract Agreement, Bob can initiate a Transfer of the file represented by the Asset to his own Amazon S3 bucket. This domain model is a simplified version of the full domain model on the Tractus-X website. Figure 2 illustrates the relationships and responsibilities of the objects. The numbers in the figure indicate in which order the objects are created.

Figure 2 Partial MVD domain modelFigure 2: Partial MVD domain model

The objects shown in Figure 2 can be managed by calling APIs of the data space connectors’ control plane, which we deployed in the data space deployment section. The tutorial file on GitHub provides those API calls in the form of cURL commands. To learn more about the API, have a look at the Tractus-X EDC OpenAPI definition. Follow the instructions in the docs/data-exchange-tutorial.md file to try out the scenario with your deployment of the MVD.

Conclusion

This blog post shows how to deploy the “MVD for Catena-X” sample code to an AWS account in a sandbox environment and experiment how to use the Tractus-X EDC API to help create data assets, policies and contracts, how to negotiate contracts, and how to transfer data between Amazon S3 buckets. The MVD serves as a starting point for Catena-X onboarding, API implementation, and prototyping of use cases. Explore the Eclipse Tractus-X MXD’s tutorial resources if you want to explore further data space use cases.

While the EDC connector and this walkthrough handle technical data exchange, solutions for data space use cases would deploy additional business logic adjacent to the connector, either directly to the Kubernetes cluster, or separately. Have a look at Tractus-X’s Managed Simple Data Exchanger or Digital Product Pass reference applications for ways to get started.

Florian Seidel

Florian Seidel

In his role as Global Solutions Architect for a strategic automotive customer at AWS, Florian advises customers on how to unleash the full potential of the AWS Cloud. His diverse interests span analytics, machine learning, AI, and developing resilient distributed systems on AWS.

Jonas Bürkel

Jonas Bürkel

As Senior Solutions Architect at AWS based in Germany, Jonas works with customers in the manufacturing industry to help build their solutions in the cloud to meet unique business and technical requirements. He is passionate about data ecosystems and industrial decarbonization.