Implementing an Operational Data Mesh with Palantir Foundry on AWS to Transform Your Organization

By Keith Kelly, Head of Solutions Architecture – Palantir
By Francesca Ferretti, Forward Deployed Architect – Palantir
By Mehmet Bakkaloglu, Principal Solutions Architect – AWS

Palantir

Enterprise data architectures and strategies have become more scalable, performant, and cost effective over the past 30 years. They have evolved from centralized delivery paradigms to providing consumption-oriented data modeling that enables easier data visualization and insights.

Today, data architectures and strategies continue to respond to the need for discoverability and consumer desire to directly connect with producers.

Data mesh is one such approach, originally defined by Zhamak Dehghani, and provides a methodology for how organizations can organize around data domains by delivering data as a product. Data mesh has four principles: domain-driven data ownership, data as a product, self-service infrastructure, and federated computational governance.

In this post, we discuss how Palantir Foundry runs on Amazon Web Services (AWS) to help customers deliver and transform their data architectures through such an approach while leveraging and building on existing investments.

Palantir is an AWS Partner and AWS Marketplace Seller that builds software that lets organizations integrate their data, decisions, and operations.

How Does Data Mesh Help?

Data mesh defines a new generation of highly available data architectures—not only availability in terms of service, but the availability of products to end consumers and the availability of a variety of self-service tools to producers.

Service Availability

Data mesh inherits good principles from ancestor architectures like data lakes and joins them with compelling new requirements. For example, while volume, velocity, and variety are still valid conditions of modern data products and require an elastic hyper-scalable infrastructure, a successful data product strategy implicitly requires a certain level of democratization.

This democratization lies in the decentralization of certain historically-centralized responsibilities, such as infrastructure and security management. Specifically, data product teams require total ownership over their products in order to meet infrastructure needs, and must have the autonomy to grant or prevent access to available services.

Such democratization is also referred to as “federated computational governance” and can be accomplished only if the data platform provides such infrastructure and security management features that can be implemented with minimal to no technical skills.

Palantir Foundry on AWS provides an optimized version of Kubernetes, Rubix, that enables the underlying AWS infrastructure to scale up and down according to needs, without necessitating involvement by a central IT operations team. Foundry also allows data product teams full autonomy to set computational resources to specific data product workloads, leveraging Foundry-hosted spark-profiles and environment-configurations.

A multimodal platform security framework that combines different access control mechanisms (such as role-based, classification-based, or purpose-based) ensures data product teams are able to monitor who is accessing what at all times.

Availability to End Consumers

Data products must be available for consumers at all times. This one simple concept underscores the need for data products to be easily consumable through different ports (interoperable and addressable), to speak the language of consumers (discoverable and self-describing), and to be trusted and secure.

Palantir Foundry implements any data product (whether a dataset or model) as a REST API, which serves as the standard output port for product consumption. By default, data products are rendered through multiple interoperable ports, such as API and the ontology.

The ontology exploits an object storage layer powered by AWS infrastructure, which makes the data product easy and fast to interact with when the consumption is happening through the vast set of Foundry end-user applications, ranging from visualization and dashboarding to time-series analytics.

For AWS customers, a new feature has been made available that allows interacting with Foundry datasets through an Amazon S3-compatible API as though they are S3 buckets.

In general, these data products and their output ports are discoverable through different capabilities within Foundry from a Smart Source, which scans any digital resource existing or connected to Foundry, to more curated forms like Data Catalog, Ontology Manager, or Modelling Objectives.

Self-Service Availability

One of the key principles of data mesh architecture is to provide a platform-level self-serve data infrastructure. This allows producers and consumers to choose the tools they are most comfortable using.

An ideal data platform must be one that organizations can build on top of; not just inside of but on top of an existing toolchain to maximize past investments, while still guaranteeing full interoperability with all data products, including old and new technologies.

For this reason, Palantir Foundry on AWS includes standard interoperable output ports as well as a number of discovery and input ports to leverage Foundry as the unique data mesh supervision plane while integrating data products originating from other systems.

With 200+ native connectors developed over the years (among them Amazon S3 and other AWS services) and massive research and development investments for user-friendly data integration, Foundry offers a unique data exploration and integration tool for standard enterprise resource planning (ERP) and customer relationship management (CRM) originated data products known as HyperAuto.

Beyond Data Mesh: Foundry Operational Data Mesh

Palantir Foundry on AWS provides everything needed to deliver a successful data strategy that fulfills all data mesh principles. This post is focused on specific features as others are covered elsewhere.

Figure 1 – Palantir Foundry on AWS.

Principles alone are not sufficient to fully operationalize and ensure the business impact of your data mesh. The key challenge that data mesh ultimately addresses is to provide more effective operational efficiency in data-driven decision-making across the full organization at scale.

The four principles of data mesh all support this, but as with other paradigm shifts applying the principles is not a guarantee of successful execution or transformation. The most challenging part is transitioning the organizational structure, mindset, and processes at the operational level.

Data mesh transformation requires the right supporting technology, but it should not be driven as a technology project. There’s a danger in extrapolating the success of any small-scale or isolated proof of concept (POC) as they do not demonstrate better operational scalability than previous paradigms. To keep this front and center, Palantir recommends focusing on the operational data mesh.

The operational data mesh cannot be a POC or isolated initiative focused on a discrete part of the organization or on isolated systems. It democratizes access to all data from across the breadth of the organization, facilitating decision-making and digital continuity across full enterprise value chains and at all levels of enterprise management. Metrics should be put in place as soon as possible to baseline and measure scope and efficiency.

An operational data mesh program should be able to show quick and sustainable progress of such metrics, as well as increasing operational business impact. Palantir recommends “transitioning” people, process, and tools towards an operational data mesh of ever-increasing maturity rather than the “deployment” or “creation” of a data mesh which generally indicates a technical project.

Nearly two decades ago, AWS realized the way to scale system hardware sustainably without exploding integration and management costs was to abstract and standardize as much as possible. Likewise, to scale sustainably without exploding integration and management costs, the operational data mesh should abstract and standardize data product delivery and consumption. This inevitably leads to accelerated innovation for business impact as enterprise budget and resources are used for higher-level purposes instead of lower-level technical activities.

Foundry provides this kind of higher-level abstraction and does not require technology skills to produce or consume data, allowing producers and consumers to work directly together rather than through intermediaries.

Figure 2 – Palantir Foundry producers and consumers.

Operational data mesh also brings semantic, kinetic, and dynamic elements of your business together and facilitates closing the feedback loop between insights and decision-making. It’s essential to be able to capture decisions and feed them back into the generation of decision support insights, especially in the current era of rapidly progressing artificial intelligence.

The operational data mesh grants autonomy to producers and consumers so they can interact together within policy but without intermediary teams. Foundry enables close collaboration between all producers and consumers, from operational systems developers and advanced analytics teams to business operations and executives.

Benefits of implementing with Foundry include:

Makes it easy to discover and provision the data (democratization) without sacrificing the governance controls required to protect data from misuse.
Enables control and management by teams generating data as a byproduct of their domain responsibilities (and best understand its contents and sensitivity).
Allows each business team to be able to do analytics in their bespoke environment, while also collaborating with teams in different analytical/operational environments.
Connects the output of analytics and models to the end decision maker, or operational users in the field across various teams and roles, but consolidates the findings, knowledge, and performance of actions to learn with discipline.

Palantir Foundry and AWS

Palantir Foundry and AWS together provide flexibility at each step of the ontology hydration. Data, models, and applications can be synchronized between the two to achieve operational connectivity.

For instance, users can connect to the ontology from Amazon SageMaker in their AWS account via the Foundry Python SDK to build and train machine learning (ML) models. The models deployed to an endpoint can then be registered in Foundry Catalog and invoked by new data products developed in Foundry or consumed in Foundry applications.

Foundry Catalog can be enriched with model details by connecting to the Amazon SageMaker Model Registry. Additionally, Foundry and AWS interoperability allows AWS artificial intelligence (AI) services to be invoked directly in Foundry, taking advantage of ready-made intelligence.

Figure 3 – Palantir Foundry’s Amazon SageMaker integration (example 1).

In the second example shown below, users can build cloud-native applications on top of the ontology utilizing services such as Amazon API Gateway, AWS Lambda, and Amazon DynamoDB. The ontology APIs (OPIs) allow the application to wield and consume the ontology in a first-class way. OPIs can use actions defined on the objects, including complex logic, models, and writeback.

Figure 4 – Cloud-native application on top of ontology (example 2).

These are just two examples of how the ontology can be hydrated and wielded with AWS services. Note that AWS services can also be used for data preparation, visualization, and analytics.

Conclusion

Palantir Foundry on AWS allows customers to transition progressively but rapidly to an operational data mesh paradigm. This allows enterprises to leverage their existing investments for rapid value, and provides the flexibility to rationalize and optimize the technical landscape. It’s implemented in parallel to the organizational and process changes required to scale and fully embrace the operational data mesh paradigm.

Several of Palantir’s largest customers across financial, manufacturing, utilities, and other industries have already done this while delivering continuous and ever-increasing business value.

To find out more about how you can transition to an operational data mesh, please contact Palantir. You can also learn more about Palantir in AWS Marketplace.

.

.

Palantir – AWS Partner Spotlight

Palantir is an AWS Partner that builds software that lets organizations integrate their data, decisions, and operations.

Contact Palantir | Partner Overview | AWS Marketplace

AWS Partner Network (APN) Blog

Implementing an Operational Data Mesh with Palantir Foundry on AWS to Transform Your Organization

How Does Data Mesh Help?

Service Availability

Availability to End Consumers

Self-Service Availability

Beyond Data Mesh: Foundry Operational Data Mesh

Palantir Foundry and AWS

Conclusion

Palantir – AWS Partner Spotlight

Resources

Follow

Learn

Resources

Developers

Help