Best practices for CDP design and implementation using AWS and Databricks

This post is co-authored by Steve Sobel, Global Industry Leader for Media & Entertainment at Databricks and Dan Morris, Technical Director, Communications, Media & Entertainment at Databricks.

Media and entertainment companies are dealing with more data than ever before and increasingly struggle to gain the insights needed to grow their business.

This deluge of data results from an increasingly direct relationship between media and entertainment companies and their customers, with many of them entering the competitive direct-to-consumer (D2C) market. In 2020, US media consumers viewed 26% of their video content over IP versus 25% of broadcast content (Nielsen Research, published in Forbes).

Even in broadcast, the imminent rollout of ATSC 3.0 will bring addressability and the need for a customer data strategy/infrastructure.

Organizations look to CDPs as a means to differentiate

Understanding your customer, your content, and your customer experience are crucial to delight customers, drive advertising revenue, reduce content costs, increase lifetime value of customers and content, attract new customers, increase subscriptions and reduce churn, and innovate and accelerate outcomes. Many media companies have adopted, or are in the process of adopting Customer Data Platforms (CDP) to help them collect and better analyze their customer base. A CDP is software that collects and unifies customer data — from multiple sources including first- and third-party — to build a single, coherent, complete view of each customer. However, CDPs are both one of the most disappointing investments brands made in 2020 and the number one investment they want to make in 2021, according to Gartner.

Comparing “build” vs “buy” approaches for CDPs

When considering investing in a CDP, it’s important to view the CDP, and its corresponding capabilities, as an extension of a strong data management strategy and lake house architecture — not an alternative. By adopting this view, the value of your CDP investment will be amplified through interoperability. For example, use cases such as content valuation that require user level data, but fall outside the remit of a CDP, will be easily enabled as the data created and harmonized in your CDP will be readily accessible in your lake house, and through the managed services provided by AWS and Databricks. Similarly, with connective tissue between your lake house and your CDP, your data teams will be empowered to infuse their domain knowledge, insights, and models into the customer interactions that are powered by your CDP.

For companies who have decided to invest in a CDP, the main decision is which components to build versus which components to buy or license. Even off-the-shelf CDP platforms require M&E customers to have a strategy for where important data resides and who has access to the core data versus access to the insights. As a general rule, we recommend keeping the important data in company managed data lakes built on Amazon Simple Storage Service (Amazon S3). The control, access, and management of 1st party customer data, including Personally Identifiable Information (PII) is not only a significant competitive advantage for brands, it’s also a regulatory necessity. New data protection regulations, such as GDPR and CCPA in California, force enterprises to know where all of their data is housed and how they are keeping it safe.

Even when companies have CDPs, we hear that data sets and insights are still being used individually by teams or company divisions but not being aggregated sufficiently to have a comprehensive view across divisions, platforms, content types, etc. Ultimately, customers are concerned about where their data resides and who can leverage it effectively.

And finally, third-party cookies, which have been a staple of media marketers and in media advertising spends, will soon be gone. Firefox and Safari already block third-party cookies by default and as of summer 2021 Google will end support for third-party cookies in Chrome (this date is a moving target). Businesses that use third-party cookies to track customer journeys increasingly need to think of alternatives to reaching their target audiences.

In general, customers take one of 3 approaches:

1. Fully integrated suite CDPs (Adobe, Salesforce)

This approach is best for customers who want one tool, want it quickly, and have little or no concern for cost. Customers need the appropriate budget and dedicated data engineering resources for integration and stand up. A soft cost here is the need for internal alignment as out-of-the-box solutions require some end users to accommodate the platform’s tools and workflows. Hard costs include paying for the whole platform even if you are only using a portion as well as lock-in to a single vendor for features and pace of innovation.

Fully Integrated Suite CDP

2. Modular CDPs (using TapAd for identity graph, Amplitude for activation, QuickSight for dashboards, etc.)

This hybrid approach allows customers to pick which aspects of the CDP they build, and which they buy/license. This provides the potential for the “best of breed” tools for each use case and allows customers to avoid spending money on features they do not need. It also allows customers to add functionality as needed with their expanding needs. Customers will still need to dedicate data engineering resources, and in most cases this will be a larger time commitment than the full stack option. Customers can more directly control CDP costs but may spend more time reviewing and contracting with vendors.

Modular CDP

3. Fully DIY: AWS + Databricks end-to-end

The final option is for customers to build the entire CDP themselves on top of their existing lake house (AWS + Databricks) foundation. This is for “builders” who have the budget and the internal resources. The upside is complete flexibility, data control, and workflow management. Customers can build at the pace they require and leverage testing to make data-driven development decisions.

Designing a CDP that fits the entire organization

As we’ve mentioned, the CDP should be an extension of the data lake house, not a standalone application. What this means in practice is the necessity of selecting and implementing an architecture that works for the main stakeholders – those tasked with building and maintaining the CDP as well as those who will use it.

Oftentimes, we see that the root source of CDP disappointment is that the needs of one of these main stakeholders weren’t considered before implementation – for example, an IT organization implementing a CDP without regard for supporting all the use cases marketing requires, or a marketing organization stitching a CDP from several third-party providers that can’t be supported by the IT organization long term.

Stakeholder

Stakeholder	CTO/CIO organization	Marketing and product teams	Data scientists and analysts
Remit	Builds and maintains the CDP as part of the lake house	Use CDP to execute vision for a better, more personalized customer experience	Leverage the customer data for use cases that fall outside the remit of a CDP (e.g. content valuation, product analytics, content recs)
Concern	-Better alight with the guiding principels of the CTO/CIO -Reduce vendor lock-in by maintaining core logic internally -Use the right tool for the right job (e.g. ETL, object store, etc.) -Additive and future proof -Mitigate need for multiple copies of the same data -You own your data -Reduced TCO for all the reasons above	-Managing general-purpose segment definitions in one place will lead to a more consistent user experience -Garner support and alignment from technology organizations + data teams (including data scientists) -Improve data quality and completeness more easily	-More easily contribute models and model output for specific use cases — while also leveraging data that lives outside the CDP (e.g. content metadata for segments)

Getting started

There is no one “right approach” when it comes to building a CDP – it depends on where your company is in its cloud journey, the scope of your business needs, and the resources that you have available. If you do not have the proper internal resources, specifically data engineering and data science talent, but have the budget, then a full stack CDP approach is best. However, as internal data capabilities mature within an enterprise, then it makes sense to outsource some development to AWS Consulting Partners with expertise in building media CDPs with AWS and Databricks buying best-in-breed specific functions while building core functions, such as personalization or segmentation, in house.

We have customers that have taken all of these approaches, or started with one and then moved into another as they hired more data team resources. What’s important is that with AWS and Databricks at the core, you can build a modern CDP that can evolve with your business and scale to your users and use cases.

If you are further along in your journey, and have opted for a modular, or DIY approach, below are advanced use cases we have built to take full advantage of your CDP:

AWS for M&E Blog