AWS for Industries

How differential privacy helps unlock insights without revealing data at the individual-level

In today’s data-driven world, organizations are constantly seeking ways to extract valuable insights from their data assets, especially when collaborating with their partners. Companies across industries such as advertising, healthcare, media, entertainment, finance, insurance, and others rely on insights generated from first- and third-party datasets to develop new products and services, improve business decision-making, assess the impact of marketing campaigns, and increase revenue opportunities. For example, a pharmaceutical company might want to analyze patient data with an academic hospital to assess the efficacy of new drugs, or an auto insurer might want to help complement insurance policies with market insights about a driving population. These datasets often contain information about individuals that can be extracted by comparing aggregated statistics. In the advertising industry, for example, an advertiser who is querying ad impressions data from a publisher can learn which users viewed an ad by adding or removing users from their analysis one by one and finding the difference in query results.

In this blog post, we outline what differential privacy is, the applications of this proven framework, and challenges to applying it effectively. You will learn about AWS Clean Rooms Differential Privacy, how this new capability makes it easy for you to apply differential privacy and protect the privacy of your users, as well as common use cases across industries.

Overview of differential privacy

Differential privacy is a mathematically proven framework for data privacy protection. The primary benefit of differential privacy is protecting data at the individual level by adding a controlled amount of randomness to obscure the presence or absence of any single individual in a dataset that is being analyzed. This makes certain that the addition or removal of any individual’s data from the dataset cannot be detected. By introducing controlled noise or randomness into the query results, differential privacy effectively masks individual contributions while maintaining the query results accurately enough to provide meaningful insights.

Differential privacy also has another component called privacy budget. The privacy budget is a finite resource that is consumed each time a query is run and thus controls the number of queries that can be run on your datasets, so that the noise in query results cannot be averaged out to reveal any private information about an individual.

Applications of differential privacy

Differential privacy can be applied to various types of data analysis based on the customer use case.

Using SQL, differential privacy can be applied by adding noise to the output of the query answers. For example, when an analyst in the advertising industry runs a SQL query on user interactions on protected data by differential privacy to measure the reach of an advertising campaign, the SQL engine introduces a carefully calibrated amount of noise to the query result at runtime. This approach makes certain that the unique reach of the ad campaign can be determined across devices and platforms while preserving individual user privacy.

In machine learning, differential privacy can be applied to models by adding noise to the model parameters such as gradients.1 For example, when a data scientist in the healthcare field trains a predictive model for patient outcomes, they can employ differential privacy by introducing controlled noise to the gradients during the model training process. This helps them preserve the individual patient data while still permitting their use for accurate predictions.

Another emerging application is differentially private synthetic data generation, where the input dataset is transformed by a dynamic programming algorithm into a synthetic dataset that preserves statistical properties of the original dataset. This helps downstream data scientists and analysts perform their analyses on the synthetic dataset instead of original dataset, while the data owner can protect the privacy of their customers.2

Implementing differential privacy

Differential privacy is quite a sophisticated technique, and applying it is not straightforward. It demands a deep understanding of complex mathematical formulas and theories to apply it effectively. Configuring differential privacy is a complex task because data analysts must calculate the right level of noise to inject to datasets. This noise should preserve the privacy of their users without compromising or negatively impacting the utility of query results. Additionally, companies using this technique may want to enable their partners to conduct a wide variety of analyses, including highly complex and customized queries, on their data. However, supporting such a requirement with differential privacy is difficult due to the intricate nature of the calculations involved in calibrating the noise while processing various query components like aggregations, joins, and transformations.

AWS Clean Rooms Differential Privacy: empowering privacy-enhanced data collaboration with flexible and configurable privacy controls

That’s why we built AWS Clean Rooms Differential Privacy, now generally available, to help you protect the privacy of your users with mathematically backed controls in just a few steps.

AWS Clean Rooms Differential Privacy provides intuitive controls to configure the noise level and the privacy budget, helping you estimate how many queries your partners can run in a data collaboration and adjust these controls as you see fit to meet each partner’s needs. AWS Clean Rooms Differential Privacy does not require any additional configuration or setup from the collaboration members to query data, making it simple for them to continue using data collaborations as they did before.

Customers are using AWS Clean Rooms Differential Privacy to more confidently and easily permit their collaboration partners to analyze their data while decreasing reliance on preapproving or auditing queries. Using AWS Clean Rooms Differential Privacy, they aim to scale existing partnerships and strengthen new ones where there is less established trust. They also appreciate that, using AWS Clean Rooms Differential Privacy, they can permit ad-hoc queries on their data because it protects user-level data by automatically adding a carefully calibrated amount of noise in query results at runtime.

How differential privacy works in AWS Clean Rooms

You can set up AWS Clean Rooms Differential Privacy by applying a custom analysis rule in your AWS Clean Rooms collaboration. Then you can configure AWS Clean Rooms Differential Privacy with controls that are flexible to your specific business use cases and can be applied in just a few steps. AWS Clean Rooms Differential Privacy makes it easier for you to enable differential privacy in AWS Clean Rooms collaborations with a few simple choices—all without requiring any additional expertise or setup from your partners.

Use cases for AWS Clean Rooms Differential Privacy

  • Optimize media planning – Advertisers can plan their advertising spend by determining user overlap with marketing partners without revealing which customers are in common.
  • Expand advertising campaign measurement – Advertisers can measure the return on investment (ROI) of marketing with a media publisher to optimize campaigns based on aggregate advertising insights.
  • Improve auto insurance assessments – Auto insurance companies can complement policy creation with market insights about a driving population without revealing data about individuals.
  • Accelerate biomedical research – Healthcare companies can advance clinical research insights by collaborating with medical institutions without revealing information about an individual patient.

Conclusion

As organizations continue to evolve, increase, and scale their data privacy, security, and governance approaches, and the need for privacy-enhanced data collaboration with their partners, differential privacy emerges as a powerful technique to generate insights without revealing information at the individual level. AWS Clean Rooms Differential Privacy helps protect privacy with mathematically backed and intuitive controls in just a few steps. If you’d like to learn more about use cases, benefits, or customers of AWS Clean Rooms Differential Privacy, check out our website and contact an AWS Clean Rooms expert.

 Additional resources

References

1 Deep Learning with Differential Privacy, CCS’16 October 24-28, 2016, Vienna, Austria.

2 Private Synthetic Data for Multitask Learning and Marginal Queries, 36th Conference on Neural Information Processing Systems (NeurIPS 2022).

Amit Choudhary

Amit Choudhary

Amit Choudhary is a Product Manager for AWS Clean Rooms Differential Privacy. He loves to build products that make it easy for customers to use privacy-preserving technologies (PETs) such as differential privacy.

Allison Milone

Allison Milone

Allison Milone is a Product Marketer for the Advertising & Marketing Industry at Amazon Web Services.

Jonathan Harmms

Jonathan Harmms

Jonathan Harmms is a Product Marketer for the Advertising & Marketing Industry at Amazon Web Services.

Sergul Aydore

Sergul Aydore

Sergul Aydore is a Senior Applied Scientist at AWS working on Trustworthy Machine Learning.