How Pinterest worked with AWS to create a new way to manage data access: Part 1
This is the first of a two-part series on how Pinterest and AWS forged a grassroots collaboration to solve a complex data access problem.
In November 2020, Pinterest sent a note to AWS reporting something “surprising and inspiring:” Pinterest had begun using its new data access control system, built on Amazon S3 Access Points, to amplify underrepresented voices on the Pinterest platform.
The message capped off a collaboration between AWS and Pinterest that had begun 22 months earlier, with a seemingly routine meeting, a puzzling technical challenge, and a group of engineers who refused to give up on a problem that needed to be solved.
Note: The launch for S3 Access Points aliases on 7/26/2021 makes it easy to use an access point for any application that requires an S3 bucket name.
Controlling data access at the individual level
Pinterest and Amazon S3 developer teams got together in February 2019 to discuss new ways to ensure that specific data would be used only for the purposes Pinterest intended. The Pinterest team also needed to make sure that the rapidly expanding scale of Pinterest’s data use wouldn’t outgrow the access controls provided by the company’s current data management system, built on Amazon S3.
At the time, Pinterest’s storage was rapidly growing. Pinterest protected confidential data by creating clusters in Amazon S3 and assigning access permissions to those clusters for groups of users. Pinterest in particular wanted its developers to be able to grant data access to specific users and processes while blocking access to all others.
A complementary collaboration
Everyone agreed that Pinterest needed a fine-grain access control system, and while many people had ideas about what access controls were needed, there wasn’t much consensus about “how.” Should we build a fine-grained access control system using Amazon S3 capabilities and, if so, which ones? Or should Pinterest develop a completely new system? There were arguments for and against every option. While the “how” remained an open question, everyone believed that this was an important problem to solve.
Pinterest prioritizes Pinner trust and worked with AWS closely. The teams knew that the problem that Pinterest faced today would be a problem many organizations would face tomorrow, and it was important to carefully think through.
“This was an opportunity to walk in the shoes of the customer and see the constraints they’re dealing with,” said Doug Youd, AWS Senior Solutions Architect. “Working closely with Pinterest would give us a view into real world problems and how customers went about solving them. All of us on the project were aware that we were working on a problem that Pinterest had defined, but we were innovating for everyone.”
Pinterest and AWS had complementary strengths, and the two companies’ appreciation for each other gave the project its glue. This was important for a grassroots effort that brought together engineers from two different companies in a problem-solving venture fueled by curiosity, forward-thinking, and customer obsession.
Where to go next
One approach that Pinterest had considered was creating its own permissions mechanism that would sit on top of the AWS infrastructure and that Pinterest would manage itself. However, this seemed like a risky option because the solution would take a lot of time and resources to build. Plus, Pinterest would have to take on management of a critical system and be able to scale it reliably, with no downtime, as the company grew.
Pinterest preferred, instead, to create a solution that took advantage of Amazon S3’s capabilities, as this seemed to be the fastest path to getting a cost-effective solution at the door. This approach would also take advantage of the scaling capabilities of S3, a service that was fully managed by AWS. The challenge with going this route seemed to be that AWS Identity and Access Management (IAM) limited the number of access groups (10) to which an individual could be a member. Pinterest needed a much larger number to create smaller, more focused access groups. Could that limit be increased? And if so, by how much?
The next step was to get a better understanding of the capabilities of S3. For this, AWS tapped S3 data management and security experts, who suggested that the Pinterest and AWS teams get together for a white-boarding session to understand the numbers: How many data sets did Pinterest anticipate needing? What were the permissions and policy constraints?
The conversation was particularly timely because the S3 team was working on a new S3 feature called Amazon S3 Access Points. Amazon S3 Access Point makes it easier to manage data access at scale for shared data sets. With S3 Access Points, a customer can easily manage multiple access patterns for different data buckets with highly customized bucket policies, each one managing different access points. S3 Access Points was at the center of a fine-grained access control solution for a scalable S3 storage solution by Pinterest.
During a series of whiteboard sessions at the AWS offices in San Francisco that spanned three months during the summer of 2019, the project team created a complex matrix of scenarios to help everyone understand all the different use cases Pinterest anticipated, how the Pinterest access control system would integrate with S3 Access Points, and how extensive the Pinterest data policies needed to be. As the team worked through requirements and constraints, it painstakingly charted a detailed picture of what Pinterest needed S3 Access Points to be able to do. In addition, AWS increased the number of access groups per individual to 30 so Pinterest could achieve the access control they wanted.
“Leveraging the scale and capabilities of Amazon S3 access control features, without having to reinvent them, gave us time to focus on ensuring that the experience for our customers was smooth and delightful,” said Vedant Radhakrishnan, Security Software Engineer at Pinterest.
“S3 had more capabilities than we realized,” said Keith Regier, Engineering Manager, at Pinterest. “After working with the AWS team, we built a prototype that was successful.”
Pinterest Fine Grain Access Control and amplifying underrepresented voices
The Fine Grain Access Control (FGAC) system Pinterest built with Amazon S3 Access Points combines human and automated elements. The developer within Pinterest who owns a particular data set stored in S3 determines which individuals should be included in the group that has access to that data. From there, an automated Credential Vending Service takes over. When a user requests access to a particular data set, the service builds an access token based on the access groups of which the person is a member. With this token, the system can determine whether or not the person should be granted access to that data. With thousands of Pinterest engineers, this automation makes the process of granting or denying access fast and reliable.
“Pinterest developers now have access to all the data tools they need to get their job done, along with the ability to lock down data as needed” said Keith.
Although the access control problem addressed a hypothetical problem when Pinterest first described it, the company quickly found a specific use for its new FGAC solution. As part of its commitment to providing a platform for everyone, Pinterest was working to ensure that Pinners could discover ideas that were inclusive of the global community. Data was critical to this effort.
By October 2020, Pinterest was using FGAC to support its efforts to amplify underrepresented creators and businesses on its platform. FGAC gave Pinterest the ability to equip specific developers to design experiences to help all Pinners feel represented while locking down data so that it wouldn’t be used for any other purpose, such as advertising or additional targeting.
This was an unanticipated but important first use for FGAC.
“When we first began working on FGAC, we didn’t imagine that amplifying the voices of underrepresented people would be one of the first use cases,” said Pinterest’s Chief Architect David Chaiken. “But that’s one of the great things about working at Pinterest: it’s often the unexpected and inspiring work that gets prioritized.”
“We had no idea that our first meeting would launch such an intense collaboration journey,” said Pinterest’s AWS Account Manager John McGuire. “But this is the kind of adventure we love, solving tough customer problems. And this one has been especially satisfying because Pinterest has given us the opportunity to learn so much along the way.”
Pinterest will continue expanding its use of FGAC, but the story doesn’t end there. The work revealed the benefits to both Pinterest and AWS of collaborating across company walls to solve tough technical problems. Doug and Keith say that the Pinterest-AWS project team members plan to look for other opportunities to innovate together.
“Working through the problem-solving process with AWS helped us build a solution,” said Keith. “But it also started a conversation between our two companies about where Pinterest wants to go in the future and what support we’ll need from AWS. We all entered the Fine Grain Access Control process together, and AWS became invested in the solution. This project mattered to all of us. That’s the biggest reason it was successful.”
You can learn more about the Pinterest-AWS FGAC collaboration in Part 2 of this two-part series. And learn more about the wide range of Amazon S3 features you can use to store, access, govern, and analyze your data to reduce costs, increase agility, and accelerate innovation.