Skip to main content

Guidance for Data Sharing With Nonprofits Funders, Research Participants & Communities on AWS

Overview

This Guidance helps nonprofit research institutes build a modern data sharing portal. Nonprofit funders seek data sharing policies that will give them visibility into a nonprofit’s goal progress and output. In addition to funders, agencies are starting to require researchers to share data with research participants and communities. In January 2023, the US National Institutes of Health (NIH)* announced that most of the 300,000 researchers and 2,500 institutions that the NIH funds annually will need to include a data management plan in their grant applications and eventually make their data publicly available. This Guidance can help nonprofits achieve this goal by showing how funders can add to the raw data that nonprofit researchers use and how these researchers can share their findings with research participants and communities.

*Data Management & Sharing Policy Overview, National Institutes of Health (NIH), January 2023

How it works

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

Well-Architected Pillars

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

This architecture can be deployed using infrastructure as code (IaC). IaC helps you recover from failures by automating the process of launching new environments and infrastructure. The repeatable aspect of IaC enables consistency and ease of deployments in production or operation. Additionally, you can use Amazon CloudWatch to monitor services.

Read the Operational Excellence whitepaper 

Amazon Cognito provides managed security for controlling access to the data sharing portal. Lake Formation enforces security and governance of the data lake.

Read the Security whitepaper 

The serverless services in this architecture are automatically deployed across multiple Availability Zones, so that if one Availability Zone fails, services are available in another Availability Zone. Additionally, this architecture decouples storage from compute and uses stateless services to enhance reliability and availability. When compute and storage are decoupled, compute happens independently from when data is stored. If there are failures in compute, the person or process performing the data storage operation does not have to wait for confirmation that the issue is resolved. Instead, compute processes can execute automated retries and independently notify users of failures.

Read the Reliability whitepaper 

Multiple services used in this architecture, including QuickSight, SageMaker, Amazon S3, and AWS Glue, offer AWS Free Tier usage. With this offer, you can experiment and fine-tune architecture configurations without worrying about additional costs.

Read the Performance Efficiency whitepaper 

The serverless services in this architecture scale automatically, meaning they can manage growing data volumes while using only the minimum resources required. To manage costs over time, we recommend implementing a standardized process to identify and remove unused resources, such as unused data, SageMaker resources, and extract, transform, load (ETL) jobs.

Read the Cost Optimization whitepaper 

This architecture supports S3 Lifecycle policies, which allow you to monitor access patterns to discover data that should be moved to lower-cost storage classes, such as infrequently accessed data storage or cold storage. This helps reduce the amount of resources needed to maintain data storage. 

Read the Sustainability whitepaper 

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.