AWS Public Sector Blog

Alex’s Lemonade Stand Foundation uses AWS to advance cutting-edge pediatric cancer research worldwide

Alexandra “Alex” Scott was diagnosed with neuroblastoma, a type of childhood cancer, shortly before her first birthday. She battled cancer bravely through her early childhood and at four-years-old, hosted her first front yard lemonade stand to fundraise for childhood cancer research. Her first lemonade stand raised $2,000. She held yearly lemonade stands, and her story inspired others to do the same, and in the course of her life, she raised more than $1 million before she passed away in 2004. Alex’s Lemonade Stand Foundation (ALSF) emerged from her commitment to finding cures for childhood cancer. Today, the Foundation bearing her name is a national fundraising movement that changes the lives of children with cancer by funding impactful research, raising awareness, supporting families, and empowering the global research community to help cure childhood cancer.

Increasing access to user-friendly pediatric cancer data with the cloud

In 2017, ALSF founded the Childhood Cancer Data Lab (Data Lab) to address an important gap in the pediatric cancer field: vast amounts of accumulated data were not being put to use at scale. ALSF user experience research revealed that childhood cancer researchers often spent up to 30% of their time searching through repositories of public data, downloading and processing datasets, and comparing results across datasets. This duplication in efforts among researchers led to a loss in time and resources to dedicate towards important research activities.

To address this gap, the Data Lab first built refine.bio to make public datasets interoperable and reusable. refine.bio is an openly available collection of normalized bulk gene expression data. It harmonizes data across many different technologies into one universal repository, reducing the time that scientists have to spend normalizing datasets, a process that can take up to two weeks. refine.bio is powered by Amazon Web Services (AWS), and with the support of the AWS IMAGINE Grant in 2019, the number of samples available via refine.bio surpassed 1 million. Currently, the Data Lab has expanded its suite of products by launching the Single-cell Pediatric Cancer Atlas (ScPCA), a publicly available atlas of single-cell gene expression data from pediatric cancer samples. Compared to refine.bio’s bulk gene expression data, ScPCA’s difference in unit of measurement is significant since it allows scientists to examine individual cell populations across a variety of pediatric cancer types.

These complementary efforts by the Data Lab through refine.bio and the ScPCA Portal accelerate research by making data findable and accessible, helping researchers reveal commonalities with other conditions, and uncover potentially effective treatments in childhood cancers that have already been approved by the U.S. Food and Drug Administration (FDA) in other settings.

Saving scientists’ time to accelerate research with AWS

The main goal behind ScPCA is to produce a cutting-edge atlas of gene expression profiles for a variety of childhood cancer types from different organ sites to help pediatric cancer researchers probe the underlying mechanisms in service of finding a cure for childhood cancer. Over the course of the project, ScPCA investigators generated massive volumes of raw sequencing data and shared it with the Data Lab to be processed uniformly. Then, the Data Lab built a web interface, known as the ScPCA Portal, on AWS to release it to the public as an open source for discovery. First, the data is processed using the Data Lab’s open source pipeline, which packages the data in a standardized way for user-friendliness. The ScPCA Portal then offers no-cost access to the processed data in one convenient location.

Researchers can simply access and download samples from a broad range of cancer types following a step-by-step guide on how to get started with an ScPCA dataset. The original datasets are stored on Amazon Simple Storage Service (Amazon S3) to be processed by utilizing AWS Batch, which enables scientists to simply and efficiently run hundreds of thousands of batch computing jobs on AWS. The packaged summarized data are also stored on Amazon S3. Storing and processing datasets at this size and scope to share with the broader researcher community would be challenging without leveraging cloud technology.

The portal makes data openly available to childhood cancer researchers, and the heavy lifting of processing the raw sequencing data is already done, so it frees up researchers’ time to focus on analysis to accelerate impact and innovation. Scientists tend to leverage repositories of publicly available data as a launching ground to devise a research direction or they review the data to check and confirm their hypotheses.

Figure 1. Alex’s Lemonade Stand Foundation provides funding to pediatric cancer researchers and the Childhood Cancer Data Lab (Data Lab). Researchers provide raw sequencing data to the Data Lab, which then uniformly processes the data for the ScPCA Portal, making the data accessible to researchers worldwide.Figure 1. Alex’s Lemonade Stand Foundation provides funding to pediatric cancer researchers and the Childhood Cancer Data Lab (Data Lab). Researchers provide raw sequencing data to the Data Lab, which then uniformly processes the data for the ScPCA Portal, making the data accessible to researchers worldwide.

Expanding access to critical data to make Alex’s vision a reality

Launching ScPCA was a major milestone for ALSF. While there are other data portals that look at single-cell data, this collection is uniquely dedicated to pediatric cancer data. In addition, single-cell is a cutting-edge technique that may not be available to all childhood cancer researchers, so hosting the data openly on the cloud expands accessibility. Currently, data from seven ScPCA projects are accessible on the portal, including 226 patient samples representing 29 diagnoses. New processed data will be added to the portal as they become available for researchers to use in analyses.

Ultimately, ALSF wants to offer resources that are useful and that will accelerate progress. In addition to building tools like refine.bio and ScPCA, the Data Lab also offers in-person and virtual training workshops that teach researchers skills to better understand their data, to collaborate with other scientific community members, and to improve their ability to work with data analysis tools. Looking forward, ALSF is excited to continue supporting childhood cancer researchers to make Alex’s vision of a world without childhood cancer a reality. The organization actively seeks community feedback to inform portal improvements to better serve researchers’ needs. To scale the initiative, ALSF envisions a day when researchers can process and add summarized data to the ScPCA portal themselves, increasing the breadth of data and collective support available worldwide.

Learn more about the cloud for nonprofits, and check out more nonprofit stories.

Read more stories about AWS for nonprofits:


Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.

Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.

Jennifer O’Malley

Jennifer O’Malley

Jennifer O’Malley is the scientific community manager for the Childhood Cancer Data Lab. She joined Alex’s Lemonade Stand Foundation in 2019, raising funds and awareness for pediatric cancer research as a special events coordinator. In 2021, she assumed her current role on the Data Lab team. Jen now manages Data Lab communications, administers programs and services, identifies opportunities to maximize reach, and maintains a supportive community for pediatric cancer researchers. Jen received her Bachelor of Arts in English and her Master of Arts in Public Relations and Communications from Hofstra University.

Jaclyn N. Taroni

Jaclyn N. Taroni

Jaclyn N. Taroni, PhD is the director of the Childhood Cancer Data Lab. She joined the Data Lab in 2018, serving as the principal data scientist prior to assuming the director role in 2021. In her time with Alex's Lemonade Stand Foundation, Jaclyn led the establishment of the training workshop program, served as the scientific lead for refine.bio, and oversaw the overall development of the Single-cell Pediatric Cancer Atlas project. Jaclyn received her PhD from the Geisel School of Medicine at Dartmouth and postdoctoral training at the Perelman School of Medicine at the University of Pennsylvania.

Angela Wu

Angela Wu

Angela Wu is the content manager on the Amazon Web Services (AWS) worldwide public sector grants team. She loves telling stories about how technology connects communities and inspires social change.

Irene Wu

Irene Wu

Irene Wu is a program manager at Amazon Web Services (AWS). With a background in education policy and international development, Irene works with nonprofit organizations to advance their missions and scale their impact with cloud technology.