JustGiving is one of the world’s largest online social platforms for charitable fundraising. The London-based organization’s 24 million registered users have helped raise $3.5 billion for more than 13,000 causes as diverse as Alzheimer’s research, Haiti earthquake relief, and initiatives to stop human trafficking. Launched in 2001, JustGiving charges a 5 percent transaction fee on donations, the profits of which are reinvested into the development and innovation of its platform.
JustGiving’s growth has been fueled by its integration with social networks, support for mobile donations, social feeds, and the launch of campaigns and crowdfunding products—all of which have enabled JustGiving users to amplify their fundraising, reach more people, and raise more money for good causes. The growth has also challenged the organization’s technology and operations teams, which follow the company’s mission of delivering an engaging and user-friendly experience across all platforms while keeping costs and overhead as low as possible.
“What we’ve seen in the past few years is people increasingly sharing their fundraising activities through a variety of networks and channels,” says Richard Atkinson, JustGiving’s chief information officer. “There’s a lot of viral social phenomena coming from nowhere, and the result is that our spikes in traffic are getting even spikier. We had a collocated data center environment that made scaling difficult due to additional costs and complexity that would have been costly and impractical.”
The dramatic increases in data were also overwhelming the organization’s internal analytics processes, says Richard Freeman, Ph.D., solutions architect and data scientist at JustGiving. “We had been growing our analytics team and capabilities so rapidly that our Microsoft SQL Server data warehouse was unable to process the new data volume, velocity, and query complexity required by our data scientists and analysts,” Freeman says.
To address these issues, the company decided to move to the cloud for its general operations and to host a new big-data analytics platform called RAVEN (Reporting, Analytics, Visualization, Experimental, Networks), which would work alongside the existing data warehouse. The goal was to give JustGiving’s data analysts tools for running experiments on clickstream, log, transactional, and external data sources. The analytics team also wanted to run more traditional reporting and examine key performance indicators (KPIs) without the need to continually repeat different extract transform and load (ETL) processes.
JustGiving chose Amazon Web Services for its test and production environments, with a particular focus on enhancing analytics capabilities. Atkinson says the choice boiled down to trust: “We’re a trusted brand for 24 million users and 13,000 causes that are using us to raise funds,” he says. “We wanted to find a solution for cloud services that we could bring into that trust network. AWS was really the only player for that.”
JustGiving reengineered its software as
The organization also uses AWS for its RAVEN analytics platform, relying on several AWS services, including Amazon Redshift, Amazon Elastic MapReduce (Amazon EMR), Amazon Kinesis, AWS Lambda, Amazon DynamoDB, Amazon Simple Queue Service (Amazon SQS), and Amazon Simple Notification Service (Amazon SNS).
“Many vendors propose a graphical interface to big-data integration, but we found that in real life it was more efficient to load and query the data with actual SQL code triggered on an ad hoc basis for our data-science experiments or automated for KPI dashboards and reports,” says Freeman. “Using AWS, we have built an event-driven ETL pipeline with systems that communicate through a robust hosted SNS and SQS-based messaging process. We also looked at existing open-source workflow frameworks, but these require dedicated machines that need to be set up and supported. They were too complex to customize for our use cases.”
Freeman lauds the functionality enabled by the AWS platform. “AWS offered exactly what we needed for rapid prototyping,
For the JustGiving analytics team, Freeman says Amazon Redshift has proven to be an efficient product for data exploration and querying large structured datasets, which include billions of data points on different clusters. “We built a whole suite of tools for running event-driven ETL jobs and integrating with internal and external APIs,” he says. “Queries that took 30 minutes in SQL Server now take just seconds to run. We can run more complex queries that were not possible before, and we’ve even found that simpler graph-type queries, such as the relationship of charity and events to users, can be executed faster than using a dedicated graph database. And for the first time, we can provide our business users with a joined view of transactional and non-transactional data, such as page visits, donations, and sharing funnels. Redshift is faster, easier to use, and provides more useful tools than we had before to help support our production environment.”
JustGiving uses AWS EMR to run large automated ETL and analytics processes on terabytes of data without having to manage cluster infrastructure setup and maintenance. “Before Amazon EMR, our data scientists were limited to algorithms that can run on a single machine, and they could only work on sample data sets. Anything larger required days of computation,” Freeman says. “Now with Amazon EMR running Hadoop or Spark clusters, they can easily launch clusters with hundreds of Amazon EC2 instances to compute scalable graph processing, natural language processing, and machine learning and streaming analytics algorithms. For example, we use this in recommending crowdfunding projects, understanding user networks, automating charity tagging, and increasing user engagement.”
The analytics team uses Amazon Kinesis, AWS Lambda, and Amazon DynamoDB in tandem to perform tasks that were complex to implement in the past. For example, website clickstream events are written in near real time to Amazon Kinesis. An AWS Lambda function runs code in response to events, processing them and writing them to Amazon DynamoDB. Additionally, Lambda is used for testing, event monitoring, and active notification, while DynamoDB is used as a persistent data store for Kinesis events and as a visualization monitoring tool.
Atkinson notes that the new analytics platform is part of a new way of approaching IT that is benefitting JustGiving. “Before using AWS, JustGiving was basing decisions on a single high-level data source,” he says. “Now, using the new AWS tools, we can extract much more granular data from many sources based on millions of donations and billions of events, and then use that information to provide a better platform for our visitors.”
The scalability and cost-effectiveness of the AWS platform, particularly the pay-as-you-go business model, are also key to helping the organization continue on its growth path. “We’ve always mapped our costs to operational cycles and to the creation of value,” Atkinson says. “Previously, when we had a lot of money tied up in hardware, we’d spend money and that asset would sit there, often at low utilization rates. With AWS, we have automated our pipeline and, with our new analytics platform, we’re only spending money on tools and data that are producing insights, adding value, and supporting decisions in real time for all the users visiting JustGiving.”
To learn more about how AWS can help you process and analyze big data, visit our Big Data details page: http://aws.amazon.com/big-data/.