Hacking for Social Good: How AWS Hypercharged Our Hackathon
This post was co-authored by Ferdinand von den Eichen, Cofounder and CTO, kineo.ai; and Julia Ostheimer, Machine Learning Scientist, kineo.ai and Team Member, DSSG Berlin.
In recent years, digital technologies have transformed businesses and nonprofit organizations alike. Nonprofits are often especially overwhelmed by the amount of data they accumulate and lack the resources to generate value from it. In response to this challenge, the charitable organization Data Science for Social Good (DSSG) Berlin was founded in 2015 with the mission to enlist volunteer data scientists and analysts to help nonprofits utilize their data properly. With more than 250 volunteers worldwide, the DSSG community supports organizations in social good areas through projects in education, the environment, public health, and more. To bring these volunteers and nonprofits together, DSSG Berlin hosted Datathon; a data science hackathon providing challenges involving each of the participating organizations, which participants solve by hacking throughout the weekend.
Achieving speed and excellence with AWS during a hackathon
Hackathons thrive on the participants’ ability to move at breakneck speed, and Datathon 2021 was no exception. Data from the participating nonprofit organizations was prepared by DSSG’s data ambassadors months before the event, and organized by Datathon 2021 sponsor kineo.ai, Berlin’s Machine Learning (ML) consultancy. Kineo.ai’s mission is to enable German businesses to adopt AI, from the verification of early ideas all the way to production and millions of requests and data points.
The prepared data contained millions of entries on weather, COVID-19, and other factors. Traditional approaches weren’t going to cut it, so the team used Kubeflow Pipelines running on top of Amazon Elastic Kubernetes Service (Amazon EKS), to obtain the data. They followed by using AWS Glue to catalog and partition the data combined with Amazon Athena to provide quick, comfortable ad hoc query capacity.
During the event itself, participants focused on generating insights from these various data points, with Amazon SageMaker Data Wrangler at the heart of the solution. Using its simple, streamlined APIs, candidates accessed portions of data and put it into the tools of their choice, refining as needed. While some participants were content to carry out that kind of exploration on their local machines, others found they needed more horsepower to accelerate model training. Those participants were able to transfer their existing code quickly to Amazon SageMaker Studio Notebooks, enjoying significantly faster speeds, thanks to greater compute and GPU power. Of course, plenty of teams just wanted to use ML, rather than implementing everything from scratch. These teams made great use of some of the AWS high-level ML APIs, such as Amazon Textract, Amazon Comprehend, and Amazon Forecast.
37 participants. 8 data ambassadors. 6 Datathon volunteers. 3 organizations.
More than 30 participants from Germany and around Europe joined the Datathon from remote locations.
Use Case 1: Investigating influences on Germany’s bike traffic
The German Cyclists Association, the Allgemeine Deutsche Fahrrad-Club (ADFC) e. V., provided the first use case. Here, the main objective was to understand the impact of weather conditions and the COVID-19 pandemic on bike traffic. To improve Germany’s bike infrastructure, participants correlated different data sources to extract meaningful insights to boost the ADFC’s political lobbying efforts.
The teams applied different data science techniques, from dynamic time-series visualizations as seen in Figure 2, to time-series decomposition techniques (trend, seasonality, residuals), and correlations analysis. One of the main findings was that there is a correlation between rainy weather and the bike traffic during rush hours (correlation coefficient of -0.28), showing how rainy weather negatively impacts the number of people biking.
For Berlin in particular, data showed the increase in bike traffic over the past 4 years, resulting in more bike-related accidents. One team sourced bike accident data and compared it with the location of bike counting stations (Figure 3). Future steps for the ADFC could be to observe the number accidents at certain counting stations to evaluate which ones are more likely to be accident-prone hotspots.
The team investigating the influence of the COVID-19 pandemic on bike traffic in Berlin was able to provide insights on the varying levels of bike traffic during separate lockdown instances in Berlin (Figure 4).
Use Case 2: Assessing the parking situation in Berlin
With data provided by district office Friedrichshain-Kreuzberg of Berlin, the goal for this use case was to generate insights into the district’s parking situation, through an analysis of OpenStreetMap (a free wiki world map) data. Participants applied geospatial analysis techniques to correlate parking information with demographic data, to visualize areas in terms of their parking spot density.
Along with showing some general analytics (Figure 5), illustrating with the density of parking spots per km², participants discovered that even though 9.8% of Germany’s population has severe disabilities, only 0.05% of the parking spots in the district of Berlin Friedrichshain-Kreuzberg are built to accommodate people with disabilities. Figure 6 illustrates the distribution of parking spots for people with disabilities using blue dots.
Another participant examined the location data from Point of Interest (POI) locations in Berlin, such as schools, pubs, and tourist attractions, and see how many parking spots are available in the surrounding areas. To illustrate this, a parking consumption ratio was built by dividing the number of counted cars in the vicinity of a POI to the number of counted parking spots (Figure 7).
Use Case 3: Analyzing skill shortages at the German Red Cross
Finally, Germany’s national Red Cross organization, the Deutsches Rotes Kreuz e.V., is experiencing a shortage of skilled workers. The main objective of this use case was to build a better visual of the shortage by analyzing data from the German Red Cross job platform. Participants applied natural language processing and data analysis techniques to highlight which jobs would be more likely to attract new applicants and which factors may be driving the skill shortage. Due to privacy regulations, detailed findings from this use case cannot be presented.
Datathon 2021 was a great success driven by the desire in the IT and data science community to provide social good. With the seamless integration of the DSSG Berlin and kineo.ai team, and the infrastructure support of AWS, participants provided quantifiable insights for the nonprofit organizations, and many will go on to present their findings at events such as The Future of Care event, hosted by the German Red Cross. AWS services were critical to enhancing and speeding up these efforts that have the potential to impact so many. If you’d like to get involved as a volunteer or want to participate in the 2022 Datathon or help out with nonprofit data projects, reach out to DSSG Berlin.
Ferdinand von den Eichen is Co-Founder and Managing Director at Kineo.ai. He has been building cloud architectures on AWS, Azure and Google for the last 10 years. He is excited to push the boundaries of AI tech and wants to build modern organisations on the basis of personal growth, ownership, trust and empowerment.
Julia Ostheimer is a Machine Learning Scientist at Kineo.ai where she focuses to develop impactful solutions by combining human domain expertise with machine intelligence. In her free time, she volunteers for DSSG Berlin as a team member and was part of the core team organizing the Datathon 2021.