How Civitas Learning collaborated with AWS Data Lab to build a flexible Data Science platform
The AWS Data Lab Architects listened to our requirements and designed a solution custom-fit to our needs that has allowed us to turn data science ideas developed in notebooks into repeated value for all of the institutions we serve.
This is a guest post by Daya Wimalasuriya, Principal Data Scientist at Civitas Learning.
Civitas Learning is a mission-driven education technology company that relies on the power of machine learning analytics to help higher-education institutions improve student success outcomes. We leverage AWS analytics and machine learning technologies to continuously track and associate data changes with machine learning model behaviors, identify intervention opportunities, and subsequently present all significant data changes and action items back to our customers through a comprehensive dashboard. This case study presents how Civitas was able to quickly adapt this framework at the onset of COVID19 to provide our community college and university customers with the insights they needed to feel confident that their Civitas models were still performing well despite the unprecedented changes happening within the education industry.
In this case study, we'll cover:
- An overview of Civitas’s core architecture
- How Civitas designed and built this core architecture leveraging the AWS Data Lab
- How Civitas applied the architecture to address COVID19 analytics
Overview of Civitas’s core architecture
The underlying architecture used to deliver Civitas’s COVID19 solution in 2020 was originally developed in collaboration with the AWS Data Lab for a slightly different project. Working with the AWS Data Lab, Civitas’s data science team designed and built a dashboard in 2019 to track the efficacy of various student success initiatives implemented by higher-education institutions using machine learning models. One of the core components of this architecture is parameterized Jupyter notebooks that are run through Papermill in Amazon SageMaker with a customized docker image. The architecture also leverages AWS Step Functions to orchestrate the analyses and Amazon QuickSight to present the results via dashboards that update on a set frequency. The benefit of this architecture is its adaptability to a wide range of use cases, which Civitas quickly learned 6 months later when using it to react to COVID19. By attending the AWS Data Lab, Civitas’s data science team deepened their knowledge of AWS services and left equipped with the skills we needed to repurpose our architecture 6 months later for tracking changes to machine learning models caused by COVID19.
Preparing for the AWS Data Lab
Before Civitas Learning sent a team consisting of four data scientists and one software engineer to Seattle, WA to build a data science platform with the AWS Data Lab, there was a significant amount of prep work completed in the weeks leading up to the lab in order to set the team up for success. This pre-work included a series of online meetings that allowed us to communicate the business and technical requirements in depth to our dedicated AWS Data Lab Architects and design an initial architecture that would be used as a starting point in the lab. Most of Civitas’s code was already in Jupyter notebooks and we wanted to utilize an open source tool, Papermill, to parameterize the execution of Jupyter notebooks. Our AWS Data Lab Architects worked with us to keep these elements in our final architecture, pairing these technologies with AWS Step Functions, Amazon Elastic Container Registry (ECR), and Amazon SageMaker for the optimal solution. This allowed us to easily use our existing Jupyter notebooks with the new architecture.
AWS Data Lab Experience
Over the four days of the lab, AWS Data Lab Architects guided us to build a prototype of the architecture in our development environment. On the first day, we kicked off with an architecture review session. Based on the pre-lab discussions, we had arrived to the Build Lab with a notional architecture using AWS Batch and AWS Step Functions to deploy our code, and during this kick-off session, the AWS Data Lab team came with yet another alternative solution that would help parameterize and schedule the execution of our Jupyter notebooks leveraging Amazon SageMaker Training Jobs. This solution ended up achieving our goals even faster!
Having direct access to the expertise of our AWS Data Lab Architects and other AWS service experts made building our solution that much easier. We also had guest speakers join our lab who deep dived on various technologies that would be a part of the new architecture we were building. We were new to many of the services we were building our prototype with and having access to AWS experts helped us clear blockers as they emerged and ensure we were building according to trusted best practices. Figure 1 shows the architectural diagram of the solution we developed.
Figure 1: The architecture developed during the AWS Data Lab. We created a Docker image in Amazon ECR that could run parameterized Jupyter notebooks. An Amazon CloudWatch Event invokes an AWS Step Function to begin the notebook execution. Amazon SageMaker spins up ephemeral compute resources with the Docker images to run the job.
In only four days, our team was able to build a prototype of the entire architecture, with only minor follow-up items to complete after the lab. In addition to building the architecture, we walked away with a template that would allow us to automate multiple data science projects in the future, such as ranking the potential impact of large numbers of possible initiatives on student segments for continuous institutional improvement. As a bonus, we also gained a deeper understanding of different AWS services, like Amazon QuickSight, which would help our business with the onset of the COVID19 pandemic.
Application in COVID19 Analytics
When the COVID19 crisis hit in early 2020, we had a lot of experience with the notebook automation architecture described above. By then, we had experimented with it on a few projects - one project focused on evaluating the performance of newly built models and another project was aimed at identifying “engagement opportunities” for universities, such as sending nudges to specific groups of students encouraging them to register for the next term or sending referrals to tutoring. While these projects did not leverage the full breadth of the architecture we built in the AWS Data Lab, they gave us confidence that the architecture we designed in the AWS Data Lab was functional and could be used for a new application without much difficulty.
This was crucial when the COVID19 crisis hit the academic world and Civitas Learning received many inquiries from institutions concerned about their data and models. COVID19 caused significant changes in student behavior, such as moving a large number of students from on-ground modality to online modality. Our customers were concerned about whether our models were still accurate. In addition, they also wanted to know if there were significant changes in their datasets that they were not aware of. This provided a perfect setting to develop an application using the notebook automation architecture. Because of our past experiences, it was fairly easy for our team to set up a dashboard that could visualize what was going on with the data and model performance for each institution. The fact that we already had significant parts of the code in existing Jupyter notebooks made the task even easier and the fact that our architecture easily integrated with Jupyter notebooks meant we could seamlessly transition from experimentation to production. This significantly shortened our development time and we were able to produce the first version of the dashboard within one month, allowing us to answer questions from our customers in a timely manner.
In terms of model performance, we were fairly confident that our models would be able to cope with the changes since they use multiple data sources with derived and normalized features that provide a sophisticated picture of student behavior---the dashboard proved us right. In some cases, the changes in prediction scores were minimal and we could verify that the underlying data had not changed significantly. In cases where there were noticeable changes in the prediction scores, we were able to identify the reasons for those shifts using the changes in the underlying data (e.g., students registering later for the next term when compared with the previous year). For some online institutions, we saw a material increase in new student enrollments, which changes the student mix in a less favorable direction since new students drop out at a higher rate. This leads to lower predictions, especially when coupled with a smaller number of students applying for financial aid. Such insights help institutions identify students in need so they can offer the right intervention to ensure their success.
Figure 2 below illustrates how the dashboard is used to explain changes in model scores when compared with scores from the past. The plot on the left shows the features with the most significant changes in data availability. It can be seen that the availability of financial aid features has gone down by a significant amount, which would affect the model scores. The plot on the right shows the features with the highest difference in z-scores when compared with the past data. It can be seen that loan ratio has a significantly lower value and as such it may have an effect on scores.
Figure 2: A subset of the Amazon QuickSight dashboard to highlight the key model features with material non-stationarities that can impact model performance
Our team was very pleased with how quickly we were able to get the architecture discussed with the AWS Data Lab team into production. The AWS Data Lab Architects listened to our requirements and designed a solution custom-fit to our needs that has allowed us to turn data science ideas developed in notebooks into repeated value for all of the institutions we serve.
After applying the knowledge we gained during the AWS Data Lab, we were able to quickly create a comprehensive solution to monitor the key features of our data and models. This speed of development allowed us to identify how different institutions and their students may have been affected by COVID19, as well as how changes in the underlying data affected our models. It also empowered us to get critical insights quickly, which was crucial during a time of unprecedented uncertainty.
The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.
About the Author
Daya Wimalasuriya is the Principal Data Scientist at Civitas Learning. He was the coordinator at Civitas Learning for the AWS Data Lab engagement discussed in this case study. Daya has worked for more than 5 years at Civitas Learning in a wide range of projects ranging from improving production model performance to integrated machine learning analytics R&D to consultancy engagements. Having earned his doctoral degree in 2011 from the University of Oregon for his dissertation on ontology-based information extraction, he held a faculty position before joining Civitas Learning. Civitas Learning would also like to acknowledge the contributions made by Data Scientists Keisuke Irie, Joseph Kim and Tianjiao Zhu in the project.
About AWS Data Lab
AWS Data Lab offers accelerated, joint engineering engagements between customers and AWS technical resources to create tangible deliverables that accelerate data and analytics modernization initiatives. During the lab, AWS Data Lab Solutions Architects and AWS service experts support the customer by providing prescriptive architectural guidance, sharing best practices, and removing technical roadblocks. Customers leave the engagement with an architecture or working prototype that is custom fit to their needs, a path to production, deeper knowledge of AWS services, and new relationships with AWS service experts.
Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.