ClassDojo Gains Control of Its Data with Mutt Data's Help

Executive Summary

ClassDojo needed to create a loosely coupled architecture for its data pipelines taking advantage of Amazon Web Services (AWS) resources while considering budget constraints. AWS Partner Mutt Data deployed an expert Data Engineering advisory and hands-on team to design and implement a Modern Data Platform, its tooling, pipelines, components, and integrations that could serve the whole company.

Modernizing ClassDojo's Architecture and Stacks

ClassDojo handles huge amounts of data that require appropriate processing in order to be consumed by their product and analytics teams. Their initial setup consisted of a tightly coupled architecture using the same tool to take care of their CI/CD and ETL/ELT pipeline jobs. Issues were scaling, data availability was decreasing, and pipelines were cost ineffective.

The challenge consisted of the creation of a loosely coupled architecture for their data pipelines taking advantage of AWS resources while considering budget constraints.

"We’ve worked with the team at Mutt Data for more than one year, they are an extension of our Data Engineering team. They worked on fixing up our data warehouse, integrating our systems with different data sources, building out our self-service data platform, and creating anomaly monitoring for KPIs."

Sam Chaudhary, Co-Founder and CEO, ClassDojo

Choosing the Right Partner

For ClassDojo, developing and implementing a brand new Modern Data Platform that conformed to their vision meant completely re-building their data foundation. They were not going to leave anything to chance. This is why picking the right partner for the project became a top priority.

When searching for a partner, ClassDojo was after two main attributes. First, they needed a company that aligned with their vision. Second, they wanted a partner with a proven track record of building modern data platforms.

In Mutt Data, ClassDojo found a team of data experts that specialized in building out DataOps and ML pipelines for companies with vast experience implementing solutions in a dozen industries and renowned clients. Together with Mutt Data, ClassDojo was able to mold its vision into an actionable plan.

Leveraging AWS Resources to Implement A Modern Data Platform

ClassDojo selected AWS Select Consulting Partner, Mutt Data, to deploy an expert Data Engineering advisory and hands-on team to design and implement a Modern Data Platform, its tooling, pipelines, components, and integrations that could serve the whole company.

Airflow (MWAA) was used to manage data pipelines, replacing ClassDojo's workflow scheduler with a scalable solution. Airflow DAG templates were also used to simplify the creation of DBT-based workflows which are idempotent by default, to enable easier reprocessing and rollback operations.

Additionally, Kinesis Firehose to Amazon Simple Storage Service (Amazon S3) and Amazon Redshift were used to set up streamlined workflows, DBT to create, document, and test views and tables, and AWS Glue, AWS Lambda, and Amazon EMR to apply specific transformations to large datasets.

Finally, Testing, monitoring, anomaly detection, and data governance were addressed. CI/CD was set up to automatically test incremental loads and rollback steps, Amazon Cloudwatch was chosen for monitoring, and a custom anomaly detection software was implemented to monitor raw, intermediate, and final KPI tables. To compound data governance tools Amazon Elastic Registry (Amazon ECR) and Amazon Elastic Container Service (Amazon ECS) were used to administrate and use docker containers with different services.

Different AWS services were used for different parts of the solution. For Raw Data Ingestion the pipelines were handled using Kinesis Firehose, Kinesis Data Analytics, Amazon S3, Amazon Redshift, AWS Glue, Amazon Managed Workflows for Apache Airflow (Amazon MWAA), Amazon S3, and Amazon Athena. For the AB Testing Platform, the following tools were used: Amazon MWAA, Amazon Redshift, and Amazon EC2. For the implemented DataOps Platform: Airflow, Great Expectations, DBT and Python were used together with AWS solutions such as Amazon MWAA, Amazon Redshift, and Amazon S3. Finally, for PII Detection alongside Datahub, Airflow, and Python the utilized AWS services were: Amazon OpenSearch, Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Managed Streaming for Apache Kafka (Amazon MSK), RDS, Amazon MWAA, and Amazon S3.

"The implemented solution solves ClassDojo's challenge by combining their ambition to constantly improve, our best practices and in-house trained expert data team, and the latest toolings from AWS Solutions."

- Mateo De Monasterio, Co-Founder and Chief Revenue Officer, Mutt Data

How the Modern Data Platform Enabled Growth and Scale

The implemented solution enabled the customer business to iterate product development faster, relying on its architecture to grow its operations with data. ClassDojo improved its capabilities for building and improving components to construct and control cost-effective pipelines that lead to high-end metrics consumption.

The result is a Modern Data Platform that will improve control over different data pipelines (facilitating their development with transformation steps), improve monitoring through CloudWatch with reduced response times in case of failures, and enhance analytical performance.

The solution improved Data Warehousing efficiency by a factor of ten, allowing for more connections, operations, and ways to build data models.

ClassDojo

About ClassDojo

ClassDojo is a global community of 50M+ teachers and families who come together to share kids' most important learning moments in school and at home—through photos, videos, messages, and more.

It’s app is used by teachers, children and families in 95% of pre-kindergarten through eighth grade schools in the United States, as well in a further 180 countries, raising more than 191 million-dollars in funding to date.

About AWS Partner Mutt Data

Mutt Data is an AWS Select Consulting Partner dedicated to guiding companies into an AI-Fueled future by building Modern Data Systems that combine the latest technologies and best practices in Cloud Infrastructure, Data Architecture, Data Engineering, and Machine Learning. With over twenty-five years of collective experience building big data architectures and eight years implementing machine learning systems, Mutt Data’s team of more than one hundred data science and engineering experts has implemented more than eighty AI and data-based projects.

Published January 2022