Jobcase Scales ML Workflows to Support Billions of Daily Predictions Using Amazon Redshift ML
As an online community for workers and job seekers, Jobcase uses machine learning (ML) models to analyze its database of millions of job listings to match members with job recommendations. With more than 20 million unique visitors per month, the site scores billions of matches every day. To support this workload, the company needed to improve the scalability of its ML-based job-search recommendation engine while remaining cost efficient.
Jobcase was already using Amazon Web Services (AWS) to ingest and store over 100 TB of compressed data. But the company wanted to significantly reduce the need to move large amounts of that data between Amazon Redshift—a data warehouse that makes it simple to query and combine exabytes of structured and semistructured data—and its ML environment. Using Amazon Redshift ML, which analysts can use to create, train, and apply ML models using familiar SQL commands in Amazon Redshift, Jobcase can perform predictions on billions of records in a matter of minutes. Using AWS, Jobcase has improved scalability while decreasing its cost-to-performance ratio. Now, the company can support its growing community efficiently and test new features more quickly.
Amazon Redshift is one of the most important tools we have in growing Jobcase as a company.”
Distinguished Engineer, Jobcase
Reducing Overhead for ML Workflows
Jobcase is a community-supported work platform where more than 110 million registered members across the United States connect to help each other and find opportunities. While many job-search websites skew toward professional positions, Jobcase’s search tools and social features focus on a broader spectrum of everyday roles, including hourly and service workers, tradespeople, and technicians. Identifying strong matches lets the company suggest quality jobs for members and helps employers hire qualified workers. When someone searches for work opportunities on Jobcase, the company analyzes around 30 million listings in its directory, comparing the qualities of each with the member’s preferences. Its infrastructure must be able to perform these ML tasks at scale, retrieving and making predictions on billions of records per day. Jobcase has used Amazon Redshift for over 8 years as its primary data warehouse, acting as the source of truth for all its data analytics work. “Our database ingests billions of events every day,” says Ajay Joshi, distinguished engineer at Jobcase. “All our production systems generate data that flows into Amazon Redshift. The company depends on it.” The company’s previous ML workflow, which involved moving data from Amazon Redshift to a separate environment to run its ML software before returning the data to the database, was inefficient, error-prone, and costly. To overcome these challenges, Jobcase migrated to Amazon Redshift ML so that it could perform its ML functions inside the data warehouse—no data movement required. The company began testing Amazon Redshift ML in December 2020 and deployed it to production in July 2021. “The new system on AWS basically fit into our pipeline as is,” says Joshi. “We were able to quickly deploy several models into production that immediately started yielding benefits.”
Improving Scalability and Speed Using Amazon Redshift ML
Using Amazon Redshift ML simplifies the way Jobcase generates predictions from its ML models. “Through Amazon Redshift ML, we can fit a wide range of sophisticated ML model classes to the data directly in our Amazon Redshift data warehouse,” says Clay Martin, senior data scientist at Jobcase. Just 4 weeks after deploying the new models on Amazon Redshift ML, the company had already seen up to 5 percent improvement in its engagement metrics for specific email and push notification channels. “A 5 percent improvement in engagement metrics translates to an improved member experience and member retention and a corresponding increase in revenue,” says Martin. Jobcase can now perform model inference on billions of records in a matter of minutes instead of 4 to 5 hours.
The recommendation system generates specific job listing recommendations—as well as search suggestions and company recommendations—for each of its millions of active members. As the community grows, the costs of maintaining complex data pipelines increase. “We do an average of five to six billion offline predictions every day,” says Joshi. Using the in-database local inference features of Amazon Redshift ML removes the need to transfer data between separate environments. As a result, Jobcase saves money and reduces complexity while increasing the scale of its ML workloads.
Additionally, Jobcase can work through large tests more quickly than before. “Previously, we would have to perform tests on small user cohorts over 1 to 2 months,” says Martin. “Using Amazon Redshift ML, we can run tests on entire datasets in less than a week.” This facilitates building and iterating on the company’s models at a very fast pace. In addition, the service’s ability to train and deploy models automatically contributes to increased productivity across Jobcase’s teams. “We’re a small company relative to the amount of data we process,” says Joshi. “Running predictions quickly and with little work required to deploy the models on Amazon Redshift ML frees us up to focus on adding value to other aspects of our product.”
Equally important, by using Amazon Redshift ML, Jobcase can scale its ML workloads without increasing costs. “To achieve high performance at this scale on a different system, we would have to spend a significant amount of time and money optimizing it,” says Joshi. Instead, the company faced no increase in cost when it began using Amazon Redshift ML because the feature works within its existing Amazon Redshift cluster. The elasticity of working in the cloud makes it simple for Jobcase to work at scale, even as the company’s user base grows. “We have always been on the cutting edge when building on AWS,” says Joshi. “We’ve had a great relationship with the teams at AWS, and that has been phenomenal.”
Performing Data Analytics at Scale Using AWS
Jobcase plans to scale its use of Amazon Redshift ML to other teams within the organization. “We’re already seeing people on other teams deploy Amazon Redshift models,” says Martin. “Making this accessible throughout the organization is another valuable aspect of its scalability.” By gaining the ability to scale its data warehouse and ML workflows without raising costs or using excessive resources, Jobcase can deliver value to its growing community. “Amazon Redshift is one of the most important tools we have in growing Jobcase as a company,” says Joshi.
Jobcase is an online community dedicated to empowering and advocating for the world’s workers. Its technology offers access to jobs, tools, resources, and community-powered knowledge, helping more than 110 million members prepare for any role.
Benefits of AWS
- Achieved 5% improvement in member engagement rates with no cost increase
- Reduced testing time from 1 to 2 months to under a week
- Improved scalability to support more than 110 million members
- Makes billions of predictions in around 15 minutes instead of 4 to 5 hours
- Eliminated the need to move data to a separate ML environment
AWS Services Used
Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes, using AWS-designed hardware and machine learning to deliver the best price performance at any scale.
Amazon Redshift ML
Amazon Redshift ML makes it easy for data analysts and database developers to create, train, and apply machine learning models using familiar SQL commands in Amazon Redshift data warehouses.
Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.