Careem logo

Careem Improves Fraud Prevention with AWS Machine Learning

2021

Dubai-based Careem became the Middle East’s first unicorn when it was acquired by Uber for $3.1 billion in 2019. A pioneer of the region’s ride-hailing economy, Careem is now expanding its services to include mass transportation, delivery, and payments as an everyday super app.

But its size and popularity—it has around 50 million customer accounts—have also made it a prime target for fraudsters constantly looking for new loopholes to exploit and different ways to hijack genuine accounts.

Careem needed a way to detect and stop losses from fraud that were damaging both its revenue and its brand reputation.

It turned to Amazon Web Services (AWS) and is now fighting back using analytics and machine learning to automatically identify and block fraudsters in their tracks before any crime can be committed.

Careem delivery
kr_quotemark

Amazon Neptune is fully managed, which is a big advantage to us in terms of how many people we’d have to have working on this project, and the potential cost of infrastructure and maintenance.”

Kevin O’Brien
Senior data scientist, Careem

When Fraudsters Attack

Careem sees a wide variety of different types of fraud and criminals are always finding new loopholes to bypass the specific measures it puts in place to combat the existing fraud patterns detected.

In the past, tackling these different kinds of fraud was a never-ending game of cat-and-mouse. Careem used to have to create rules or machine learning models for each specific type of fraud. But this was problematic on two levels.

First, it only allowed Careem to identify and block an account after the fraud had been committed and detected—the money had already been lost.

Second, fraudsters were able to quickly spot when Careem had worked out how to detect that type of fraud and they would simply move on and find a new loophole to exploit.

A Smarter Way

It was clear that Careem needed a smarter and faster way to detect fraudulent accounts and stop fraud before it was committed.

“Instead of continuously creating very specific tools to detect very specific fraud use cases, we wanted to build a project that was almost a blanket detection mechanism over all the users, regardless of what type of loophole they found or whatever type of attack they try to make,” says Kevin O’Brien, senior data scientist at Careem.

Careem opted for a graph database as a way of detecting potentially fraudulent patterns in real time across user and account activity, and evaluated several of the major providers in the market. 

It chose AWS and the automated real-time analysis and monitoring capabilities of Amazon Neptune, in part because it is a managed service. 

“Amazon Neptune is fully managed, which is a big advantage to us in terms of how many people we’d have to have working on this project, and the potential cost of infrastructure and maintenance,” says O’Brien. “Instead, that’s all completely managed by AWS.” 

Careem was already using AWS for all its cloud computing and data warehouse operations, so opted to stay in the same environment for its fraud prevention project. 

Careem also had a preference for the Gremlin query language that supports Amazon Neptune over the query languages such as Cypher that are used by other graph database providers. Gremlin allows developers to write queries in a range of programming languages, including Groovy, Java, and Python.

Detecting Patterns by Focusing on Identity

To improve its fraud detection capability using Amazon Neptune, Careem started focusing on the identity of users in addition to its efforts tackling specific types of fraud as they arose.

The Amazon Neptune graph database allows Careem to make connections between different users and datapoints and identify patterns that might indicate fraudulent activity.

The first version of the fraud prevention project went live in October 2020 using historical user data going back to 2012 from Careem’s in-house sources, such as its data warehouse. This data is extracted, transformed, and then formatted into CSV files on Amazon Simple Storage Service (Amazon S3) before being uploaded to Amazon Neptune. That historical data is added to in real-time as users perform new actions, such as using a new device to log in, adding a new credit card, changing a phone number, or making a profile change. On average, data is added to or updated in the Amazon Neptune graph more than 100,000 times per day.

This creates a cluster of data connected to each user, that is analyzed using a simple algorithmic analytics engine, built by Careem using Python, that sits on top of Amazon Neptune. 

When an account is flagged as potentially fraudulent it is either automatically blocked if the data shows it is historically an untrustworthy account or flagged for manual review if it is a trustworthy or high-value account, such as that of a corporate customer.

Reducing Losses with Improved Accuracy

Careem has blocked tens of thousands of fraudulent user accounts since the implementation of the first phase of the project in October 2020, and the results are impressive—around 90 percent of the users that the system automatically blocked were correct decisions. This means Careem is blocking these fake accounts before any fraud is committed, which helps reduce losses.

After the success of this first phase of the project, Careem is now working with AWS on an updated version that will improve the accuracy further by using the machine learning capability in Amazon Neptune ML.

Using around 10 times more historical data, Careem will be able to apply advanced deep learning instead of a simple rules-based approach, and train the system so that it can learn to identify what a fraudulent user looks like on the graph database. This will allow for vastly improved recall, where the system is able to correctly detect more fraudulent accounts out of all the users that are analyzed by the system—while improving fraud prediction accuracy well beyond 90 percent.

“We are very confident this second version of our solution will improve on our current fraud prevention capabilities,” says O’Brien. “And this is another great reason why we chose Amazon Neptune.”


About Careem

Dubai-based Careem is a pioneer of the ride-hailing economy and is now expanding its services to include mass transportation, delivery, and payments. Founded in 2012, Careem has operations in over 100 cities across 14 countries in the Middle East, Africa, and South Asia. It was acquired by Uber for $3.1 billion in 2019.

Benefits of AWS

  • Security & Compliance
  • Agility & Performance
  • Availability
  • Innovation

AWS Services Used

Amazon Neptune

Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets.

Learn more »

Amazon Redshift

With Redshift, you can query and combine exabytes of structured and semi-structured data across your data warehouse, operational database, and data lake using standard SQL.

Learn more »

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

Learn more »

Amazon SageMaker

Amazon SageMaker helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML.

Learn more »


Get Started

Learn more about Amazon Neptune. A fast, reliable graph database built for the cloud.