Using AI to Fight Fraud
Cryptocurrencies like Bitcoin have seen a blizzard of headlines over the past few years. These digital tokens share some of the qualities of hard currency and can be bought, traded and spent. In fact, an entire market has grown around the trading of digital currencies, with investors and speculators keeping close tabs on every fluctuation.
At the center is San Francisco-based Coinbase, a digital wallet and exchange platform where over 20 million merchants and consumers have traded more than $150 billion in cryptocurrencies since its founding in 2012.
Like all financial services companies, Coinbase needs to provide a seamless experience for consumers while taking steps to secure the environment in which they operate. For this, the company relies on artificial intelligence (AI) using machine learning tools from Amazon Web Services (AWS).
“AI has been in the DNA of the company from the very beginning,” says Soups Ranjan, director of data science at Coinbase. “One of the biggest risk factors that a cryptocurrency exchange must get right is fraud, and machine learning forms the linchpin of our anti-fraud system.”
Using Amazon SageMaker, a tool to easily build, train and deploy machine learning models, engineers at Coinbase developed a machine learning-driven system that recognizes mismatches and anomalies in sources of user identification, allowing them to quickly take action against potential sources of fraud.
“ID authentication online is actually a very hard problem,” notes Ranjan. “When you go into a bar and the bouncer looks at your driver license, he can shine light of a certain frequency and look for hidden messages like holograms.”
That’s not possible online, so Coinbase uses SageMaker to develop machine learning algorithms for image analysis to defeat scammers. For example, a face-similarity algorithm automatically extracts faces from IDs that are uploaded and then compares a given face with all of the faces across other IDs that have been uploaded. Scammers often use the same photo for multiple IDs, as they would otherwise have to edit the face in several places on the ID. With this face similarity algorithm, the company can quickly detect the forgery.
“Machine learning helps us balance risks for Coinbase, with flexibility for customers where we want them to have the best experience possible.”
Soups Ranjan
Director of Data Science
Coinbase
“Machine learning helps us balance risks for Coinbase, with flexibility for customers where we want them to have the best experience possible.”
Soups Ranjan
Director of Data Science
Coinbase
“The reality is, it’s easy for customers to move to different services for cryptocurrency,” Ranjan says. “Machine learning helps us balance risks for Coinbase, with flexibility for customers where we want them to have the best experience possible.”
The insights gained from building anti-fraud algorithms also allow Coinbase to tailor experiences based on user types—a simple and intuitive way to segment retail-level investors who buy and hold, versus sophisticated pro users who trade a lot. In a recent customer segmentation exercise, a Coinbase analyst was able to simply write a clustering algorithm on a laptop and then run it through SageMaker to analyze how customers use cryptocurrencies, segmenting those who are interested exclusively in trading from those investing for the long haul.
But risk management is only one side of the story. Given its digital roots, it’s no surprise that cryptocurrency, like more traditional financial markets, goes hand in hand with a tremendous amount of data. “Our data warehouse gathers data from various microservices, including blockchain and user data—that’s hundreds of terabytes altogether,” says Ranjan. “That number has doubled since the beginning of the year.”
However, since Coinbase operates in a highly regulated environment, the company takes extra measures to ensure customer data is protected—even from its own data scientists and engineers. Any code that runs on Coinbase production servers has been code reviewed and looked at by multiple sets of people before it goes into production. “One of our core tenets is that we are a security-first company because we are storing cryptocurrencies on behalf of our customers,” says Ranjan.
Restricted access to data in a highly secure environment makes doing machine learning that much harder. Coinbase overcomes this challenge by allowing machine learning engineers access to data logs only via code that’s been thoroughly reviewed and committed into Amazon Elastic Container Registry—machine learning engineers can’t actually log into the production servers and run code that hasn’t been reviewed.
At the end of the day, digital cryptocurrencies rely on trust for their existence. And companies like Coinbase rely on AWS to build and maintain that trust by working to constantly stay ahead of risks.