Experian gathers, analyzes, and processes credit data at massive scale to help businesses make smarter decisions, individuals gain access to financial services, and lenders minimize risk. Headquartered in Dublin, Ireland, the company has 16,000 employees in 37 countries, with clients in 80 countries.
Experian’s Consumer Information Services, also known as the consumer credit bureau, drives a bulk of the company’s revenue with a database of credit information for more than 220+ million consumers in the United States. In the world of financial services, lenders use this data to create new products, personalize offerings, and make accurate decisions about credit risk—all of which depend on accurate, up-to-date information. Experian prioritizes data quality, processing more than 1.8 billion updates per month. Using the power of Amazon Web Services (AWS), the company is realizing the potential of disruptive improvements in the speed and scale of data operations.
Experian first started its journey to the cloud as an efficiency project, to reduce the cost of ownership of its batch-processing environment. “Along the way we realized there was a massive market opportunity,” says Vijay Mehta, senior vice president at Experian. “We, like our customers, have hundreds of data scientists working in siloed environments using extracts of data to provide bespoke solutions. By accelerating the process and increasing the amount of data that could be used, we have empowered our internal data scientists and our customers to discover unique insights that drive true competitive advantage.”
Leveraging AWS and related tools, Experian built a sandbox environment that enables modeling against petabytes of full-file credit data. The environment is accessible to both Experian data scientists and customers’ own analytics teams. The cloud-based environment makes it feasible to run models against large amounts of data, improving the accuracy of results. The large number of variables in the full-file data supports enhanced discovery with the ability to incorporate characteristics including scores, inquiries, demographics, geographical information, and more.
“By providing clients with the ability to build models against hundreds of terabytes of full-file credit data, we help them deploy new solutions to market faster, with better outcomes and more relevance to their customers,” says Mehta. For example, Experian clients are using the environment to discover new market segments and design products that appeal to them. They are improving risk scoring by developing more accurate ways to measure who is likely to pay a loan on time. And, they are enhancing their sales and marketing efforts by understanding consumer needs and behavior at a detailed level.
Experian settled on the Apache Hadoop distributed-processing environment to handle what would eventually be a petabyte-scale system. “We didn’t want to roll our own solution, and we eventually decided to partner with Cloudera because of the toolset they offer,” says Mehta. Cloudera is an AWS Advanced Technology Partner that provides Cloudera Enterprise, a modern data-management and analytics platform that lets customers rapidly process and explore all their cloud data, regardless of where it lives.
Experian leverages Apache Spark through the PySpark API for high-performance parallel processing, as well as Apache Kafka for message queuing and ingestion where applicable. Compute functions are programmed in Java. Microservices and API functionality are provided by Apigee, while replication of data from Experian data centers into AWS is handled by Attunity; both organizations are also AWS Advanced Technology Partners. The entire solution is built using Amazon Simple Storage Service (Amazon S3) and a 2,360-core cluster of Amazon Elastic Compute Cloud (Amazon EC2) instances.
Given the sensitivity of Experian’s data, there was no cutting corners on security. The organization uses encryption at the level of the Hadoop Distributed File System (HDFS), using the same algorithms that the government uses for national defense data. “We want to lead the charge to the cloud in our industry, and to do that requires providing the highest assurances to our customers that data is secure and private,” says Mehta.
Ultimately, Experian has found AWS to be an ideal fit for its business goals. “Using AWS is extremely safe, because we have full encryption in transit and at rest,” says Mehta. “It’s also very flexible, because we can access new tools as soon as they become available and apply machine learning across vast amounts of data.”
Experian Analytical Sandbox took only 10 months to build and has become one of the company’s fastest-growing products. “One reason for the Sandbox’s rapid adoption is the freedom clients have to explore the data in a self-service manner using tools such as R, Python, H20, and so on,” says Mehta. “Not only can they work with off-the-shelf models and strategies we provide, they also can gain proprietary insights with their own tools.”
Experian has big plans for the future of credit data in the cloud. “This is the first step in a larger transformation,” says Mehta. “We are using technology as a differentiator. AWS gives us an environment that is elastic and flexible, and lowers time to market, which is very powerful for us.”
Learn more about financial services cloud solutions through AWS.