AWS Startups Blog
How FinTech LendUp Strives for (And Achieves) Precision
Guest post by Mengxi Lu, Senior Principal Software Engineer, and John Cenzano-Fong, Senior Data Engineer, LendUp
LendUp builds technology, credit products and educational experiences for the 56% of the US population who are shut out of, or mistreated by, mainstream banking because of poor credit or income volatility. We want to provide anyone with a path to better financial health.
As a fintech, we have to act a lot more like an enterprise than a typical startup of our size. Our product line currently includes credit cards, a variety of loans, and more. We’re heavily regulated and in order to continue to rapidly expand our market share, we work with many external partners, each with a separate contractual agreement to provide them with various financial information. We should also mention that our product-specific predictive modeling is an important ingredient in our secret sauce. The glue that allows us to respond to all of these challenges is precise data and metadata. And we’re not alone—lots of companies face similar challenges.
Why precision matters
A lot of an organization’s energy can go into investigating and debating different teams’ interpretations of data. These inconsistencies can also hobble decisionmaking and your ability to confidently communicate to the outside world. For example, when we speak with investors, we want to be able to identify the net present value of customers and the components that go into that value based on the products they use.
What makes precision difficult
Precision sounds great and obvious, but it’s hard. Most data we want to use for analytics originates from a production system optimized for online processing, not reporting. As a data team we don’t have control of upstream systems, and in a dynamic company like ours, an understanding of data structures and content can quickly become outdated. Key details can be overlooked or system idiosyncrasies glossed over. Transforming this data with an incomplete understanding or simply making mistakes along the way are obstacles to precision.
A Practical Path to Precision
To tackle this we focus primarily on a handful of items. Each approaches issues from a different angle:
● Awareness: We work collaboratively with other teams to maintain an up to date understanding of our data landscape.
● Tools: We’ve adopted and built tools to assist in data movement, discovery, quality and security. We measure the data and use of these tools so everything is auditable.
● Processes: Guardrails for working with data help. We want documented and well-publicized processes so the tools and best practices we’ve come up with through much thought and experience are consistently used.
● Validation: Continuous monitoring and verification of assumptions for the above three step, are an important feedback loop. Are we still in touch? Are tools working the way they should? Are processes still appropriate?
● Data Management and Governance: Ensuring that there is strong ownership and change management associated with data helps increase quality, reduce risk and encourage usage.
● Suitability: We develop datasets tailored to the use case, technical skills, and security considerations of different audiences in the company, such as finance vs. compliance.
Understand the Data Landscape
At LendUp, we lean heavily on AWS Redshift, our primary data source for analytics. To keep things organized, we split our data into bands of schemas based how refined the data is. We have a few conventions to make governance easier:
● Schemas that contain raw data coming from outside Redshift are prefixed with “src” (for source).
● Derived datasets we consider building blocks for analysis get stored in schemas prefixed with “ods” (operational data store, a pretty standard data warehousing term).
● Carefully-curated datasets that are ready to dump straight into a report with no manipulation are stored in “mart” (another warehousing term) schemas.
● A schema called “trans” for transient transformations, which store our intermediate work products and help troubleshoot transformations that had issues.
● We provide views in the public schema to create a simplified interface for users and a layer of abstraction to make operations easier.
We also align user access control (for auditability) and Redshift WLM (workload management) with these fine-tuned schemas. It enables us to integrate our data discovery tool with Redshift and expose relevant metadata to different stakeholders, and dynamically manage computational resources based on various access patterns.
Precision Takes Effort
We’ve learned a lot along the way, including that we’ll never be “finished.” Whenever you’re dealing with precise data, issues will come up. But we feel confident that our approach is serving us well: as we’ve steadily refined our processes, we’ve been running into fewer data challenges. Moving forward we’d like to automate more, for example automating the creation of some baseline CheckUp tests so that we increase coverage and quickly identify data issues. We’re also taking steps to further centralize S3 as our primary data lake, which will give us more operational flexibility in the future, while still using Redshift as the primary focal point for data consumption. It’s all quite important and interesting—but also a lot of work—so it’s worth mentioning we’re hiring!
Learn more about what we’re doing at LendUp: