Canopy Uses Machine Learning to Automate Financial Statement Processing on AWS


For individuals with financial assets across multiple sources, maintaining a single, comprehensive view of their net worth can be time-consuming, as they will need to manually track and compile their financial holdings. This is a constant challenge for high-net-worth individuals, who tend to have a more diversified asset allocation.

Founded in 2013 in Singapore, Canopy aims to resolve this issue. Canopy provides high-net-worth individuals with a consolidated view of their varied financial holdings, by analyzing their financial statements, extracting, and collating relevant information onto a single dashboard. With Canopy’s platform, high net-worth individuals can easily keep track of their assets while comparing financial performance, strategy, and market timing with their peers.

As an Amazon Web Services (AWS) cloud-native platform, Canopy had automated much of its day-to-day operations. However, it was still analyzing financial statements manually, and wanted to automate this process with machine learning (ML) and optical character recognition (OCR) to make it more efficient.

“Applying machine learning to any data analysis is a complex undertaking—Amazon SageMaker uses ML to automatically extract text and data, going beyond simple OCR and enabling us to automatically process nearly 100,000 financial documents to date,” says Amit Gupta, chief technology officer, Canopy.

Financial data analysis graph showing search findings. Selective focus. Horizontal composition with copy space.

AWS has helped us get our machine learning capabilities to a position where we can process months’ worth of data in days — if we saw a ten-fold increase in the number of financial documents we had to process for clients tomorrow, we can easily meet that. We now have greater freedom to expand our business, and that is exactly what we plan to do.”

Amit Gupta
Chief Technology Officer, Canopy

Moving Forward with Machine Learning

When it first started operations, Canopy’s data team would manually scan through a customer’s financial documents from multiple sources. Canopy connects to approximately 400 custodian banks and would receive data in various formats, including application programming interface (APIs), data feeds, reporting services, and in Society for Worldwide Interbank Financial Telecommunications (SWIFT) format.

The team would also receive customer transaction statements in emails, Excel files, Portable Document Format (PDF) and scanned images, all of which made analyzing customer data a time-consuming and costly process. Canopy embarked on a journey to automate the process and make its business future-ready.

“We were spending hundreds of menial hours, every week, processing financial statements, which was not sustainable for business growth. We began experimenting with open-source ML models on our own, and within a year and a half, we managed to semi-automate the processing of our clients’ financial data,” says Gupta.

Soon after, Canopy hit a wall in its automation journey—the team had to continuously update its ML models to recognize and process new information in 20 percent of the financial records received monthly. Even though the team was spending less time analyzing customer data, it now had to focus on data processing and improving the data quality for the ML models, which took away time from managing the client investments and relationships.

With its previous setup, Canopy could not retrain the ML models while they were in use, and resorted to working on weekends, to minimize downtime on its platform as much as possible—the retraining process could take up to 48 hours a week. Canopy turned to AWS, to advise it on how this process could be streamlined, and to improve its OCR capabilities.

“We started with asking if the process of retraining our ML models could be fully automated—this is where AWS’ counsel proved invaluable,” Gupta says. “The AWS team pointed us in the right direction with Amazon SageMaker, and guided us during its implementation to ensure that we were always supported.”

Amazon SageMaker enabled Canopy to efficiently develop its ML models and improve its OCR capabilities without having to invest in hiring more data engineers—the solution allows Canopy to consolidate the building, training, and deployment of ML models on one platform. SageMaker will update ML models automatically, whenever it discovers new information while parsing through financial records.

Prepared for the Future

With its ML capabilities, Canopy now processes 2,000 client financial records a month, allowing its data team to focus on product innovation, and helping to achieve a 300 percent growth in the business. It now serves thousands of clients and has $120 billion assets under management as of 2021.

The company is looking to scale to meet a ten-fold increase in user demand, now that it has streamlined its data processing with AWS.

Looking ahead, Canopy plans to expand its operations into the United States in 2021, and has set the target of doubling its assets under management by end of 2021. The company intends to engage AWS Managed Services (AMS) for greater assistance with its backend operations, to support its growth plans.

“AWS has helped us get our ML capabilities to a position where we can process months’ worth of data in days — if we saw a ten-fold increase in the number of financial documents we had to process for clients tomorrow, we can easily meet that. We now have greater freedom to expand our business, and that is exactly what we plan to do,” concludes Gupta.

To Learn More

 To learn more, visit

About Canopy

Founded in 2013, Canopy is an asset aggregator platform for high-net-worth individuals. Canopy provides its clients with a single view of their financial holdings across asset class and markets, by processing their financial statements and consolidating relevant information from them, in its platform’s customer interface. It counts Credit Suisse as a flagship client and investor.

Benefits of AWS

  • Able to digitize PDFs into APIs at scale
  • Confidence in scaling to meet ten-fold user demand
  • Ability to simultaneously train and deploy machine learning models under one platform

AWS Services Used

Amazon SageMaker

Amazon SageMaker helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML.

Learn more »

Get Started

Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.