AWS Architecture Blog

Audra Devoto

Author: Audra Devoto

Audra is a Data Scientist with a background in metagenomics and many years of experience working with large genomics datasets on AWS. At Metagenomi, she builds out infrastructure to support large scale analysis projects and enables discovery of novel enzymes from MGXdb.

AWS architecture showing protein vector processing workflow with ECR, Lambda, and LanceDB

A scalable, elastic database and search solution for 1B+ vectors built on LanceDB and Amazon S3

In this post, we explore how Metagenomi built a scalable database and search solution for over 1 billion protein vectors using LanceDB and Amazon S3. The solution enables rapid enzyme discovery by transforming proteins into vector embeddings and implementing a serverless architecture that combines AWS Lambda, AWS Step Functions, and Amazon S3 for efficient nearest neighbor searches.