Overview
Life sciences teams generate data faster than they can organize it. Every sequencer, flow cytometer, mass spectrometer, and plate reader produces files that end up on local drives, instrument computers, or ad-hoc S3 folders — invisible to the rest of the organization.
Connected Lab solves this by turning raw S3 objects into governed, searchable data packages.
HOW DATA MOVES
Quilt does not move data. Raw instrument files are transferred to Amazon S3 by your existing infrastructure — AWS Storage Gateway for on-premises instruments, AWS DataSync for batch transfers, direct S3 upload via CLI or SDK, or any pipeline that writes to S3. Quilt's role begins once data lands in the bucket.
HOW QUILT PROCESSES NEW DATA
When objects arrive in an S3 bucket managed by Quilt, the platform's event-driven indexing system (SNS/SQS notifications) detects the new files and indexes them in Elasticsearch — both shallow indexing (file name, size, metadata) and deep indexing of supported file contents (CSV, Parquet, PDF, Jupyter notebooks, FASTQ, and more). Objects become searchable in near-realtime.
Scientists or pipelines then create Quilt Packages — immutable, versioned bundles of related files with metadata — via the Python SDK, the Quilt web catalog, or automatically via the Nextflow plugin. Metadata Workflows (JSON Schema-based validation gates) enforce that every package includes required labels, controlled vocabularies, and documentation before it can be pushed to a bucket.
WHAT SCIENTISTS EXPERIENCE
Scientists open the Quilt web catalog, run a metadata or full-text search, and find the exact dataset they need in seconds. They can preview files inline (images, CSVs, notebooks, PDFs, FASTQ, BAM), browse package version history, and download or programmatically access data via the Python SDK. No S3 console navigation. No asking bioinformatics for file paths.
WHAT IT INCLUDES
- Event-driven indexing of S3 objects via SNS/SQS with near-realtime Elasticsearch updates
- Deep content indexing for CSV, Parquet, JSON, PDF, PPTX, Jupyter notebooks, and more
- Immutable, versioned Quilt Packages with cryptographic hash verification (SHA-256)
- Metadata Workflows: JSON Schema validation gates that enforce required metadata, controlled vocabularies, and README files before packages are accepted
- Web catalog with inline file preview, search, and package browsing
- Python SDK (quilt3) for programmatic package creation, access, and automation
- Nextflow plugin (nf-quilt) for automatic packaging of pipeline outputs
- Package promotion across buckets (raw → staging → production) with per-bucket workflow rules
- AWS IAM-enforced access control with role-based policies managed in the Quilt admin panel
- Full audit trail via AWS CloudTrail integration
- Deploys as a CloudFormation stack in your AWS account (ECS Fargate, Elasticsearch, RDS, Lambda)
PROVEN AT SCALE
Resilience (biomanufacturing): Replaced 3 legacy platforms. 200+ scientists onboarded in month one. NGS processing cut from 7–10 weeks to under 1 hour. 3,000+ hours saved per month. $3M cost savings. 7.6x ROI.
Tessera Therapeutics (gene writing): Over 1PB centralized. Nextflow plugin auto-packages every pipeline run. 3x faster NGS analysis, 50% reduction in data retrieval time, >80% daily usage across scientists and data engineers.
Entact Bio (precision medicine): 90%+ reduction in lookup time. 3x dataset reuse. Audit-ready for IND filing.
Inari Agriculture: 50% faster retrieval. Scientists self-serve; data requests to IT eliminated.
Use cases
Cloud-Optimized Research Datasets
Life sciences instruments generate terabytes of research data that lands in S3 in raw, unstructured form. Quilt turns these files into versioned, metadata-enriched packages indexed in Elasticsearch — making datasets searchable, reproducible, and governed from the moment they arrive. Inari cut retrieval time by 50%; Tessera manages over 1PB of cloud-optimized genomics data with 3x faster analysis.
Plant and Animal Genomics
Genomics pipelines produce massive sequencing outputs that quickly become unfindable in S3. Quilt's Nextflow plugin automatically packages pipeline results with metadata at the end of every run, and Metadata Workflows enforce required labels before data is accepted. Inari Agriculture uses Quilt to manage versioned genomics packages across research teams, with scientists self-serving instead of filing data requests.
Data Catalog
Quilt is a scientific data catalog that indexes S3 objects in near-realtime via SNS/SQS event-driven notifications. Scientists search across all managed buckets using metadata facets, full-text content search (CSV, Parquet, PDF, notebooks), and Elasticsearch queries — then browse, preview, and access results directly from the web catalog or Python SDK without navigating S3 folder structures.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Products included
Features and programs
Financing for AWS Marketplace purchases
Pricing
Custom pricing options
Integration guide
Quilt deploys as a CloudFormation stack in the customer's AWS account, running on Amazon ECS (Fargate) with Amazon S3 as the data layer, Elasticsearch for metadata and content indexing, Amazon RDS (Postgres) for user management, and Amazon Athena for SQL queries over package metadata. Raw instrument data is transferred to S3 via AWS Storage Gateway, DataSync, or direct upload — Quilt does not move data. When objects land in a Quilt-managed S3 bucket, SNS/SQS event notifications trigger near-realtime indexing in Elasticsearch, making files searchable by metadata and content. Scientists or pipelines then create versioned Quilt Packages via the Python SDK (quilt3), the web catalog, or the Nextflow plugin (nf-quilt), with Metadata Workflows (JSON Schema validation) enforcing required labels and controlled vocabularies before packages are accepted.