AWS Big Data Blog

How Taxbit achieved cost savings and faster processing times using Amazon S3 Tables

In this post, we discuss how Taxbit partnered with Amazon Web Services (AWS) to streamline their crypto tax analytics solution using Amazon S3 Tables, achieving 82% cost savings and five times faster processing times.

Taxbit is a leading tax compliance suite serving cryptocurrency exchanges, digital platforms, and government agencies, generating more than 100 million forms for users and reconciling more than 500 billion digital asset transactions. The suite powers a complex environment that handles real-time pricing data from 29 cryptocurrency exchanges covering over 10,000 digital assets.

Recently, Taxbit experienced challenges with their pricing data infrastructure. As data volumes continued to expand, infrastructure costs rose sharply, putting pressure on operational budgets. At the same time, the system struggled to efficiently ingest the growing number of pricing data points, creating persistent bottlenecks in their data pipeline. These technical limitations led to customers missing data and experiencing slow processing times, leading to dissatisfaction. In addition to these operational challenges, Taxbit has strict regulatory compliance requirements to be considered when designing solutions. This combination of issues led Taxbit to modernize their pricing data infrastructure with a focus on helping to meet regulatory standards.

“During peak workloads, our solutions process hundreds of millions of digital asset transactions across blockchain and cryptocurrency exchanges,”

– says Clark Roberts, CTO at Taxbit.

“Our legacy database architecture was becoming a bottleneck, leading to increased costs and slower response times for our enterprise and government customers.”

Solution overview

Taxbit’s modernized architecture uses Amazon S3 Tables with Apache Iceberg as the foundation, combined with purpose-built AWS services for data ingestion, processing, and analytics. The solution processes real-time pricing data from 29 cryptocurrency exchanges including over 10,000 digital assets. This architecture is shown in the following diagram.

This AWS cloud architecture diagram illustrates a comprehensive data pipeline for processing digital assest market data.

The data pipeline architecture uses AWS services to deliver a comprehensive solution. At its foundation, Amazon S3 Tables provides the scalable storage infrastructure necessary for managing large volumes of pricing data. For data processing and transformation, the solution combines Amazon EMR and AWS Glue, handling both extract, transform, and load (ETL) operations and asynchronous API requirements efficiently.

Real-time data handling is managed through Amazon Kinesis, enabling streaming of pricing updates. AWS Lambda functions perform multiple tasks, including periodic polling of vendor APIs, transformation of streaming data, and data enrichment. The orchestration of these components is managed by AWS Step Functions, helping to ensure coordination of data workflows. Completing the architecture, Amazon Athena provides query capabilities, supporting both synchronous APIs and one-time analytical queries. This approach creates a scalable system built to handle both real-time and batch processing workflows while maintaining high performance and reliability.

Data ingestion layer

The ingestion layer operates through two key components: API integration and stream processing. The API integration uses Lambda functions to systematically poll multiple external APIs. These polling operations are orchestrated by Amazon EventBridge, which manages the scheduled data collection tasks. Additionally, WebSocket listeners maintain continuous connections to capture real-time price updates as they occur.

On the stream processing side, Amazon Kinesis Data Streams serves as the backbone for handling real-time data ingestion at scale. As data flows in, Lambda functions perform transformations and enrichment operations to prepare the data for downstream use. Throughout this process, custom validation checks are applied to help ensure the quality and completeness of the data, helping to maintain the integrity of the pricing information pipeline.

Data storage layer

At the storage layer, Taxbit uses Amazon S3 Tables because of its optimized storage format designed for analytical queries. Amazon S3 Tables is designed to automatically handle table optimization and compaction, helping to streamline data management processes. The system also incorporates time-travel capabilities, allowing Taxbit to meet audit requirements and their need for historical data analysis.

The data organization strategy is designed to maximize efficiency and accessibility. Data is systematically partitioned by date and exchange, allowing for targeted data retrieval and improved query performance. The implementation of columnar storage further enhances query efficiency by minimizing unnecessary data scans. Additionally, version control mechanisms are in place to maintain clear data lineage, enabling precise tracking of data changes and transformations over time.

Analytics layer

At the analytics layer, the query engine forms the foundation, using Amazon Athena to facilitate flexible ad-hoc analysis of the pricing data. This is complemented by Presto-based queries that handle complex aggregations efficiently. The system includes carefully crafted execution plans optimized for common query patterns, designed to provide consistent and reliable performance.

To maximize efficiency, the analytics layer incorporates several key performance optimizations. The system uses an Athena reuse query result to minimize redundant processing and parallel query execution capabilities to handle multiple simultaneous requests effectively.

Security and compliance

The data protection strategy implements multiple layers of security, starting with AWS Key Management Service (AWS KMS) encryption for all data at rest. This is complemented by TLS encryption for data in transit, helping to secure data movement throughout the system. Access to data and resources is controlled through AWS Identity and Access Management (IAM), providing fine-grained permissions that enforce the principle of least privilege.

The audit trail component provides comprehensive monitoring and compliance capabilities. AWS CloudTrail logging captures detailed records of system activities, enabling thorough security analysis and incident investigation. Data lineage tracking maintains clear records of data movement and transformations throughout the pipeline. These features are augmented by robust compliance reporting capabilities, helping the system demonstrate adherence to regulatory requirements and internal governance policies. Together, these security controls create an environment that protects sensitive data, maintains transparency, and provides accountability.

Business impact

Most notably, Taxbit achieved an 82% reduction in storage infrastructure costs, while simultaneously delivering processing speeds five times faster than their previous architecture. Data completeness for calculations achieved approximately 99.99% accuracy and the workload can now successfully support over 10,000 digital assets.The benefits extended beyond these quantitative improvements. Customer experience has improved, with transaction pricing times shrinking from hours to minutes. Higher throughput capabilities increased operational efficiency, enabling faster data loading while reducing compute costs. The new architecture also established a scalable foundation that provides faster data access and the flexibility to expand into new markets. The modern infrastructure has also enabled Taxbit to pursue new product offerings by supporting advanced analytics and real-time insights that were previously unattainable. These capabilities created new business opportunities and revenue streams that weren’t possible under the constraints of the legacy system.

Conclusion

Taxbit’s implementation of Amazon S3 Tables has transformed their cryptocurrency tax compliance solutions, delivering 82% cost savings and five times faster processing speeds. The modernized architecture, combining Amazon EMR, AWS Glue, Amazon Kinesis, and Lambda, now processes transactions in minutes instead of hours. Additionally, the architecture has helped Taxbit maintain approximately 99.99% data accuracy across more than 10,000 digital assets. Beyond operational improvements, this transformation has enabled new product offerings and real-time analytics capabilities. By partnering with AWS, Taxbit addressed their scaling challenges and built a foundation for continued innovation in the digital asset space.

For more information, see Amazon S3 Tables.


About the authors

Larry Christensen

Larry Christensen

Larry is a Principal Engineer at Taxbit based in the Salt Lake City area. He’s spearheaded many architectural, big data, and AI transformations across Taxbit.

Washim Nawaz

Washim Nawaz

Washim is an Analytics Specialist Solutions Architect at AWS with extensive professional experience building and tuning data warehouse and data lake solutions. He is passionate about helping customers modernize their data platforms with efficient, performant, and scalable analytics solutions. Outside of work, he enjoys watching sports and traveling.

Derek Ziehl

Derek Ziehl

Derek is a Senior Technical Account Manager (TAM) at AWS. He has a background designing large-scale network systems and managing cloud migrations. As a TAM he enjoys enabling customers to run resilient, optimized workloads on AWS.

Pranjal Gururani

Pranjal Gururani

Pranjal is a Solutions Architect at AWS based out of Seattle. Pranjal works with various customers to architect cloud solutions that address their business challenges. He enjoys hiking, kayaking, skydiving, and spending time with family during his spare time.