AWS Database Blog
How Kajabi optimized costs with Amazon Aurora upgrades
Kajabi is an online business platform that helps experts, coaches, and educators package their expertise into digital products and online courses. It removes the typical headaches of website development, payment processing, and marketing automation. Unlike other solutions that require multiple subscriptions and integrations, Kajabi provides everything under one digital roof. It offers website building, course creation, email marketing, and sales funnels. This allows knowledge entrepreneurs to focus on what they do best: sharing their expertise with audiences around the globe.
As Kajabi’s platform grew to serve creators across 22 global regions, their team faced a challenge managing a large database infrastructure. The database totaled 39 terabytes (TB), divided into 12 TB in tables and 27 TB in indexes, with high-volume tables accumulating several gigabytes during peak hours.
Kajabi needed to upgrade their critical databases running on Aurora PostgreSQL v12 because the version was reaching end of support. They also wanted to achieve cost optimization with Amazon Aurora I/O-Optimized billing. They could not tolerate an extended downtime as it would disrupt countless online courses, coaching sessions, and digital product sales happening around the clock. Traditional database upgrade approaches proved inadequate for Kajabi’s scale, dynamic workload patterns, and near-zero downtime requirements. The situation demanded a creative solution.
In this post, we show you how Kajabi navigated complex Aurora PostgreSQL database upgrades and achieved an 80.53% cost reduction through strategic planning and technical execution. You’ll discover their hybrid approach combining Amazon Aurora blue/green deployments with PostgreSQL native replication. You’ll also learn about their implementation of Aurora I/O-Optimized storage and the key lessons from their journey. Whether you’re managing large-scale databases or planning your own upgrade path, Kajabi’s experience offers valuable insights. You’ll see how to balance performance requirements with cost optimization while maintaining continuous availability.
Amazon Aurora PostgreSQL: Kajabi’s database solution of choice
Amazon Aurora PostgreSQL combines the performance and reliability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. You can choose from two database cluster storage configurations in Amazon Aurora, each with different cost and performance characteristics.
- Aurora I/O-Optimized is designed specifically for I/O-intensive applications.
- Aurora Standard is designed for applications with moderate I/O needs and offers cost effective pricing.
In existing clusters, you can switch the storage configuration to Aurora I/O-Optimized once every 30 days or switch back to Aurora Standard anytime.
Migration challenges
Kajabi’s database migration presented multiple technical challenges that required ingenuity and precision. The team faced massive data volumes with near-zero downtime requirements. A critical constraint for Kajabi platform serving thousands of users expecting continuous service. A high-volume email analytics table posed a specific problem with its daily fluctuation patterns. It accumulated gigabytes of data during peak hours before returning to baseline overnight. This required a replication strategy as dynamic as the data itself. They considered the approaches listed in Upgrade strategies for Amazon Aurora PostgreSQL and Amazon Relational Database Service (Amazon RDS) for PostgreSQL 12. In the following section, we summarize these strategies, highlight the winning method, and explain why the others wouldn’t work.
Kajabi’s massive database was on Amazon Aurora PostgreSQL v12.x, with the standard support ending on February 28, 2025. They took a phased approach with this upgrade to minimize extra complexities. They decided to move to an intermediate version 13.x and then to 15.x. They also wanted to avoid staying on extended support, which constrained their upgrade timelines.
Methods for upgrading Amazon Aurora PostgreSQL major versions
Kajabi’s team carefully evaluated several potential upgrade strategies, each with its own trade-offs and considerations. Their evaluation revealed why traditional methods often fall short for modern, high-availability applications.The seemingly straightforward in-place upgrade method was the first to be ruled out. Upgrading the database directly on the existing cluster would require over an hour of downtime because of the database size, and the upgrade applies to the entire cluster simultaneously. For Kajabi’s 24/7 global platform serving thousands of entrepreneurs, this extended downtime was simply unacceptable.
The team explored database snapshot upgrades, which includes creating a cluster snapshot, restoring to a new cluster with a newer engine version and validating the upgraded version. However, given Kajabi’s massive database (39 TB), the restoration time alone would exceed their downtime constraints, making this approach impractical.
Out-of-place upgrades using AWS Database Migration Service (AWS DMS) initially seemed promising with its change data capture capability, but technical limitations quickly emerged. AWS DMS could not handle certain data types, particularly timestamp with time zone, which was crucial for Kajabi’s global operations. The additional AWS DMS costs also made this option unfeasible.
Logical replication with pglogical presented its own challenges. While it offered continuous replication capabilities, its complex setup requirements and inability to replicate sequences, data definition language (DDL), and large objects posed significant blockers for Kajabi’s workload.
The winning strategy emerged as a hybrid approach combining Aurora blue/green deployments with PostgreSQL native replication. The primary advantage of this approach was that by handling the cloning, upgrade, and replication, Kajabi engineers were able to distribute the load across replication slots. This method offered several advantages:
- It created a fully managed staging environment for thorough testing
- Used the Aurora PostgreSQL fast database cloning feature
- Enabled production validation while maintaining the safety of the current environment
- Provided the flexibility to handle Kajabi’s specific workload patterns
By implementing this strategy with carefully configured replication slots and precise traffic management through PgBouncer, Kajabi achieved their upgrade goals while maintaining the high availability their customers demanded. This approach demonstrated that sometimes the best solution is to combine methods thoughtfully to address specific business needs.
The migration team balanced two key objectives: achieving cost savings through Aurora I/O-Optimized storage and maintaining the performance Kajabi’s end customers (heroes) relied on. Technical complexities of moving between Aurora PostgreSQL versions created additional hurdles. Compatibility issues required a multi-stage migration path. These obstacles demanded careful planning and precise execution.
Solution overview
Their primary production cluster became the heart of the system, supported by strategically deployed read replicas that distributed the query load efficiently. By implementing PostgreSQL native replication with replication slots allocated to distribute load across the highest traffic tables, they achieved data consistency while maintaining performance. Kajabi balanced tables across multiple replication slots. For high-traffic tables, they isolated each into its own slot and publication. For example, for a database size of 39 TB, this resulted in 6 to 8 slots. For tables experiencing particularly high traffic, dedicated replication slots provided an extra layer of stability. Without them, the high-traffic tables would backlog the other replications. Connection management, often overlooked but critical for performance, was addressed through PgBouncer implementation. PgBouncer is a lightweight connection pooler for PostgreSQL that manages database connections efficiently by:
- Connection pooling: Maintaining a pool of connections that can be reused by clients, reducing the overhead of establishing new connections
- Connection limits: Helping prevent database overload by limiting the maximum number of connections
- Connection modes: Supporting transaction pooling, session pooling, and statement pooling to match different application needs
It also helps deliver performance benefits with reduced resource consumption and higher throughput. Throughout this architecture, the existing monitoring systems were integrated to maintain visibility across the entire solution.
PgBouncer was part of the application for load balancing purposes, and in this implementation, it helped to incrementally execute the cutover process.
Figure 1 shows the architecture diagram before the upgrade, where Sidekiq worker Pods and web Pods indicate Amazon Elastic Kubernetes Service (Amazon EKS) pods.

Figure 1: Live traffic before upgrading from PostgreSQL 12 to PostgreSQL 13 (and later from version 13 to version 15)
During the cutover window, the team disabled traffic from live users by using a Cloudflare maintenance page and paused workers. After the replication fully synchronized from PostgreSQL 12 to 13, they used PgBouncer to cut over to the new cluster. When the new database was fully operational, they disabled the maintenance page, and live traffic began using the PostgreSQL 13 cluster (see Figure 2).

Figure 2: Live traffic using PostgreSQL 13 database cluster after a successful replication
Upgrade strategy
The team developed a two-phase upgrade approach:
Phase 1: Initial upgrade and cost optimization
- Upgrade to an intermediate version (PostgreSQL v13) supporting Aurora I/O-Optimized storage.
- Establishment of baseline performance metrics including CPU, I/O, slow / inefficient / costly query tracking, replication lag (must be low since reads are sent to replicas), shared buffer usage, vacuum frequency and timings, read/write latencies, deadlock monitoring, total I/O and more.
Phase 2: Feature enhancement and modernization (repeat as needed to reach the latest supported version)
- Upgrade to PostgreSQL v15 for performance optimization
- Implementation of advanced PostgreSQL features
- Enhancement of replication infrastructure
During the upgrade, Kajabi balanced cost considerations with their performance requirements. Rather than overprovisioning resources across database components, they allocated dedicated resources only where necessary. This approach helped them maintain their performance service level agreements (SLAs) while optimizing infrastructure costs. The team also explored future optimization opportunities, including table partitioning to improve performance and reduce storage and maintenance costs for their largest tables. Other key measures were optimization of vacuum and maintenance windows and fine-tuning of auto-vacuum parameters.
Phase 1: Initial upgrade and cost optimization
In this stage, Kajabi focused on upgrading their databases to an intermediate version that supported Aurora I/O-Optimized storage. They upgraded their databases, bringing them to a baseline where they could take advantage of Aurora I/O-Optimized billing. Kajabi’s choice for Aurora I/O-Optimized storage configuration was justified after an extensive evaluation and estimated savings of 82%. The estimate was for an Aurora database cluster with an instance class r6g.16xlarge, storage size 30 TB, and total IOPS usage of 361,631 million I/Os. The estimates were conducted according to Estimate cost savings for the Amazon Aurora I/O-Optimized feature using Amazon CloudWatch.
The entire process was tested in lower and staging environments before production implementation. As a result, Kajabi upgraded their databases with minimal downtime. The actual cutover took only a few minutes. More importantly, it enabled Aurora I/O-Optimized billing configuration, resulting in cost savings. The following cost categories were tracked to determine realized savings from November 2024 to February 2025:
- Aurora:StorageIOUsage
- InstanceUsageIOOptimized:db.r6g.16xl
- Aurora:StorageUsage
- InstanceUsage:db.r6g.16xl
- Aurora:BackupUsage
- Aurora:IO-OptimizedStorageUsage
While (2) has increased from $0 to $13,849, (1) reduced from $90,500 to $4.51, resulting in a cost saving of 80.53% (the total spend across the observed categories reduced from $106,536 in November 1, 2024 to $20,733 in February 1, 2025).
Phase 2: Feature enhancement and modernization
Building on their initial success, Kajabi embarked on the second phase, upgrading their Aurora PostgreSQL database from version 13.16 to 15.10. They skipped version 14 because the required downtime for each upgrade was something Kajabi couldn’t afford. PostgreSQL v15 offered a parallel vacuum feature. Kajabi ran nightly vacuums on their high-traffic tables which were auto-vacuuming constantly through high-traffic hours (daytime). This comprehensive upgrade encompassed their main site database and two critical payment processing systems that formed the backbone of their ecommerce operations.
Drawing from their first upgrade’s success, the team implemented a dual-strategy approach. For their main production database, they deployed PostgreSQL native replication with a carefully calibrated configuration of 6-8 replication slots. Kajabi dedicated specific replication slots to their highest-traffic tables, effectively resolving the periodic write-ahead log (WAL) backup accumulation issues they had previously faced. To optimize the system, the team ran vacuum analyze before cutover, implemented DDL migration freezes to prevent divergence, and used PgBouncer for precise traffic management during the transition.
Even though Kajabi was comfortable using Aurora blue/green deployments, for the primary database it wouldn’t work. A few tables with large disk size and high traffic would back up the WAL log past the point where a single publication/subscription could keep up with the volume of change generation.For their payment-related databases, Kajabi employed an Amazon Aurora blue/green deployment strategy. The disk size and traffic were manageable, so they could keep up with traffic using a single replication slot. The process began methodically: cloning the current production database, upgrading the clone to PostgreSQL 15.10, and establishing logical replication from the production database to the clone. Each step followed a detailed event playbook, from systematic table vacuuming on the clone to the coordinated cutover using PgBouncer.
This carefully orchestrated approach resulted in a smooth upgrade that minimized downtime while maintaining data integrity. With PostgreSQL 15.10, Kajabi could use the latest features and optimizations, enhancing their platform’s performance and efficiency for their users.
The PostgreSQL v13 to v15 cutover process completed in under 10 minutes of downtime, about 5 minutes faster than the previous upgrade because of lessons learned and added automation. The total cutover process took about 2 hours including post-upgrade verification. There was no performance degradation or fallout because of query degradation following the upgrade. Kajabi evaluated existing load testing tools and analyzed the outcome of pgreplay and similar tools to replicate production load. They also monitored slow or inefficient queries and database metrics during the upgrade to address any query degradation. In the future, Kajabi would like to evaluate the new database under real-world query load, as a valuable addition to the upgrade process.
Through this phased journey, Kajabi demonstrated a strategic approach to database management that balanced performance with cost optimization.
Lessons learned: Insights from the migration frontlines
Kajabi’s migration journey was more than a technical exercise. It was a valuable learning experience with insights applicable beyond their specific use case. Here are the key lessons:
- Customization is key. Kajabi’s tailored replication strategy, with dedicated slots for high-traffic tables addressed their unique data patterns effectively. This approach allowed them to monitor replication slots and traffic patterns alongside database sizes. Kajabi also used performance insights that helped identify the high volume write queries. After a few trials, they determined which databases to group together and which ones to isolate during the replication process. One of the biggest wins following the upgrade from PostgreSQL 12 to 13 was the introduction of manual parallelized nightly vacuum jobs across tables with high traffic. Many tables previously were in a state of constant auto-vacuuming throughout the day, especially during peak hours. The nightly vacuum jobs helped alleviate a lot of pressure on high-traffic tables. Many of these tables are only vacuuming overnight during low-traffic hours.
- The power of thorough planning. Kajabi’s success hinged on meticulous preparation. By testing extensively in lower environments and creating detailed playbooks, they minimized risks and surprises during migrations.
- Balancing act of cost and performance. Through strategic resource allocation, Kajabi optimized costs without sacrificing performance. This demonstrated that cost-cutting and performance improvement aren’t mutually exclusive goals.
- The virtue of a phased approach. By breaking the process into stages, Kajabi managed risks more effectively and gained valuable insights that informed subsequent steps.
The experiences from Phase 1 resulted in creating a series of scripts that targeted:
- adding replication slots
- initiating replication
- verifying that there are no data gaps
- finalizing the cutover to the new cluster
Previously manual steps were automated. These included verifying that PgBouncer was not routing to the old cluster, swapping the PgBouncer configuration to the new cluster, and more. This automation helped reduce the downtime from approximately 15 minutes for the PostgreSQL 13 upgrade to under 10 minutes for the PostgreSQL 15 upgrade.
- Collaboration breeds success: Close cooperation with AWS account and specialist teams proved crucial in overcoming challenges. This underscored the importance of using external expertise when navigating complex technical landscapes.
These lessons guided Kajabi through their migration and prepared them for future database challenges.
Conclusion
In this post, we shared Kajabi’s journey upgrading their existing Aurora PostgreSQL databases in a phased, customized approach. Through customized solutions and a phased approach to upgrades, Kajabi has achieved cost savings (80.53%) and established the framework for future growth and innovation. They’ve demonstrated that it’s possible to optimize costs without compromising performance or reliability. Many growing tech companies struggle to achieve this balance.In addition to upgrading to Aurora PostgreSQL 16, Kajabi is planning to re-architect one of their largest and highest throughput tables, allowing them to apply Aurora blue/green deployment in future upgrades.