How InterWiz reduced AI costs by 90% with Amazon Bedrock

By: Syed Sabih ur Rehman, Pre-Sales Solutions Architect – Emumba
By: Ore Okebukola, Partner Solutions Architect – AWS

Emumba

If you’re building an AI-powered software as a service (SaaS) product, you’ve likely watched your AI infrastructure costs grow faster than your revenue. What starts at $500 per month during prototyping can reach $50,000 per month in production. In this post, you’ll learn how Emumba, an AWS Advanced Tier Services Partner, helped InterWiz cut AI costs by 90%, improve response times by 55%, and maintain 99.9% uptime with no service disruption by migrating their AI workloads to Amazon Bedrock.

When you’re locked into a single AI provider, rising costs limit growth and compress margins. You lose the ability to evaluate better-performing models. You can’t tune costs independently per workload. When competitors adopt newer models, you’re left waiting on your provider’s release schedule. The result is a difficult tradeoff: raise prices and risk losing customers or absorb costs and sacrifice profitability. Migrating to Amazon Bedrock resolved that tradeoff for InterWiz. InterWiz is an AI-powered recruitment company conducting more than 1,000 interviews monthly. Rising AI costs were blocking their path to profitable growth, and they had a tight window to address it.

The InterWiz story: From cost crisis to competitive advantage

InterWiz automates candidate screening through structured, AI-driven interviews at scale, so recruiters can focus on higher-value work. The system asks contextual follow-up questions, probes for deeper insights, and generates comprehensive candidate evaluations. InterWiz was gaining strong customer traction but faced two business constraints, a cost constraint and an architecture limitation, that limited their growth. At $0.25 per interview using GPT-4 Turbo, AI expenses consumed 40% of per-interview revenue. This made aggressive growth unprofitable. InterWiz was losing deals to competitors with lower AI costs who offered better pricing. They needed to scale to achieve venture-backed growth targets, but their infrastructure costs made scaling unsustainable. Different interview functions have different requirements. Question generation needs strong reasoning and creativity. Real-time interviews require low latency and reliable adherence to instructions. Candidate evaluation demands consistency and cost efficiency at scale. With access to only one model family, InterWiz couldn’t tune each function independently. Response latency averaging 850 milliseconds created noticeable pauses during live interviews, affecting candidate experience.

Why Emumba

InterWiz chose to work with an AWS Partner Network member rather than attempting direct implementation, a decision that proved valuable. InterWiz engaged Emumba, an AWS Advanced Tier Services Partner with the AWS Generative AI Services Competency, to lead the migration. Three factors drove this decision:

AWS Partner Network benefits – As an AWS Advanced Tier Services Partner, Emumba unlocked direct access to AWS funding programs, including proof-of-concept (POC) credits and Well-Architected reviews, plus dedicated support from AWS technical account teams and solutions architects. This reduced migration risk, kept the work aligned with AWS best practices, and lowered upfront costs.
Proven migration framework – Emumba’s systematic seven-phase methodology covered assessment, model evaluation, prompt optimization, progressive rollout, and multi-provider redundancy. Each phase included defined checkpoints and rollback controls, so InterWiz had a structured path to production with minimal disruption risk.
Amazon Bedrock expertise – Emumba’s deep experience with Amazon Bedrock helped InterWiz run a comprehensive evaluation of Claude by Anthropic in Amazon Bedrock and Meta’s LLama in Amazon Bedrock, identifying the most suitable fit for each interview function.

The migration approach

The goal was clear: reduce AI costs by at least 60%, improve response times to under 500 milliseconds, and maintain or improve interview quality while conducting thousands of live interviews without service disruption. Emumba’s seven-phase migration framework guided the transition from model assessment through full production deployment:

Collect information
Select candidate models
Prompt optimization
Model evaluation
Compare and select model
Migrate
Optimize

The following diagram illustrates this framework.

Figure 1: Emumba’s systematic seven-phase migration framework

Emumba conducted comprehensive model assessments using Amazon Bedrock, which provides access to high-performing foundation models (FMs) through a single API, with built-in capabilities for building generative AI applications with security, privacy, and responsible AI. Emumba’s team evaluated Anthropic’s Claude and Meta’s LLama models against InterWiz’s specific requirements.

Emumba designed a specialized architecture that assigned each interview function to the FM best suited for it. Anthropic’s Claude 3.5 Sonnet, a reasoning-focused FM, powered intelligent question generation. Meta’s LLama 3.3 70B, a cost-efficient FM optimized for speed, handled real-time interviews, contextual follow-ups, and candidate evaluation.

Migrating from GPT-4 Turbo introduced prompt compatibility challenges. Because Anthropic’s Claude and Meta’s LLama follow different instruction patterns, Emumba’s team rewrote and tested prompts for each model using Amazon Bedrock prompt optimization tooling, including prompt caching (which can reduce costs by up to 90% and latency by up to 85%) and Amazon Bedrock Intelligent Prompt Routing (which can reduce costs by up to 30% by automatically directing requests to the most cost-effective FM). To validate quality, Emumba’s team applied a large language model (LLM)-as-a-judge evaluation approach, using an FM to score outputs against a defined quality rubric and confirming that outputs met or exceeded baseline standards before each module moved to production.

The results

The migration followed a structured 3-month timeline. Month 1 focused on model assessment and architecture design. Month 2 prioritized progressive rollout of low-risk modules. Month 3 completed full migration with optimization. Emumba measured these outcomes over a 90-day post-migration period across production interview workflows:

90% cost reduction – AI expenses dropped from $0.25 to $0.025 per interview. As InterWiz scales from 1,000 to a projected 10,000 monthly interviews by year-end 2026, this migration will save $27,000 annually, transforming the cost-to-revenue ratio from barely sustainable to highly profitable at scale.
55% latency improvement – Response times decreased from 850 ms to 450 ms. Candidate feedback improved noticeably, with interviewees specifically mentioning the conversation felt “more natural” and “less robotic.”
99.9% uptime – The multi-provider architecture with automatic fallback maintained continuous service availability, even when primary or secondary providers experienced regional issues.

The following graphic summarizes the measurable business outcomes InterWiz achieved.

Figure 2: Business outcomes from InterWiz’s migration to Amazon Bedrock (measured over 90-day post-migration period)

“Migrating our AI stack from Azure OpenAI to Amazon Bedrock was a complex undertaking, but AWS CoE at Emumba handled it with precision and engineering depth. They built the abstraction layers, tuned our prompts, and managed rollout risks so thoroughly that we experienced no disruption, only improvement. The new setup is faster, cheaper, and far more scalable. We couldn’t have asked for a smoother transition.”

— Zishan Iqbal, CEO, InterWiz AI

The competitive impact

Beyond immediate cost savings, the migration created measurable strategic advantages. InterWiz can now profitably serve market segments previously off-limits due to pricing constraints and can invest in advanced features such as role-specific question strategies and industry-tailored evaluation criteria.As AWS adds new FMs to Amazon Bedrock, InterWiz can evaluate and integrate them within days, not months, without waiting on a provider’s roadmap. This architectural flexibility compounds as model innovation accelerates.

Three strategic takeaways for AI-powered SaaS leaders

Architectural decisions made during prototyping often become strategic liabilities at scale, and this post highlights three beneficial outcomes of the InterWiz migration:

Cost predictability at scale – The transparent pricing and multi-model flexibility of Amazon Bedrock provide a path to managing FM costs as you grow. By systematically evaluating which models deliver the most suitable cost-performance balance for specific functions, InterWiz reduced AI costs by 90% without sacrificing quality. You can apply the same approach to your own workloads.
Model choice as strategic advantage – Amazon Bedrock provides access to multiple leading FMs, including Anthropic’s Claude, Cohere Command in Amazon Bedrock, Meta’s LLama, Amazon Nova, and Amazon Titan, through a single API, so you can select the most suitable model for each job. When new FMs emerge, you can adopt them immediately.
AWS Partner Network benefits for regulated industries – For companies in regulated industries, Amazon Bedrock is in scope for System and Organization Controls 1, 2, and 3 reports and is a Health Insurance Portability and Accountability Act (HIPAA) eligible service. AWS also supports customers’ General Data Protection Regulation (GDPR) compliance efforts through data residency controls and contractual commitments.

Under the AWS Shared Responsibility Model, your organization is responsible for implementing the controls required to meet your specific compliance obligations. Working with an AWS Advanced Tier Services Partner such as Emumba provides validated expertise backed by AWS quality assurance, access to funding programs that lower financial barriers, and a path to continuous model innovation through the expanding FM catalog of Amazon Bedrock.

Looking forward

InterWiz’s journey demonstrates what is possible when an AWS Partner Network member guides a production generative AI migration: 90% cost reduction, 55% latency improvement, and a zero-downtime transition completed in 3 months. Building for multi-model flexibility from the start is one of the most consequential infrastructure decisions you can make as an AI-powered SaaS company.Companies that migrate to Amazon Bedrock are positioned to take advantage of FM innovation as it happens. Those locked into single-provider architectures carry higher switching costs as the model landscape evolves. A migration assessment is a practical first step to understanding your potential savings and risk profile.

To learn more about Emumba, refer to Emumba in AWS Partner Network. To contact Emumba, visit the Emumba AWS Partner contact page.

Emumba – AWS Partner spotlight

When you work with Emumba, you gain access to production-grade generative AI implementations, cloud migration and modernization expertise, and data strategy capabilities, backed by more than 300 certified engineers across the Bay Area, Islamabad, and Dubai. Emumba is an AWS Advanced Tier Services Partner with the AWS Generative AI Services Competency, helping companies in healthcare, travel, and software industries build AI infrastructure that scales profitably.

Contact Emumba | Partner Overview | AWS Marketplace

AWS Partner Network (APN) Blog

How InterWiz reduced AI costs by 90% with Amazon Bedrock

The InterWiz story: From cost crisis to competitive advantage

Why Emumba

The migration approach

The results

The competitive impact

Three strategic takeaways for AI-powered SaaS leaders

Looking forward

Emumba – AWS Partner spotlight

Resources

Follow

Learn

Resources

Developers

Help