You CAN Manage, Forecast, and Evaluate AI Costs

As a former CFO, I view AI from a financial perspective, not a technological one. How can you control AI costs? How do you know your company is getting value from every dollar it spends on AI? And how can you forecast spending when AI is evolving quickly and future use cases remain uncertain?

Because AI is so exciting, vendors talk about models, tokens and architectures. But that gives the CFO little guidance on cost control, forecasting, and ROI. As a result, the technology can appear expensive and risky, even though it may be one of the highest-ROI initiatives a company can undertake.

One difficulty is that AI doesn’t fit neatly into traditional budgeting cycles. The technology evolves every three months. Prices drop. Capabilities expand. Usage scales unpredictably. Model inference costs have dropped roughly 75% in the last 18 months and are likely to continue falling. Companies use AI more as its capabilities increase and teams find new ways to use it. Then those teams produce new applications faster and faster. Total spend rises, but the value generated per dollar may be growing even faster.

Where does this leave the CFO?

Managing Costs

The biggest savings are found in how you design and operate AI workloads.

Choose the Right Model

Not every task requires the largest, most expensive LLM. Simple classification, data extraction, or routing tasks can run efficiently on smaller, lower-cost models. Reserve frontier models for complex reasoning where the incremental intelligence justifies the incremental cost.

Glean’s CEO recently noted that 95% of enterprise AI still runs on the most expensive models. We’ve seen organizations cut inference costs by 60%–80% simply by aligning model size to task complexity. It’s not hard: Tools like Amazon Bedrock Intelligent Prompt Routing automatically evaluate each prompt and route it to the most cost-effective model.

Eliminate Waste

Most companies unknowingly pay for the same answers repeatedly. If your AI system processes the same customer query 1,000 times a day, you shouldn’t pay for 1,000 inferences. Caching—whether at the prompt, semantic, or response level—can reduce redundant processing by 40%-60%. For high-volume, repeated interactions like customer support, HR FAQs, or internal knowledge lookup, this is one of the fastest levers to lower unit costs. And it requires no re-engineering of logic, just smart execution.

Examine Your Prompts

What many call “prompt engineering,” I call cost engineering. The way you structure your prompts directly impacts token usage, latency, and output quality. A well-scoped prompt with clear constraints produces accurate results in fewer tokens. Teams that treat prompts as code—versioned, tested, optimized—see consistent cost reductions of 30%–50% without sacrificing output quality. Shorter prompts, fewer examples, and tighter instructions are small changes that compound into real savings.

Separate Real-Time from Batch Workloads

Real-time inference is expensive because it demands immediate compute and low latency. But not every task needs it. Report generation, data enrichment, document summarization—anything not in front of a live user can be batched. Asynchronous batch processing significantly reduces the cost per inference. You can reserve real-time spending for moments that demand it: customer interactions, real-time decision support, or dynamic content generation.

These are just a few examples of how engineering decisions dramatically affect cost. You have more control over expenses than you might think.

Forecasting Spending

When finance leaders ask me, “How do I budget for AI?” they’re really asking, “How do I commit capital to something uncertain without risking my credibility?” The answer is disciplined execution. Build a solid foundation, then deliver measurable results—repeatedly. The evidence you gather improves your forecasting accuracy over time.

Phase 1

Building your initial foundation is mostly a fixed cost. It includes your platform setup, data preparation, security, compliance, workforce upskilling, and initial pilots. Compute spending at this point is minimal, typically 10%–15% of total investment. The real costs are in data readiness, integration, change management, and governance. Skip them, and you’ll hit roadblocks when you try to scale. You can budget here with confidence. Treat it like any early-stage investment and keep it controlled, measured, and tied to learning.

Phase 2

As adoption increases, inference costs grow—but unit costs decline. As models improve, value creation grows exponentially. Caching kicks in. Prompts are optimized, and routing gets smarter. Cost per outcome drops. Plan in ranges, not point estimates, and use conservative, base, and aggressive scenario bands. Expect variability and build in flexibility.

Phase 3

As AI matures within your organization, it stops being a project and becomes embedded in workflows. You stop asking “How much does this model cost?” and begin asking, “What is the cost to automate this process? To reduce this risk? To accelerate this decision?” Investment is tied directly to business impact, and AI becomes a value driver.

Organizations move through these phases at different speeds. Some compress them into 12–18 months; others take longer. The key is to apply the right financial model at each stage instead of forcing a one-size-fits-all approach.

Measuring Results

Perhaps the most important shift is cultural rather than technical. Instead of measuring tokens and their costs, you should measure outcomes. Business results can come in various forms: time saved, errors avoided, or throughput increased. A $0.50 inference that saves an analyst two hours of work isn’t expensive—it’s one of the best deals in enterprise software. A free service that produces unusable output is infinitely expensive.

Most organizations can tell you what they spent on AI in the last quarter, but few can tell you what they got in return. To accurately measure results, you need:

A baseline measurement of the process before AI touches it
A clear definition of the outcome you’re targeting (e.g., “reduce report generation from 5 hours to 20 minutes”)
An owner who is accountable for the business result, not the deployment

This is familiar to anyone who lived through the first days of the cloud, when early adopters had higher costs but received exponentially more capability per dollar spent. The companies that thrived then didn’t obsess over the cost per compute hour. They connected every dollar to business outcomes and invested where the return was clear. AI is following the same path.

The Bottom Line: CFOs and AI

So where should CFOs focus?

Companies getting AI right are neither the biggest nor the smallest spenders. They’re the ones who can trace AI dollars to business outcomes. They use AI to drive efficiency, reduce cycle times, and free talent for higher-value work.

Given the uncertainty, you should value flexibility. It’s best to avoid long-term, fixed-quantity contracts and opt for consumption-based pricing that lets you scale up as usage grows and scale down as efficiency improves. Budget for increasing investment—but expect improving unit economics over time.

AI can be a substantial cost for companies. But its returns might have the biggest impact on your P&L. You have more control over its costs than you might expect, so you shouldn’t be afraid to invest wherever that impact can be greatest.

AWS Executive in Residence Blog