Guidefinanceunit economicsFinOps

LLM Unit Economics: A Working Model for AI-Native B2B SaaS

May 1, 2026 · Spendline

The SaaS unit economics framework that built the cloud era assumed one thing: serving customers is roughly free at the margin. CAC, LTV, payback period, gross margin — every formula in the playbook leans on the assumption that variable cost per customer is small enough to ignore once the product is built.

That assumption breaks with AI-native SaaS. When 30–50% of revenue from a customer is going straight back out the door to OpenAI or Anthropic, the variable cost is no longer a rounding error. It is the dominant feature of the P&L. The unit economics math has to be rebuilt.

This guide is that rebuild. It walks through what changes, the new formulas, what "healthy" looks like at scale, and how to instrument the data to actually compute the numbers without hand-rolled spreadsheets every quarter.

What classical SaaS unit economics assumed

The reason the classical framework worked is that it described a business where:

Variable cost per customer is dominated by hosting and support, both of which scale sublinearly with revenue. Doubling customers does not double infrastructure cost; it adds 20–40%.
Gross margin is structural, set by the product architecture, and once hit at scale, is stable. A SaaS product running at 75% gross margin at $10M ARR will run at 78% gross margin at $100M ARR.
The interesting question is acquisition. Because gross margin is stable and high, the management challenge is CAC payback, churn, and expansion. The cost side is solved.

Every formula in the SaaS canon — LTV/CAC, payback period, magic number — encodes those assumptions. They are not wrong; they describe the business that existed before AI inference was a line item.

What changes with AI

Three things break the classical model when AI is a material input:

1. Variable cost is now significant and customer-specific

A heavy AI user can cost you 50% of their MRR in inference. A light AI user costs 5%. The variance between customers is enormous, and it is impossible to estimate from outside the request log. Per-seat or per-account averaging hides which customers are actually profitable.

2. Gross margin is dynamic, not structural

In classical SaaS, gross margin is set by your architecture and improves slowly with scale. In AI-native SaaS, gross margin moves week to week with model price changes, traffic mix shifts, agent behavior changes, and prompt engineering improvements. A new model release from OpenAI can move your gross margin two points in a quarter. A misconfigured agent can move it five points in a week.

You cannot manage what you cannot see at this resolution. Quarterly gross margin review is too slow.

3. The interesting question shifts from CAC to gross margin per customer

When variable cost is large and customer-specific, the question "is this customer profitable?" stops being trivial. A customer with a $50K ARR contract and $30K of AI spend is not equivalent to a $50K ARR customer with $5K of AI spend. They are different business outcomes that happen to have the same top-line number.

The thing that matters for valuation, retention strategy, and pricing is contribution margin per customer, not just ARR. And contribution margin per customer requires per-request cost attribution, which classical SaaS finance never had to build. (See Per-customer attribution.)

The new unit-economics formulas

The replacement framework has four primitives:

Cost per request

The atomic unit. Every LLM call has a cost: input tokens × input price + output tokens × output price + (optional) overhead.

cost_per_request = (input_tokens × p_in) + (output_tokens × p_out)

For agentic workflows, "request" is the user-initiated action, not the individual LLM call. A single user prompt that triggers 200 internal calls has a cost that is the sum of all 200.

Cost per request is the only number you can actually instrument cleanly at the source. Everything else aggregates from it.

Cost per customer

Sum of cost-per-request over all requests attributable to that customer in a period:

cost_per_customer_month = Σ cost_per_request for that customer in the month

The dimension that matters here is customer, not user, not seat. A 50-seat customer with one heavy power user can cost more than five light 50-seat customers combined. Aggregate at the level that maps to the contract.

Contribution margin per customer

The per-customer version of gross margin:

contribution_margin_per_customer = MRR - cost_per_customer_month - other_variable_cost_per_customer
contribution_margin_% = contribution_margin_per_customer / MRR

This is the number that says whether each individual customer is profitable. Aggregate it across the customer base and you get gross margin in the classical sense, but the interesting analysis is the distribution, not the average.

Cost per feature / per workflow

For product and engineering decisions, the same math applied to a different dimension:

cost_per_feature_month = Σ cost_per_request tagged to that feature
revenue_per_feature_month = (allocated revenue, often by usage proportion)
contribution_margin_per_feature = revenue - cost - other_variable

Feature-level unit economics is what tells you which features to invest in, which to deprecate, and which to price. Without it, every feature is a black box that may or may not be paying for itself.

A worked example

Take a hypothetical AI-native B2B SaaS company. $32M ARR, 240 customers, average $11K MRR. Looks healthy on the top line.

| Customer cohort | Customers | Avg MRR | Avg AI cost | Contribution margin | |---|---|---|---|---| | Top decile (heavy users) | 24 | $24,500 | $19,400 | 21% | | 2nd–4th decile | 72 | $14,800 | $7,200 | 51% | | 5th–8th decile | 96 | $7,500 | $1,500 | 80% | | Bottom decile (light users) | 48 | $4,200 | $300 | 93% |

The top decile is generating 56% of revenue but only 30% of contribution. The bottom decile is small in revenue but 93% margin. The same business looks like a high-growth SaaS company at the bottom and a commodity reseller at the top.

Three things become obvious from this table that are invisible in aggregate:

The top decile has a pricing problem, not a cost problem. They are using the product as designed; the pricing is wrong for their usage pattern. The fix is contract restructuring or usage-based pricing, not engineering optimization.
The middle two cohorts are the strategic core. They are profitable, scaling, and the place where retention investment pays off most.
The aggregate gross margin (let's say 58%) hides everything important. It is a useful headline number for investors. It is a useless number for management decisions.

This is what unit economics analysis looks like when AI is a material input. It is per-customer, per-feature, distribution-aware, and updated continuously rather than quarterly.

What "healthy" looks like

There is no single benchmark, but the patterns observed across AI-native SaaS in 2025–2026 suggest the following ranges. These are not prescriptive; they are calibration points.

| Metric | Concerning | Watching | Healthy | |---|---|---|---| | Aggregate gross margin (AI as COGS) | < 40% | 40–55% | 55–75% | | Top-decile customer contribution margin | < 10% | 10–25% | 25–40% | | Bottom-decile customer contribution margin | < 70% | 70–85% | 85–95% | | AI spend as % of revenue | > 50% | 30–50% | 15–30% | | AI spend growth rate vs. revenue growth rate | AI > 1.5× revenue | AI = 1.0–1.5× revenue | AI < revenue |

The last row is the most important leading indicator. If AI spend is growing faster than revenue, your gross margin is compressing every quarter and the unit economics are getting worse, not better. If AI spend growth is below revenue growth, you are gaining operating leverage on the variable cost side and your business is improving.

ICONIQ's 2025 data on AI-native B2B companies showed a median AI spend at roughly 30% of revenue, with the heavy tail going up to 50%+. That is the reference point against which to benchmark your own numbers — but it is a moving target as the category matures.

The value capture question

Beneath the math is a strategic question that AI-native founders have to answer: how much of the revenue from AI-derived value gets captured by the AI provider, vs. by you?

Three patterns are emerging:

The thin-wrapper problem. If your product is mostly a prompt template plus a chat UI, the customer is paying you for a marginal value-add over what they could build themselves with an OpenAI key. Your gross margin will compress to roughly the value of the marginal effort. This is unsustainable; the wrapper either thickens (add proprietary data, workflows, integrations) or the business commodifies.

The proprietary-data moat. If your product depends on data you have that the customer cannot easily replicate, you can charge a premium that the LLM cost does not erode. Gross margins remain healthy. The vast majority of AI-native B2B winners are following this pattern.

The orchestration moat. If your product wraps LLM calls in workflow orchestration, integrations, and operational knowledge that takes a long time to replicate, you are charging for the orchestration, not the inference. Gross margins recover as you automate the orchestration cost down.

Unit economics analysis is what tells you which pattern you are in. If your top-decile customers are at 20% contribution margin, the value is not flowing to you fast enough — either the price is too low or the LLM cost is too high. If your top-decile customers are at 50% contribution margin, you have a moat and the math is working.

How to instrument

The data requirements for this framework are not optional. To compute the formulas above, you need:

Per-request cost record with token counts, model identifier, latency, status, and cost in dollars
Customer dimension on every request so cost can be aggregated to the customer level
Feature/workflow dimension on every request for product analysis
Revenue mapping per customer per period (typically from billing system)
Reconciliation against provider invoices so the per-request total matches the bill

Most companies have (1) somewhere — usually in observability tools. (2) and (3) are the gap; engineering teams have to instrument them deliberately. (4) and (5) are the finance integration that closes the loop, and they only work if the close is operationalized monthly. (See The AI month close.)

The instrumentation is the work. Once it exists, the unit economics analysis is mechanical. Without it, every analysis is a multi-week reconstruction project from logs.

The frequency question

Unit economics in classical SaaS is a quarterly review artifact. In AI-native SaaS, it has to be at least monthly and ideally weekly, because the inputs change that fast.

The reasons:

Model pricing changes. Provider price drops happen with little notice and meaningfully shift gross margin. You want to know within a week.
Traffic mix shifts. A customer onboarding to a new feature can move their cost profile dramatically inside a billing cycle. You want to see it before the next contract renewal.
Agent behavior drift. Agentic workflows have non-deterministic cost profiles. Yesterday's agent that took 12 steps may take 30 steps tomorrow because of a prompt or model change.
Pricing experiments. If you are testing usage-based pricing tiers, you need contribution margin per cohort updated continuously, not quarterly.

The companies handling this well have moved AI unit economics into the same dashboard tier as DAU and revenue, with weekly review as standard practice and monthly reporting to the board.

What this changes for the business

If you do this analysis for the first time, three things tend to fall out:

Some customers move to the firing list. Not literally — usually they move to renegotiation or restructuring. Customers who consume more AI than they pay for are running an arbitrage on you, and once you can name them, you can fix them.
Some features move to deprecation. Features with negative contribution margin and small revenue contribution are net costs. Killing them or hiding them behind a paywall is a margin lift.
Pricing strategy changes. Once you understand cost-per-customer, you can build pricing tiers that align with cost. Most AI-native companies arrive at some form of usage-based or hybrid pricing inside 18 months of doing this analysis seriously.

The framework is not theoretical. Every AI-native company that has scaled past $20M ARR has had to confront these numbers. The teams that confronted them early and built the muscle preserved their margins. The teams that delayed are now reverse-engineering the data from logs and trying to reprice contracts mid-flight, which is much harder.

A closing note for finance leaders

If you are reading this in a finance role at an AI-native company that has not done this analysis, the practical first step is small: pick your top 20 customers by ARR, get the AI cost attributable to them for the last full month, and compute contribution margin for each. The exercise will take a week if your data is reasonable and a month if it is not. Either way, the picture you get back will reshape how you think about the business.

That is what AI unit economics is — the picture of the business that classical SaaS finance is not equipped to produce.

Spendline captures cost at the request level as traffic flows through the proxy and aggregates contribution margin per customer on demand — no log reconstruction, no end-of-quarter spreadsheet rebuild. If you are computing these numbers from logs today, request a pilot and we will set you up with a working model in two weeks.