Billing Architecture: Tokens, Gates, and the Economics of AI-Native SaaS

Maxime ChampouxJanuary 31, 202610 min read

Every API call to GPT-4 costs between $0.002 and $0.12 depending on context length and complexity. For a traditional SaaS company, the marginal cost of serving one more user rounds to zero. For an AI-native product, every single query hits the P&L. This changes everything about how you build, price, and grow a software business.

At Well, we process financial data through LLMs. Bank transactions get categorized, reconciled, and explained in natural language. Each of those operations burns inference tokens. Our cost per query varies by 10x depending on whether a user asks a simple categorization question or triggers a full reconciliation across multiple accounts. We learned quickly that the standard SaaS pricing playbook does not apply when your infrastructure costs scale linearly with engagement.

The unit economics problem

SaaS built its empire on a simple truth: software costs almost nothing to copy. You build the product once, host it on shared infrastructure, and each additional customer adds revenue with negligible incremental cost. Gross margins of 80-90% became the norm. Investors modeled businesses on the assumption that scale would always improve margins.

AI-native products break this model. When your core value proposition runs through an LLM, every user interaction carries a direct cost. More engagement means more spending. The metrics that SaaS investors memorized over the past decade need recalibration.

Consider two scenarios. A traditional project management tool serves 1,000 daily active users. The infrastructure cost barely changes whether those users create 10 tasks or 10,000 tasks. Storage is cheap. Compute for CRUD operations is negligible. Now take an AI-native financial assistant serving 1,000 daily active users. If each user sends 5 queries per day, and each query costs an average of $0.03 in inference, that is $150 per day in pure LLM costs. Scale to 10,000 users and you are burning $1,500 daily on inference alone, before you pay for anything else.

The relationship between revenue and cost of goods sold becomes something closer to a marketplace than a software company. You need to think about take rates, not just subscription revenue.

This is not a theoretical problem. Startups that raised on traditional SaaS assumptions are discovering mid-flight that their unit economics do not behave as modeled. Series A decks showing 85% gross margins turn into board meetings explaining why margins contracted to 55% as usage grew. The cost curve slopes the wrong way.

How we built the token system

We tried three pricing approaches before landing on the current architecture.

First, we tested flat-rate monthly subscriptions with unlimited AI queries. This lasted six weeks. A small percentage of power users consumed 40x the median query volume. Our inference costs for that cohort exceeded their subscription revenue by 3x. Unlimited plans in AI-native products attract exactly the users you cannot afford to serve.

Second, we experimented with strict per-query pricing. Users purchased query packs. This killed engagement. People became reluctant to ask questions, which defeated the purpose of building an AI assistant. Usage dropped 60% compared to the unlimited period. The product felt punitive, like a taxi meter running while you think.

Third, we built a hybrid system. A base subscription includes a set number of conversations per month, with overage pricing for heavy users. This worked better but felt arbitrary. Users did not understand why some queries cost more than others, and the billing was hard to predict.

The system we run today is token-based with activation rewards. Every user starts with a base allocation of tokens. Tokens represent conversation credits, roughly mapping to one AI-powered interaction each. Then we layered in a gamified onboarding flow: complete setup steps and earn bonus tokens.

Connect your first bank account: 10 bonus tokens. Invite a teammate: 15 bonus tokens. Set up your first budget category: 5 bonus tokens. We designed 20 activation tasks that collectively grant 140 bonus conversations. The psychology is deliberate. Users associate value creation with token earning. Setting up a data source is not a chore; it is earning capacity.

This approach solved two problems simultaneously. Activation rates improved because users had a tangible incentive to complete setup. And our cost exposure became predictable because tokens cap maximum usage per period while still feeling generous.

The gating decisions

Not every feature sits behind the same gate. We operate three tiers: a free exploration layer, a standard tier, and Pro.

Free users get limited tokens and access to basic categorization and reporting. They can connect data sources, see their transactions, and ask the AI simple questions. The free tier exists to demonstrate value, not to be a permanent home.

Standard users pay a monthly subscription and receive a larger token allocation with the activation bonus system described above. They get the full conversational AI experience for day-to-day financial management.

Pro unlocks two specific capabilities: automated reconciliation and cross-workspace reporting. These are gated for economic reasons, not artificial scarcity. Reconciliation is our most expensive operation. A single reconciliation run can consume 50-100x the tokens of a standard query because it requires the LLM to compare, match, and validate transactions across multiple data sources in sequence. Cross-workspace reporting aggregates data across entities, which means multiple parallel LLM calls.

We gate these features because their unit economics demand a higher price point. Giving them away would make our cost structure unsustainable. But we also gate them because the users who need reconciliation and cross-workspace reporting are typically businesses with higher willingness to pay. The gating aligns cost structure with customer segmentation.

One decision we got wrong early: we initially gated smart categorization suggestions behind Pro. Users on the free and standard tiers got basic rule-based categorization. This was a mistake. Categorization is the first moment where users feel the AI working. Gating it meant free users never experienced the core value proposition. We moved it to all tiers within a month and saw trial-to-paid conversion increase by 22%.

The generosity problem

Early-stage AI products face a specific tension. You need users to engage deeply enough to discover value. But every interaction costs you money. In traditional SaaS, generous free tiers are cheap. In AI-native SaaS, generosity has a direct cost.

Our data shows 60-70% of new users churn within the first week. That means most of the tokens we allocate to new users are consumed by people who will never pay. Every free token granted to a churning user is pure cost with zero future return.

The tempting response is to restrict the free experience. Give fewer tokens. Gate more features. Make users pay before they can truly explore. We tried this. Conversion rates dropped. It turns out that restricting the free experience does not reduce churn; it just makes churning users leave faster. The users who would have converted also leave because they never got deep enough to see the value.

Our current approach is to be deliberately generous with the activation-linked tokens. The 140 bonus conversations from completing setup tasks are substantial. But the key insight is that users who complete setup tasks have dramatically lower churn. The correlation between activation completion and 30-day retention is 0.74. So the tokens are not wasted on churning users because churning users rarely complete the activation flow. The system self-selects.

We still lose money on some cohorts. Users who complete activation but never convert to paid consume their full token allocation at our expense. This is the cost of acquisition in an AI-native business. We think of it the same way a marketplace thinks about subsidized early transactions. The difference is that our subsidy is not a discount on a physical good; it is compute time that evaporates whether or not the user converts.

Tracking this cost forced us to build internal tooling we had not anticipated. We now monitor cost-per-activated-user, cost-per-converted-user, and token redemption curves segmented by acquisition channel. Some channels produce users who burn tokens quickly on complex queries but convert at lower rates. Others produce users who convert faster but use the product less intensively. Optimizing across these dimensions is a new discipline that traditional SaaS acquisition models do not prepare you for.

Inference cost variability

Not all AI queries are created equal. A simple "What category is this transaction?" query uses minimal context and generates a short response. It might cost $0.005 in inference. A complex query like "Compare my Q3 spending across all categories against my budget and explain the variances" requires loading transaction history, budget data, and prior context. That query might cost $0.08.

This 10-16x cost range within a single product creates a billing architecture challenge. If you charge the same token rate for all queries, you either overprice simple interactions or underprice complex ones. We chose to absorb the variance. One token equals one conversation, regardless of complexity.

This means our margins fluctuate based on the query mix. Months where users ask more complex analytical questions have lower margins than months dominated by simple categorization queries. We track the query complexity distribution weekly and model our pricing against the trailing 90-day average cost per conversation.

The alternative, charging different token rates for different query types, adds friction and confusion. Users should not need to think about whether their question is "expensive" before asking it. That calculation kills the natural interaction pattern that makes AI products valuable.

Pricing an AI-native product

After two years of iteration, here is what we believe about pricing AI-native SaaS:

Flat-rate unlimited plans do not work unless you can absorb 10-50x variance in per-user costs. Very few early-stage companies have that margin of safety.

Pure usage-based pricing kills engagement. Users optimize for fewer interactions, which means they extract less value, which means they are more likely to churn. You save on inference costs but lose on lifetime value.

Token or credit systems with base allocations work because they create a psychological framework. Users understand they have a budget. They spend it on what matters. The constraint makes each interaction feel more intentional without feeling punitive.

Activation-linked rewards align your costs with user commitment. Users who invest time in setup are more likely to stay. Giving them more capacity reinforces the behavior you want.

Feature gating should follow cost structure, not just value perception. Gate the features that are genuinely expensive to serve at the tier where pricing supports the unit economics.

Price against your trailing cost curve, not your current costs. LLM inference costs have dropped roughly 10x in two years. If you price against today's costs, you will be overpriced in six months. If you price against projected costs, you might not survive until the projections materialize. Trailing averages split the difference.

What this means for SaaS broadly

The AI-native pricing problem is not limited to startups building with LLMs. Every established SaaS company adding AI features faces the same calculus. Salesforce, HubSpot, and Notion have all introduced AI features with separate usage limits or premium tiers. They are discovering what we learned from day one: AI features cannot be given away for free at scale.

This will reshape the SaaS market over the next five years. Companies that previously competed on feature breadth at a flat rate will need to develop hybrid pricing models. The ones that figure out how to make usage-based components feel natural, rather than punitive, will have an advantage.

For investors evaluating AI-native companies, the key metrics shift. Gross margin needs to be reported with and without inference costs isolated. Query volume trends matter as much as user growth. Cost per query trajectory tells you whether the business will reach profitability as it scales or whether scale will amplify losses.

The SaaS playbook is not dead. Recurring revenue, low churn, and expansion are still the goals. But the path to those outcomes runs through a billing architecture that accounts for the real cost of intelligence. Build the system wrong and growth becomes a liability. Build it right and every token spent by a user creates measurable value on both sides of the transaction.

Maxime Champoux

CEO & co-founder, Well

Maxime is the CEO and co-founder of Well. He built Well to rebuild finance around AI-native data, not spreadsheets.

Ready to automate your financial workflows?

Try Well free

Billing Architecture: Tokens, Gates, and the Economics of AI-Native SaaS

The unit economics problem

How we built the token system

The gating decisions

The generosity problem

Inference cost variability

Pricing an AI-native product

What this means for SaaS broadly

More from the blog

Writing Culture Is Back. But Not For Us. Here's Why Leaving Notion Means Something Bigger.

Founder Mode in Era 3: When Half Your Team Can Only Read

163×: The SaaS Idiot Index in an Age of Abundance