How to Bill Your Users for AI: A Guide for Platform Builders

To bill your users for AI, you need four systems working together: real-time metering of every API call, a rating engine that converts tokens to dollars with your markup, a prepaid wallet or payment collection layer, and a dashboard that gives users visibility into their spending. Most teams underestimate this by 3 to 6 months of engineering time.

Key Takeaways

Platform billing for AI is fundamentally different from SaaS billing because your costs scale per-token with every user request
You need real-time metering, not batch jobs. If a user burns $200 in tokens overnight, you need to know immediately
Stripe alone is not enough. It handles payments but not metering, margin calculation, or balance enforcement
The three approaches: build it yourself (6+ months), stitch together tools (3+ months), or use a purpose-built platform like Lava
Prepaid wallets with automatic top-ups are becoming the default pattern for consumer AI billing

If you are building an AI-powered product, you have already figured out the hard part: making the AI work. Now comes the part nobody warns you about. You need to charge your users for it.

Not your enterprise customers. Not your API consumers. Your end users. The people inside your app who are generating completions, running agents, and burning through tokens without thinking about what any of it costs. You are paying OpenAI or Anthropic per token, and somehow you need to pass that cost through to the people actually using your product, collect payment, and come out with margin intact.

This is the platform billing problem. And it is fundamentally different from billing developers or enterprises.

Why Platform Billing Is Harder Than It Looks

When OpenAI bills a developer, the developer has a credit card on file, understands token pricing, and monitors their own usage. When you bill your end user, none of that is true. Your user does not know what a token is. They do not want to think about per-request costs. They just want the AI feature to work.

That creates three problems at once:

1. Margin management is invisible. GPT-4o costs you roughly $2.50 per million input tokens and $10 per million output tokens. Claude 3.5 Sonnet runs about $3 and $15 respectively. If your user sends a long conversation with a lot of context, a single request might cost you $0.01 to $0.05. That sounds small until you have thousands of users making dozens of requests per day. If you are not metering every request and applying your markup in real time, you are flying blind on margin.

2. Metering has to be invisible too. You need to track every API call, measure token counts (input and output, separately), attribute costs to the right user, and do all of this without adding latency to the user experience. If your metering system adds 200ms to every request, your users will notice.

3. Payment collection from consumers is its own problem. Enterprise billing means invoices and net-30 terms. Consumer billing means credit cards, prepaid balances, automatic top-ups, and users who will churn the moment they see a confusing charge. You need a checkout flow, a balance system, and a way to cut off access gracefully when funds run out.

Most teams underestimate all three of these. They think "we will just add Stripe" and discover that Stripe handles payment collection but not metering, not margin calculation, and not the real-time balance enforcement that prevents users from running up costs they cannot pay for.

Stripe is not enough

Stripe handles payment collection but not metering, not margin calculation, and not real-time balance enforcement. Platform billing for AI requires infrastructure that sits between your app and the AI providers.

Three Approaches to Solving This

Option 1: Build It Yourself

You wire up token counting on every API response. You build a usage tracking database. You integrate Stripe for payments. You create a dashboard so users can see their balance. You implement automatic top-ups, low balance alerts, and access cutoffs. You build an admin panel so your team can see margins per user.

This works, and some teams do it well. But it takes months of engineering time, and it is not your core product. Every hour spent on billing infrastructure is an hour not spent on the AI features your users actually care about. You also need to maintain it as providers change pricing, as you add new models, and as your user base grows.

Realistic timeline: 2-4 months of engineering for a basic version. Ongoing maintenance indefinitely.

2-4 months of engineering, minimum

And that is just the basic version. You still need to maintain it as providers change pricing, as you add new models, and as your user base grows. Every hour on billing is an hour not on your product.

Option 2: Absorb the Cost

Many early-stage products just bundle AI usage into a flat subscription. Cursor charges $20/month for Pro and includes a generous allocation of fast requests. ChatGPT Plus is $20/month for access to GPT-4o. This is simple and it works for getting to market quickly.

The problem is margin math. If your average user consumes $3/month in API costs and you charge $20, great. But usage distributions are not uniform. Your power users might consume $30 or $50 in API costs while paying the same $20. A small percentage of heavy users can destroy your unit economics.

This model works when you are pre-product-market-fit and optimizing for learning, not margin. It breaks when you scale past a few thousand users and the power-user tail starts eating your runway.

Option 3: Use a Billing Layer

Instead of building billing infrastructure from scratch or ignoring the problem, you use a platform that sits between your application and the AI providers. Your API calls get proxied through a billing layer that meters usage automatically, attributes costs to users, and handles payment collection through a hosted checkout.

This is the approach Lava was built for. Your application makes AI requests through a gateway. The gateway counts tokens, calculates cost, applies your markup, and deducts from the user's prepaid balance. The user topped up their balance through a checkout flow you embedded in your app. When their balance gets low, they get prompted to add funds.

You set the margin. The infrastructure handles the rest.

What Good Platform Billing Requires

Whatever approach you choose, here is what you actually need to get right:

Real-time usage tracking. Not daily summaries. Not batch processing overnight. Every request needs to be metered and attributed within seconds so that balance enforcement works and users can see current usage.

Transparent pricing for end users. Your users need to understand what they are paying for, even if they do not understand tokens. Show them a balance, show them usage, and make it clear how their money is being spent. Trust drives retention.

Automatic top-ups and balance management. Do not make users manually add funds every time they run low. Offer automatic top-ups at configurable thresholds. Reduce friction, reduce churn.

Margin control per model and provider. You might want a 2x markup on GPT-4o but a 3x markup on a cheaper model. You might want to subsidize certain usage to drive adoption. Your billing layer needs to support this granularity.

Multi-provider support. If you are routing requests to OpenAI for some tasks and Anthropic for others, your billing needs to handle both. Different providers have different token pricing, different rate limits, and different response formats. Your metering has to normalize across all of them.

The Margin Math

Here is a concrete example. Say you are building an AI writing assistant and you route requests to GPT-4o.

Your costs per 1M tokens: $2.50 input, $10.00 output. A typical user session generates roughly 2,000 input tokens and 1,000 output tokens. That is about $0.015 per session. If a user runs 100 sessions per month, your cost per user is roughly $1.50.

$0.015

Cost per session

2K input + 1K output tokens on GPT-4o

Markup needed

To cover costs + payment fees + margin

$2.70

Net margin per user

At 100 sessions/month after fees

If you charge users at a 3x markup, you are collecting about $4.50 per month per user. After payment processing fees (roughly 2.9% + $0.30 per transaction on Stripe), your net margin per user is around $2.70 per month.

That margin only works if you are actually metering and enforcing. If 10% of your users are power users doing 500 sessions per month, their cost is $7.50 but they are still only paying $4.50 at flat pricing. Those users are costing you money. Usage-based billing with real metering catches this automatically.

Who Is Doing This Well

Cursor bundles AI usage into subscription tiers and uses rate limits to manage power users. Simple, but they had to build custom infrastructure to track and throttle usage per user.

Zapier uses task-based pricing where each AI action counts as a task against the user's plan. This translates AI costs into a unit the user already understands.

Custom AI apps built on platforms like Lava let the end user prepay for a balance and consume it as they use AI features. The developer sets the markup, and the platform handles metering and payment collection.

The pattern that scales best is clear: meter actual usage, let users prepay or auto-top-up, and give yourself per-model control over markup.

How Lava Helps

Lava is built specifically for this problem. You are a platform builder, and you need to charge your end users for AI usage without spending months building billing infrastructure.

Lava Gateway proxies your AI requests to 600+ models across 30+ providers through a single API. Every request is automatically metered. Token counts, costs, and attribution are tracked in real time. You swap providers or models without touching your billing code.

Lava Monetize handles the end-user payment side. Your users fund a wallet through a hosted checkout. As they use AI features, their balance decreases. When it runs low, they get prompted to top up (or it happens automatically). You set the markup. You see your margins on every request.

You do not have to choose between building it yourself and absorbing costs until it hurts. There is a third option: let the billing layer do what billing layers do, and get back to building the product your users actually came for.