Setting up billing for an AI app requires choosing a pricing model that matches your cost structure, metering every API call in real time, collecting payment before or after usage, and giving users clear visibility into what they are spending. Most teams underestimate this work by months. The good news is that the patterns are well established, and you do not have to build everything from scratch.

Key Takeaways

AI billing is different from SaaS billing. Your costs scale per-token with every request, so flat subscriptions alone will erode your margins as usage grows
Pick your pricing unit early. Tokens, requests, credits, or compute time. This decision shapes your entire billing architecture
Real-time metering is non-negotiable. Batch processing means you cannot enforce spending limits or prevent cost overruns
Prepaid wallets reduce risk. Collecting payment before usage eliminates bad debt and simplifies your financial model
Users need spending visibility. Dashboards, alerts, and cost breakdowns build trust and reduce churn from surprise bills

The Three Billing Models for AI Apps

Before writing any code, you need to decide how you are going to charge. There are three proven models, and each comes with real tradeoffs.

Subscriptions with usage limits. Customers pay a flat monthly fee and get an allocation of AI usage. Cursor charges $20/month for Pro with a set number of fast requests. ChatGPT Plus is $20/month with priority GPT-4o access. This model is simple to understand and simple to sell. The risk is that your heaviest users blow past the allocation and cost you more than they pay. You need hard limits or overage charges to make the math work.

Prepaid credits. Customers buy credits upfront and spend them as they use your product. OpenAI and Anthropic both use this model for API access. A developer loads $10 in credits and burns through them based on token consumption. This is the safest model for you because customers can only spend what they have already paid for. The downside is purchase friction, which you solve with automatic top-ups.

Pure usage-based billing. Customers pay after the fact based on exactly what they consumed. AWS, Google Cloud, and most infrastructure companies bill this way. It is the most precise alignment between cost and revenue. The risk is credit exposure. A customer racks up $5,000 in usage and their card declines. Now you are chasing money. For more on how these models compare, see our deep dive on usage-based billing.

Start with prepaid, add flexibility later

Prepaid wallets with automatic top-ups give you the safety of collecting payment upfront while keeping the experience nearly as frictionless as postpaid billing. You can always add invoicing for enterprise customers later.

Step 1: Choose Your Pricing Unit

Your pricing unit is the thing customers see on their bill. This decision shapes everything downstream, from your metering infrastructure to your pricing page.

Tokens are the native unit for language models. GPT-4o costs $2.50 per million input tokens and $10 per million output tokens. Claude Sonnet 4.5 runs $3 input and $15 output per million tokens. Token-based pricing gives you the most precise cost alignment, but most end users have no idea what a token is. This works for developer-facing APIs. It does not work for consumer products.

Requests are simpler. One API call equals one billable unit. This is easy to explain but hides wide cost variance. A 50-token request and a 50,000-token request both count as one, even though one costs 1,000x more to serve.

Credits abstract the underlying costs behind an internal currency. One credit might equal 1,000 tokens, or one image generation, or ten seconds of audio. Credits let you normalize different actions into a single, understandable unit. This is the model Midjourney, Runway, and most consumer AI products use. For a full breakdown, see our guide to credit-based pricing.

Compute time works for GPU-heavy workloads like image generation, video rendering, or model fine-tuning. Replicate charges per second of compute. This aligns well when processing time is the primary cost driver.

Tokens

Best for developer APIs

Precise cost alignment, complex for consumers

Credits

Best for consumer apps

Simple UX, abstracts underlying complexity

Requests

Simplest to implement

Easy to explain, hides cost variance

Step 2: Build or Buy Your Metering

Metering is the foundation of everything. Every API call needs to be captured, timestamped, attributed to a user, and priced. If your metering is wrong, your billing is wrong.

The critical requirement is real-time processing. If a user burns through $200 in tokens overnight and you do not know until a batch job runs in the morning, you have already lost that money. Real-time metering means you can enforce limits as they happen, not after the damage is done.

Building metering yourself means building an event ingestion pipeline that handles bursts without dropping data. A single customer might send 500 requests in a minute during a batch job. You need deduplication (because retries happen), aggregation that rolls up events without double-counting, and a storage layer that supports both real-time queries and historical analysis.

Most teams start with a database table and an increment query. That works until about 10,000 requests per day. Past that, you need something more robust: a queue or streaming layer, a deduplication mechanism, and time-series storage. Budget two to three months of engineering time for metering alone.

The alternative is routing your AI traffic through a gateway that meters automatically. Your application makes requests through the gateway. The gateway counts tokens, calculates cost, attributes usage to the right customer, and makes all of that data available in real time. No custom pipeline to build or maintain.

Batch metering breaks spending limits

If your metering runs on a nightly cron job, you cannot enforce real-time spending caps. A runaway script or a single power user can burn through your budget before you even know it happened.

Step 3: Set Up Payment Collection

You have metered the usage and calculated the cost. Now you need to collect money. There are two main approaches, and the right choice depends on your customer profile.

Prepaid wallets are the default for consumer and SMB products. Customers load funds through a checkout flow. As they use AI features, their balance decreases in real time. When the balance gets low, they either top up manually or automatic top-ups kick in. The key advantage is zero credit risk. If the wallet is empty, the request fails gracefully. The user sees a clear prompt to add funds. No bad debt, no collections, no surprise bills on either side.

Post-pay invoicing is common for enterprise customers. You bill monthly based on actual usage, with net-30 or net-60 payment terms. This reduces friction for large buyers who have procurement processes and cannot prepay. The tradeoff is credit exposure and accounts receivable overhead.

Either way, you will need Stripe or a similar payment processor for the underlying card charges. But Stripe handles payment collection, not billing logic. It does not meter your AI usage, enforce spending limits, or manage wallet balances. You need a billing layer between your app and Stripe that handles rating, balance management, and enforcement.

2.9% + $0.30

Stripe's per-transaction fee

Factor this into your markup calculation or you lose margin on every charge

Step 4: Add Spending Controls

AI usage can spike unpredictably. A user testing a new feature, a batch job processing thousands of documents, or an agent running in a loop can all create sudden cost surges. Spending controls protect both you and your customers.

Hard limits cut off access when a threshold is reached. The user's requests start failing with a clear error message. This is the safest approach but creates the worst experience if it happens unexpectedly.

Soft limits send alerts but allow usage to continue up to a higher cap. This gives users a warning without abruptly breaking their workflow.

Automatic cutoffs with grace periods are the middle ground. When a user hits 90% of their limit, they get notified. At 100%, new requests are queued or throttled rather than immediately rejected. This prevents bill shock while keeping the product usable.

Budget alerts notify users (and your team) when spending reaches configurable thresholds, like 50%, 80%, and 100% of a monthly budget. Even if you do not enforce hard limits, alerts give users the information to self-regulate.

The key insight is that spending controls are a feature, not a restriction. Customers actively want them. A usage limit that prevents a surprise $500 bill builds more trust than unlimited access that leads to one.

Step 5: Give Users Visibility

The fastest way to lose a customer on usage-based billing is to surprise them with a bill they did not expect. Visibility is not a nice-to-have. It is core infrastructure that directly affects retention.

Your users need to see:

Current balance or spend for the billing period, updated in real time
Usage breakdown by model, feature, or action type so they know where costs are going
Daily or weekly trends so they can spot unusual patterns before they become expensive
Projected costs for the rest of the billing period based on current usage rate

This requires your metering and rating systems to be working correctly and feeding data to the frontend with minimal lag. A dashboard that shows yesterday's data is better than nothing, but real-time data is what builds genuine trust.

For a comprehensive look at how billing, visibility, and user experience come together, see our guide on billing end users for AI.

Transparency drives retention

AI products with real-time usage dashboards see significantly lower churn from billing disputes. When customers can see exactly what they spent and why, they rarely question the bill. The anxiety comes from not knowing.

Putting It All Together

Setting up billing for an AI app is not a one-afternoon task, but it is also not a mystery. The pattern is proven: choose a pricing unit that matches your customers and cost structure, meter every request in real time, collect payment through prepaid wallets or invoicing, add spending controls that protect everyone, and give users a clear view of what they are spending. The companies that get this right, from OpenAI to Cursor to the thousands of AI startups shipping today, all follow the same playbook. The main decision is whether you build each piece yourself or use infrastructure that handles it for you.

How Lava Helps

Lava gives you the entire billing stack for AI, from metering to payment collection, so you can ship billing in days instead of months.

Lava Gateway proxies your AI requests to 600+ models across 30+ providers through a single API. Every request is automatically metered with token-level granularity. You get real-time usage tracking, cost attribution per user, and multi-model support without writing a single line of instrumentation code. Swap providers or models without touching your billing logic.

Lava Monetize handles the payment and visibility side. Your users fund prepaid wallets through a hosted checkout. As they use AI features, their balance decreases in real time. Automatic top-ups, spending limits, and usage alerts are built in. You set the markup per model, and Lava handles rating, balance enforcement, and the customer-facing usage dashboard.

Instead of spending months stitching together metering pipelines, Stripe webhooks, and custom dashboards, you integrate once and get back to building the product your users actually care about.

How to Set Up Billing for Your AI App: A Step-by-Step Guide

Key Takeaways

The Three Billing Models for AI Apps

Step 1: Choose Your Pricing Unit

Step 2: Build or Buy Your Metering

Step 3: Set Up Payment Collection

Step 4: Add Spending Controls

Step 5: Give Users Visibility

Putting It All Together

How Lava Helps

Related Articles

What Is an AI Gateway? A Guide for Developers

The Best AI Tools to Use with Lava Spend Keys (February 2026)

Ready to simplify your AI billing?