How does BYOK work for LLM-based products?

You generate an API key in your provider's console, paste it into the workbench once, and the workbench encrypts the ciphertext at rest. For every prompt run, the server decrypts the key in memory, calls the provider's API directly, streams the result back, and discards the decrypted key. The same key works from the UI, CLI, or programmatic API . The provider sees your account; the workbench sees the prompt text, the model choice, and the response.

Does BYOK actually save money, and at what scale?

At prototyping scale (under 100K input tokens per month) the savings are small because the platform subscription dominates the bill. At production scale (1M tokens per month and up) BYOK saves the margin a proxy-only tool would otherwise add, and with Anthropic's 90% cache-read discount active the savings compound at every cache hit. The exact break-even depends on your traffic pattern and your current tool's pricing, but the direction is consistent: BYOK pulls ahead as workload grows.

Is LangSmith a BYOK prompt tool?

Partially. LangSmith supports your own provider keys for traced inference requests, so the token bill goes direct to Anthropic or OpenAI. LangSmith itself charges a per-seat platform fee and a per-trace overage fee on top (Plus tier $39/seat/month, 10,000 included traces, $2.50 per additional 1,000 as of April 2026). By the posture framework above, it's partial-BYOK, not BYOK-mandatory.

What happens when my API key leaks, and how do I rotate it?

Every major provider (Anthropic, OpenAI, Google) supports immediate key revocation from the console; rotated keys invalidate within seconds. In a BYOK-mandatory platform, the plaintext key never lives outside server memory during an active call, so the blast radius is bounded to any prompts executed during the leak window. The rotation flow is: revoke the old key in the provider console, generate a new one, paste it into the workbench settings. The BYOK architecture overview has the step-by-step rotation procedure and workspace-scoped key best practices.

Issue №02BYOK & Cost

What is a BYOK prompt tool?

Q: What is a BYOK prompt tool?

A BYOK prompt tool is a prompt engineering workbench that requires you to connect your own LLM provider API key (Anthropic, OpenAI, Google, or others) and routes every inference call directly through that key. The platform charges a flat fee for workbench features such as versioning, critique, eval suites, and API access, and never adds a margin to your token bill.

Jon LasleyApril 20, 202612 min read

A BYOK prompt tool routes every LLM call through an API key you own, directly to the provider. The platform charges a flat fee for the workbench features you use; the token bill goes to your Anthropic, OpenAI, or Google account. No inference markup, no middleman on spend, no proxy between you and the model.

On this page

What a BYOK prompt tool actually is

BYOK stands for bring your own key. In the context of a prompt engineering workbench, it means the platform never holds or proxies the LLM provider credentials you use for inference. You paste your Anthropic, OpenAI, or Google API key into the workbench once; the workbench stores it encrypted; every prompt run calls the provider directly on your behalf.

The opposite posture is proxy-only, where the platform maintains its own provider relationships, hides the key from you, and bills you per token or per request with a margin baked in. A middle ground also exists. We'll name it in the postures section below.

Three things flow from the BYOK definition:

Your provider bill is your provider bill. Every cache hit, every batch discount, every tier rate cut your provider ships lands in your invoice. The workbench can't mark it up because it never sees the payment.
The workbench charges for workbench features. Versioning, critique, compare, eval suites, API access, team seats. Not tokens.
If the workbench disappears, your provider relationship is intact. Keys are still keys. Prompts are text files. Traces live in your provider's console.

Durability is a cost category too, even if it doesn't show up on a line item. Humanloop shut down on September 8, 2025 (Anthropic acqui-hired the team; no customer assets transferred; Humanloop deleted customer data at sunset, per the Humanloop migration notes). Teams that had used Humanloop as a proxy for managed keys rebuilt their provider relationships from scratch. Teams already on BYOK kept going.

How BYOK works (the data path, five minutes of setup)

Mechanically, BYOK looks like this:

You generate an API key in your provider's console. Anthropic, OpenAI, and Google all work the same way: open the console, create a workspace-scoped key, copy the value once.
You paste the key into the workbench. Prompt Assay encrypts it at rest using authenticated encryption (Supabase Vault's XChaCha20-Poly1305 as the primary path, with an AES-256-GCM fallback whose associated data binds the ciphertext to the row's workspace, provider, label, and key version), persists the ciphertext, and never surfaces the plaintext to the browser again. See how the BYOK data path works end-to-end for the architecture diagram, or the Anthropic setup walkthrough if you want the exact click path.
You run a prompt. The server resolves your workspace's encrypted key, decrypts it in memory, makes the LLM request, streams the response back, and discards the decrypted key. The provider sees your account. The workbench sees the prompt text, the model choice, the response, and the usage metadata, but not your provider-side billing.

That's the whole data path. Setup runs about five minutes per provider. Once per provider, not once per prompt. If your workspace has procurement concerns about which services see which data, the exact answer is: the provider sees all inference traffic (it's their API); the workbench sees the prompt text, the model choice, and the response; nothing else.

BYOK data path specimen: three columns showing what the provider sees (keys issued, then nothing), what the workbench sees (ciphertext at rest, decrypted per call, then discarded), and what the provider API sees (all inference traffic billed to your account).

You'll want to connect a key to a free workspace before running the cost math in the next section, because the crossover point depends on your workload and the rates your provider quotes you.

The cost math (three scales)

Set aside markup for a moment. A BYOK cost model has two lines:

Platform fee. Flat subscription. Doesn't move with usage.
Provider bill. Linear in tokens. Cache hits and batch reads cut it. Fast models are cheaper than frontier models.

A proxy-only tool collapses these into one line and adds a margin. A BYOK tool keeps them separate.

For any workload, the decision turns on whether the platform fee plus your provider bill is lower than the proxy-only tool's combined line. The answer depends on scale.

Consider three example workloads:

100K input tokens per month. You're prototyping. Platform fees dominate. A $49/month Solo subscription plus a few dollars of direct Anthropic Haiku 4.5 usage totals under $60/month. Any tool with comparable features at a similar subscription is fine; the BYOK advantage at this scale is attribution, not savings.
1M input tokens per month. A small production feature. The direct provider bill might run $10 to $50 depending on model mix and cache hit rate. Proxy-only tools that bill per-trace or per-token start to out-run the flat subscription. A Team tier at $99/seat/month with BYOK keeps the platform fee predictable while the token bill scales at provider cost.
10M input tokens per month and up. A real production workload. Put numbers on it. On Claude Sonnet 4.6 ($3 per million input tokens) with a 70% cache-hit rate, the cached portion (7M tokens) reads at Anthropic's 90% cache-read discount for about $2.10; the uncached portion (3M tokens) costs $9. Add output tokens and the direct bill lands in the $20-$40 range depending on response length. A proxy-only tool that bundles tokens and adds even a 15% margin across this volume has to beat that arithmetic by sheer platform-feature value alone, which is a hard case to make every renewal cycle. See how each tier prices out.

Here's how the $18-$20 direct-to-Anthropic figure breaks down for the BYOK-mandatory case:

Cost line at 10M input tokens/mo · Sonnet 4.6 · 70% cache hit	Monthly
Cached input (7M tokens × $0.30/MTok)	~$2.10
Uncached input (3M tokens × $3/MTok)	~$9.00
Output tokens (illustrative, 500K × $15/MTok)	~$7.50
Total direct to Anthropic	~$18-20
Plus: flat platform subscription (BYOK-mandatory)	Free $0 / Solo $49/mo / Team $99/seat/mo
Plus: seat + trace overage (partial-BYOK)	Variable with workload
Proxy-only: one bundled bill with margin baked in	Opaque

The specific break-even depends on your model mix, your cache hit rate, and what the proxy-only tool charges. The mechanical truth is that platform fee plus provider bill scales slowly, proxy-only spend scales linearly with traffic plus margin, and at production scale they diverge.

This isn't a hypothetical for the market as a whole. Enterprise LLM API spending grew from $3.5B in late 2024 to $8.4B by mid-2025, a 140% increase (Menlo Ventures 2025 mid-year update). Gartner projects $2.52T in worldwide AI spending in 2026, up 44% year-over-year. Every percentage point of inference margin skimmed off that volume adds up fast. BYOK takes the skim off the table.

Three BYOK postures in the market

Prompt tools sort into three postures. Tell them apart by one question: where does the key live?

Posture 1 · BYOK-mandatory. Your provider key is required for every LLM call; the platform has no default keys and no fallback. Examples include Prompt Assay, Vellum, and PromptLayer in its BYOK configuration. Cost: flat platform fee plus direct provider bill, no markup possible. Governance: clean data path; security reviews are simple because the platform never holds credentials for calls you didn't authorize. Durability: high; if the platform vanishes, your provider relationship is intact.

Posture 2 · Partial-BYOK. The platform supports your provider key for inference, but still requires its own auth layer and may price per-trace or per-seat on top. LangSmith is the canonical example: you connect your Anthropic or OpenAI key for the traced requests, but platform access requires a LangSmith account with its own tier pricing ($39/seat/month for Plus, as of April 2026, with 10,000 included base traces and $2.50 per additional 1,000). Cost: your provider bill direct, plus the platform's seat and trace fees. Governance: cleaner than proxy-only; inference traffic goes direct. Durability: your provider relationship survives platform churn; any traces the platform holds may not. Teams hitting the per-trace ceiling at scale should read the LangSmith alternatives breakdown for the auto-upgrade math.

Posture 3 · Proxy-only. The platform holds the keys (often its own, sometimes yours), routes all inference through its infrastructure, and charges a margin on tokens. Some historical prompt-management platforms operated this way. Cost: one combined bill with opaque markup. Governance: inference traffic passes through a third party's servers, so data-processing agreements matter. Durability: the provider relationship is mediated, so when any intermediary platform sunsets the rebuild is larger than for direct-provider customers.

At the same 10M input tokens per month on Sonnet 4.6, the three postures produce structurally different bills:

Cost structure at 10M input tokens per month on Sonnet 4.6. BYOK-mandatory keeps the platform fee separated from a direct provider bill, no markup. Partial-BYOK layers a seat fee and trace overage on top of a direct provider bill. Proxy-only bundles everything into one opaque line with margin on every token.

Cost is one axis. Governance and durability are the other two:

Posture	Provider key required	Platform markup on tokens	If the platform sunsets
BYOK-mandatory	Yes, always	None (flat fee only)	Keys and provider relationship intact
Partial-BYOK	Yes, for inference	No on tokens; yes per-seat and per-trace	Provider relationship intact; platform-held traces may not be
Proxy-only	Optional or N/A	Margin baked into every call	Provider relationship mediated; rebuild required

Two quick tests to place a tool on this spectrum:

Can I run a prompt without giving the platform a key to a specific provider? If yes, they have default keys; you're on Posture 2 or 3.
Does my provider's usage dashboard show every call the platform makes on my behalf? If yes, you're on Posture 1 or the inference side of Posture 2. If no, you're on Posture 3.

Humanloop's September 2025 sunset is a recent reminder that platform posture matters in practice, whatever tier you were on.

What about the objections?

"BYOK is more setup than a SaaS wrapper." Five minutes per provider, once. You trade that five minutes for linear savings as workload scales and for a cleaner security review.

"The platform fee plus provider bill ends up costing more than a bundled tier." Below roughly 500K input tokens per month the platform subscription dominates on both sides, so the gap is small. Above 1M input tokens per month the BYOK side pulls ahead by the margin a proxy-only tool would otherwise skim. Run the numbers against the tier list and your provider's pricing page for your exact model mix.

"What if my key leaks?" Every major provider supports key rotation in the console; rotated keys invalidate within seconds. BYOK-mandatory platforms never transmit the plaintext key to the client, so the blast radius is bounded to any prompts run during the leak window. Workspace-scoped keys limit blast radius further.

"Does BYOK hurt caching, batch pricing, or rate limits?" No. Those are provider-side features tied to your account. BYOK means requests land in your account, so you keep every discount your provider ships.

"Switching from my current tool is expensive." One afternoon of parallel-run, typically. Export the prompts, connect keys in the new workbench, run both against a regression suite for a week, cut over. The public migration walkthrough covers the operational steps for Humanloop refugees specifically.

If those answers leave residual friction, add a provider key to a free workspace and run one prompt against your real workload before deciding.

Frequently Asked Questions

Reader notes at the edge of the argument.

Ship your next prompt or Skill in the workbench.

Prompt Assay is the workbench for shipping production prompts and Agent Skills. Version every change. Critique, improve, evaluate across GPT, Claude, and Gemini. Bring your own keys. No demo call. No card. No sales gate.

Open the workbench Read the docs

Issue №02 · Published APRIL 20, 2026 · Prompt Assay

What is a BYOK prompt tool?

What a BYOK prompt tool actually is

How BYOK works (the data path, five minutes of setup)

The cost math (three scales)

Three BYOK postures in the market

What about the objections?

Frequently Asked Questions

Ship your next prompt or Skill in the workbench.

LangSmith alternatives without per-trace fees

Promptfoo alternatives after OpenAI acquisition

Langfuse alternatives: the honest comparison

What a BYOK prompt tool actually is

How BYOK works (the data path, five minutes of setup)

The cost math (three scales)

Three BYOK postures in the market

What about the objections?

Frequently Asked Questions

Ship your next prompt or Skill in the workbench.

Further Reading

LangSmith alternatives without per-trace fees

Promptfoo alternatives after OpenAI acquisition

Langfuse alternatives: the honest comparison