Issue №07Comparisons & Migrations

LangSmith alternatives without per-trace billing

Jonathan Lasley12 min read
LangSmith auto-upgrade: $2.50 to $5.00 per 1,000 traces when any of three feedback triggers fire.

LangSmith bills per trace, and any annotation flips a trace from $2.50 to $5.00 per thousand. That auto-upgrade is why per-trace billing goes superlinear at agentic-workload scale. Four credible alternatives exist in 2026: Langfuse, Helicone, Phoenix, and Braintrust. Prompt Assay handles the workbench half (authoring, critique, versioning, evals); pair it with one of those for tracing.

On this page

What LangSmith actually is, and what it costs

LangSmith is LangChain's commercial observability and evaluation platform. It records traces (every LLM call, tool call, and agent step), persists them for later review, and layers on dataset management, evaluations, and prompt management on top. It's not LangChain-locked: the official SDKs cover Python, TypeScript, and Java, and the wrap_openai helper instruments any provider call without LangChain code in the request path.

The pricing on LangSmith's own page (verified April 2026) reads as follows on the Plus plan:

  • $39 per seat per month
  • 10,000 base-retention traces included
  • $2.50 per 1,000 traces beyond the base allotment, at 14-day retention
  • $5.00 per 1,000 traces at extended retention (400 days)

For a five-seat team running a typical chat product (50,000 traces a month, no annotation pressure), that math sits around $295/mo. Comfortable. The bill behaves predictably. For most teams onboarding LangSmith for the first time, the pricing model is fine.

The numbers stop being comfortable at agentic scale. We'll cover why next, because the cost mechanic isn't volume alone. It's volume plus a feedback rule most teams don't notice until renewal. If you're earlier in that funnel and just want the BYOK economics framing, the BYOK posture for prompt tools post covers that ground.

Why LangSmith gets expensive at scale

The cost mechanic is documented in LangSmith's own support docs: any trace that receives a feedback signal auto-upgrades from base retention ($2.50 per thousand) to extended retention ($5.00 per thousand). The three trigger conditions are explicit:

  1. A trace placed in an annotation queue for human review
  2. Feedback added to the trace, manually or via the SDK feedback API (typical in eval pipelines)
  3. An automation rule that flags or routes the trace

All three happen automatically the moment a team adopts the platform's other features. If you're using the eval suite, every eval run adds feedback. If you have an automation rule that captures error traces for review, every error trace upgrades. If you have a human review queue, every queued trace upgrades.

The "base discount" is theoretical for any team using LangSmith past the trace-only mode it was first marketed on.

Auto-upgrade trigger map: three conditions (annotation queue, feedback added, automation rule) that each flip a trace from base 14-day retention at $2.50 per thousand to extended 400-day retention at $5.00 per thousand

What that does to the bill at agentic scale

Agentic workloads change the trace shape entirely. A single user request that fans out to 12 tool calls produces 13 traces, not 1. We've watched teams cross from 100,000 traces a month to 2 million traces a month inside one quarter just by promoting a chat product to an agent product. The base allotment is unchanged at 10,000.

Worked example, 1 million traces per month at a 10% feedback rate (modest by agent standards):

  • 10,000 free at base
  • 890,000 at $2.50/k base = $2,225
  • 100,000 at $5.00/k extended (the 10% receiving feedback) = $500
  • Subtotal $2,725, plus seats

At 10 million traces a month with the same feedback rate, that subtotal is $27,475. A common workaround is to drop sample rates to 0.1%, but at that point observability is cosmetic; you're recording one trace in a thousand and missing every regression that doesn't happen during the sampled window.

LangSmith Plus monthly bill at three workload scales for a five-seat team: $295 at 50K traces with no annotation, $2,920 at 1M traces with 10% annotation, $27,670 at 10M traces with the same annotation rate. Prompt Assay Team is flat at $99 per seat regardless of trace volume.

This isn't a malicious pricing model. It's a pricing model designed when LLM products were chat products, and it didn't get re-tuned when the median product became an agent. It's also why teams renewing in 2026 are quietly running the math and looking at alternatives. We've heard a version of "what did we spend on LangSmith last quarter" become a board-level question more than once.

The four credible alternatives in 2026

ToolLicenseSelf-hostHostedPricing shapeBest for
LangfuseMIT (open source)YesYesFree self-host; usage-tier hosted1:1 LangSmith replacement, durability concern
HeliconeApache 2.0YesYesGateway-mode + async-log; flat tiersGateway features (caching, rate-limiting)
Phoenix (Arize)Elastic License 2.0YesYes (since June 2025)Free self-host; Free/$50/Enterprise on cloudArize-aligned teams, eval-driven
BraintrustClosed sourceNoYesTrace + eval bundledEval-heavy workflows, no self-host need

Langfuse

Langfuse is the closest one-to-one LangSmith replacement on feature surface: tracing, datasets, evals, prompt management, all open-source under MIT. The self-host story is mature; a small Langfuse deployment runs on a single VM with a managed Postgres and ClickHouse for tracing storage.

The durability story changed materially in January 2026. ClickHouse acquired Langfuse at a $400M Series D valuation. For teams that bounced off Humanloop's September 2025 sunset, this is a real signal: Langfuse now has the storage primitive its tracing layer depends on under the same corporate roof.

Helicone

Helicone is structurally different. It supports two integration patterns: a gateway/proxy mode (you swap your provider base URL to oai.helicone.ai, and Helicone routes the request to the provider while logging it) and an async log mode (your application calls the provider directly, then fires a separate logging request after the fact, off the critical path). The async mode means Helicone outages don't take your product down.

Helicone is Apache 2.0 and self-hostable via Docker Compose (Postgres, ClickHouse, Redis, MinIO). The proxy mode unlocks gateway features like caching, rate-limiting per API key, and request retries that LangSmith doesn't offer in the same shape.

Phoenix (Arize)

Phoenix is Arize's open-core observability and eval framework. The license matters: it's Elastic License 2.0, not Apache. Self-host is free, but the ELv2 prohibits a third party from offering Phoenix as a managed service.

Arize itself runs managed Phoenix Cloud as of 2025: Free at 25,000 spans/month, Pro at $50/month with 50,000 spans, Enterprise with custom pricing. The self-host path stays free. Phoenix fits cleanly when the team is already on Arize's larger ML observability stack.

Braintrust

Braintrust is closed-source and hosted-only. The product is sharper on the eval side than the trace side: it's the alternative that prioritizes "your evals are the spec for your prompts" workflow over generic LLM observability. For teams that ran into LangSmith's pricing while heavily annotating eval datasets, Braintrust is closer to the workflow they actually wanted.

Self-host is Enterprise-only, which is the tradeoff for smaller teams. If a compliance review rejects vendor-proxied traffic and Enterprise pricing isn't on the table, Braintrust isn't the answer. Teams also evaluating PromptLayer on a similar shortlist should read the PromptLayer alternatives breakdown; the pricing shapes are different but the buying motion overlaps.

Where Prompt Assay fits, honestly

Prompt Assay is not an observability platform. It doesn't ingest production traces, it doesn't run tracing storage, and it doesn't have a request-path SDK that wraps your inference calls. Anyone telling you Prompt Assay is a LangSmith replacement on its own is selling you something.

What Prompt Assay does cover is the workbench half of what most teams adopted LangSmith for:

  • Prompt authoring with version control (diff, restore, branching, annotations on the version itself)
  • Six-dimension critique (Clarity, Completeness, Structure, Technique Usage, Robustness, Efficiency) on the prompt before it ships
  • Multi-provider Compare across Claude, GPT, and Gemini for the same prompt
  • Eval suites with test cases, rubrics, and LLM-as-a-judge graders
  • An AI pair in the editor (Brainstorm, Critique, Improve, Rewrite, Compare)

If your real use of LangSmith was 30% trace plumbing, 70% "manage prompts and run evals so we stop shipping regressions," that 70% is what Prompt Assay handles. The 30% is what you pair with Langfuse self-host (or Helicone, or Phoenix Cloud, depending on your stack).

On the Team tier, the workspace is shared across the team with roles, invitations, an audit log, and org-scoped API keys for shipping prompts to production from code. Prompt registry, eval suites, and version history live inside the team workspace, so the "who changed this prompt and when" question has an answer instead of a Slack archaeology dig.

The pricing is flat. Prompt Assay's pricing is $49 per month on Solo, $99 per seat per month on Team, custom on Enterprise. There's no per-trace meter, because we never see your inference traffic. BYOK is mandatory at every paid tier: your Anthropic, OpenAI, and Google keys connect directly to the providers. Your bill stays with your provider, not us. The BYOK setup is documented and takes about 60 seconds; the trust page covers the encryption-at-rest and key-isolation specifics if compliance review needs them.

If BYOK setup cost is the friction you're worried about, you're already paying it. LangSmith's wrap_openai helper and @traceable decorator route inference through your own provider keys today. Migrating to Prompt Assay swaps which platform's wrapper sits around the inference call, not whether keys are involved.

The honest pairing recommendation:

  • Langfuse self-host + Prompt Assay is the default. Open source for tracing, flat fee for the workbench, neither bills per trace.
  • Helicone + Prompt Assay if you want gateway-mode caching alongside async logging.
  • LangSmith Developer (free) + Prompt Assay if you want to keep LangSmith for trace history and move authoring + critique + evals to Prompt Assay. The cost reduction comes from not running paid LangSmith seats on your full team.

Open the editor and connect a key. No credit card, no demo call.

Migrating from LangSmith, practical path

This is what the migration actually looks like for a five-to-ten-engineer team. The goal is no observability gap during the transition.

  1. Keep LangSmith on the lowest tier for 30 to 60 days

    Don't cancel on the first day. The Developer tier is free and keeps your historical traces queryable while the new pipeline ramps up. Set a calendar reminder for the cancellation date.

  2. Stand up replacement tracing

    A small Langfuse self-host on a $20/month VM with managed Postgres takes a few hours including the SDK swap. The Helicone async-log path is faster (one base-URL change for gateway mode, or a non-blocking log call for async mode).

  3. Move authoring, critique, and evals to Prompt Assay

    Import existing prompts as the first version of each new Prompt Assay record. Connect your provider keys. Re-run any test cases you had as a Prompt Assay eval suite. The eval suite docs cover the rubric and judge-model setup.

  4. Repoint production traces to the new tracing tool, ramp down LangSmith

    Cut the SDK over in one deploy. Watch the new tracing tool for 48 hours. When you're confident no signal is missing, downgrade the LangSmith tier to free or cancel.

The pattern is the same one teams used migrating off Humanloop. The Humanloop migration writeup covers the durability framing that applies here too: when a vendor's pricing or stability changes the calculus, you don't fight the migration, you sequence it so nothing drops.

Frequently Asked Questions

Reader notes at the edge of the argument.

Ship your next prompt in the workbench.

Prompt Assay is the workbench for shipping production LLM prompts. Version every change. Critique, improve, and compare across GPT, Claude, and Gemini. Bring your own keys. No demo call. No card. No sales gate.

Open the editorRead the docs

  1. 04·April 2026

    PromptLayer alternatives: the honest comparison

    PromptLayer alternatives compared honestly: current 2026 pricing, BYOK posture, and when Prompt Assay, LangSmith, Langfuse, or Braintrust fits better.

    Comparisons & Migrations·14 min read
  2. 01·April 2026

    Migrate from Humanloop: a 2026 re-home guide

    Humanloop shut down Sep 2025. If the replacement you picked isn't sticking, this 2026 guide covers the durable asset, destinations, and BYOK math.

    Comparisons & Migrations·13 min read
  3. 02·April 2026

    What is a BYOK prompt tool?

    A BYOK prompt tool routes every LLM call through your own API key. Here's what that means for cost, setup, and the three postures in the market.

    BYOK & Cost·12 min read

Issue №07 · Published APRIL 25, 2026 · Prompt Assay