Promptfoo alternatives after OpenAI acquisition

OpenAI acquired Promptfoo on March 9, 2026. The MIT license on the open-source CLI is preserved; the new steward is OpenAI. The stated integration target is OpenAI Frontier, OpenAI's security-testing and red-teaming surface for AI agents. For teams who picked Promptfoo for provider-neutrality, three credible 2026 alternatives cover the eval and red-teaming use cases without OpenAI's stewardship: Braintrust, Langfuse, and Prompt Assay.
On this page
What Promptfoo actually is, and what changed on March 9
Promptfoo is an open-source command-line tool for evaluating, red-teaming, and security-testing LLM applications. The configuration is a YAML file (promptfooconfig.yaml) that names prompts, providers, and assertions; the CLI runs the matrix and reports results. The provider catalog spans Anthropic, OpenAI, Google, Bedrock, Azure, local models via Ollama, and a long list of others, with framework adapters that let the same eval suite run cross-vendor without code changes. Founded in 2024 by Ian Webster and Michael D'Angelo, Promptfoo raised $23M total across its rounds and reached an $86M valuation in July 2025, per OpenAI's acquisition announcement.
On March 9, 2026, OpenAI announced it had acquired Promptfoo. The deal value is not disclosed. Promptfoo's technology is being integrated into OpenAI Frontier for automated security testing and red-teaming of AI agents. Per OpenAI's own announcement, the company "expects to continue building out Promptfoo's open source offering." That language is forward-looking ("expects to"), not a binding commitment about feature parity, multi-provider scope, or a defined long-term release cadence.
The TechCrunch coverage carries no commitment about supporting non-OpenAI providers (Anthropic, Google, Bedrock, local models) going forward. The OSS license stays MIT either way; the question buyers are publicly asking is whether the MIT codebase will keep accumulating non-OpenAI provider improvements at the same pace once the maintainers' day jobs are inside OpenAI.
What the acquisition means for buyers
Three concerns are real and worth naming honestly. None of them require Promptfoo to actually break or stop working. They're roadmap-stewardship concerns, not platform-death concerns.
Provider-neutrality stewardship. Multi-provider eval depends on someone caring about every provider equally. A maintainer team employed by OpenAI has natural reasons to ship OpenAI-first features faster. The MIT license means the codebase can be forked if the trajectory diverges, but a fork without funded maintainers is a different product than a steward-led OSS project.
Eval-tool roadmap independence. Promptfoo's eval CI primitives (assertions, datasets, model-graded checks) compete adjacent to OpenAI's evals product. A buyer who wants their eval-tool roadmap to optimize for cross-model rigor rather than OpenAI ecosystem alignment now has an answer in 2026 they didn't have to consider in 2025.
Security-testing workflows. Red-teaming is the surface most directly absorbed into OpenAI Frontier. If your security team uses Promptfoo specifically for cross-vendor red-teaming (testing the same agent against Claude, GPT, and Gemini for jailbreak resistance), the post-acquisition direction is harder to predict than the eval surface.
The DEV Community publication "Top 5 AI Agent Eval Tools After Promptfoo's Exit" went up shortly after the announcement, capturing the same concerns from the developer side. The signal is real; the question is which alternative fits which workflow.
The four alternatives that earn their slot in 2026
The right alternative depends on the specific use that drove Promptfoo adoption. Eval-CI workflows, observability-first tracing, and prompt authoring are three different rooms in the same house, and Promptfoo was opinionated about being the eval-CI room without taking over the others.
Braintrust · for eval-first workflows with hosted observability
Braintrust is the closest spiritual successor to "Promptfoo's eval-CI workflow, but hosted." Tracing plus evals plus storage in one platform. Multi-provider via an AI Gateway endpoint that accepts OpenAI, Anthropic, and Google SDK calls. Pricing in May 2026: Starter Free (1GB processed, 10K scores, 14-day retention), Pro $249/mo (5GB, 50K scores, 30-day retention), Enterprise custom.
Braintrust raised an $80M Series B at an $800M valuation in February 2026, led by ICONIQ. The independence story is durable for the medium term. The trade for that durability is gateway-mode posture: BYOK keys you supply still pass through Braintrust's proxy, which a security-conscious team should weigh against the managed-gateway benefits.
Closed-source and hosted-only on Pro; self-host is Enterprise-only.
Langfuse · for OSS observability with eval primitives, OTel-native
Langfuse is open-source under MIT (excluding ee folders), with both hosted and self-host paths. Tracing, prompt management, evals, datasets, dashboards, and a playground all under one roof. The v3 architecture is OpenTelemetry-native, which makes it framework-agnostic across Anthropic, OpenAI, Google, and adapter integrations.
May 2026 pricing: Hobby Free (50K units/mo, 30-day retention), Core $29/mo, Pro $199/mo, Enterprise $2,499/mo. Self-host on a small VM with managed Postgres and ClickHouse runs in the low hundreds of dollars per month at small scale.
ClickHouse acquired Langfuse on January 16, 2026, which gave the tracing layer the storage primitive it depended on under the same corporate roof. The acquisition explicitly committed to keeping Langfuse open-source under MIT and continuing Langfuse Cloud as a standalone service with the same SLAs and support · which is a stronger durability statement than Promptfoo's post-acquisition language. (See the Langfuse alternatives breakdown for when Langfuse itself is not the right fit.)
Langfuse is structurally closest to Promptfoo for buyers who valued OSS plus framework-agnostic plus eval primitives in one tool.
Prompt Assay · for the workbench half, paired with a tracing or security tool
Prompt Assay is the prompt-engineering workbench: authoring, six-dimension critique on every prompt before it ships, two-version Compare with model-graded structural diff, prompt-level versioning with diff and restore, an AI pair (Brainstorm, Critique, Improve, Rewrite, Compare) inside the editor, and eval suites with LLM-as-a-judge graders.
The honest framing is that Prompt Assay covers the authoring and prompt-iteration workflow that Promptfoo's YAML-driven prompts: block represented; the eval-CI surface inside Prompt Assay (test cases, rubrics, graders) covers the assertions side. Red-teaming and security testing are not surfaces Prompt Assay specializes in · pair with a security-testing tool of your choice for that workflow.
Pricing is flat: Free $0, Solo $49/mo, Team $99 per seat per month, Enterprise contact-sales. BYOK-mandatory at every paid tier with no inference markup; provider keys connect directly to Anthropic, OpenAI, and Google with no parent-company tilt. The trust page covers the encryption-at-rest and key-handling specifics.
LangSmith · for LangChain-heavy codebases (with caveats)
LangSmith is mostly orthogonal to the Promptfoo use case · it's observability-first with prompt management and evals attached, not an eval-CI CLI · but for LangChain-native codebases that ran Promptfoo for assertion-style checks, LangSmith's eval surface plus its native LangChain trace ingestion can subsume both. The caveat is the per-trace cost curve once annotation queues, evaluators, or run-rule matches start auto-upgrading affected traces to the extended tier. The LangSmith alternatives breakdown covers that mechanic in full.
How the migration actually works
Promptfoo's configuration is YAML. The migration to any of the alternatives is parsing the YAML and re-creating the prompts, providers, and assertions in the new tool's surface. There is no automated importer, but the YAML structure is well-defined and the parsing is a few hundred lines of Python or TypeScript at most.
The core fields to migrate:
prompts· the prompt strings or files. Land each as a new prompt in the destination tool; if you used variable interpolation, preserve it.providers· the model configs. Re-encode in the destination tool's provider config or run-time call.tests· the test cases withvars,assertblocks, and expected outputs. In Prompt Assay these become eval suite test cases with rubrics and graders. In Braintrust they become dataset rows with score functions. In Langfuse they become datasets with eval methods.redteamconfigurations · the red-teaming probe sets are the surface most coupled to Promptfoo's CLI; if red-teaming is your primary use, factor that into which tool you pick.
For teams running both Promptfoo and a hosted tool today (a common pattern · YAML in source control for pre-deploy, hosted tool for production observability), the migration is just consolidating the eval definitions in the hosted tool's surface and removing the CLI dependency. For teams using Promptfoo as the only eval surface, the migration is more substantial, and the destination choice matters.
The sequenced walkthrough lives at /migrate/promptfoo; the side-by-side destinations comparison lives at /alternatives/promptfoo.
Where Prompt Assay fits, honestly
Prompt Assay is a specific shape. It is the workbench: authoring, critique, versioning, eval suites. It is not a tracing platform, not an AI gateway, and not a security red-teaming surface.
For Promptfoo refugees specifically, the fit is good in three scenarios:
- You used Promptfoo for the prompt-iteration loop more than for security testing. YAML-defined prompts that you ran assertions against, where the assertions were quality checks (does the output cover X, does it stay under Y tokens, does it match the schema). That workflow lands cleanly in Prompt Assay's eval suites.
- Provider-neutrality is a hard requirement. Prompt Assay is BYOK-mandatory across Anthropic, OpenAI, and Google with no provider preference. We never sit in the inference request path. There is no parent-company roadmap pressure on which provider gets the next adapter improvement.
- You want authoring depth alongside the eval surface. Six-dimension critique on every prompt before it ships, two-version Compare with model-graded structural diff, prompt-level versioning with branching and annotations · these are what Promptfoo did not do, and they're the part of the workflow that lives upstream of the eval CI.
If your primary Promptfoo use was red-teaming and security probes, Prompt Assay is the wrong tool · pair with a dedicated security-testing surface and use Prompt Assay for the authoring and quality-eval half.
Open the editor at /signup. No credit card, no demo call. BYOK setup is documented at /docs/byok/overview and takes a few minutes.
Ready to evaluate Prompt Assay against your real prompts?
Plug your existing prompts into the workbench and run six-dimension critique, two-version Compare, and an eval suite against your golden test cases. Free tier, no credit card, no demo call. Open the editor at /signup. The BYOK setup is documented at /docs/byok/overview; multi-provider keys connect directly with no inference markup.
Frequently Asked Questions
Reader notes at the edge of the argument.
Ship your next prompt in the workbench.
Prompt Assay is the workbench for shipping production LLM prompts. Version every change. Critique, improve, and compare across GPT, Claude, and Gemini. Bring your own keys. No demo call. No card. No sales gate.
Further Reading
- №08·April 2026
Langfuse alternatives: the honest comparison
Langfuse alternative tools compared in 2026: five buyer scenarios where Langfuse isn't the right fit, with honest tool recommendations for each.
Comparisons & Migrations·14 min read - №07·April 2026
LangSmith alternatives without per-trace billing
LangSmith's auto-upgrade-on-feedback can turn per-trace billing superlinear. Compare Langfuse, Helicone, Phoenix, Braintrust, and where Prompt Assay fits.
Comparisons & Migrations·13 min read - №04·April 2026
PromptLayer alternatives: the honest comparison
PromptLayer alternatives compared honestly: current 2026 pricing, BYOK posture, and when Prompt Assay, LangSmith, Langfuse, or Braintrust fits better.
Comparisons & Migrations·14 min read
Issue №09 · Published MAY 1, 2026 · Prompt Assay