Is Promptfoo still open source after the OpenAI acquisition?

Yes. The MIT license is preserved per OpenAI's acquisition announcement , and the company stated it "expects to continue building out Promptfoo's open source offering." The forward-looking framing matters · OpenAI has not committed publicly to maintaining feature parity for non-OpenAI providers (Anthropic, Google, Bedrock, local models) going forward, and the integration target is OpenAI Frontier, OpenAI's security-testing surface for AI agents.

What's the closest direct replacement for Promptfoo's eval CLI?

Braintrust is the closest hosted spiritual successor for the eval-CI workflow. Langfuse is the closest open-source one with OTel-native instrumentation. Both have larger surfaces than Promptfoo · they bundle tracing and observability with evals · so the migration is consolidating into a broader tool, not lateral. There is no exact CLI-shaped 1:1 replacement that is also funded and actively developed.

Does Prompt Assay replace Promptfoo for security testing?

No. Prompt Assay is a prompt-engineering workbench: authoring, six-dimension critique, two-version Compare with model-graded diff, prompt-level versioning, and eval suites with LLM-as-a-judge graders. It does not specialize in red-teaming or AI-agent security probing, which is Promptfoo's strongest surface. For security testing specifically, pair Prompt Assay (or any other workbench) with a dedicated security-testing tool · OpenAI Frontier is one option, and there are independent security-research tools that occupy this lane.

Can I keep using Promptfoo and just add a workbench alongside?

Yes, this is a reasonable interim. Keep the YAML-defined eval CI in Promptfoo for what it does well, and add Prompt Assay (or Braintrust, or Langfuse) for the authoring and prompt-iteration loop that lives upstream. Most teams who run both today already have this pattern. The decision to fully migrate off Promptfoo is independent of whether the workflow upstream needs more depth · you can answer the second question first.

Issue №09Comparisons & Migrations

Promptfoo alternatives after OpenAI acquisition

Q: Why are some teams looking for alternatives if the OSS is preserved?

Three reasons. Provider-neutrality stewardship · a maintainer team employed by OpenAI has structural reasons to prioritize OpenAI features. Eval-tool roadmap independence · buyers who want eval primitives that optimize for cross-model rigor rather than OpenAI ecosystem alignment have a real concern. Security-testing workflows · red-teaming is the surface most directly absorbed into OpenAI Frontier, and its post-acquisition direction is harder to predict than the eval surface.

Jon LasleyMay 1, 202611 min read

OpenAI acquired Promptfoo on March 9, 2026. The MIT license on the open-source CLI is preserved; the new steward is OpenAI. The stated integration target is OpenAI Frontier, OpenAI's security-testing and red-teaming surface for AI agents. For teams who picked Promptfoo for provider-neutrality, three credible 2026 alternatives cover the eval and red-teaming use cases without OpenAI's stewardship: Braintrust, Langfuse, and Prompt Assay.

01Note
OpenAI acquired Promptfoo on March 9, 2026. Deal value not disclosed. The MIT-licensed open-source CLI is preserved per OpenAI's announcement, and integration into OpenAI Frontier is the stated direction for the technology.
02Note
OpenAI committed to 'continue building out' the open-source offering. There is no public commitment to maintaining feature parity for non-OpenAI providers (Anthropic, Google, Bedrock, local models) going forward.
03Note
Three durable concerns drive the search for alternatives: provider-neutrality stewardship, eval-tool roadmap independence, and security-testing workflows that aren't gated by OpenAI's product priorities.
04Note
Credible alternatives in 2026: Braintrust for hosted eval CI, Langfuse for OSS observability plus eval primitives, Prompt Assay for the workbench half (authoring, six-dimension critique, eval suites with LLM-as-a-judge) without parent-company tilt.
05Note
Prompt Assay covers the prompt authoring, critique, versioning, and eval-suite workflow. Pair it with a security-testing tool of your choice if red-teaming is your primary use of Promptfoo.

On this page

What Promptfoo actually is, and what changed on March 9

Promptfoo is an open-source command-line tool for evaluating, red-teaming, and security-testing LLM applications. The configuration is a YAML file (promptfooconfig.yaml) that names prompts, providers, and assertions; the CLI runs the matrix and reports results. The provider catalog spans Anthropic, OpenAI, Google, Bedrock, Azure, local models via Ollama, and a long list of others, with framework adapters that let the same eval suite run cross-vendor without code changes. Founded in 2024 by Ian Webster and Michael D'Angelo, Promptfoo raised $23M total across its rounds and reached an $86M valuation in July 2025, per OpenAI's acquisition announcement.

On March 9, 2026, OpenAI announced it had acquired Promptfoo. The deal value is not disclosed. Promptfoo's technology is being integrated into OpenAI Frontier for automated security testing and red-teaming of AI agents. Per OpenAI's own announcement, the company "expects to continue building out Promptfoo's open source offering." That language is forward-looking ("expects to"), not a binding commitment about feature parity, multi-provider scope, or a defined long-term release cadence.

The TechCrunch coverage carries no commitment about supporting non-OpenAI providers (Anthropic, Google, Bedrock, local models) going forward. The OSS license stays MIT either way; the question buyers are publicly asking is whether the MIT codebase will keep accumulating non-OpenAI provider improvements at the same pace once the maintainers' day jobs are inside OpenAI.

What the acquisition means for buyers

Three concerns are real and worth naming honestly. None of them require Promptfoo to actually break or stop working. They're roadmap-stewardship concerns, not platform-death concerns.

Provider-neutrality stewardship. Multi-provider eval depends on someone caring about every provider equally. A maintainer team employed by OpenAI has natural reasons to ship OpenAI-first features faster. The MIT license means the codebase can be forked if the trajectory diverges, but a fork without funded maintainers is a different product than a steward-led OSS project.

Eval-tool roadmap independence. Promptfoo's eval CI primitives (assertions, datasets, model-graded checks) compete adjacent to OpenAI's evals product. A buyer who wants their eval-tool roadmap to optimize for cross-model rigor rather than OpenAI ecosystem alignment now has an answer in 2026 they didn't have to consider in 2025.

Security-testing workflows. Red-teaming is the surface most directly absorbed into OpenAI Frontier. If your security team uses Promptfoo specifically for cross-vendor red-teaming (testing the same agent against Claude, GPT, and Gemini for jailbreak resistance), the post-acquisition direction is harder to predict than the eval surface.

The DEV Community publication "Top 5 AI Agent Eval Tools After Promptfoo's Exit" went up shortly after the announcement, capturing the same concerns from the developer side. The signal is real; the question is which alternative fits which workflow.

The four alternatives that earn their slot in 2026

The right alternative depends on the specific use that drove Promptfoo adoption. Eval-CI workflows, observability-first tracing, and prompt authoring are three different rooms in the same house, and Promptfoo was opinionated about being the eval-CI room without taking over the others.

Braintrust · for eval-first workflows with hosted observability

Braintrust is the closest spiritual successor to "Promptfoo's eval-CI workflow, but hosted." Tracing plus evals plus storage in one platform. Multi-provider via an AI Gateway endpoint that accepts OpenAI, Anthropic, and Google SDK calls. Pricing in May 2026: Starter Free (1GB processed, 10K scores, 14-day retention), Pro $249/mo (5GB, 50K scores, 30-day retention), Enterprise custom.

Braintrust raised an $80M Series B at an $800M valuation in February 2026, led by ICONIQ. The independence story is durable for the medium term. The trade for that durability is gateway-mode posture: BYOK keys you supply still pass through Braintrust's proxy, which a security-conscious team should weigh against the managed-gateway benefits.

Closed-source and hosted-only on Pro; self-host is Enterprise-only.

Langfuse · for OSS observability with eval primitives, OTel-native

Langfuse is open-source under MIT (excluding ee folders), with both hosted and self-host paths. Tracing, prompt management, evals, datasets, dashboards, and a playground all under one roof. The v3 architecture is OpenTelemetry-native, which makes it framework-agnostic across Anthropic, OpenAI, Google, and adapter integrations.

May 2026 pricing: Hobby Free (50K units/mo, 30-day retention), Core $29/mo, Pro $199/mo, Enterprise $2,499/mo. Self-host on a small VM with managed Postgres and ClickHouse runs in the low hundreds of dollars per month at small scale.

ClickHouse acquired Langfuse on January 16, 2026, which gave the tracing layer the storage primitive it depended on under the same corporate roof. The acquisition explicitly committed to keeping Langfuse open-source under MIT and continuing Langfuse Cloud as a standalone service with the same SLAs and support · which is a stronger durability statement than Promptfoo's post-acquisition language. (See the Langfuse alternatives breakdown for when Langfuse itself is not the right fit.)

Langfuse is structurally closest to Promptfoo for buyers who valued OSS plus framework-agnostic plus eval primitives in one tool.

Prompt Assay · for the workbench half, paired with a tracing or security tool

Prompt Assay is the prompt-engineering workbench: authoring, six-dimension critique on every prompt before it ships, two-version Compare with model-graded structural diff, prompt-level versioning with diff and restore, an AI pair (Brainstorm, Critique, Improve, Rewrite, Compare) inside the editor, and eval suites with LLM-as-a-judge graders.

The honest framing is that Prompt Assay covers the authoring and prompt-iteration workflow that Promptfoo's YAML-driven prompts: block represented; the eval-CI surface inside Prompt Assay (test cases, rubrics, graders) covers the assertions side. Red-teaming and security testing are not surfaces Prompt Assay specializes in · pair with a security-testing tool of your choice for that workflow.

Pricing is flat: Free $0, Solo $49/mo, Team $99 per seat per month, Enterprise contact-sales. BYOK-mandatory at every paid tier with no inference markup; provider keys connect directly to Anthropic, OpenAI, and Google with no parent-company tilt. The trust page covers the encryption-at-rest and key-handling specifics.

LangSmith · for LangChain-heavy codebases (with caveats)

LangSmith is mostly orthogonal to the Promptfoo use case · it's observability-first with prompt management and evals attached, not an eval-CI CLI · but for LangChain-native codebases that ran Promptfoo for assertion-style checks, LangSmith's eval surface plus its native LangChain trace ingestion can subsume both. The caveat is the per-trace cost curve once annotation queues, evaluators, or run-rule matches start auto-upgrading affected traces to the extended tier. The LangSmith alternatives breakdown covers that mechanic in full.

How the migration actually works

Promptfoo's configuration is YAML. The migration to any of the alternatives is parsing the YAML and re-creating the prompts, providers, and assertions in the new tool's surface. There is no automated importer, but the YAML structure is well-defined and the parsing is a few hundred lines of Python or TypeScript at most.

The core fields to migrate:

prompts · the prompt strings or files. Land each as a new prompt in the destination tool; if you used variable interpolation, preserve it.
providers · the model configs. Re-encode in the destination tool's provider config or run-time call.
tests · the test cases with vars, assert blocks, and expected outputs. In Prompt Assay these become eval suite test cases with rubrics and graders. In Braintrust they become dataset rows with score functions. In Langfuse they become datasets with eval methods.
redteam configurations · the red-teaming probe sets are the surface most coupled to Promptfoo's CLI; if red-teaming is your primary use, factor that into which tool you pick.

For teams running both Promptfoo and a hosted tool today (a common pattern · YAML in source control for pre-deploy, hosted tool for production observability), the migration is just consolidating the eval definitions in the hosted tool's surface and removing the CLI dependency. For teams using Promptfoo as the only eval surface, the migration is more substantial, and the destination choice matters.

The sequenced walkthrough lives at /migrate/promptfoo; the side-by-side destinations comparison lives at /alternatives/promptfoo.

Where Prompt Assay fits, honestly

Prompt Assay is a specific shape. It is the workbench: authoring, critique, versioning, eval suites. It is not a tracing platform, not an AI gateway, and not a security red-teaming surface.

For Promptfoo refugees specifically, the fit is good in three scenarios:

You used Promptfoo for the prompt-iteration loop more than for security testing. YAML-defined prompts that you ran assertions against, where the assertions were quality checks (does the output cover X, does it stay under Y tokens, does it match the schema). That workflow lands cleanly in Prompt Assay's eval suites.
Provider-neutrality is a hard requirement. Prompt Assay is BYOK-mandatory across Anthropic, OpenAI, and Google with no provider preference. We never sit in the inference request path. There is no parent-company roadmap pressure on which provider gets the next adapter improvement.
You want authoring depth alongside the eval surface. Six-dimension critique on every prompt before it ships, two-version Compare with model-graded structural diff, prompt-level versioning with branching and annotations · these are what Promptfoo did not do, and they're the part of the workflow that lives upstream of the eval CI.

If your primary Promptfoo use was red-teaming and security probes, Prompt Assay is the wrong tool · pair with a dedicated security-testing surface and use Prompt Assay for the authoring and quality-eval half.

Open the editor at /signup. No credit card, no demo call. BYOK setup is documented at /docs/byok/overview and takes a few minutes.

Ready to evaluate Prompt Assay against your real prompts?

Plug your existing prompts into the workbench and run six-dimension critique, two-version Compare, and an eval suite against your golden test cases. Free tier, no credit card, no demo call. Open the editor at /signup. The BYOK setup is documented at /docs/byok/overview; multi-provider keys connect directly with no inference markup.

Frequently Asked Questions

Reader notes at the edge of the argument.

Ship your next prompt in the workbench.

Prompt Assay is the workbench for shipping production LLM prompts. Version every change. Critique, improve, and compare across GPT, Claude, and Gemini. Bring your own keys. No demo call. No card. No sales gate.

Open the editor Read the docs

Issue №09 · Published MAY 1, 2026 · Prompt Assay

Promptfoo alternatives after OpenAI acquisition

What Promptfoo actually is, and what changed on March 9

What the acquisition means for buyers

The four alternatives that earn their slot in 2026

Braintrust · for eval-first workflows with hosted observability

Langfuse · for OSS observability with eval primitives, OTel-native

Prompt Assay · for the workbench half, paired with a tracing or security tool

LangSmith · for LangChain-heavy codebases (with caveats)

How the migration actually works

Where Prompt Assay fits, honestly

Ready to evaluate Prompt Assay against your real prompts?

Frequently Asked Questions

Ship your next prompt in the workbench.

Langfuse alternatives: the honest comparison

LangSmith alternatives without per-trace billing

PromptLayer alternatives: the honest comparison

What Promptfoo actually is, and what changed on March 9

What the acquisition means for buyers

The four alternatives that earn their slot in 2026

Braintrust · for eval-first workflows with hosted observability

Langfuse · for OSS observability with eval primitives, OTel-native

Prompt Assay · for the workbench half, paired with a tracing or security tool

LangSmith · for LangChain-heavy codebases (with caveats)

How the migration actually works

Where Prompt Assay fits, honestly

Ready to evaluate Prompt Assay against your real prompts?

Frequently Asked Questions

Ship your next prompt in the workbench.

Further Reading

Langfuse alternatives: the honest comparison

LangSmith alternatives without per-trace billing

PromptLayer alternatives: the honest comparison