I.Opening

Behind every great agentis a great prompt.

─── this is where it gets written.

The authoring workbench. Enhance your craft.

5
in-editor
AI actions
6
critique
dimensions
3
LLM
providers
BYOK · Free tier · No demo call
Specimen 001customer_support_triage.mdv7
01 You are a customer support triage assistant.
02 
03 Read the ticket and decide how to route it.
04 Produce a short summary.
05 
06 Be friendly. Be concise. Be thorough.
07 Always include the customer's full name.
08 Never mention account numbers.
09 
10 Return the result.
11 
12 {{ticket}}
line 04vague output spec
line 06contradictory instruction
line 12missing example
Critique score38 / 60CLARITY 7/10STRUCTURE 8/10ROBUSTNESS 5/10
Works with every major model
Anthropic ClaudeOpenAI GPTGoogle GeminiAnthropic ClaudeOpenAI GPTGoogle GeminiAnthropic ClaudeOpenAI GPTGoogle Gemini
Your keys. Your bill. No markup.
II.Try the Critique

See the AI pair grade three drafts of the same prompt.

The same task, written three ways. Click a tab to see how the six-dimension critique scores each draft and what it says about why.

OK draftBetter, but still leaks
You are a customer support triage assistant.
Read the ticket and decide how to route it.
Produce a short summary.
Be friendly. Be concise.
Always include the customer's name.
Return the result.
Critique score
38/ 60
  • Clarity7/10

    Role is now specific. Goal is clearer.

  • Completeness5/10

    Output spec is still loose. "Short summary" is undefined.

  • Structure8/10

    Reads as a procedural list.

  • Technique6/10

    Implicit task framing. No examples.

  • Robustness5/10

    No edge cases handled. PII not addressed.

  • Efficiency7/10

    Tighter than the weak draft, still wordy.

Excerpts shown. Real prompts run longer. Inside the editor, run the critique any time.

III.A Different Lane

Everyone moved upstream or downstream. We stayed where the craft lives.

Every other tool moved upstream to agents or downstream to traces. The middle, where prompts get written, is empty. We sit there.

You open the editor. You write. The AI pair critiques, improves, rewrites, brainstorms, and compares. You version, evaluate, and ship. Everything else, we leave to everyone else.

The category, end to end
Upstream
Agent frameworks
Orchestration, tools, memory, planning loops.
Here
Authoring
Where the prompt itself gets written, critiqued, and shipped.
Downstream
Logs and traces
Observability, request replay, eval analytics.
IV.The Instruments

Three instruments, one workbench.

Instrument one

An AI pair in the editor.

Five named actions next to your draft: spot weakness, propose fixes, start over, think out loud, compare two drafts. Your provider does the thinking. You stay in flow.

[critique][improve][rewrite][brainstorm][compare]
Instrument two

A versioned library with diffs and branches.

Every save is a version. Every change has a diff. Branch to try an idea, restore any point in history, annotate any version. The git workflow your prompts deserve.

@@ v6 → v7 @@
- Produce a summary.
+ Produce a short summary of the issue.
Instrument three

Evaluation suites with LLM-as-judge.

Run any prompt against a batch of test cases. Keyword checks for hard rules, judge models for anything qualitative. Cost, latency, and pass rates tracked per run.

V.The Cabinet

Every feature on the workbench.

Nine features. Skim the titles in ten seconds; read anything that catches.

Featured · critique

Six dimensions, one score.

Clarity, completeness, structure, technique, robustness, efficiency. Each scored with a written reason. The radar chart shows where the prompt leaks.

Critique score38 / 60
  • CLARITY7
  • COMPLETENESS5
  • STRUCTURE8
  • TECHNIQUE6
  • ROBUSTNESS5
  • EFFICIENCY7

Targeted edits, not rewrites.

Before-and-after suggestions, applied one click at a time. No wholesale rewrites unless you ask.

Clarity · line 04
Produce a summary.
+Produce a short summary of the issue.
Apply →

A full rewrite, with its reasoning.

The whole prompt rewritten, with the techniques and reasons. Saved as a new version on accept.

technique: few-shot
technique: chain-of-thought
→ version saved as v8

A conversation, not a form.

Chat with the AI pair about where the prompt should go. Suggested edits appear inline. Nobody else ships this.

you: tighten the role framing
pair: try "a senior support triage specialist"

Two versions, examined side by side.

Structural diff of any two versions. Named improvements, regressions, maturity signals. See what got better, drifted, or broke.

@@ v6 → v7 @@
- summary
+ short summary of the issue

Reusable context blocks, with variable slots.

System instructions, examples, guardrails, task frames. Compose into any prompt with named slots. Edit once, propagate everywhere.

role/senior-supportguardrails/piiformat/json-outexamples/triage-3policy/escalation

Every provider, one editor.

Streaming output, variables, cost and latency per run. Switch between Anthropic, OpenAI, and Google in the model menu.

Claude Sonnet$0.004820ms
GPT-4.1$0.006640ms
Gemini 2.5$0.003910ms

Eval suites with judge models.

Batch test cases with keyword checks and judge models. Per-criterion rationales. Pass-fail per case, aggregate per suite, cost and latency tracked.

customer_support_triage14 / 16 pass
avg 820ms·cost $0.042·claude-sonnet-4-6
summary_extraction16 / 16 pass
avg 640ms·cost $0.033·gpt-4.1

Move your library in, move it out, no lock-in.

Anthropic JSON, OpenAI JSON, Markdown, bundled archives. Every format imports and every format exports.

In
· anthropic.json
· openai.json
· library.md
· bundle.tar.gz
Out

The whole workbench, free to try.

Open the editor
VI.The Procedure

A day in the life of a prompt.

Brainstorm & draft

Talk it through with the AI pair, then draft in the editor.

Critique

Score it on six dimensions. See where it leaks and why.

Improve

Apply targeted edits one by one, or accept the full rewrite.

Run

Stream output from any provider. Track cost, latency, and run history.

Evaluate

Run a suite. See pass-fail per case and the aggregate roll-up.

Ship

Version it, tag it, pull it via API or SDK from your app.

VII.The Keys

Your keys. Your bill. No markup on a single token.

  • BYOK on every paid tier. We never resell inference. Your provider account is the only place usage is billed.

  • Keys are encrypted per organization. Never logged, never sent to a client, never used outside the workflows you trigger.

  • One editor, three providers. Draft once, run against Anthropic, OpenAI, or Google. Switch the model, not the tool.

Your provider connections
AnthropicClaude Sonnet 4.6
connected
OpenAIGPT-4.1
connected
GoogleGemini 2.5 Pro
connected
Billed to your provider$0.00 from us
VIII.Solo and Teams

One workbench, two ways to work in it.

For the prompt engineer

Everything you need to ship a great prompt, alone.

  • Personal library with fragments and versions.
  • The full AI pair: critique, improve, rewrite, brainstorm, compare.
  • Every provider, one editor, BYOK.
  • Public API and TypeScript SDK when you are ready to ship.
Starts free. Solo tier at $49 / month, unlimited calls on your keys.
Start solo
For the team

A shared library your whole team can work in.

  • Org workspaces with owner, admin, and member roles.
  • Shared fragments and versioned libraries, reviewed like code.
  • Evaluation suites the whole team can run and see.
  • SAML SSO and custom tier controls on Enterprise.
Team tier at $99 / seat / month. SAML SSO on Enterprise.
IX.Under Glass

A public API and a typed SDK.

Pull any versioned prompt at runtime. REST or TypeScript SDK for type-safe clients. API-key auth, org-scoped, rate-limited, documented.

[rest][typescript][api keys][org-scoped][versioned]
pull_prompt.tstypescript
import { PromptAssay } from "promptassay-sdk";
const client = new PromptAssay({
apiKey: process.env.PROMPTASSAY_API_KEY!,
});
// Pull the resolved prompt with all fragments assembled.
const { data: prompt } = await client.prompts.getResolved(promptId);
// Send it to your provider. PromptAssay never touches the call.
const response = await anthropic.messages.create({
model: "claude-sonnet-4-6",
messages: [{ role: "user", content: prompt.resolved_content }],
});
X.On the Record

Trust, before testimonials.

Responses stay with your provider

We do not retain provider responses on our servers. The one exception: evaluation test outputs, saved with each test case so you can review past runs.

Encrypted key storage

Provider keys are encrypted at rest and never leave the server.

Role-based access

Owner, admin, and member roles enforced at every layer.

We do not train on your content

PromptAssay does not train, fine-tune, or aggregate your content into any model. Documented in section 5 of the privacy policy.

Early access · April 2026

The first wave of prompt engineers is in the workbench.

Real reactions from real users. Customer logos and full case studies will land here as the program grows.

I've never seen a tool score a prompt on six dimensions and tell me why each one slipped. This is the workflow I've been hacking together in scratch files for two years.
Senior engineer, internal early access
Versioning, the AI critique, and BYOK in one place. The last one is the part nobody else gets right.
Reddit, r/LocalLLaMA discussion
XI.Subscription

Pricing that respects your time.

Free to start. Self-serve through Team. Contact us only when you need Enterprise controls. BYOK at every paid tier.

Free
$0/ month

For trying the editor and seeing what the AI pair does to your prompts.

  • Personal workspace
  • 500 AI calls a month — enough to explore
  • All five in-editor AI actions, on the free tier
  • 7-day version view
  • Single seat
  • BYOK: bring an Anthropic, OpenAI, or Google key
Start free
No card. Upgrade any time.
Most popular
Solo
$49/ month

For the prompt engineer who is shipping.

  • Unlimited AI calls on your keys
  • Full version history
  • Every in-editor AI action
  • Public API and TypeScript SDK
  • Single seat
Start solo
BYOK required. Cancel any time.
Team
$99/ seat / month

For teams reviewing prompts like code.

  • Everything in Solo
  • Shared org library
  • Owner, admin, member roles
  • Team evaluation suites
  • Invitations and seat management
  • 3 to 15 seats
Start a team
Minimum 3 seats. Annual billing available.
Enterprise
Contact us

For engineering orgs that need SSO and custom controls.

  • Everything in Team
  • SAML SSO
  • Unlimited seats
  • Custom tier controls
  • Data processing agreement
  • Priority support
Talk to us
Custom contract. Talk to us.
BYOK at every paid tier. Your keys. Your bill. No markup.
XII.Marginalia

Questions, answered plainly.

Yes, on every paid tier. We never resell inference. Your provider account is where your usage is billed and where your logs live. We think this is the only honest way to run this kind of platform.

Open the editor. Write a better prompt.

Free to start. Your keys, your bill, no demo call.