Behind every great agentis a great prompt.
The authoring workbench. Enhance your craft.
- 5
- in-editor
AI actions - 6
- critique
dimensions - 3
- LLM
providers
01 You are a customer support triage assistant. 02 03 Read the ticket and decide how to route it. 04 Produce a short summary. 05 06 Be friendly. Be concise. Be thorough. 07 Always include the customer's full name. 08 Never mention account numbers. 09 10 Return the result. 11 12 {{ticket}}
See the AI pair grade three drafts of the same prompt.
The same task, written three ways. Click a tab to see how the six-dimension critique scores each draft and what it says about why.
You are a customer support triage assistant. Read the ticket and decide how to route it. Produce a short summary. Be friendly. Be concise. Always include the customer's name. Return the result.
- Clarity7/10
Role is now specific. Goal is clearer.
- Completeness5/10
Output spec is still loose. "Short summary" is undefined.
- Structure8/10
Reads as a procedural list.
- Technique6/10
Implicit task framing. No examples.
- Robustness5/10
No edge cases handled. PII not addressed.
- Efficiency7/10
Tighter than the weak draft, still wordy.
Excerpts shown. Real prompts run longer. Inside the editor, run the critique any time.
Everyone moved upstream or downstream. We stayed where the craft lives.
Every other tool moved upstream to agents or downstream to traces. The middle, where prompts get written, is empty. We sit there.
You open the editor. You write. The AI pair critiques, improves, rewrites, brainstorms, and compares. You version, evaluate, and ship. Everything else, we leave to everyone else.
Three instruments, one workbench.
An AI pair in the editor.
Five named actions next to your draft: spot weakness, propose fixes, start over, think out loud, compare two drafts. Your provider does the thinking. You stay in flow.
A versioned library with diffs and branches.
Every save is a version. Every change has a diff. Branch to try an idea, restore any point in history, annotate any version. The git workflow your prompts deserve.
@@ v6 → v7 @@ - Produce a summary. + Produce a short summary of the issue.
Evaluation suites with LLM-as-judge.
Run any prompt against a batch of test cases. Keyword checks for hard rules, judge models for anything qualitative. Cost, latency, and pass rates tracked per run.
Every feature on the workbench.
Nine features. Skim the titles in ten seconds; read anything that catches.
Six dimensions, one score.
Clarity, completeness, structure, technique, robustness, efficiency. Each scored with a written reason. The radar chart shows where the prompt leaks.
- CLARITY7
- COMPLETENESS5
- STRUCTURE8
- TECHNIQUE6
- ROBUSTNESS5
- EFFICIENCY7
Targeted edits, not rewrites.
Before-and-after suggestions, applied one click at a time. No wholesale rewrites unless you ask.
A full rewrite, with its reasoning.
The whole prompt rewritten, with the techniques and reasons. Saved as a new version on accept.
A conversation, not a form.
Chat with the AI pair about where the prompt should go. Suggested edits appear inline. Nobody else ships this.
Two versions, examined side by side.
Structural diff of any two versions. Named improvements, regressions, maturity signals. See what got better, drifted, or broke.
@@ v6 → v7 @@ - summary + short summary of the issue
Reusable context blocks, with variable slots.
System instructions, examples, guardrails, task frames. Compose into any prompt with named slots. Edit once, propagate everywhere.
Every provider, one editor.
Streaming output, variables, cost and latency per run. Switch between Anthropic, OpenAI, and Google in the model menu.
Eval suites with judge models.
Batch test cases with keyword checks and judge models. Per-criterion rationales. Pass-fail per case, aggregate per suite, cost and latency tracked.
Move your library in, move it out, no lock-in.
Anthropic JSON, OpenAI JSON, Markdown, bundled archives. Every format imports and every format exports.
The whole workbench, free to try.
Open the editorA day in the life of a prompt.
Talk it through with the AI pair, then draft in the editor.
Score it on six dimensions. See where it leaks and why.
Apply targeted edits one by one, or accept the full rewrite.
Stream output from any provider. Track cost, latency, and run history.
Run a suite. See pass-fail per case and the aggregate roll-up.
Version it, tag it, pull it via API or SDK from your app.
Your keys. Your bill. No markup on a single token.
BYOK on every paid tier. We never resell inference. Your provider account is the only place usage is billed.
Keys are encrypted per organization. Never logged, never sent to a client, never used outside the workflows you trigger.
One editor, three providers. Draft once, run against Anthropic, OpenAI, or Google. Switch the model, not the tool.
One workbench, two ways to work in it.
Everything you need to ship a great prompt, alone.
- Personal library with fragments and versions.
- The full AI pair: critique, improve, rewrite, brainstorm, compare.
- Every provider, one editor, BYOK.
- Public API and TypeScript SDK when you are ready to ship.
A shared library your whole team can work in.
- Org workspaces with owner, admin, and member roles.
- Shared fragments and versioned libraries, reviewed like code.
- Evaluation suites the whole team can run and see.
- SAML SSO and custom tier controls on Enterprise.
A public API and a typed SDK.
Pull any versioned prompt at runtime. REST or TypeScript SDK for type-safe clients. API-key auth, org-scoped, rate-limited, documented.
import { PromptAssay } from "promptassay-sdk";const client = new PromptAssay({apiKey: process.env.PROMPTASSAY_API_KEY!,});// Pull the resolved prompt with all fragments assembled.const { data: prompt } = await client.prompts.getResolved(promptId);// Send it to your provider. PromptAssay never touches the call.const response = await anthropic.messages.create({model: "claude-sonnet-4-6",messages: [{ role: "user", content: prompt.resolved_content }],});
Trust, before testimonials.
We do not retain provider responses on our servers. The one exception: evaluation test outputs, saved with each test case so you can review past runs.
Provider keys are encrypted at rest and never leave the server.
Owner, admin, and member roles enforced at every layer.
PromptAssay does not train, fine-tune, or aggregate your content into any model. Documented in section 5 of the privacy policy.
The first wave of prompt engineers is in the workbench.
Real reactions from real users. Customer logos and full case studies will land here as the program grows.
I've never seen a tool score a prompt on six dimensions and tell me why each one slipped. This is the workflow I've been hacking together in scratch files for two years.
Versioning, the AI critique, and BYOK in one place. The last one is the part nobody else gets right.
Pricing that respects your time.
Free to start. Self-serve through Team. Contact us only when you need Enterprise controls. BYOK at every paid tier.
For trying the editor and seeing what the AI pair does to your prompts.
- Personal workspace
- 500 AI calls a month — enough to explore
- All five in-editor AI actions, on the free tier
- 7-day version view
- Single seat
- BYOK: bring an Anthropic, OpenAI, or Google key
For the prompt engineer who is shipping.
- Unlimited AI calls on your keys
- Full version history
- Every in-editor AI action
- Public API and TypeScript SDK
- Single seat
For teams reviewing prompts like code.
- Everything in Solo
- Shared org library
- Owner, admin, member roles
- Team evaluation suites
- Invitations and seat management
- 3 to 15 seats
For engineering orgs that need SSO and custom controls.
- Everything in Team
- SAML SSO
- Unlimited seats
- Custom tier controls
- Data processing agreement
- Priority support
Questions, answered plainly.
Open the editor. Write a better prompt.
Free to start. Your keys, your bill, no demo call.