§.Skills

Skills overview

What a Skill is in Prompt Assay, how it differs from a prompt, and the workbench surfaces that author + critique + evaluate one.

Updated 2026-05-06 · By Jon Lasley

A Skill is a multi-file capability bundle following the agentskills.io specification: a SKILL.md with YAML frontmatter, optional scripts/ directory of helper code, and optional references/ directory of long-form context. Anthropic's Agent Skills overview describes how Claude Code consumes the format; OpenAI Custom GPTs and Google Gemini Gems consume variants of the same shape.

Where a prompt is one block of text aimed at one model call, a Skill is a reusable capability that ships across providers. Prompt Assay's Skills workbench gives you the same author + critique + improve + version primitives you have for prompts, plus a multi-provider Behavioral Eval that scores how reliably the same Skill activates across Claude, GPT, and Gemini. The community `anthropics/skills` repository is the canonical reference set if you want to see well-formed Skills before authoring your own.

Where to find it

Click Skills in the left sidebar · it sits next to Prompts as a peer artifact type and is visible to every workspace. The list page lives at /skills and individual workbenches at /skills/<id>.

What you get

Three authoring paths: start from scratch in the multi-file editor, import an Anthropic SKILL.md / OpenAI Custom GPT / Gemini Gem, or Convert a working prompt into a complete Skill bundle in one shot via the AI pair (preview-before-apply, BYOK-only · see Convert panel).
Multi-file authoring: SKILL.md plus scripts/ and references/ in a single bundle, with a CodeMirror editor and inline frontmatter validation.
19-rule linter with a seven-rule security tier (security-skill-*) that flags hardcoded secrets, untrusted fetch, runtime install commands, and other capability-specific risks.
Six-dimension critique scoring Discovery Fidelity, Instruction Quality, Example Coverage, Cross-Provider Portability, Token Efficiency, and Security & Safety Posture.
Behavioral Eval that runs your trigger + non-trigger probes across 2-5 BYOK models in parallel and scores activation accuracy + adherence with an LLM judge.
Versioning with diff and restore, identical to prompts.
Public Skill Reports with an opt-in body publish, robots-noindex by default, and a Shields.io badge for your README.

BYOK still applies

Every LLM call from Skills (critique, improve, eval) goes through your BYOK keys: same gate, same impersonation block, same tier limits as prompts. There is no demo budget on Skills.

← All docs Open the workbench →