On this page
Authoring a Skill from scratch
Frontmatter contract, body conventions, scripts/ vs references/ vs body, and the common pitfalls that trip the linter on the first save.
This guide walks you through building a Skill from scratch in Prompt Assay. We'll author a commit-message-formatter Skill: a small capability that turns a paste of git diff output into a Conventional Commit message with a one-line subject (≤72 chars), a blank line, and a body explaining the change. Discovery is the load-bearing concern. The Skill should activate when the user pastes a diff or asks for a commit message, and stay dormant otherwise.
Skills follow the agentskills.io spec, which Anthropic's Agent Skills (see platform.claude.com/docs/en/agents-and-tools/agent-skills/overview), OpenAI Custom GPTs, and Google Gemini Gems all consume variants of. The same SKILL.md ships across all three providers; PA's six-dimension critique and Behavioral Eval are the layer that tells you whether it actually does.
Where to start
- Create the skillFrom
/skills, click New skill. Pick a name (kebab-case, see the constraint table below) and a one-sentence description. PA scaffolds a SKILL.md with valid frontmatter you can edit. - Open the workbenchThe file tree on the left shows
SKILL.mdselected by default. Click it to load the CodeMirror editor. The AI Pair on the right has Critique, Improve, Rewrite, Brainstorm, and Behavioral Eval tabs. - Save as you goCmd/Ctrl-S commits a new version. The version dropdown in the header shows history; the diff view at
/skills/<id>/versionsworks the same as prompts.
The frontmatter contract
Every SKILL.md opens with a YAML block. Two fields are required (name and description); the rest are optional but lift discovery and portability scores when set. Prompt Assay validates frontmatter inline as you type and surfaces structured errors above the editor.
---
name: commit-message-formatter
description: |
Turns pasted git diff output or a request like "write me a commit message"
into a Conventional Commit. Use when the user shares diff text, references
a staged change, or asks for help phrasing a commit. Do not use for
non-git copy editing.
compatibility:
- claude
- openai
- gemini
allowed-tools: []
license: MIT
metadata:
author: Prompt Assay
version: 1.0.0
tags:
- git
- commit-message
- conventional-commits
---Required fields
| Field | Constraints | Why it matters |
|---|---|---|
| name | 1-64 chars, kebab-case ([a-z0-9-]+), no leading or trailing hyphen, no consecutive hyphens, must equal the directory name when imported as a file bundle. The agentskills.io spec also reserves anthropic and claude as namespaces, so avoid them in your name. | Discovery routing. The model matches the user's request against the name + description; a name like helper will lose to commit-message-formatter every time. |
| description | ≤1024 chars, third-person voice, plain text (no XML or markdown that the runtime might re-render). Should answer: what does this Skill do, AND when should it activate. | This is the single largest input to Discovery Fidelity (D1). Vague descriptions activate on too much; over-specific descriptions miss real triggers. |
Optional fields
| Field | Shape | When to use |
|---|---|---|
| compatibility | Array of provider ids (claude, openai, gemini). Empty or omitted means the Skill works everywhere. | Set this when the Skill genuinely depends on a provider feature (e.g. Claude tool use). Otherwise leave it out, because Cross-Provider Portability (D4) penalizes unjustified narrowing. |
| allowed-tools | Array of tool names the agent runtime is permitted to invoke when this Skill is active. | Use the ServerName:tool_name form (e.g. Filesystem:read_file). An empty array means the Skill is purely advisory. |
| license | SPDX identifier (MIT, Apache-2.0, CC0-1.0). | Required for any Skill you intend to publish. Missing license is a soft-flag in the linter. |
| metadata | Free-form object. Common keys: author, version, tags. | Surfaces in the workbench header. The version field here is independent of PA's own version history; bump it when you cut a release for downstream consumers. |
Body conventions
After the frontmatter, write the SKILL.md body in imperative voice. The model reads this as instructions to follow when activated, not as documentation about the Skill. Three sections are conventional:
- When to use: concrete activation triggers, paired with negative examples (when NOT to use). This is where you raise non-trigger discrimination.
- How it works: the steps the model should take. Imperative, numbered, atomic. Avoid hedges ("you could", "perhaps") because they degrade adherence.
- Examples: at least 2-3 worked examples in fenced blocks showing input → expected output. Example Coverage (D3) scores how many distinct invocation paths you cover.
## When to use
Activate when the user:
- Pastes `git diff` or `git log` output and asks for a commit message
- Says "write me a commit message" with context about what changed
- Asks to convert prose into Conventional Commit format
Do NOT activate for:
- General code-review requests (use a code-review Skill instead)
- Pull request descriptions (different format, longer)
## How it works
1. Identify the primary change type: `feat`, `fix`, `docs`, `refactor`, `test`, `chore`, `perf`, `build`, `ci`, `style`, or `revert`.
2. Pick a scope (optional, in parentheses) only if the change clearly belongs to a single component.
3. Write a subject ≤72 chars, imperative mood, no trailing period.
4. Insert a blank line, then a body that explains *why* (not *what*; the diff is the *what*).
## Examples
Input: a diff that adds a null check in `lib/auth.ts`.
Output:
```
fix(auth): handle null session in middleware refresh
Refresh middleware previously assumed a non-null session and crashed
on first-load when the cookie was absent. Returns early instead.
```references/ folder so the runtime can attach them on demand without paying the token cost on every activation.The scripts/ folder
Files under scripts/ are deterministic helpers the agent runtime executes. Use them when the work has a predictable shape that doesn't need an LLM: parsing a diff, formatting a date, validating against a schema, hitting a known REST endpoint.
- Path whitelist:
scripts/[a-z0-9._-]+\.(py|js|ts|sh). Lowercase only, no path traversal, no nested folders. Prompt Assay enforces this at both the application layer and the database CHECK constraint. - Per-file size cap: 1 MiB. Bundle size is a Token Efficiency signal even though scripts aren't loaded into the model context.
- No execution in PA: Behavioral Eval mentions scripts as 'available to the model' but never runs them. Treat scripts as something a downstream runtime will execute, not as something Prompt Assay validates dynamically.
- Header comment: every shell script needs a header comment explaining what it does. Missing headers fire
security-skill-undocumented-shellin the linter.
The references/ folder
Files under references/ are long-form context the model reads when the SKILL.md instructs it to ("see references/conventions.md for the full type list"). Use them when the Skill needs detailed reference material that would bloat the body.
- Path whitelist:
references/[a-z0-9._-]+\.md. One level deep only, lowercase, markdown only. - Reference, don't duplicate: if the same content appears in both the body and a reference, the linter flags it as a redundancy and Token Efficiency drops.
- No frontmatter: only SKILL.md takes frontmatter. Reference files are plain markdown.
Body, scripts/, or references/?
| Put it in... | When the content is... |
|---|---|
| SKILL.md body | Instructional. Tells the model what to do. Short enough to load on every activation. Includes 2-3 worked examples. |
| scripts/ | Deterministic. The work has one correct answer and no LLM judgment is needed. The runtime can run it without prompting. |
| references/ | Lookup material. Long lists, exhaustive type tables, full API references the model only needs sometimes. Linked from the body, not always loaded. |
Improve can extract sections into new files
When SKILL.md is bloated with long examples or detailed reference material, the Improve tab can propose extracting a region into a new references/<name>.md or scripts/<name>.{py,js,ts,sh} file. Suggestions of this kind carry a chip in the Improve panel that reads Extracts to references/auth.md (or similar) and the diff card shows the new file's content above the SKILL.md before/after diff.
Click Apply on an extract suggestion and three things happen at once: the new file lands in your bundle's file tree, the SKILL.md region you were told to remove gets replaced with the AI's pointer text (typically See [references/auth.md](references/auth.md) or similar), and the editor flashes the change so you can verify the splice landed correctly. Save the skill to commit the new version. The next Critique run will score the leaner SKILL.md against D5 Token Efficiency directly.
- Path validation: the proposed
target_pathruns through the same whitelist regex as the file-tree Add dialog ·..is rejected, length capped at 80, lowercase letters / digits / dots / underscores / hyphens only. - Collision check: an extract suggestion targeting a path that already exists in your bundle is rejected at apply time. If you see this, rename or delete the existing file first.
- Per-file 1 MiB cap and 4 MiB total bundle cap apply at apply time. The AI is told about both; if a proposed extraction would exceed either, the apply path refuses.
- Apply All ignores extract suggestions. The bulk-apply button only runs in-place edits because each extract creates a new file (and
create_newscript suggestions need explicit confirmation per item). Apply extract suggestions one at a time.
creation_mode: "create_new" with target_kind: "script"), the apply opens a confirmation modal showing the full script content before the file lands in your bundle. Read it carefully · the linter security tier scans the new file after creation, but secrets, curl ... | bash patterns, or untrusted fetches are advisory findings, not blockers. Prompt Assay never executes scripts; the risk is what downstream runtimes (Claude Code, Anthropic Skill SDK) might do.Bundle density signals
Iterative critique → improve cycles can quietly accrete files. Anthropic's published reference skills carry 1-9 files per directory (median 4); past ~8 files in either references/ or scripts/, readers lose the mental map. The file-tree surfaces two soft-cap captions to keep authoring honest:
- 8-11 files in any dir · muted advisory: *"Past 8 files per dir, readers lose the mental map · consider grouping by domain."*
- 12+ files in any dir · warning-tone nudge: *"Bundle is dense · consider consolidating overlapping references or splitting into multiple skills."*
Both are advisory · there's no hard cap. The agentskills.io spec genuinely allows unlimited files (progressive disclosure means unused files cost zero context). The signals exist so the workbench tells you when readability is starting to suffer, not because the runtime will reject the bundle.
Common pitfalls
- Vague description → low D1. "Helps with git stuff" activates on every git mention. Be concrete: "Turns pasted git diff output into a Conventional Commit message."
- XML or markdown in `description` → portability hit. Some runtimes re-render the description and your
<important>tags become literal text. - Provider-specific syntax in body → low D4. XML tags work on Claude, function-call schemas work on GPT, but cross-provider Skills should stay in plain markdown.
- Hardcoded API keys → fires
security-skill-secret-in-bodyorsecurity-skill-secret-in-script. Use environment variables and document them in the body. - `curl ... | bash` → fires
security-skill-curl-bash. Never run unverified code at install time. - Examples that look identical → low D3. Three examples that all show the same shape don't cover three invocation paths; they cover one. Vary inputs.
- Missing non-trigger probes → can't measure false-positive activation. Behavioral Eval needs both kinds of probes to score Discovery Fidelity meaningfully.
Next steps
- Run Critique to score the bundle on six dimensions and surface critical improvements.
- Run Behavioral Eval to test trigger and non-trigger probes across 2-5 BYOK models in parallel.
- Once both scores are healthy, save the report and publish a Skill Report and drop a README badge into your distribution.