Authoring a Skill from scratch

Frontmatter contract, body conventions, scripts/ vs references/ vs body, and the common pitfalls that trip the linter on the first save.

Updated 2026-05-06 · By Jon Lasley

This guide walks you through building a Skill from scratch in Prompt Assay. We'll author a commit-message-formatter Skill: a small capability that turns a paste of git diff output into a Conventional Commit message with a one-line subject (≤72 chars), a blank line, and a body explaining the change. Discovery is the load-bearing concern. The Skill should activate when the user pastes a diff or asks for a commit message, and stay dormant otherwise.

Skills follow the agentskills.io spec, which Anthropic's Agent Skills (see platform.claude.com/docs/en/agents-and-tools/agent-skills/overview), OpenAI Custom GPTs, and Google Gemini Gems all consume variants of. The same SKILL.md ships across all three providers; PA's six-dimension critique and Behavioral Eval are the layer that tells you whether it actually does.

Where to start

Create the skill
From /skills, click New skill. Pick a name (kebab-case, see the constraint table below) and a one-sentence description. PA scaffolds a SKILL.md with valid frontmatter you can edit.
Open the workbench
The file tree on the left shows SKILL.md selected by default. Click it to load the CodeMirror editor. The AI Pair on the right has Critique, Improve, Rewrite, Brainstorm, and Behavioral Eval tabs.
Save as you go
Cmd/Ctrl-S commits a new version. The version dropdown in the header shows history; the diff view at /skills/<id>/versions works the same as prompts.

The frontmatter contract

Every SKILL.md opens with a YAML block. Two fields are required (name and description); the rest are optional but lift discovery and portability scores when set. Prompt Assay validates frontmatter inline as you type and surfaces structured errors above the editor.

---
name: commit-message-formatter
description: |
  Turns pasted git diff output or a request like "write me a commit message"
  into a Conventional Commit. Use when the user shares diff text, references
  a staged change, or asks for help phrasing a commit. Do not use for
  non-git copy editing.
compatibility:
  - claude
  - openai
  - gemini
allowed-tools: []
license: MIT
metadata:
  author: Prompt Assay
  version: 1.0.0
  tags:
    - git
    - commit-message
    - conventional-commits
---

A complete frontmatter for the commit-message-formatter Skill.

Required fields

Field	Constraints	Why it matters
name	1-64 chars, kebab-case (`[a-z0-9-]+`), no leading or trailing hyphen, no consecutive hyphens, must equal the directory name when imported as a file bundle. The agentskills.io spec also reserves `anthropic` and `claude` as namespaces, so avoid them in your name.	Discovery routing. The model matches the user's request against the name + description; a name like `helper` will lose to `commit-message-formatter` every time.
description	≤1024 chars, third-person voice, plain text (no XML or markdown that the runtime might re-render). Should answer: what does this Skill do, AND when should it activate.	This is the single largest input to Discovery Fidelity (D1). Vague descriptions activate on too much; over-specific descriptions miss real triggers.

Optional fields

Field	Shape	When to use
compatibility	Array of provider ids (`claude`, `openai`, `gemini`). Empty or omitted means the Skill works everywhere.	Set this when the Skill genuinely depends on a provider feature (e.g. Claude tool use). Otherwise leave it out, because Cross-Provider Portability (D4) penalizes unjustified narrowing.
allowed-tools	Array of tool names the agent runtime is permitted to invoke when this Skill is active.	Use the `ServerName:tool_name` form (e.g. `Filesystem:read_file`). An empty array means the Skill is purely advisory.
license	SPDX identifier (`MIT`, `Apache-2.0`, `CC0-1.0`).	Required for any Skill you intend to publish. Missing license is a soft-flag in the linter.
metadata	Free-form object. Common keys: `author`, `version`, `tags`.	Surfaces in the workbench header. The `version` field here is independent of PA's own version history; bump it when you cut a release for downstream consumers.

Body conventions

After the frontmatter, write the SKILL.md body in imperative voice. The model reads this as instructions to follow when activated, not as documentation about the Skill. Three sections are conventional:

When to use: concrete activation triggers, paired with negative examples (when NOT to use). This is where you raise non-trigger discrimination.
How it works: the steps the model should take. Imperative, numbered, atomic. Avoid hedges ("you could", "perhaps") because they degrade adherence.
Examples: at least 2-3 worked examples in fenced blocks showing input → expected output. Example Coverage (D3) scores how many distinct invocation paths you cover.

## When to use

Activate when the user:
- Pastes `git diff` or `git log` output and asks for a commit message
- Says "write me a commit message" with context about what changed
- Asks to convert prose into Conventional Commit format

Do NOT activate for:
- General code-review requests (use a code-review Skill instead)
- Pull request descriptions (different format, longer)

## How it works

1. Identify the primary change type: `feat`, `fix`, `docs`, `refactor`, `test`, `chore`, `perf`, `build`, `ci`, `style`, or `revert`.
2. Pick a scope (optional, in parentheses) only if the change clearly belongs to a single component.
3. Write a subject ≤72 chars, imperative mood, no trailing period.
4. Insert a blank line, then a body that explains *why* (not *what*; the diff is the *what*).

## Examples

Input: a diff that adds a null check in `lib/auth.ts`.

Output:
```
fix(auth): handle null session in middleware refresh

Refresh middleware previously assumed a non-null session and crashed
on first-load when the cookie was absent. Returns early instead.
```

Body excerpt for the commit-message-formatter Skill.

Soft 500-line cap

Token Efficiency (D5) penalizes bundles over ~500 lines. If you're past that, move long examples or context into the references/ folder so the runtime can attach them on demand without paying the token cost on every activation.

The scripts/ folder

Files under scripts/ are deterministic helpers the agent runtime executes. Use them when the work has a predictable shape that doesn't need an LLM: parsing a diff, formatting a date, validating against a schema, hitting a known REST endpoint.

Path whitelist: scripts/[a-z0-9._-]+\.(py|js|ts|sh). Lowercase only, no path traversal, no nested folders. Prompt Assay enforces this at both the application layer and the database CHECK constraint.
Per-file size cap: 1 MiB. Bundle size is a Token Efficiency signal even though scripts aren't loaded into the model context.
No execution in PA: Behavioral Eval mentions scripts as 'available to the model' but never runs them. Treat scripts as something a downstream runtime will execute, not as something Prompt Assay validates dynamically.
Header comment: every shell script needs a header comment explaining what it does. Missing headers fire security-skill-undocumented-shell in the linter.

The references/ folder

Files under references/ are long-form context the model reads when the SKILL.md instructs it to ("see references/conventions.md for the full type list"). Use them when the Skill needs detailed reference material that would bloat the body.

Path whitelist: references/[a-z0-9._-]+\.md. One level deep only, lowercase, markdown only.
Reference, don't duplicate: if the same content appears in both the body and a reference, the linter flags it as a redundancy and Token Efficiency drops.
No frontmatter: only SKILL.md takes frontmatter. Reference files are plain markdown.

Body, scripts/, or references/?

Put it in...	When the content is...
SKILL.md body	Instructional. Tells the model what to do. Short enough to load on every activation. Includes 2-3 worked examples.
scripts/	Deterministic. The work has one correct answer and no LLM judgment is needed. The runtime can run it without prompting.
references/	Lookup material. Long lists, exhaustive type tables, full API references the model only needs sometimes. Linked from the body, not always loaded.

Improve can extract sections into new files

When SKILL.md is bloated with long examples or detailed reference material, the Improve tab can propose extracting a region into a new references/<name>.md or scripts/<name>.{py,js,ts,sh} file. Suggestions of this kind carry a chip in the Improve panel that reads Extracts to references/auth.md (or similar) and the diff card shows the new file's content above the SKILL.md before/after diff.

Click Apply on an extract suggestion and three things happen at once: the new file lands in your bundle's file tree, the SKILL.md region you were told to remove gets replaced with the AI's pointer text (typically See [references/auth.md](references/auth.md) or similar), and the editor flashes the change so you can verify the splice landed correctly. Save the skill to commit the new version. The next Critique run will score the leaner SKILL.md against D5 Token Efficiency directly.

Path validation: the proposed target_path runs through the same whitelist regex as the file-tree Add dialog · .. is rejected, length capped at 80, lowercase letters / digits / dots / underscores / hyphens only.
Collision check: an extract suggestion targeting a path that already exists in your bundle is rejected at apply time. If you see this, rename or delete the existing file first.
Per-file 1 MiB cap and 4 MiB total bundle cap apply at apply time. The AI is told about both; if a proposed extraction would exceed either, the apply path refuses.
Apply All ignores extract suggestions. The bulk-apply button only runs in-place edits because each extract creates a new file (and create_new script suggestions need explicit confirmation per item). Apply extract suggestions one at a time.

AI-authored scripts require explicit confirmation

When an extract suggestion proposes a NEW script (creation_mode: "create_new" with target_kind: "script"), the apply opens a confirmation modal showing the full script content before the file lands in your bundle. Read it carefully · the linter security tier scans the new file after creation, but secrets, curl ... | bash patterns, or untrusted fetches are advisory findings, not blockers. Prompt Assay never executes scripts; the risk is what downstream runtimes (Claude Code, Anthropic Skill SDK) might do.

Brainstorm and Rewrite still target SKILL.md only

The extract-to-file capability lives in Improve and the Critique → Improve chain · those are the places where the AI surfaces structural recommendations like "this 200-line auth section belongs in references/." Brainstorm's edit blocks and Rewrite's full-body output continue to operate on SKILL.md exclusively. If a Brainstorm or Rewrite session lands on "this should be its own file," switch to Improve to apply it.

Bundle density signals

Iterative critique → improve cycles can quietly accrete files. Anthropic's published reference skills carry 1-9 files per directory (median 4); past ~8 files in either references/ or scripts/, readers lose the mental map. The file-tree surfaces two soft-cap captions to keep authoring honest:

8-11 files in any dir · muted advisory: *"Past 8 files per dir, readers lose the mental map · consider grouping by domain."*
12+ files in any dir · warning-tone nudge: *"Bundle is dense · consider consolidating overlapping references or splitting into multiple skills."*

Both are advisory · there's no hard cap. The agentskills.io spec genuinely allows unlimited files (progressive disclosure means unused files cost zero context). The signals exist so the workbench tells you when readability is starting to suffer, not because the runtime will reject the bundle.

Improve gates extraction at high density too

When references/ or scripts/ already has ≥8 files, the Improve prompt is gated to prefer in-place edits over new extractions unless extraction would consolidate existing files OR the content is ≥100 lines AND structurally distinct. Prevents iterative cycles from blowing the bundle past the legibility line.

Common pitfalls

Vague description → low D1. "Helps with git stuff" activates on every git mention. Be concrete: "Turns pasted git diff output into a Conventional Commit message."
XML or markdown in `description` → portability hit. Some runtimes re-render the description and your <important> tags become literal text.
Provider-specific syntax in body → low D4. XML tags work on Claude, function-call schemas work on GPT, but cross-provider Skills should stay in plain markdown.
Hardcoded API keys → fires security-skill-secret-in-body or security-skill-secret-in-script. Use environment variables and document them in the body.
`curl ... | bash` → fires security-skill-curl-bash. Never run unverified code at install time.
Examples that look identical → low D3. Three examples that all show the same shape don't cover three invocation paths; they cover one. Vary inputs.
Missing non-trigger probes → can't measure false-positive activation. Behavioral Eval needs both kinds of probes to score Discovery Fidelity meaningfully.

Next steps

Run Critique to score the bundle on six dimensions and surface critical improvements.
Run Behavioral Eval to test trigger and non-trigger probes across 2-5 BYOK models in parallel.
Once both scores are healthy, save the report and publish a Skill Report and drop a README badge into your distribution.

← All docs Open the editor →