JournalEst. 2026

Notes from the workbench.

Prompt engineering, context engineering, agents, and the AI advances worth paying attention to. Opinionated, specific, and shipped from the same workbench we build.

Current issueNo. 12Evaluation & Testing

GPT-4 accuracy dropped from 84 to 51 percent on identical questions in 2023, behind a stable model name.

Prompt drift: a 2026 detection playbook

Prompt drift is when output quality changes over time even though the prompt didn't change. The 2026 cadence, three causes, and a four-step playbook.

MAY 202612 min read

Read the issue

CatalogueNo. 11No. 01

Issue №11MAY 2026Evaluation & Testing

15 LLM-as-a-judge prompt templates (copy-paste)

15 copy-paste LLM-as-a-judge templates as YAML, organized by dimension. 5 foundations plus 10 specialized rubrics for RAG, code, summarization, agents.

15 min read

Issue №10MAY 2026Prompt Engineering

What is an Agent Skill?

An Agent Skill is a versioned folder of instructions and resources an LLM agent loads on demand. How Skills work, and how they differ from prompts and MCP.

19 min read

Issue №09MAY 2026Comparisons & Migrations

Promptfoo alternatives after OpenAI acquisition

Promptfoo joined OpenAI on March 9, 2026. MIT license preserved; the steward changed. For multi-provider eval and red-teaming, the credible 2026 alternatives.

11 min read

Issue №08APR 2026Comparisons & Migrations

Langfuse alternatives: the honest comparison

Langfuse alternative tools compared in 2026: five buyer scenarios where Langfuse isn't the right fit, with honest tool recommendations for each.

14 min read

Issue №07APR 2026Comparisons & Migrations

LangSmith alternatives without per-trace fees

LangSmith's auto-upgrade-on-feedback can turn per-trace billing superlinear. Compare Langfuse, Helicone, Phoenix, Braintrust, and where Prompt Assay fits.

13 min read

Issue №06APR 2026Prompt Engineering

How to version prompts: the 2026 guide

Prompt versioning captures every prompt change as an immutable artifact. Seven concrete steps, worked examples, and where it fits in your stack.

13 min read

Issue №05APR 2026Evaluation & Testing

How to set up prompt regression testing

A 7-step guide to building regression tests for production prompts. Catch breakage before deploy with golden datasets, scoring rubrics, and LLM judges.

18 min read

Issue №04APR 2026Comparisons & Migrations

PromptLayer alternatives: the honest comparison

PromptLayer alternatives compared honestly: current 2026 pricing, BYOK posture, and when Prompt Assay, LangSmith, Langfuse, or Braintrust fits better.

14 min read

Issue №03APR 2026Prompt Engineering

Sixty prompt engineering techniques, organized

A 2026 field guide to the 60 prompt engineering techniques worth knowing, organized into 10 workflow families with canonical examples from The Prompt Report.

13 min read

Issue №02APR 2026BYOK & Cost

What is a BYOK prompt tool?

A BYOK prompt tool routes every LLM call through your own API key. Here's what that means for cost, setup, and the three postures in the market.

12 min read

Issue №01APR 2026Comparisons & Migrations

Migrate from Humanloop: a 2026 re-home guide

Humanloop shut down Sep 2025. If the replacement you picked isn't sticking, this 2026 guide covers the durable asset, destinations, and BYOK math.

15 min read