JournalEst. 2026

Notes from the workbench.

Prompt engineering, context engineering, agents, and the AI advances worth paying attention to. Opinionated, specific, and shipped from the same workbench we build.

Current issueNo. 12Evaluation & Testing
GPT-4 accuracy dropped from 84 to 51 percent on identical questions in 2023, behind a stable model name.

Prompt drift: a 2026 detection playbook

Prompt drift is when output quality changes over time even though the prompt didn't change. The 2026 cadence, three causes, and a four-step playbook.

12 min read
Read the issue

Catalogue

CatalogueNo. 11No. 01
Issue №11Evaluation & Testing

15 LLM-as-a-judge prompt templates (copy-paste)

15 copy-paste LLM-as-a-judge templates as YAML, organized by dimension. 5 foundations plus 10 specialized rubrics for RAG, code, summarization, agents.

15 min read
Issue №10Prompt Engineering

What is an Agent Skill?

An Agent Skill is a versioned folder of instructions and resources an LLM agent loads on demand. How Skills work, and how they differ from prompts and MCP.

19 min read
Issue №09Comparisons & Migrations

Promptfoo alternatives after OpenAI acquisition

Promptfoo joined OpenAI on March 9, 2026. MIT license preserved; the steward changed. For multi-provider eval and red-teaming, the credible 2026 alternatives.

11 min read
Issue №08Comparisons & Migrations

Langfuse alternatives: the honest comparison

Langfuse alternative tools compared in 2026: five buyer scenarios where Langfuse isn't the right fit, with honest tool recommendations for each.

14 min read
Issue №07Comparisons & Migrations

LangSmith alternatives without per-trace fees

LangSmith's auto-upgrade-on-feedback can turn per-trace billing superlinear. Compare Langfuse, Helicone, Phoenix, Braintrust, and where Prompt Assay fits.

13 min read
Issue №06Prompt Engineering

How to version prompts: the 2026 guide

Prompt versioning captures every prompt change as an immutable artifact. Seven concrete steps, worked examples, and where it fits in your stack.

13 min read
Issue №05Evaluation & Testing

How to set up prompt regression testing

A 7-step guide to building regression tests for production prompts. Catch breakage before deploy with golden datasets, scoring rubrics, and LLM judges.

18 min read
Issue №04Comparisons & Migrations

PromptLayer alternatives: the honest comparison

PromptLayer alternatives compared honestly: current 2026 pricing, BYOK posture, and when Prompt Assay, LangSmith, Langfuse, or Braintrust fits better.

14 min read
Issue №03Prompt Engineering

Sixty prompt engineering techniques, organized

A 2026 field guide to the 60 prompt engineering techniques worth knowing, organized into 10 workflow families with canonical examples from The Prompt Report.

13 min read
Issue №02BYOK & Cost

What is a BYOK prompt tool?

A BYOK prompt tool routes every LLM call through your own API key. Here's what that means for cost, setup, and the three postures in the market.

12 min read
Issue №01Comparisons & Migrations

Migrate from Humanloop: a 2026 re-home guide

Humanloop shut down Sep 2025. If the replacement you picked isn't sticking, this 2026 guide covers the durable asset, destinations, and BYOK math.

15 min read