Last month we changed how Mentor126.ai handles assessment scoring. The new logic was cleaner, faster, and better aligned with how sales managers actually think about rep development. It took four hours to build.
It also broke three other features.
Not because the code was wrong. The code was exactly what we asked for. The problem was that the scoring change had implications for the profile assignment engine, the course recommendation logic, and the benchmarking comparisons, and none of those implications were written down anywhere the AI could find them.
We'd made the decision to change scoring in a conversation. We'd discussed the downstream effects in a Slack thread. We'd even agreed on how to handle the profile assignment edge case, verbally, on a call. Then we prompted Claude Code to implement the scoring change and nothing else, because that's what was in the ticket.
The AI did its job perfectly. We failed to give it the full picture.
This keeps happening. Not just to us.
The pattern
If you're building with AI agents: Claude Code, Cursor, Copilot, whatever, you have probably noticed a version of this. The agent is fast, capable, and increasingly good at writing production-quality code. But it keeps doing things you thought you had already resolved.
It's not forgetting. It never knew.
The decisions that govern how your product works live in a scattered constellation of artifacts: a PRD that was accurate three weeks ago, a Notion page with seven conflicting comment threads, a Slack message from your co-founder that starts with "actually, I changed my mind about…", and the most dangerous artifact of all. Your own memory.
Every time you start a new AI coding session, you're doing a lossy compression of that constellation into a prompt. You leave things out. You forget which decisions superseded which. You reference a requirement that was quietly deprecated two sprints ago.
The agent doesn't know any of this. It can't. There's no system of record for it to check.
What a system of record would look like
Think about how a well-run legal system works. When a court makes a ruling, it becomes precedent. Future cases reference it. If a new ruling overturns the old one, both the original decision and the reversal are on the record. You can trace the logic. You can see what changed and why.
Now think about how decisions work at your startup.
Someone decides to change the pricing model. That decision lives in a meeting note, maybe. The implications for the billing system, the onboarding flow, and the sales demo script are discussed in three separate conversations. Two happen in Slack. One over coffee. None of them are connected to the original decision in any retrievable way.
Six weeks later, an AI agent builds a new feature that references the old pricing model. Because that's what's in the PRD. And nobody updated the PRD, because updating PRDs is boring and there's always something more urgent.
This is the canonical document problem. Not that documents don't exist. They do, in abundance. The problem is that no document is authoritative. No document is the canonical version. And when nothing is canonical, everything is negotiable, including decisions you made months ago.
Why this matters more now
When humans wrote all the code, the canonical document problem was a management tax. It slowed things down, caused the occasional misalignment, and generated some frustrating meetings. But humans have context. They remember the conversation over coffee. They can ping someone on Slack and ask "didn't we decide to do X?" They compensate.
AI agents don't compensate. They execute. If the context is wrong, the output is wrong: confidently, quickly, and at scale.
We're entering a period where the speed of code generation is outpacing the speed of decision documentation by an order of magnitude. You can ship a feature in an hour. Updating the requirements to reflect what you decided last Tuesday? That takes discipline nobody has budgeted for.
The faster your AI agents can build, the more expensive your documentation debt becomes. That's the inversion nobody's talking about.
What we're doing about it
At Mentor126.ai, we started tracking what I've been calling "context cost": the time and rework attributable to decisions that weren't captured, requirements that drifted, or implications that were discussed but never documented. The API cost to run Claude Code is $400–700 for a major build sprint. The context cost is multiples of that.
The fix isn't "better prompting." It's a better way to understand company context.
A canonical version of every important decision: what was decided, when, why, and what it affects. Not a wiki that nobody updates. Not a PRD that's three versions behind. A living document that the AI agent can reference before it writes a single line of code.
I'm not going to pretend we've solved this. We haven't. But we've started building the infrastructure for it, and the early results are promising enough that I'll be writing more about the approach over the coming weeks.
If you're building with AI agents and you keep finding yourself re-explaining things you thought were settled, the problem isn't the agent. It's that you don't have a system of record for the decisions that govern your product. Everything downstream of that: the prompts, the code, the features. Only as good as the documents upstream.
Fix the documents. The rest follows.