The three-layer architecture is easy to draw on a whiteboard. Raw sources at the bottom. Canonical layer in the middle. Schema or constitution at the top. Karpathy drew it. I drew it. Anyone building in this space draws roughly the same picture.
The whiteboard version takes five minutes. Building the canonical layer takes months. That's where every implementation either works or quietly fails.
The raw sources are easy. They're immutable. You ingest them and leave them alone. The schema is a config file: rules, behaviors, guardrails. It changes occasionally and deliberately. But the canonical layer is where the actual thinking lives, and thinking is messy. It contradicts itself. It evolves. It has dependencies nobody mapped. It rots.
Here are the three things people underestimate about building a canonical layer, based on six months of getting them wrong at Mentor126.ai.
1. Conflict resolution is the whole game
The moment you have more than ten canonical entries, some of them will conflict. Not because anyone made a mistake. Because the business evolved and the old decisions didn't get retired.
At Mentor126.ai we decided in January that assessment scoring would use simple averages. In March we changed to weighted averages based on feedback from design partners. Both decisions were documented. Neither one referenced the other. As far as the canonical layer was concerned, both were true.
An AI agent asked to build a scoring feature had two valid answers. It picked one. It picked the wrong one. The code compiled, the tests passed, and the feature worked perfectly against a requirement we'd already abandoned.
The fix isn't just detecting conflicts. Detection is the easy part: compare two entries and ask the LLM if they contradict each other. The hard part is resolving them. Which decision wins? Is the older one superseded or still partially valid? Did the newer decision intend to override the older one, or are they governing different scopes?
These are judgment calls. The system can surface the conflict. A human has to resolve it. And the resolution itself needs to be a new entry in the canonical layer: "Decision 47 supersedes Decision 12 for assessment scoring. Decision 12 remains valid for benchmark comparisons." That's the level of specificity required. Anything less and the conflict just moves downstream.
2. Implications are invisible until they break something
Every decision has downstream effects. Change the scoring algorithm and you've changed what the profile assignment engine sees. Change the profile assignment thresholds and you've changed which courses get recommended. Change the course recommendation logic and you've changed the learner's 90-day roadmap.
None of these implications are obvious when you're making the original decision. They become obvious when something breaks.
This is the implication mapping problem. Every canonical entry needs to declare what it affects. Not what it decided. What it touches. "This decision about scoring affects: profile assignment, benchmark comparisons, the Skills Fingerprint radar chart, and the re-assessment trigger at Day 25."
At Mentor126.ai we now maintain a "what it affects" field on every decision record. It's the fourth field in our five-field template. When a new decision is added, the system checks every existing "what it affects" list for overlaps. If the new decision touches something an older decision also touches, the system flags it for review.
This isn't perfect. The "what it affects" list is only as complete as the person who wrote it. But it catches about 60% of the downstream conflicts before they reach the codebase. Before we had this, the catch rate was zero. We found conflicts when something broke in production.
3. "Just use a wiki" is the answer that never works
Every six months someone on a product team says "we should put all our decisions in a wiki." The wiki gets created. Twenty pages get written in the first week. By month two, three pages are current and the rest are artifacts of a past era that nobody trusts.
This has happened at every company I've worked at. It happened at Mentor126.ai. It's happening right now at a company you know.
The failure mode is always the same: the cost of maintaining the wiki exceeds the perceived benefit of having it. Updating a wiki page is boring. Nobody's quarterly goals include "keep the wiki current." And unlike code, there's no CI pipeline that breaks when a wiki page drifts from reality.
This is Karpathy's core insight and it's the right one: LLMs change the maintenance math. The bookkeeping that humans abandon every wiki over is exactly the kind of work LLMs are good at. Cross-referencing. Summarizing. Flagging inconsistencies. Filing new information into the right place.
But "LLM-maintained wiki" is still not the same thing as a canonical layer. A wiki is a collection of pages. A canonical layer has structure: typed entries, declared dependencies, explicit supersession, lint operations that run on a schedule. The LLM handles the bookkeeping, but the schema constrains the bookkeeping. Without the schema, you just have a wiki that an AI updates. That's a marginal improvement over a wiki that a human doesn't update.
Where this leaves us
Six months into building GroundTruth OS, the canonical layer is still the part that keeps me up at night. The architecture is right. The three layers work. The lint operation catches stale decisions. The five-field decision record captures enough structure to be useful.
But conflict resolution is still semi-manual. Implication mapping is still incomplete. The system surfaces problems faster than it used to, but a human still has to fix most of them. The goal is to get the system to a point where it proposes resolutions, not just flags conflicts. That's the next milestone.
If you're building anything in this space, whether it's a personal LLM wiki or a business context layer, the canonical layer is where your architecture will be tested. The ingest pipeline is plumbing. The query interface is a solved problem. The schema is a config file.
The canonical layer is where your decisions live, evolve, conflict, and compound. Getting it right is the hard part. Everything else is bookkeeping.