Back to investigation

tobi/qmd — The architecture that only git remembers

We pointed Glaux at a well-structured TypeScript repo built by an experienced engineer. It found architectural knowledge that isn't written down anywhere — not in the README, not in the types, not in any documentation.

#case-study#research

The test

Here's a question worth asking: in a small, clean, well-maintained codebase — the kind built by a developer who knows exactly what they're doing — does graph intelligence tell you something you don't already know?

tobi/qmd is a local-first knowledge store built in TypeScript. It indexes documents, runs embeddings via a local LLM, and exposes search through both a CLI and an MCP server. 451 files tracked, 364 commits, with the core runtime in about 15 TypeScript source files and a substantial finetune/training corpus alongside it.

This is the kind of repo where the author could explain every design decision from memory. The code is clean. The architecture is intentional. If graph intelligence can't find something new here, it's probably not finding anything new anywhere.

It found something.

The coupling nobody documented

src/qmd.ts and src/store.ts have the strongest co-change signal in the entire repository — a weight of 4.3. For context, the average co-change weight across all edges in this repo is 0.51, and the median is 0.25. These two files change together 8x more often than average and 17x more than the median pair. By the graph's measure, they're the most tightly coupled files in the codebase.

But neither file imports the other. No function call. No type dependency. No re-export. As far as every static analysis tool, linter, and IDE is concerned, they're unrelated files.

They're coupled through a shared data contract — a protocol for how documents are stored, queried, and indexed. When the storage layer changes its query format or index structure, the CLI entry point has to adapt. Not because one calls the other, but because they both speak the same language. That language isn't written down in types or interfaces. It lives in the developer's head.

Here's what happens when an agent doesn't know this: it gets a task to update the store's search API — maybe changing how queries are parsed or how results are ranked. It modifies store.ts, runs the tests that import store.ts, they pass. It opens a PR. But qmd.ts — the CLI that speaks the same protocol — wasn't touched. The CLI now sends queries in a format the store no longer expects. Users who run qmd search "..." from the terminal get silent failures or wrong results. The tests didn't catch it because the tests for qmd.ts don't import store.ts either.

This isn't a hypothetical. This is how regressions happen in production, and it's exactly the class of bug that graph intelligence is designed to prevent.

What the graph found

When Glaux ran community detection on the co-change graph, two primary clusters emerged:

  • Core cluster (79 files): the finetune data and training scripts — files that change together during model experiments
  • Runtime cluster (19 files): the actual qmd runtime — store.ts, llm.ts, qmd.ts, mcp/server.ts

This separation is architecturally meaningful. The finetune work and the runtime code live in the same repo but evolve on completely different schedules. An agent working on training data shouldn't be looking at store.ts for context, and vice versa. Community detection surfaces this boundary from pure history — no labels, no config, no directory conventions needed.

The center of gravity

src/store.ts sits at the top of the PageRank ranking — more files co-change with it than any other file. It handles document storage, embedding generation, hybrid search, and index management. When store.ts changes, the ripple touches index.ts, llm.ts, cli/qmd.ts, mcp/server.ts, maintenance.ts, and cli/formatter.ts.

This isn't surprising to someone who reads the code. But the blast radius is: a single change to the store can require coordinated changes across 6 files. An agent that modifies store.ts in isolation is working against the grain of how this codebase actually evolves.

The dormant dependencies

Four import edges exist in code but don't show up in co-change patterns:

SourceTargetWhat it means
store.tscollections.tsStable interface — hasn't needed changes when store changes. If this starts co-changing, a foundational API is shifting.
cli/qmd.tscollections.tsSame — the collections contract is mature
cli/qmd.tsdb.tsDatabase layer is settled infrastructure
test-preload.tsllm.tsTest setup imports but rarely changes with the LLM

These are good dormant dependencies. They represent boundaries where the interface is mature enough that changes don't ripple. The actionable insight: a dormant-to-active transition — when one of these pairs suddenly starts co-changing — is an early warning signal that something foundational is shifting.

The MCP mirror

src/mcp/server.ts co-changes with both store.ts (weight 1.54) and qmd.ts (weight 1.49) but imports neither directly — it goes through index.ts. The MCP server is a second surface that mirrors the CLI, and changes to either surface tend to happen together. An agent adding a new MCP tool should check whether a corresponding CLI command needs updating.

What an agent should know

If you're a coding agent about to edit this repo, here's what the graph tells you — and what no other tool will:

  1. Don't touch store.ts alone. Check qmd.ts, index.ts, llm.ts, and mcp/server.ts for ripple effects.
  2. Finetune work is isolated. Changes to training scripts and data files won't affect the runtime.
  3. collections.ts is stable. If your change requires modifying it, you're shifting a foundational interface — proceed carefully.
  4. The MCP server and CLI are coupled mirrors. Changes to one usually require changes to the other.

None of this is in a README. None of it is in the type system. None of it is in any documentation.

All of it is in the git history.

The answer

So — does graph intelligence tell you something you don't already know about a well-structured codebase?

Yes. Even in a clean repo maintained by an experienced engineer, there's architectural knowledge that only lives in how files change together over time. The coupling between qmd.ts and store.ts — the strongest signal in the repo, invisible to every static analysis tool — is real, significant, and undocumented. Every other tool would miss it. The co-change graph found it in under a second.

If that's true in a 451-file repo with clear structure and intentional design, imagine what's hiding in a 10-year enterprise monorepo with 50 contributors and no architect. That's the codebase where agents need this context most — and where it's hardest for any human to hold in their head.

This case study is based on live data from the Glaux explorer.