Investigative Discipline
In one line: Investigate every symptom against current production code before declaring it stale, transient, or unrelated.
Do this: When a test fails on master, git log the prod file and the test file; if prod changed after the test, assume the test is right and prod regressed until you prove otherwise.
What: When a symptom appears (failing test, error log, user-reported bug), the default first move is to investigate the symptom against current production code before classifying it. Three rules govern the investigation:
-
A failing test is REAL until proven stale. A test that fails on master is at least as likely to be detecting a real production regression as it is to be staleness from an un-coordinated code change. Default-investigate the prod code change in the touched area before declaring the test stale.
-
Misattributed-cause is technical debt. When a defensive guard is added with a comment that guesses at the cause (rather than documenting the cause that was actually verified), future debuggers will waste time chasing the wrong hypothesis. Comments must distinguish "verified" from "hypothesized" and be updated when the actual cause is later identified.
-
Convergent design intuition is a quality signal. When two engineers (or an engineer and an AI agent) independently arrive at the same design from different angles, the design is likely right. The convergence is a positive correctness signal and worth calling out — both as confidence that the implementation is on track and as documentation of why the design was chosen.
Why: Three distinct failure modes are addressed by these rules.
The "test is stale" mis-classification is the most common: a test fails on master, the developer runs git stash and observes the failure existed before their change, then declares "pre-existing — therefore stale" and moves on. The "pre-existing" half of that inference is correct; the "therefore stale" half is the wrong leap. A pre-existing failing test could be a pre-existing real bug — and frequently is. The shortcut bypasses the step that would have caught the regression.
The "misattributed cause" failure happens when a developer (or AI agent) hits a symptom they don't have time to fully diagnose. They write a defensive workaround, then write a comment guessing at the cause. The workaround works (because it covers the real cause too, even though the guess was wrong). Months later, another developer encounters a similar symptom, reads the comment, and chases the wrong hypothesis for hours before realizing the original attribution was speculation. The cost compounds across the codebase as defensive guards accumulate.
The "convergent design intuition" rule is positive rather than defensive: AI-assisted development runs separate threads of reasoning — the human's product intuition, the AI's pattern-match against prior implementations, and the test suite's empirical signal. When they independently agree on a design, that agreement is correctness information. Teams that treat convergence as noise miss a free quality check.
Evidence: Each rule has a concrete failure it prevents:
- Stale-misclassification: a failing test gets tagged "stale" after a
git stashshows the failure is pre-existing — but "pre-existing" is not "wrong." A canary test that was the bug's only detector gets silenced, and the bug ships. The rule forces a prod-vs-testgit logcomparison before any stale verdict. - Misattributed cause: a defensive guard ships with a comment guessing at the cause. The guard works (it covers the real cause too), the guess is wrong, and a later debugger chases the wrong hypothesis for hours. The rule forces comments to mark causes verified vs. hypothesized.
- Convergent intuition: when the human and the AI reach the same design from different angles, recording that convergence in the commit or design doc is a free correctness signal.
These are reproducible: try the stale shortcut on a real canary failure and the prod-vs-test log comparison surfaces the regression the shortcut would have buried.
How: Three operational mechanisms encode these rules.
For "failing test is real until proven stale," the bug-fix lifecycle (Section 3.2) requires investigating the production code change in the touched area before declaring a test stale. Specifically: run git log -- <prod-file> and git log -- <test-file> and compare. If the prod file changed after the test was last touched, the prior is that the test is right and the prod regressed. Run the prod code path manually (REPL, integration test, live invocation) and compare what it does with what the test asserts. Only declare the test stale when (a) the prod change was intentional AND (b) the test wasn't updated in the same commit that should have updated it — and that combination is itself an R2-discipline debt entry, not a "no-op" closure.
For "misattributed-cause is technical debt," code comments on defensive guards must distinguish verified from hypothesized causes. The format is structural: Verified <date> via <method>: <cause> for known causes, Hypothesized <date>: <cause> (untested; defensive guard works regardless) for guesses. When the actual cause is later identified, the comment is updated and any cross-references in memory entries are corrected. Memory entries that documented disproven hypotheses retain the original framing at the bottom for historical fidelity but lead with the corrected diagnosis.
For "convergent design intuition," the methodology recognizes the convergence as a positive quality signal, not noise. When the AI agent and human reach the same design from different angles, that agreement is documented in the commit message or design doc as a correctness signal. This builds team confidence in agentic execution and creates a feedback loop: human intuitions that converge with AI reasoning are observably correct more often, which calibrates the team's trust in subsequent agentic work.
For a documentation-modernization-pass project shape that exercises all three rules at scale, see skill:s4u-doc-excellence.