The Self-Maintaining Framework: What AI-Native Operations Actually Look Like

Every morning at 5 AM Pacific, three agents of mine have already drafted their proposed updates to my framework. A fourth is finishing its work on the public website. By the time I sit down with coffee, the day's pull requests are queued for review.

The Self-Maintaining Framework: What AI-Native Operations Actually Look Like

Every morning at 5 AM Pacific, three agents of mine have already drafted their proposed updates to my framework. A fourth is finishing its work on the public website. By the time I sit down with coffee, the day's pull requests are queued for review.

I built BiModal Design a couple of years ago as an open-source framework for designing interfaces that work across the full Agent Capability Spectrum. The methodology lives in a GitHub repository. The public face lives at bimodal.design. Both surfaces document a field that moves weekly. New agent protocols ship. New evaluation benchmarks emerge. Vision agents grow new behaviors. The framework's claim to being a definitive standard depends on keeping pace, and so does the site.

Manually, this is a full-time job. I already have a very demanding job.

I am one person. I don't have time for this. But...I don't want the project to die.

The Conventional View of AI-Native

When most teams describe themselves as AI-native, they mean their product uses AI. They have a copilot, a chatbot, a recommendation engine, a summarization feature. The AI is in the product. The work of running the company, maintaining the documentation, evolving the methodology, propagating updates across surfaces, still falls on people.

This is +AI thinking applied to operations. The AI is added. The work persists.

The better question is what it would look like if the operations themselves were AI-native. Not the product surface, but the infrastructure underneath. Not the user-facing intelligence, but the maintenance, research, and propagation work that keeps a body of intellectual property current.

The Architecture

So, to help with my chronic lack of time, what I built is a closed-loop, multi-agent pipeline that maintains both the framework and its public surface. Two halves. An upstream fleet that keeps the framework itself current and clean. A downstream agent that propagates relevant changes to the public website. A governance file at each layer defining what the agents can and cannot touch. A single human reviewer at every merge gate.

The upstream fleet runs on the framework repository. Three agents.

The first is a Strategic Research and Innovation Lead. It scans the industry daily for advancements in agent protocols, vision agent behaviors, and autonomous web navigation. It tracks updates to evaluation frameworks like WebArena and VisualWebArena and their successors. It does gap analysis between the current state of the BiModal Design framework and emerging trends. Then it drafts pull requests proposing concrete updates to the methodology, README, and tutorial sections. Its job is not to summarize news. It is to synthesize findings into actionable design principles that preserve the framework's standards-based ethos. When the field moves, this agent proposes how the framework moves with it.

The second is Sentinel, a security agent. Its job is to keep the framework codebase itself safe. Dependency vulnerabilities. Supply chain integrity. Anything that puts the canonical artifact at risk gets flagged or fixed.

The third is a performance agent. It handles repo health: performance, git hygiene, organizational best practices. The unglamorous infrastructure that keeps the framework codebase clean enough to be a credible reference.

Together these three operate continuously on the canonical source. Every change they propose routes through a pull request. I review. I merge or I reject. Nothing lands in main without a human gate.

The downstream agent runs on the public website repository. I call it Echo, because its job is to faithfully echo upstream changes to the public surface.

Every morning at 5 AM Pacific, Echo clones the upstream framework repo, reads a cursor file containing the last commit it processed, and pulls every commit since. It classifies each one. Framework-substantive changes (new principles, terminology refinements, expanded examples) get propagated. Internal changes (CI config, refactors, repo hygiene) get ignored. For substantive changes, Echo identifies the corresponding touchable page on the website, applies the change while preserving the existing JSX scaffolding and Tailwind patterns, runs the build, and opens a pull request. Each PR includes the upstream commit SHA, the affected file, and an explicit confirmation that defined terminology was preserved verbatim.

If nothing substantive happened upstream, Echo updates the cursor silently and opens no PR. Silence is the correct state when nothing has changed.

The Governance Layer

What makes any of this work is not the agents. It is the file that constrains them.

Before Echo ran for the first time, I committed a file called AGENTS.md to the website repository. It defines three things explicitly. Which directories an agent may modify. Which directories it must never touch (marketing copy, design tokens, components, brand assets, navigation, build configuration). And how to handle cross-repo syncs (reference the upstream commit SHA in every PR, mirror defined terminology verbatim, escalate via issues when a change has no clear home in the website).

This file is the policy. The agent is the propagator. The policy is human-controlled. The agent does the work.

Before any agent ran, I verified that Echo could read AGENTS.md and articulate its constraints accurately in its own words. The first verification, before AGENTS.md existed, returned a generic summary of the site as "promotional, documentation, or educational." The second verification, after AGENTS.md was committed, returned a precise articulation of the framework relationship, the touchable and off-limits zones, and the cross-repo sync rules with specific examples of the proper-noun terminology that must mirror upstream verbatim.

The governance layer was being absorbed, not just read. That is the load-bearing distinction.

The Discipline at Every Gate

Auto-merge is rejected at every layer.

The Research agent's framework proposals require my review. Sentinel and the performance agent require my review. Echo's website syncs require my review. CI validates lint, typecheck, and build, but CI cannot validate interpretive judgment, which is the load-bearing skill across the entire pipeline. Did the agent pick the right target page? Did it paraphrase a defined term? Did it sync something tonally off for a marketing surface?

These questions are not failures of automation. They are the work itself. They are why a human is at every gate.

Four weeks from now, a scheduled review will fire to evaluate whether the agents have produced enough clean PRs to justify narrow auto-merge eligibility for the most trivial cases. Single file changed. Under ten lines. No terminology lock terms touched. Trust earned through evidence, not granted by default.

What This Means for Design Leadership

The pattern is AI agent as faithful echo of human-authored policy. Not autonomous decision-maker. Not creative author. A disciplined propagator constrained by a governance file the human controls and a cursor mechanism that prevents drift beyond the visible state.

The human's role compresses. It does not disappear. It concentrates on the part where judgment matters: defining the policy, classifying ambiguous cases, deciding when terminology should evolve, reviewing whether each proposed change preserves the framework's intent. The boring propagation work, the research scanning, the dependency vigilance, the build verification, the SHA tracking, all of it lives with the agents.

For design leaders, this is the operating model question. Not "should I use AI in my product." Not "should my team use AI tools." But "what does my organization actually do that requires interpretive judgment, and what does it do that requires faithful propagation? Are those separated cleanly, or are humans doing both?"

Most design organizations have humans doing both. Researchers manually tracking industry changes. PMs manually propagating spec updates across docs. Designers manually keeping component libraries in sync with design tokens. Documentation writers manually syncing reference material across surfaces. The pattern is everywhere. The infrastructure to do it differently is now available.

What This Looks Like When You Mean It

AI-native operations are not a marketing claim. They are an architecture decision. They require defining what humans uniquely do, building infrastructure that lets agents do the rest, codifying the boundaries between the two in files the humans control, and routing every consequential output through a human review gate while the trust is being established.

The pipeline I just described is not large. It is a handful of scheduled agents, a governance file at each layer, a cursor mechanism, and a daily ritual of reviewing pull requests. It fits on a single page if you draw it. It runs while I sleep.

But it does the work of a small team. It keeps a framework current in a field that changes faster than any one person could track. It keeps a public website accurate to a canonical source that itself evolves weekly. It does so within boundaries I set, and it routes every decision through me.

This is what AI-native looks like when it is not a slogan.

Subscribe to Goldfoot

Sign up to get access to the library of members-only articles.
jamie@example.com
Subscribe