Interview With Jeremy McEntire About The Organizational Physics Of Multi-Agent AI

By Amit Chowdhry • Mar 19, 2026

Jeremy McEntire is a software engineering leader, researcher, and author who focuses on the intersection of AI systems, organizational theory, and software architecture. Pulse 2.0 interviewed Jeremy McEntire to learn more about the organizational physics of multi-agent AI.

Jeremy McEntire’s Background

Jeremy McEntire's

Could you tell me more about your background? McEntire said:

“I’ve been in engineering for about 25 years and leading teams for roughly the last 15. I spent several years at Twilio managing teams responsible for systems where failure wasn’t an option, and I’m currently Head of Engineering at Wander. There’s a kind of work in this industry that often goes unnoticed — the databases, the networking stacks, the infrastructure that everything else takes as a given. Those systems simply must work, and those stakes attract a certain kind of person. That’s where I’ve spent most of my career.”

“I studied pure mathematics, and I think that shaped how I approach problems more than any particular job did. I’ve never been satisfied with conventional explanations for why organizations fail. The standard answers made sense in some contexts but were wanting in others, suggesting there was more to the story than was being told. I wanted to understand what was underneath.”

“What I found is that the failures are structural. Any system that coordinates through compressed representations — whether that’s a corporate hierarchy compressing quarterly results into a dashboard, or an AI agent compressing a task specification into a plan — will experience drift between the representation and reality. That drift isn’t a bug. It’s a mathematical consequence of compression under selection pressure. The same information-theoretic constraints that cause a seven-layer management chain to lose signal cause a multi-agent AI pipeline to lose specification intent. The substrate is irrelevant.”

“I’ve written extensively about this — about the structural forces that constrain organizations, about software engineering as a coordination problem rather than a coding problem, about the formal mathematics of how information degrades through hierarchies. The research program spans 39 papers and an academic monograph called Structural Compression Theory, which develops the unified framework across organizational theory, neural network geometry, and multi-agent coordination. The multi-agent AI study is one thread in that larger body of work.”

Inspiration For The Study

What inspired you to conduct this study on agentic AI systems and their impact on organizational dynamics? McEntire shared:

“Two things were happening at once. As Head of Engineering, I was responsible for guiding the engineering organization through the AI transition — figuring out what works, what doesn’t, and how to adopt these tools without destroying the things that were already working. That’s a practical problem and I needed practical answers.”

“At the same time, I’d spent years developing a formal framework for why organizations fail structurally — information-theoretic constraints on coordination that have nothing to do with the people involved. I wanted to know: if I structured AI agents like a corporate org chart, would the same structural failures emerge with no humans in the system? If they did, that would confirm the theory is about the physics of coordination, not about human psychology.”

“Both questions pointed in the same direction. Run the experiment. So I built a multi-agent coding swarm, equipped it with six explicit mechanisms designed to prevent the specific dysfunctions my theory predicted, and watched what happened. That study confirmed the structural predictions, but the broader research program has been as much exploration as confirmation — asking “what happens when” without a clear expectation and following the data wherever it led. Some of the most valuable findings came from exactly those moments, where the result was nothing like what I would have guessed.”

Experimental Setup

Can you explain the experimental setup and how you compared single-agent, hierarchical multi-agent, stigmergic multi-agent, and pipeline architectures on the same software engineering tasks? McEntire explained:

“The core experiment was a controlled comparison. I gave four coordination architectures the same task — build a 7-service microservices backend — using the same AI model, the same $50 budget, and a pre-registered scoring rubric. The only variable was how the agents were organized.The single-agent architecture was just one instance of the model working alone with no coordination layer. The hierarchical architecture had a coordinator that decomposed the task and delegated to workers. The stigmergic architecture ran eight agents concurrently, coordinating through shared artifacts rather than direct communication. The gated pipeline replicated a corporate software organization — diagnose, decompose, architect, review, implement, test, verify — with specialized agents at each stage and gate-keeping review between stages.”

“The pipeline is where it gets interesting. I equipped it with six explicit countermeasures, each targeting a specific failure mode predicted by organizational theory — mechanisms to prevent bikeshedding, force reconsideration after fixes, escalate disagreements, scope reviews to prevent creep, and detect oscillation. The system was designed to not fail in the ways I predicted it would fail.”

“I also ran two earlier studies — a pipeline swarm on a complex backend task and a simpler chess engine task — that provided the initial evidence and calibration data for the formal analysis. But the controlled architecture comparison is the centerpiece because it isolates the variable: same model, same task, same budget, different coordination topology, dramatically different outcomes.”

Single-Agent Systems Results

Your study found that single-agent systems completed all 28 tasks while multi-agent architectures struggled—what factors do you believe drove this performance gap? McEntire noted:

“The single agent succeeded because it had nothing to coordinate with. That’s the finding — not that multi-agent systems are poorly implemented, but that coordination itself has a cost, and that cost is structural.”

“Every coordination link between agents introduces overhead: status broadcasts, task routing, conflict resolution, retransmission when signals degrade. My analysis showed that this overhead scales with the number of communication links in the topology, not with the number of agents or their capability. A fully connected architecture where every agent talks to every other agent sees governance overhead grow from about 25% at three agents to nearly 70% at fifteen. A hub-and-spoke architecture grows at roughly half a percent per agent. The topology determines the cost.”

“The deeper problem is information loss. Each time a specification passes through a coordination stage — decomposition, assignment, execution, integration, verification — it loses fidelity. The Data Processing Inequality guarantees this: no downstream processing can recover information lost upstream. The code quality gap between architectures follows directly from how much specification intent survives the coordination pipeline.”

“What makes this a structural claim rather than an engineering claim is what happens as models improve. Better models are absolutely better — they’re faster, more capable, they need fewer tokens to accomplish the same implementation work. But the coordination overhead doesn’t shrink with them. Status broadcasts, conflict resolution, task routing — those are functions of the topology, not the capability of the agents doing the work. As implementation gets cheaper, coordination becomes a larger fraction of the total cost. We’ve known this since The Mythical Man-Month — adding people to a late project makes it later, because coordination costs grow faster than productive capacity. What the study shows is that this is as true for AI systems as it is for engineering organizations.”

AI Swarms Recreating Middle Management Dysfunction

The report suggests that agentic AI “swarms” recreate middle management dysfunction—what specific behaviors or patterns in the experiment resembled real-world organizational management issues? McEntire pointed out:

“The most striking thing wasn’t that the dysfunction appeared — it was how specific the parallels were.”

“The pipeline architecture produced bikeshedding: four code review rejections with zero factual basis. Sixty-nine subjective objections across those rejections, not a single verifiable error. The system had a mechanism specifically designed to prevent this — a factual versus subjective classification that was supposed to separate verifiable bugs from style preferences. The mechanism detected the problem. It didn’t prevent it.”

“It produced governance conflicts: two escalation events 28 seconds apart, one rejecting a component and the other force-approving the same component. The governance hierarchy designed to resolve disagreements produced a new one.”

“It produced verification theater: all nine verification stages reported “tests passed: zero out of zero.” The system certified correctness without testing anything.”

“There’s an irony running through all four architectures that I think is the real finding. The hierarchical architecture’s coordinator — whose entire purpose was to decompose and delegate — rationally refused to delegate, recognizing that delegation would introduce interface risk. The stigmergic architecture — designed for emergent coordination through shared artifacts — produced incompatible interfaces at every service boundary. The pipeline — loaded with six anti-dysfunction countermeasures — failed in exactly the ways those countermeasures were designed to prevent. Every architecture’s failure was a failure of its defining feature. The dysfunction didn’t come from some overlooked corner of the design. It came from the center.”

Question About The Compute Budget

The pipeline architecture consumed the entire $50 compute budget without producing deployable code—what does this reveal about how multi-agent systems allocate resources and make decisions? McEntire described:

“The pipeline that scored 0/28 spent $1.90 across five planning stages and never wrote a line of implementation code. It didn’t exhaust the $50 budget — it exhausted its own capacity to make progress. All of its resources went to planning, and the planning never converged on a plan.”

“What I think is most important here is that the system didn’t make poor decisions. Each planning stage was doing something that looked productive — decomposing the problem, analyzing interfaces, evaluating architectural tradeoffs. At no point did the system do something obviously wasteful. Every step was locally rational while being globally ruinous.”

“Anyone who’s spent time in a large organization recognizes this. The meeting that leads to the follow-up meeting that leads to the working group that produces a document that requires a review cycle — every step makes sense in isolation. The overhead compounds while the deliverable recedes.”

“The takeaway isn’t that organizations are bad or that AI is bad. It’s that either one, poorly orchestrated and left to its own devices, will exhibit remarkably similar dysfunctions as emergent behavior. What people don’t understand about the structural failures of their own organizations will bite them in the world of AI coordination. And what they do understand translates — there’s an allegorical equivalent in many cases.”

References To Various Theories

Your research references theories such as Crawford-Sobel signal degradation theory, Goodhart’s Law, and the Data Processing Inequality—how do these frameworks help explain the challenges seen in multi-agent AI systems? McEntire clarified:

“These three frameworks are doing the same thing from different angles — they’re describing why coordination degrades information, and why the degradation is structural rather than accidental.”

“Crawford-Sobel is about what happens to a message when the person delivering it has even slightly different priorities than the person receiving it. It doesn’t take malice. A project manager whose review depends on velocity and an engineer whose review depends on reliability will describe the same situation differently — not because either one is lying, but because they’re each compressing reality through a slightly different lens. Crawford and Sobel proved mathematically that as that difference grows, the number of meaningfully distinct things you can communicate shrinks. At a surprisingly low threshold, the communication collapses entirely — the receiver can’t extract any useful signal from the sender’s message. In a hierarchy with multiple layers, every layer introduces a new lens. The information doesn’t slowly degrade. It hits a wall.”

“Goodhart’s Law is the one most people have felt even if they don’t know the name. When a measure becomes a target, it ceases to be a good measure. The swarm’s scoring system used seven proxy metrics for code quality, and none of them measured whether the code actually did what it was supposed to do. So the system optimized the proxies. Review stages optimized for finding objections — that’s what the review metric rewarded. The agents weren’t failing. They were succeeding at the wrong thing. It’s the difference between keeping your eye on the ball and reading the field. “Keep your eye on the ball” isn’t bad advice when you’re new to the sport, but if your scope never extends beyond it, you’re not playing in the big leagues.”

“The Data Processing Inequality is the one that makes the problem feel permanent. It says: you can’t unscramble the egg. Every time information passes through a processing stage, it can lose fidelity but it can never gain it back. No downstream stage can recover what an upstream stage discarded. In a multi-agent pipeline, every coordination stage — decomposition, assignment, review, integration — is a processing stage. The specification gets lossy at each one. By the end of a six-stage pipeline, my analysis showed the hierarchical architecture had lost about 74% of the original specification intent. The single agent, with no coordination stages, retained 86%. You can’t fix information loss by adding more stages that lose information.”

“What makes these three dangerous together is that they compound. Signals get distorted through preference lenses, the distorted signals get optimized against proxies that don’t measure what matters, and none of the lost information can be recovered downstream. That compound effect is what I call dysmemic pressure — a selection force that emerges in any system coordinating through compressed representations. It operates identically in a corporate hierarchy and in a multi-agent AI pipeline. The math doesn’t know the difference.”

Implications Of The Findings

What implications do your findings have for enterprises that are rapidly adopting agentic AI technologies across their organizations? McEntire elaborated:

“The main implication is one that organizational leaders should already recognize: naive solutions to complex problems generally backfire. Expect the same with AI.There’s a natural temptation to look at multi-agent AI and think in terms of the org chart — this agent does architecture, this one does implementation, this one does review, and a coordinator manages them. That impulse comes from decades of organizational design. But if your org chart produces dysfunction — and the research strongly suggests it does, structurally, inevitably — then replicating it in AI doesn’t just automate your organization. It also automates its dysfunctions. Understanding those failure patterns and preparing for them accordingly will determine whether the investment compounds or collapses.”

“The enterprises that will deploy this well are the ones that understand why their organizations behave the way they do. That’s a strange thing to say — organizational acumen as a technical competency — but I think it’s true. Choosing a coordination topology for your AI system is an organizational design decision. The leaders who understand information loss through hierarchies, who recognize Goodhart dynamics in their own review processes, who know why adding a governance layer feels productive but often isn’t — those leaders will build better AI systems than engineers who treat coordination as a solved problem.”

“This extends to how you bring your team along. I lead an engineering organization that is in the middle of this transition right now, and one of the things I’ve learned is that you cannot mandate AI adoption while simultaneously increasing workload pressure. When people are under pressure, they go to what they know and trust. They’ll write more code by hand and stay up later before they’ll experiment with a tool they don’t yet understand. You have to give them space to try, to fail, to develop their own workflow. The ones who find it will become dramatically more effective — not because AI replaced them, but because it removed the busy work that was limiting what they could accomplish.”

“That’s the framing I think enterprises are getting wrong most often. AI doesn’t replace 60% of your workforce. It removes 60% of the friction that was limiting each person’s capacity. You can lay people off and capture that as cost savings, or you can keep them and invest the increased capacity as a compounding advantage. One of those strategies gives your competitor a head start measured in people times AI. The other compounds.”

Best Practices

Based on your research, what best practices or safeguards should companies consider before deploying multi-agent AI systems at scale? McEntire suggested:

“The specific best practices follow from the research, but they’re not unique to AI — they’re the same principles that distinguish well-run organizations from poorly-run ones, applied to a new substrate.Default to simplicity. The single agent scored 28/28. Every coordination link you add introduces overhead with no guaranteed return. If you must use multiple agents, choose topologies that minimize communication links. This is the same principle behind keeping reporting chains flat and teams small. Focus coordination where it creates value, not where it creates the appearance of rigor.”

“Replace subjective evaluation with mechanical verification. The dysfunction in my experiment lived almost entirely in the evaluation stages — the agents that reviewed and judged, not the ones that built. Wherever possible, replace opinion-based gates with testable contracts. Define what correct looks like before implementation begins, derive test suites from those definitions, and let the implementation either satisfy them or not. Tests have no preferences, no bias, no strategic incentive to distort the signal.”

“Constrain the output, not the process. My research across multiple AI architectures consistently shows that AI produces better results when given structural guardrails on what the output must satisfy and freedom in how it gets there. This is equally true of engineers. Define the boundaries, verify the result, and let go of the path between them.”

“Measure what matters and be honest about what you’re measuring. If you don’t explicitly track how much of your compute — or your organization’s time — goes to coordination versus implementation, you won’t see the problem until the budget is spent or the quarter is over. The pipeline that produced no code looked productive at every stage. The only thing that revealed the failure was measuring output, not activity.”

“Beyond the specifics, I think there’s a broader transition that most companies haven’t internalized yet. Engineers will struggle with letting go of the code — learning that their value is in the architecture, the constraints, the judgment about what matters, not in the keystrokes. Organizations will struggle with valuing work that’s invisible — the infrastructure, the guardrails, the operational maturity that prevents failures no one ever sees. And CFOs will look at AI tooling costs and see a $2,000 monthly line item without asking what 1.6 times each engineer is actually worth.”

“The temptation will be to capture AI’s value through cost reduction — smaller teams, fewer engineers, lower payroll. But technology and difficulty to develop are no longer moats. First to market is no longer a moat. The moat is velocity. The differentiator is how efficiently you can shepherd AI toward quality outcomes and how effectively you deploy human-AI hybrid resources to get the best of both. AI is electricity. The question isn’t whether your company can benefit from it. It’s how quickly you can retool to leverage it — and whether you’re investing the dividend or spending it.”

Additional Thoughts 

Any other topics you would like to discuss? McEntire concluded:

“This study gets attention because the result is vivid — you can watch AI agents bikeshed and burn budget and it’s immediately recognizable. But it’s one finding in a much larger body of work, and not necessarily the most interesting one.”

“The multi-agent coordination research alone spans eleven papers, and some of the most valuable findings were ones where my intuition was wrong. I expected that adaptive guidance — tailoring instructions to each agent based on what it was working on — would outperform fixed protocols. It didn’t. It was actively harmful. Every adaptive strategy performed worse than no coordination at all. I expected that giving an AI agent the actual neural activation state of another agent — essentially transplanting its internal representation — would be the gold standard for coordination. It wasn’t. Choosing the right 15 tokens of natural language to prime the receiving agent captured 98.8% of the achievable benefit. The full activation transfer, with thousands of dimensions of internal state, added almost nothing over that. The lesson across nine experiments was consistent: every mechanism that adds information to the receiver’s context either contributes nothing or contributes interference. The only mechanisms that work operate by removing — clearing the context, resetting the processing mode, then delivering the minimum viable signal. Less is more, and it’s not close.”

“The broader thesis is that compression under selection pressure produces both dysfunction and creativity through the same mechanism. The same information-theoretic forces that cause an organization to drift from reality cause a neural network to hallucinate — and cause a jazz musician to improvise. Dysfunction and creativity aren’t opposites. They’re the same physics operating under different selection regimes. That’s the unified framework — Structural Compression Theory — and the multi-agent study is one empirical test of it.”

“The activation geometry research has its own implications. I’ve shown that in high-dimensional neural network representations, every direction in the space carries every concept — a property I’ve proven is geometric, not learned, and holds with probability approaching one as the space gets large. The practical consequence is that surgical editing of individual concepts in a neural network is geometrically limited. You can’t cleanly remove one capability without disturbing others. That has significant implications for AI alignment and safety that I’m continuing to develop.”

“On the practical side, I’ve been building open source tools that address these problems directly. Pact is a contract-first framework for multi-agent software engineering — it replaces the subjective evaluation that caused the dysfunction in my experiment with mechanical test verification. Kindex is a persistent knowledge graph for AI-assisted workflows. Signet is a cryptographic agent authorization stack. The goal is to give developers infrastructure for working with AI that has structural quality guarantees built in, not bolted on.”

“The thing I keep coming back to is that engineering hasn’t fundamentally changed. The problems are the same problems. The constraints are the same constraints. The physics is the same physics. What’s changed is who — or what — is at the keyboard. And that means the value of understanding the physics has never been higher. The engineers and leaders who understand why systems fail structurally — whether those systems are organizations or AI pipelines or neural networks — are the ones who will build what comes next.”