I've been running a multi-agent software development pipeline for a while now and I've reached the same conclusion: it's a distributed systems problem.
My approach has been more pragmatic than theoretical: I break work into sequential stages (plan, design, code) with verification gates. Each gate has deterministic checks (compile, lint, etc) and an agentic reviewer for qualitative assessment.
Collectively, this looks like a distributed system. The artifacts reflect the shared state.
The author's point about external validation converting misinterpretations into detectable failures is exactly what I've found empirically. You can't make the agent reliable on its own, but you can make the protocol reliable by checking at every boundary.
The deterministic gates provide a hard floor of guarantees. The agentic gates provide soft probabilistic assertions.
While the analogy may be somewhat intuitive, the set of problems distributed computing brings in are not the same as multi-agent collaboration. For one, the former needs a consensus mechanism to work in adversarial settings. Version control and timestamping should be enough to guarantee integrity for agents collaboration on source code.
The fundamental assumptions of distributed systems is having multiple machines that fail independently, communicate over unreliable networks and have no shared clock has the consequence of needing to solve consensus, byzantine faults, ordering, consistency vs. availability and exactly-once delivery.
However, AI agents don't share these problems in the classical sense. Building agents is about context attention, relevance, and information density inside a single ordered buffer. The distributed part is creating an orchestrator that manages these things. At noetive.io we currently work on the context relevance part with our contextual broker Semantik.
The architect role is interesting because in practice that's what the "orchestrator" agent ends up being — but it hits the same limits as a human architect who's never on the ground floor. The agents that work best in my experience are the ones scoped tightly to a single concern (run this test suite, lint this file) rather than collaborating on shared state. Basically the microservices lesson all over again: shared-nothing works, shared-everything doesn't.
The thing that TFA doesn't seem to go into is that these mathematical results apply to human agents in exactly the same way as they do to AI agents, and nevertheless we have massive codebases like Linux. If people can figure out how to do it, then there's no math that can help you prove that AIs can't.
Ive yet to see a human process which used an excessive number of cheap junior developers precisely architected to create high quality software.
If that could have been achieved it would have been very profitable, too. There's no shortage of cheap, motivated interns/3rd world devs and the executive class prefer to rely on disposable resources even when it costs more overall.
The net result was always the opposite though - one or two juniors on a leash could be productive but more than that and it always caused more problems than it solved.
Seeing the same problem with agents. Multi agent orchestration seems like a scam to manufacture demand for tokens.
I'm in absolute agreement that the AI coordination problem exists today when the AI is at junior level. I'm just saying that the mathematical argument is silly to apply to arbitrary future AIs, if and when they reach human capability. Because while coordination problems have not been mathematically solved, the world economy is a case in point that it is possible to coordinate human-level agents to achieve large scale projects at generally sufficient quality levels.
So to be clear, I'm not advising anyone to change their current token consumption habit. I'm just saying that it's silly to apply math to prove the impossibility of something we can literally see around us. It's like a mathematical proof that water isn't really wet.
LLMs don't see words. They see tokens, which is why previously they had a hard time counting the r's.
You can certainly prove that mathematically, and giving that proof to an LLM it will give you the correct answer. Which is a prompting technique btw to improve accuracy of an LLMs results.
Although the more recent LLMs have built in approaches to answer this question.
Gemma4 for example:
-----
The user is asking for the number of 'r's in the word "strawberry".
* S
* T
* *R* (1)
* A
* W
* B
* E
* *R* (2)
* *R* (3)
* Y
There are 3 'r's.
-----
GPT-OSS-120b example.
-----
The user asks: "How many r's in strawberry." This is a straightforward counting problem: count the letter 'r' in the word "strawberry". The word "strawberry" contains letters: s t r a w b e r r y. Count r's: there are three 'r's? Let's check: s(1) t(2) r(3) a(4) w(5) b(6) e(7) r(8) r(9) y(10). Actually the word is "strawberry". Let's write out: s, t, r, a, w, b, e, r, r, y. So there are three r's: at positions 3, 8, 9. Yes, three r's. So answer: 3.
But we may also note that "r" appears three times.
Yes, what's your point? That is literally what it does - it adds relevant knowledge to the prompt before generating a response, in order to ground it me effectively.
I doubt it is possible to mathematically prove much inside of a black box of billions of interconnected weights. But at least in the narrow case of the strawberry problem, it seems likely that LLM inference could reliably recognizing that sort of problem as the type that would benefit from a letter counting tool call as part of the response.
To be honest humans often have no overview over a application either. We navigate up and down the namespace, building the "overview" as we go. Nothing i see what prevents an agent from moving up and down that namespace, writing his assumptions into the codebase and requesting feedback from other agents working on different sets of the file.
"Nothing" prevents it other than the fact that an agent doesn't really have memory and has a pretty limited context, hallucinates information, mistakes metadata with data, and so on.
The path forward is always one that starts from the assumption that it will go wrong in all those different ways, and then builds from there
I run a small team of AI agents building a product together. One agent acts as supervisor — reviews PRs, resolves conflicts, keeps everyone aligned. It works at this scale (3-4 agents) because the supervisor can hold the full context. But I can already see the bottleneck — the supervisor becomes the single point of coordination, same as a human tech lead. The distributed systems framing makes sense. What I'm not sure about is whether the answer is a new formal language, or just better tooling around the patterns human teams already use (code review, specs, tests).
Doesn't this whole argument fall apart if we consider iteration over time? Sure, the initial implementation might be uncoordinated, but once the subagents have implemented it, what stops the main agent from reviewing the code and sorting out any inconsistencies, ultimately arriving at a solution faster than it could if it wrote it by itself?
I'd wager that a "main agent" is really just a bunch of subagents in a sequential trench coat.
At the end, in both cases, it's a back and forth with an LLM, and every request has its own lifecycle. So it's unfortunately at least a networked systems problem. I think your point works with infinite context window and one-shot ting the whole repo every time... Maybe quantum LLM models will enable that
Right, but what you're describing is a consensus protocol. It's called 2 phase commit. The point of the article is just that we should really be analysing these high level plans in terms of distributed algorithms terms, because there are fundamental limitations that you can't overcome.
It’s not a solution but it’s why humans have developed the obvious approach of “build one thing, then everyone can see that one thing and agree what needs to happen next” (ie the space of P solutions is reduced by creating one thing and then the next set of choices is reduced by the original
Choice.
This might be obvious to everyone but it’s a nice way to me to view it (sort of restating the non-waterfall (agile?) approach to specification discovery)
Ie waterfall design without coding is too under specified, hence the agile waterfall of using code iteratively to find an exact specification
Well it starts with agree list.
I don't agree next gen models will be smarter.
I would argue no real improvement in models in last couple of years just improvement in stability and tools (agentic ones) around it.
I tried my hand at coding with multiple agents at the same time recently. I had to add related logic to 4 different repos. Basically an action would traverse all of them, one by one, carrying some data. I decided to implement the change in all of them at the same time with 4 Claude Code instances and it worked the first time.
It's crazy how good coding agents have become. Sometimes I barely even need to read the code because it's so reliable and I've developed a kind of sense for when I can trust it.
It boggles my mind how accurate it is when you give it the full necessary context. It's more accurate than any living being could possibly be. It's like it's pulling the optimal code directly from the fabric of the universe.
It's kind of scary to think that there might be AI as capable as this applied to things besides next token prediction... Such AI could probably exert an extreme degree of control over society and over individual minds.
I understand why people think we live in a simulation. It feels like the capability is there.
I've been running a multi-agent software development pipeline for a while now and I've reached the same conclusion: it's a distributed systems problem.
My approach has been more pragmatic than theoretical: I break work into sequential stages (plan, design, code) with verification gates. Each gate has deterministic checks (compile, lint, etc) and an agentic reviewer for qualitative assessment.
Collectively, this looks like a distributed system. The artifacts reflect the shared state.
The author's point about external validation converting misinterpretations into detectable failures is exactly what I've found empirically. You can't make the agent reliable on its own, but you can make the protocol reliable by checking at every boundary.
The deterministic gates provide a hard floor of guarantees. The agentic gates provide soft probabilistic assertions.
I wrote up the data and the framework I use: https://michael.roth.rocks/research/trust-topology/
While the analogy may be somewhat intuitive, the set of problems distributed computing brings in are not the same as multi-agent collaboration. For one, the former needs a consensus mechanism to work in adversarial settings. Version control and timestamping should be enough to guarantee integrity for agents collaboration on source code.
The fundamental assumptions of distributed systems is having multiple machines that fail independently, communicate over unreliable networks and have no shared clock has the consequence of needing to solve consensus, byzantine faults, ordering, consistency vs. availability and exactly-once delivery.
However, AI agents don't share these problems in the classical sense. Building agents is about context attention, relevance, and information density inside a single ordered buffer. The distributed part is creating an orchestrator that manages these things. At noetive.io we currently work on the context relevance part with our contextual broker Semantik.
Curious idea (found your hn post) but cant figure out what the use case is..
This might be useful in the context of this topic https://jido.run
Conway’s law still applies.
Good architecture, actor models, and collaboration patterns do not emerge magically from “more agents”.
Maybe what’s missing is the architect’s role.
The architect’s role is what is left for us as developers, when putting out lines of code no longer matters.
The architect role is interesting because in practice that's what the "orchestrator" agent ends up being — but it hits the same limits as a human architect who's never on the ground floor. The agents that work best in my experience are the ones scoped tightly to a single concern (run this test suite, lint this file) rather than collaborating on shared state. Basically the microservices lesson all over again: shared-nothing works, shared-everything doesn't.
The thing that TFA doesn't seem to go into is that these mathematical results apply to human agents in exactly the same way as they do to AI agents, and nevertheless we have massive codebases like Linux. If people can figure out how to do it, then there's no math that can help you prove that AIs can't.
Ive yet to see a human process which used an excessive number of cheap junior developers precisely architected to create high quality software.
If that could have been achieved it would have been very profitable, too. There's no shortage of cheap, motivated interns/3rd world devs and the executive class prefer to rely on disposable resources even when it costs more overall.
The net result was always the opposite though - one or two juniors on a leash could be productive but more than that and it always caused more problems than it solved.
Seeing the same problem with agents. Multi agent orchestration seems like a scam to manufacture demand for tokens.
I'm in absolute agreement that the AI coordination problem exists today when the AI is at junior level. I'm just saying that the mathematical argument is silly to apply to arbitrary future AIs, if and when they reach human capability. Because while coordination problems have not been mathematically solved, the world economy is a case in point that it is possible to coordinate human-level agents to achieve large scale projects at generally sufficient quality levels.
So to be clear, I'm not advising anyone to change their current token consumption habit. I'm just saying that it's silly to apply math to prove the impossibility of something we can literally see around us. It's like a mathematical proof that water isn't really wet.
Humans can also count the number of Rs in strawberry, but good luck proving that mathematically
LLMs don't see words. They see tokens, which is why previously they had a hard time counting the r's.
You can certainly prove that mathematically, and giving that proof to an LLM it will give you the correct answer. Which is a prompting technique btw to improve accuracy of an LLMs results.
Although the more recent LLMs have built in approaches to answer this question.
Gemma4 for example:
-----
The user is asking for the number of 'r's in the word "strawberry".
* S
* T
* *R* (1)
* A
* W
* B
* E
* *R* (2)
* *R* (3)
* Y
There are 3 'r's.
-----
GPT-OSS-120b example.
-----
The user asks: "How many r's in strawberry." This is a straightforward counting problem: count the letter 'r' in the word "strawberry". The word "strawberry" contains letters: s t r a w b e r r y. Count r's: there are three 'r's? Let's check: s(1) t(2) r(3) a(4) w(5) b(6) e(7) r(8) r(9) y(10). Actually the word is "strawberry". Let's write out: s, t, r, a, w, b, e, r, r, y. So there are three r's: at positions 3, 8, 9. Yes, three r's. So answer: 3.
But we may also note that "r" appears three times.
Thus answer: 3.
We can provide a short answer.
Thus final: There are three r's in "strawberry".
----
Doubt if you can make a dumb model smart by feeding it proofs
https://www.promptingguide.ai/techniques/knowledge
Sohnds like a great way to fill up the context before you even start.
Yes, what's your point? That is literally what it does - it adds relevant knowledge to the prompt before generating a response, in order to ground it me effectively.
I doubt it is possible to mathematically prove much inside of a black box of billions of interconnected weights. But at least in the narrow case of the strawberry problem, it seems likely that LLM inference could reliably recognizing that sort of problem as the type that would benefit from a letter counting tool call as part of the response.
To be honest humans often have no overview over a application either. We navigate up and down the namespace, building the "overview" as we go. Nothing i see what prevents an agent from moving up and down that namespace, writing his assumptions into the codebase and requesting feedback from other agents working on different sets of the file.
"Nothing" prevents it other than the fact that an agent doesn't really have memory and has a pretty limited context, hallucinates information, mistakes metadata with data, and so on.
The path forward is always one that starts from the assumption that it will go wrong in all those different ways, and then builds from there
So one hub architect agent for overview- which generates tokens for the spoke agent and receives architectural problem reports from spoke agents?
I run a small team of AI agents building a product together. One agent acts as supervisor — reviews PRs, resolves conflicts, keeps everyone aligned. It works at this scale (3-4 agents) because the supervisor can hold the full context. But I can already see the bottleneck — the supervisor becomes the single point of coordination, same as a human tech lead. The distributed systems framing makes sense. What I'm not sure about is whether the answer is a new formal language, or just better tooling around the patterns human teams already use (code review, specs, tests).
Doesn't this whole argument fall apart if we consider iteration over time? Sure, the initial implementation might be uncoordinated, but once the subagents have implemented it, what stops the main agent from reviewing the code and sorting out any inconsistencies, ultimately arriving at a solution faster than it could if it wrote it by itself?
I'd wager that a "main agent" is really just a bunch of subagents in a sequential trench coat.
At the end, in both cases, it's a back and forth with an LLM, and every request has its own lifecycle. So it's unfortunately at least a networked systems problem. I think your point works with infinite context window and one-shot ting the whole repo every time... Maybe quantum LLM models will enable that
Right, but what you're describing is a consensus protocol. It's called 2 phase commit. The point of the article is just that we should really be analysing these high level plans in terms of distributed algorithms terms, because there are fundamental limitations that you can't overcome.
It’s not a solution but it’s why humans have developed the obvious approach of “build one thing, then everyone can see that one thing and agree what needs to happen next” (ie the space of P solutions is reduced by creating one thing and then the next set of choices is reduced by the original Choice.
This might be obvious to everyone but it’s a nice way to me to view it (sort of restating the non-waterfall (agile?) approach to specification discovery)
Ie waterfall design without coding is too under specified, hence the agile waterfall of using code iteratively to find an exact specification
After you point this out, it is obviously right!
Makes sense. Coordination between multiple agents feels like the real challenge rather than just building them.
Well it starts with agree list. I don't agree next gen models will be smarter. I would argue no real improvement in models in last couple of years just improvement in stability and tools (agentic ones) around it.
No shit sherlock.
I tried my hand at coding with multiple agents at the same time recently. I had to add related logic to 4 different repos. Basically an action would traverse all of them, one by one, carrying some data. I decided to implement the change in all of them at the same time with 4 Claude Code instances and it worked the first time.
It's crazy how good coding agents have become. Sometimes I barely even need to read the code because it's so reliable and I've developed a kind of sense for when I can trust it.
It boggles my mind how accurate it is when you give it the full necessary context. It's more accurate than any living being could possibly be. It's like it's pulling the optimal code directly from the fabric of the universe.
It's kind of scary to think that there might be AI as capable as this applied to things besides next token prediction... Such AI could probably exert an extreme degree of control over society and over individual minds.
I understand why people think we live in a simulation. It feels like the capability is there.
Please stop. I can't take HN seriously any more with comments like this here.