The first thing I do when I see a paper that claims transformers fundamentally can't do X or Y is to look at the models under test:
> To evaluate generalizability, we conducted tests of GPT-5 (41), Claude Opus 4.1 (42), and Gemini 2.5 Pro (43) from 2025 September
The problem with empirical negative results on LLMs is that they can't rule out that the alleged deficiencies disappear with increased scale and the right fine-tuning. It's like saying my dog has trouble with subject-verb agreement, so meat brains are "fundamentally limited in their capacity for grammar".
I can accept that current LLMs (even latest generation) might exhibit cognitive gaps similar to those we see in humans with deficient executive function, I can't accept these gaps as evidence of fundamental limits of the transformer architecture. LLMs are universal function approximators. Executive function is a function. Yes, yes, it's well-known that transformers have a circuit complexity limit set by layer count and whatever. The limit disappears once you allow for autoregression. Nobody cares about the limits of AI inside a single forward pass.
I have high confidence that with the right sort of training, executive function gaps in LLM can be addressed. I'm not convinced that the problem is the architecture per se.
this is a nice study but i don’t think it’s actually good argument
The first thing I do when I see a paper that claims transformers fundamentally can't do X or Y is to look at the models under test:
> To evaluate generalizability, we conducted tests of GPT-5 (41), Claude Opus 4.1 (42), and Gemini 2.5 Pro (43) from 2025 September
The problem with empirical negative results on LLMs is that they can't rule out that the alleged deficiencies disappear with increased scale and the right fine-tuning. It's like saying my dog has trouble with subject-verb agreement, so meat brains are "fundamentally limited in their capacity for grammar".
I can accept that current LLMs (even latest generation) might exhibit cognitive gaps similar to those we see in humans with deficient executive function, I can't accept these gaps as evidence of fundamental limits of the transformer architecture. LLMs are universal function approximators. Executive function is a function. Yes, yes, it's well-known that transformers have a circuit complexity limit set by layer count and whatever. The limit disappears once you allow for autoregression. Nobody cares about the limits of AI inside a single forward pass.
I have high confidence that with the right sort of training, executive function gaps in LLM can be addressed. I'm not convinced that the problem is the architecture per se.