Carson’s experience matches mine: AI is good at analysis and boilerplate, but not good at the kind of critical thinking necessary for good designs. If it were human, I would say that it jumps to solutions to quickly, rather than stepping back to consider the big picture and how everything should fit together to make a cohesive whole.
It’s not human, of course, and I think this problem actually relates to the fact that LLMs don’t have a world model. They don’t study and think through a design in the way that humans do. They don’t form a mental model of how everything fits together and how that design can be tweaked to most elegantly support a change.
I suspect that this is a fundamental limitation of LLMs, and that design will remain a weak point until some sort of bespoke design AI is bolted onto the side. In the meantime, we’ve got a lot of people producing a lot of code very quickly, and I think the debt in that code is going to be a millstone around our necks for a long time to come.
It's just because not enough people had this very specific problem before.
This article will be part of the next model training set, and probably it will be able to solve it despite not understanding anything about world or not studying or thinking.
Exactly, LLM is good at "code inpainting" : define clear structures and goals, and it will fill the boilerplate. But it doesn't work for reasoning and abstraction, so it fails to synthesise and propose novel views. But that's integral to the way it's designed and has been trained, to do a kind of "averaging" which limits it's capacity to explore novel designs
It's a good write up, but it's lacking some details, the most important one is: which Claude model was used?
The second issue is: what was tooling and the prompt approach?
(To be clear, I have no problem with the premise of the write up. But without some details like this, it's sort of like saying "I had a bad board on my deck, and my tape measure wasn't able to help me remove the nails. What a bad tape measure."
Interesting read! Creating tests is highlighted as something Claude did well, but it strikes me that all the weaker rejected solutions could have been avoided if it were really good at designing intelligent tests for itself. For example, the first solution “was very specific to the reported bug and wouldn’t have fixed the general case” and the third suggestion “prevented the perfectly valid use of as conversion expressions in go commands as well”. I imagine both of these cases could have been noticed and avoided by the agent if it had planned out adequate tests ahead of time.
This is kind of what coding with LLMs feels like. Gradually increase guard rails "outside of it's context (automated)" to get the results you want out of it. Static typing, quick compilation, not having nulls, and lints are a great start (I would also argue for managed side effects and functional, but to each their own).
It gets pretty far to the solution on it's own and quickly, but then you spend time adjacent to the problem, building out it's cage while iterating through the remainder of the solution.
hello all, this is an article I wrote up on my interaction with an agent, Claude, in fixing a bug in the hyperscript parser
it was a rather mundane bug, but i thought the interaction was interesting and worth analyzing to show where AI is very strong and where it is not as strong
I disagree with the trope -- (AI effects) "the slow dulling of our intellects". I am old enough to remember my career change, being a developer in the Apple ecosystem, confident with Objective-C and native system libraries in iOS and MacOS. I changed direction using a very different software stack in cloud services as a data engineer with deep utilization of Clojure. I have personal projects that I occasionally would return to in the former world -- often a decade or more later. I saw what I forgot immediately; but soon after, with engagement, I saw how quickly I was able to remember. Extended use of AI for me has exactly this footprint. Even "use it or lose it" is wrong -- "use it when you need to" is honestly more like it -- the brain is plastic. Some AI fears are warranted, this isn't one of them.
In all my side projects, instead of thinking about architecture or design decisions, I just ask it what I want the end effect to be. "I want this button to do a thing". You're saying this is good for my brain?
do you propose its maybe closer to the idea that you can regain strength faster after having lost it (in the context of bodybuilding and extended time off)? Gaining something from scratch requires much effort and experimentation, regaining it less so?
maybe slightly unrelated but the new htmx homepage (https://four.htmx.org/) feels a little ironic, seemingly written with tailwindcss and a full JS ecosystem Astro build system. It also has the ‘vibey’ ‘hypey’ landing page design that’s hard to describe but you’ll find on any web framework, rather than dropping you to docs like the old site.
Compared to the original simple HTML site it’s really surprising to see from the grugbrain.dev author!
:) i let a younger person on the core team create the new website for something different
it is using astro, we are scaling down the use of tailwind (I wanted to give it a try, but didn't really click with it.)
I don't mind someone doing something kind of fun with the website and trying something new out, I know some people don't like it but some people do. All good.
isnt it obvious that some web sites will become unreadable without serious machine assistance, while classical HTML web standards have some fallback path to read by a human ?
clear text with minimal markup has many desirable properties IMHO
The author admits that the logic of the language and the design of the parser are idiosyncratic. Even the solution the author likes is an extension of an existing hacky trap door. He could be more open-minded about the solutions the AI proposed and in fact, I think AI could potentially rearchitect this in a more structured, sustainable, and legible way.
Many developer criticism of AI coders could be easily directed at 95%+ of human developers. Much coding is monkey see, monkey do and keep trying until it does the things we want it to do. AI can certainly do that cheaper and faster and really this is why automated testing became such an important software discipline with or without AI.
Maybe fair. I think my point was the author emphasizes how strange the software is. The further you are from the training data, the less well a model will perform. I haven't looked at the project, but it seems like it could maybe be written more conventionally. Or maybe not! In which case AI is bad at creativity and thinking outside the training data and that's a genuine insight.
Carson’s experience matches mine: AI is good at analysis and boilerplate, but not good at the kind of critical thinking necessary for good designs. If it were human, I would say that it jumps to solutions to quickly, rather than stepping back to consider the big picture and how everything should fit together to make a cohesive whole.
It’s not human, of course, and I think this problem actually relates to the fact that LLMs don’t have a world model. They don’t study and think through a design in the way that humans do. They don’t form a mental model of how everything fits together and how that design can be tweaked to most elegantly support a change.
I suspect that this is a fundamental limitation of LLMs, and that design will remain a weak point until some sort of bespoke design AI is bolted onto the side. In the meantime, we’ve got a lot of people producing a lot of code very quickly, and I think the debt in that code is going to be a millstone around our necks for a long time to come.
One partial mitigation is to ask it to use plan mode -- and then very carefully review the plan before allowing it to execute.
At that point I would rather just write the plan myself
It's just because not enough people had this very specific problem before.
This article will be part of the next model training set, and probably it will be able to solve it despite not understanding anything about world or not studying or thinking.
Exactly, LLM is good at "code inpainting" : define clear structures and goals, and it will fill the boilerplate. But it doesn't work for reasoning and abstraction, so it fails to synthesise and propose novel views. But that's integral to the way it's designed and has been trained, to do a kind of "averaging" which limits it's capacity to explore novel designs
It's a good write up, but it's lacking some details, the most important one is: which Claude model was used?
The second issue is: what was tooling and the prompt approach?
(To be clear, I have no problem with the premise of the write up. But without some details like this, it's sort of like saying "I had a bad board on my deck, and my tape measure wasn't able to help me remove the nails. What a bad tape measure."
Interesting read! Creating tests is highlighted as something Claude did well, but it strikes me that all the weaker rejected solutions could have been avoided if it were really good at designing intelligent tests for itself. For example, the first solution “was very specific to the reported bug and wouldn’t have fixed the general case” and the third suggestion “prevented the perfectly valid use of as conversion expressions in go commands as well”. I imagine both of these cases could have been noticed and avoided by the agent if it had planned out adequate tests ahead of time.
This is kind of what coding with LLMs feels like. Gradually increase guard rails "outside of it's context (automated)" to get the results you want out of it. Static typing, quick compilation, not having nulls, and lints are a great start (I would also argue for managed side effects and functional, but to each their own).
It gets pretty far to the solution on it's own and quickly, but then you spend time adjacent to the problem, building out it's cage while iterating through the remainder of the solution.
hello all, this is an article I wrote up on my interaction with an agent, Claude, in fixing a bug in the hyperscript parser
it was a rather mundane bug, but i thought the interaction was interesting and worth analyzing to show where AI is very strong and where it is not as strong
Always exciting to see a former professor on the front page and always an enjoyable read Mr. Gross!
I disagree with the trope -- (AI effects) "the slow dulling of our intellects". I am old enough to remember my career change, being a developer in the Apple ecosystem, confident with Objective-C and native system libraries in iOS and MacOS. I changed direction using a very different software stack in cloud services as a data engineer with deep utilization of Clojure. I have personal projects that I occasionally would return to in the former world -- often a decade or more later. I saw what I forgot immediately; but soon after, with engagement, I saw how quickly I was able to remember. Extended use of AI for me has exactly this footprint. Even "use it or lose it" is wrong -- "use it when you need to" is honestly more like it -- the brain is plastic. Some AI fears are warranted, this isn't one of them.
In all my side projects, instead of thinking about architecture or design decisions, I just ask it what I want the end effect to be. "I want this button to do a thing". You're saying this is good for my brain?
do you propose its maybe closer to the idea that you can regain strength faster after having lost it (in the context of bodybuilding and extended time off)? Gaining something from scratch requires much effort and experimentation, regaining it less so?
AI makes the case for htmx, we don't have to think about the spaghetti code, AI does it for us /s
maybe slightly unrelated but the new htmx homepage (https://four.htmx.org/) feels a little ironic, seemingly written with tailwindcss and a full JS ecosystem Astro build system. It also has the ‘vibey’ ‘hypey’ landing page design that’s hard to describe but you’ll find on any web framework, rather than dropping you to docs like the old site.
Compared to the original simple HTML site it’s really surprising to see from the grugbrain.dev author!
:) i let a younger person on the core team create the new website for something different
it is using astro, we are scaling down the use of tailwind (I wanted to give it a try, but didn't really click with it.)
I don't mind someone doing something kind of fun with the website and trying something new out, I know some people don't like it but some people do. All good.
that’s fair! It definitely looks good and modern!! I just wonder if it compromises the initial impressions of the project in some way.
isnt it obvious that some web sites will become unreadable without serious machine assistance, while classical HTML web standards have some fallback path to read by a human ?
clear text with minimal markup has many desirable properties IMHO
The author admits that the logic of the language and the design of the parser are idiosyncratic. Even the solution the author likes is an extension of an existing hacky trap door. He could be more open-minded about the solutions the AI proposed and in fact, I think AI could potentially rearchitect this in a more structured, sustainable, and legible way.
Many developer criticism of AI coders could be easily directed at 95%+ of human developers. Much coding is monkey see, monkey do and keep trying until it does the things we want it to do. AI can certainly do that cheaper and faster and really this is why automated testing became such an important software discipline with or without AI.
Yeah, no. The AI was unable to come up with a good solution whereas the human was. Point human.
Maybe fair. I think my point was the author emphasizes how strange the software is. The further you are from the training data, the less well a model will perform. I haven't looked at the project, but it seems like it could maybe be written more conventionally. Or maybe not! In which case AI is bad at creativity and thinking outside the training data and that's a genuine insight.