LLMs are the eternal September for software, in that the sort of people who couldn’t make it through a bootcamp can now be “programming thought leaders”. There’s no longer a reliable way to filter signal from noise.
Those 3000 early adopters who are bookmarking a trivial markdown file largely overlap with the sort of people who breathlessly announce that “the last six months of model development have changed everything!”, while simultaneously exhibiting little understanding of what has actually changed.
There’s utility in these tools, but 99% of the content creators in AI are one intellectual step above banging rocks together, and their judgement of progress is not to be trusted.
> Sometimes I just bookmark things because I think to myself “Maybe I’ll try this out, when I have time” which then likely never happens.
For me that’s 100% of the time. I only bookmark or star things I don’t use (but could be interesting). The things I do use, I just remember. If they used to be a bookmark or star, I remove it at that point.
I'm sure i'll piss off a lot of people with this one but I don't care any more. I'm calling it what it is.
LLMs empower those without the domain knowledge or experience to identify if the output actually solves the problem. I have seen multiple colleagues deliver a lot of stuff that looks fancy but doesn't actually solve the prescribed problem at all. It's mostly just furniture around the problem. And the retort when I have to evaluate what they have done is "but it's so powerful". I stopped listening. It's a pure faith argument without any critical reasoning. It's the new "but it's got electrolytes!".
The second major problem is corrupting reasoning outright. I see people approaching LLMs as an exploratory process and let the LLM guide the reasoning. That doesn't really work. If you have a defined problem, it is very difficult to keep an LLM inside the rails. I believe that a lot of "success" with LLMs is because the users have little interest in purity or the problem they are supposed to be solving and are quite happy to deliver anything if it is demonstrable to someone else. That would suggest they are doing it to be conspicuous.
So we have a unique combination of self-imposed intellectual dishonesty, mixed with irrational faith which is ultimately self-aggrandizing. Just what society needs in difficult times: more of that! :(
> LLMs are the eternal September for software, in that the sort of people who couldn’t make it through a bootcamp can now be “programming thought leaders”
The Markdown file looks like it’s written for people who either haven’t discovered Plan mode, or who can’t be bothered to read a generated plan before running with it.
Too early to tell, so let's wait and see before we brush that off.
>> in that the sort of people who couldn’t make it through a bootcamp can now be “programming thought leaders”
>Snobbery.
Reality and actually a selling point of AI tools. I see pretty often ads for making apps without any knowledge of programming
>> the sort of people who breathlessly announce
> Snobbery / Cliche.
Reality
>> There’s no longer a reliable way to filter signal from noise.
> Cliche.
Reality, or do you destinguish a well programmed app from unaudited BS
>> There’s utility in these tools, but 99% of the content creators in AI are one intellectual step above banging rocks together
>Cliche / Snobbery.
99% is to high, maybe 50%
>> their judgement of progress is not to be trusted
> Tell me, timr, how much judgement is there is in snotty gatekeeping and strings of cliches?
We have many security issues in software coded by people who have experience in coding, how much do you trust software ordered by people who can't jusge if the program they get is secure or full of security flaws?
Don't forget these LLMs are trained on pre existing faulty code.
That might be because AI is being pushed largely by leaders that do not have the experience you’re referring to. Determinism is grossly undervalued - look at how low the knowledge of formal verification is, let alone its deployment in the real world!
When my code compiles in the evening, it also compiles the next morning. When my code stops compiling, usually I can track the issue in the way my build changed.
Sure, my laptop may die while I'm working and so the second compilation may not end because of that, but that's not really comparable to a LLM giving me three different answers when given the same prompt three times. Saying that nothing is deterministic buries the distinction between these two behaviours.
Deterministic tools is something the developper community has worked very hard for in the past, and it's sad to see a new tool giving none of it.
Determinism concerns itself with the predictability of the future from past and present states. If nothing were deterministic, you wouldn’t be able to set your clock or plan when to sow and when to harvest. You wouldn’t be able to drive a car or rest a glass on a table. You wouldn’t be able to type the exact same code today and tomorrow and trust it to compile identically. The only reason you can debug code is determinism, it is because you can make a prediction of what should happen and by inspecting what did happen you can can deduce what went wrong several steps before.
WordPress as a CMS is fine, but 90% of websites (e.g. the bit that lands in your browser) don't need the complexity of runtime generation and pointlessly run an application with a huge attack surface that's relatively easy to compromise. If sites used WordPress as a backend tool with a static site generator to bake the content there'd be far fewer compromised websites.
WordPress's popularity is mostly adding a huge amount of complexity, runtime cost, and security risk for every visitor for the only benefit of a content manager being able to add a page more easily or to configure a form without needing a developer. That is optimizing the least important part of the system.
and there is a crap-ton of "apps" that repackage the entire world^W^W excuse me, Chromium, hog RAM, and destroy any semblance of native feel - all to write "production-ready" [sic] cross-platform code in JavaScript, a language more absurd than C++[0] but so easy to start with.
Possibly because the people creating them _are_ the sloppy coworkers and they’re now experiencing (by using an AI tool) a reflection of what it’s like to work with themselves.
Even if this is complete nonsense, I choose to believe it :’)
My probably incorrect, uninformed hunch is that users convinced of how AI should act actually end up nerfing its capabilities with their prompts. Essentially dumbing it down to their level, losing out on the wisdom it's gained through training.
I've experienced in times of gpt 3, and 3.5 that existence of any, even 1-word system message changed output drastically in the worse direction. I did not verify this behaviour with recentl models.
Since then I don't impose any system prompts on users of my tg bot. This is so unusual and wild in relation to what others do that very few actually appreciate it. I'm happy I don't need to make money for living with this project thus I and can keep it ideologically clean: user's control over system prompts, temperature, top_p, giving selection of the top barebones LLMs.
So we reached a point where the quality of a `piece` of software is decided based on stars on GitHub.
The exact same thing happened with xClaw where people where going "look at this app that got thousands of stars on GitHub in only a few days!".
How is that different than the followers/likes counts on the usual social networks?
Given how much good it did to give power to strangers based on those counts, it's hard not to think that we're going in the completely wrong direction.
You know, it's good old prompt/context engineering. To be fair, markdowns actually can be useful because of LLM's (Transformer's) gullible/susceptible nature... At least that's what I discovered developing a prompting framework.
Of course it's hilarious a single markdown got 4000 starts, but it looks like just another example of how people chase a buzzing x post in tech space.
If exatly such markdown was written by some Joe from the internet no one would notice it. So these stars exist not because of quality or utility of the text.
All good advice in general. Could add others, like x-y problems etc.
This feels like a handbook for a senior engineer becoming a first level manager talking to junior devs. Which is exactly what it should be.
However, this will go horribly wrong if junior devs are thus “promoted “ to eng managers without having cut their teeth on real projects first. And that’s likely to happen. A lot.
Bro science is rampant in the AI world. Every new model that comes out is the best there ever was, every trick you can think of is the one that makes all the other users unsophisticated, "bro, you are still writing prompts as text? You have to put them into images so the AI can understand them visually as well as textually".
It isn't strange that this is the case, because you'd be equally hard pressed to compare developers at different companies. Great to have you on the team Paul, but wouldn't it be better if we had Harry instead? What if we just tell you to think before you code, would that make a difference?
Maybe I'm just really lucky but reading those instructions it's basically how I find Claude Code behaves. That repo with 4k stars is only 2 weeks old as well, so it's obviously not from a much less competent model.
Same here, I find most of these skills/prompts a bit redundant. Some people argue that in including these in the conversation, one is doing latent space management of sorts and bringing the model closer to where one wants it to be.
I wonder what will happen with new LLMs that contain all of these in their training data.
Next inevitable step is LLM alchemy. People will be writing crazy-ass prompts which ununderstandable text which somehow get system work better than the straight-human-text prompts.
1. Between Jan 27th and Feb 3rd stars grew quickly to 3K, project was released at that time.
2. People star it to be on top of NEW changes, people wanted to learn more about what's coming - but it didn't come. Doesn't mean people are dumb.
3. If OP synthesized the Markdown into a single line: "Think before coding" - why did he went through this VS Code extension publishing? Why can't they just share learnings and tell the world, "Add 'Think before coding' before your prompt and Please try for yourself!"
PS: no I haven't starred this project, I didn't know about it. But I disagree with the authors "assumptions" about stars and correlating it to some kind of insight revelation
I just packaged the extension for the fun of it! And I do want people to try for themselves, that is the point. About people that are not dumb; surely many people are not dumb; many people are very smart indeed. But that does not prove there are no dumb or gullible people!
thanks for responding and sharing your perspective.
What I would say, you could have omitted some negativity or judgement from your post about 4k devs starring something because it looks simple, because they might have different intentions for starring.
Here is another great example of 65K "not wrong" developers: https://github.com/kelseyhightower/nocode - there is no code, long before AI was a trend, released 9 years ago, but got 65K stars! Doesn't mean devs "not wrong", it means people are curious and saving things "just in case" to showcase somewhere
These hopeful incantations are a kind of cargo cult… But when applied to programming, the wild thing is that the cult natives actually built the airports and the airplanes but they don’t know what makes them fly and where the cargo comes from.
LLMs are the eternal September for software, in that the sort of people who couldn’t make it through a bootcamp can now be “programming thought leaders”. There’s no longer a reliable way to filter signal from noise.
Those 3000 early adopters who are bookmarking a trivial markdown file largely overlap with the sort of people who breathlessly announce that “the last six months of model development have changed everything!”, while simultaneously exhibiting little understanding of what has actually changed.
There’s utility in these tools, but 99% of the content creators in AI are one intellectual step above banging rocks together, and their judgement of progress is not to be trusted.
Sometimes I just bookmark things because I think to myself “Maybe I’ll try this out, when I have time” which then likely never happens.
So I wouldn’t give anything on 3k stars at all.
> Sometimes I just bookmark things because I think to myself “Maybe I’ll try this out, when I have time” which then likely never happens.
For me that’s 100% of the time. I only bookmark or star things I don’t use (but could be interesting). The things I do use, I just remember. If they used to be a bookmark or star, I remove it at that point.
Is Andrej Karpathy the guy who 'couldn't make it through a [coding] bootcamp' in this description?
Andrej Karpathy named the pitfalls and did't make the markup file
I agree.
I'm sure i'll piss off a lot of people with this one but I don't care any more. I'm calling it what it is.
LLMs empower those without the domain knowledge or experience to identify if the output actually solves the problem. I have seen multiple colleagues deliver a lot of stuff that looks fancy but doesn't actually solve the prescribed problem at all. It's mostly just furniture around the problem. And the retort when I have to evaluate what they have done is "but it's so powerful". I stopped listening. It's a pure faith argument without any critical reasoning. It's the new "but it's got electrolytes!".
The second major problem is corrupting reasoning outright. I see people approaching LLMs as an exploratory process and let the LLM guide the reasoning. That doesn't really work. If you have a defined problem, it is very difficult to keep an LLM inside the rails. I believe that a lot of "success" with LLMs is because the users have little interest in purity or the problem they are supposed to be solving and are quite happy to deliver anything if it is demonstrable to someone else. That would suggest they are doing it to be conspicuous.
So we have a unique combination of self-imposed intellectual dishonesty, mixed with irrational faith which is ultimately self-aggrandizing. Just what society needs in difficult times: more of that! :(
> LLMs are the eternal September for software, in that the sort of people who couldn’t make it through a bootcamp can now be “programming thought leaders”
The democratization of programming (derogatory)
And vendor locking
Free compilers were the democratization of programming.
This is the banalization of software creation by removing knowledge as a requirement. That's not a good thing.
You wouldn't call the removal of a car's brakes the democratization of speed, would you?
The Markdown file looks like it’s written for people who either haven’t discovered Plan mode, or who can’t be bothered to read a generated plan before running with it.
[flagged]
>> LLMs are the eternal September for software
>Cliche.
Too early to tell, so let's wait and see before we brush that off.
>> in that the sort of people who couldn’t make it through a bootcamp can now be “programming thought leaders”
>Snobbery.
Reality and actually a selling point of AI tools. I see pretty often ads for making apps without any knowledge of programming
>> the sort of people who breathlessly announce
> Snobbery / Cliche.
Reality
>> There’s no longer a reliable way to filter signal from noise.
> Cliche.
Reality, or do you destinguish a well programmed app from unaudited BS
>> There’s utility in these tools, but 99% of the content creators in AI are one intellectual step above banging rocks together
>Cliche / Snobbery.
99% is to high, maybe 50%
>> their judgement of progress is not to be trusted
> Tell me, timr, how much judgement is there is in snotty gatekeeping and strings of cliches?
We have many security issues in software coded by people who have experience in coding, how much do you trust software ordered by people who can't jusge if the program they get is secure or full of security flaws? Don't forget these LLMs are trained on pre existing faulty code.
With AI, it feels like deterministic outcomes are not valued as experience taught us it should.
The absence of means to measure outcomes of these prompt documents makes me feel like the profession is regressing further into cargo culting.
That might be because AI is being pushed largely by leaders that do not have the experience you’re referring to. Determinism is grossly undervalued - look at how low the knowledge of formal verification is, let alone its deployment in the real world!
It's particularly puzzling because until a few months ago the unmistakable consensus at the fuzzy borderlands between development and operations was:
1. Reproducibility
2. Chain of custody/SBOM
3. Verification of artifacts of CI
All three of which are not simply difficult but in fact by nature impossible when using an LLM
That is because nothing in the world is deterministic, they are just all varying degrees of probability.
This rings hollow to me.
When my code compiles in the evening, it also compiles the next morning. When my code stops compiling, usually I can track the issue in the way my build changed.
Sure, my laptop may die while I'm working and so the second compilation may not end because of that, but that's not really comparable to a LLM giving me three different answers when given the same prompt three times. Saying that nothing is deterministic buries the distinction between these two behaviours.
Deterministic tools is something the developper community has worked very hard for in the past, and it's sad to see a new tool giving none of it.
That is called a deepity: a statement which sounds profound but is ultimately trivial and meaningless.
https://rationalwiki.org/wiki/Deepity
Determinism concerns itself with the predictability of the future from past and present states. If nothing were deterministic, you wouldn’t be able to set your clock or plan when to sow and when to harvest. You wouldn’t be able to drive a car or rest a glass on a table. You wouldn’t be able to type the exact same code today and tomorrow and trust it to compile identically. The only reason you can debug code is determinism, it is because you can make a prediction of what should happen and by inspecting what did happen you can can deduce what went wrong several steps before.
surely, 4,000 developers can’t be wrong
Apparently almost half of all the websites on the internet run on WordPress, so it's entirely possible for developers to be wrong at scale.
Why is that a bad thing?
WordPress as a CMS is fine, but 90% of websites (e.g. the bit that lands in your browser) don't need the complexity of runtime generation and pointlessly run an application with a huge attack surface that's relatively easy to compromise. If sites used WordPress as a backend tool with a static site generator to bake the content there'd be far fewer compromised websites.
WordPress's popularity is mostly adding a huge amount of complexity, runtime cost, and security risk for every visitor for the only benefit of a content manager being able to add a page more easily or to configure a form without needing a developer. That is optimizing the least important part of the system.
and there is a crap-ton of "apps" that repackage the entire world^W^W excuse me, Chromium, hog RAM, and destroy any semblance of native feel - all to write "production-ready" [sic] cross-platform code in JavaScript, a language more absurd than C++[0] but so easy to start with.
[0]: https://jsdate.wtf
I know some people that would also benefit from these 65 lines of markdown. Even without using AI.
This! It feels like most of these markdown files floating around can bear the label _“stuff you struggled to have a sloppy coworker understand”_.
Possibly because the people creating them _are_ the sloppy coworkers and they’re now experiencing (by using an AI tool) a reflection of what it’s like to work with themselves.
Even if this is complete nonsense, I choose to believe it :’)
Line one would be a good start
>Think Before Coding
My probably incorrect, uninformed hunch is that users convinced of how AI should act actually end up nerfing its capabilities with their prompts. Essentially dumbing it down to their level, losing out on the wisdom it's gained through training.
I am with you on this one.
I've experienced in times of gpt 3, and 3.5 that existence of any, even 1-word system message changed output drastically in the worse direction. I did not verify this behaviour with recentl models.
Since then I don't impose any system prompts on users of my tg bot. This is so unusual and wild in relation to what others do that very few actually appreciate it. I'm happy I don't need to make money for living with this project thus I and can keep it ideologically clean: user's control over system prompts, temperature, top_p, giving selection of the top barebones LLMs.
So we reached a point where the quality of a `piece` of software is decided based on stars on GitHub.
The exact same thing happened with xClaw where people where going "look at this app that got thousands of stars on GitHub in only a few days!".
How is that different than the followers/likes counts on the usual social networks?
Given how much good it did to give power to strangers based on those counts, it's hard not to think that we're going in the completely wrong direction.
You know, it's good old prompt/context engineering. To be fair, markdowns actually can be useful because of LLM's (Transformer's) gullible/susceptible nature... At least that's what I discovered developing a prompting framework.
Of course it's hilarious a single markdown got 4000 starts, but it looks like just another example of how people chase a buzzing x post in tech space.
If exatly such markdown was written by some Joe from the internet no one would notice it. So these stars exist not because of quality or utility of the text.
That's just how it is in the LLM world. We've just gotten started. Once upon a time, the SOTA prompting technique was "think step by step".
All good advice in general. Could add others, like x-y problems etc.
This feels like a handbook for a senior engineer becoming a first level manager talking to junior devs. Which is exactly what it should be.
However, this will go horribly wrong if junior devs are thus “promoted “ to eng managers without having cut their teeth on real projects first. And that’s likely to happen. A lot.
Bro science is rampant in the AI world. Every new model that comes out is the best there ever was, every trick you can think of is the one that makes all the other users unsophisticated, "bro, you are still writing prompts as text? You have to put them into images so the AI can understand them visually as well as textually".
It isn't strange that this is the case, because you'd be equally hard pressed to compare developers at different companies. Great to have you on the team Paul, but wouldn't it be better if we had Harry instead? What if we just tell you to think before you code, would that make a difference?
(surprised smiley) - wait, this is for real? Can I feed my whiteboard scribbles as prompt?
That would be game changing!
Maybe I'm just really lucky but reading those instructions it's basically how I find Claude Code behaves. That repo with 4k stars is only 2 weeks old as well, so it's obviously not from a much less competent model.
Same here, I find most of these skills/prompts a bit redundant. Some people argue that in including these in the conversation, one is doing latent space management of sorts and bringing the model closer to where one wants it to be.
I wonder what will happen with new LLMs that contain all of these in their training data.
Next inevitable step is LLM alchemy. People will be writing crazy-ass prompts which ununderstandable text which somehow get system work better than the straight-human-text prompts.
> But the original repository has almost 4,000 stars, and surely, 4,000 developers can’t be wrong?
This is such a negative messaging!
Let's check star history: https://www.star-history.com/#forrestchang/andrej-karpathy-s...
1. Between Jan 27th and Feb 3rd stars grew quickly to 3K, project was released at that time.
2. People star it to be on top of NEW changes, people wanted to learn more about what's coming - but it didn't come. Doesn't mean people are dumb.
3. If OP synthesized the Markdown into a single line: "Think before coding" - why did he went through this VS Code extension publishing? Why can't they just share learnings and tell the world, "Add 'Think before coding' before your prompt and Please try for yourself!"
PS: no I haven't starred this project, I didn't know about it. But I disagree with the authors "assumptions" about stars and correlating it to some kind of insight revelation
I just packaged the extension for the fun of it! And I do want people to try for themselves, that is the point. About people that are not dumb; surely many people are not dumb; many people are very smart indeed. But that does not prove there are no dumb or gullible people!
thanks for responding and sharing your perspective.
What I would say, you could have omitted some negativity or judgement from your post about 4k devs starring something because it looks simple, because they might have different intentions for starring.
Here is another great example of 65K "not wrong" developers: https://github.com/kelseyhightower/nocode - there is no code, long before AI was a trend, released 9 years ago, but got 65K stars! Doesn't mean devs "not wrong", it means people are curious and saving things "just in case" to showcase somewhere
Nocode is obviously a banter repo and people starred it because made them laugh.
I wonder if someone could explore creating a standalone product out of that markdown, just for the fun of it.
Perhaps a cool wall art sticker? "In this house we don't assume. We don't hide confusion. We surface tradeoffs."
Let's start a company and raise money from investors!!
These hopeful incantations are a kind of cargo cult… But when applied to programming, the wild thing is that the cult natives actually built the airports and the airplanes but they don’t know what makes them fly and where the cargo comes from.
There will come a day soon where “hello world” will be typed by sentient hands for the last time.
I found that "Make no mistakes, or you go to jail" improves claude-code's performance by about 43%
These days I genuinely can't tell if articles are satire or not.