What exhausts me isn’t “falling behind.” It’s watching the profession collectively decide that the solution to uncertainty is to pile abstraction on top of abstraction until no one can explain what’s actually happening anymore.
This agentic arms race by C-suite know-nothings feels less like leverage and more like denial. We took a stochastic text generator, noticed it lies confidently, wipes entire databases and harddrives, and responded by wrapping it in managers, sub-agents, memories, tools, permissions, workflows, and orchestration layers so we don’t have to look directly at the fact that it still doesn’t understand anything.
Now we’re expected to maintain a mental model not just of our system, but of a swarm of half-reliable interns talking to each other in a language that isn’t executable, reproducible, or stable.
Work now feels duller than dishwater, enough to have forced me to career pivot for 2026.
I think AI-assisted programming may be having the opposite effect, at least for me.
I'm now incentivized to use less abstractions.
Why do we code with React? It's because synchronizing state between a UI and a data model is difficult and it's easy to make mistakes, so it's worth paying the React complexity/page-weight tax in order for a "better developer experience" that allows us to build working, reliable software with less typing of code into a text editor.
If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
How often have you dropped in a big complex library like Moment.js just because you needed to convert a time from one format to another, and it would take too long to hand-write that one feature (and add tests for it to make sure it's robust)? With an LLM that's a single prompt and a couple of minutes of wait.
Using LLMs to build black box abstraction layers is a choice. We can choose to have them build LESS abstraction layers for us instead.
> If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
I've had plenty of junior devs justify massive code bases of random scripts and 100+ line functions with the same logic. There's a reason senior devs almost always push back on this when it's encountered.
Everything hinges on that "if". But you're baking a tautology into your reasoning: "if LLMs can do everything we need them to, we can use LLMs for everything we need".
The reason we stop junior devs from going down this path is because experience teaches us that things will break and when they do, it will incur a world of pain.
So "LLM as abstraction" might be a possible future, but it assumes LLMs are significantly more capable than a junior dev at managing a growing mess of complex code.
This is clearly not the case with simplistic LLM usage today. "Ah! But you need agents and memory and context management, etc!" But all of these are abstractions. This is what I believe the parent comment is really pointing out.
If AI could do what we originally hoped it could: follow simple instructions to solve complex tasks. We'd be great, and I would agree with your argument. But we are very clearly not in that world. Especially since Karpathy can't even keep up with the sophisticated machinery necessary to properly orchestrate these tools. All of the people decrying "you're not doing it right!" are emphatically proving that LLMs cannot perform these tasks at the level we need them to.
I'm saying that a key component of the dependency calculation has changed.
It used to be that one of the most influential facts affecting your decision to add a new library was the cost of writing the subset of code that you needed yourself. If writing that code and the accompanying tests represented more than an hour of work, a library was usually a better investment.
If the code and tests take a few minutes those calculations can look very different.
Making these decisions, effectively responsibility as one of the key characteristics of a senior engineer, which is why it's so interesting that all of those years of intuition are being disrupted.
The code we are producing remains the same. The difference is that a senior developer may have written that function + tests in several hours, at a cost of thousands of dollars. Now that same senior developer can produce exactly the same code at a time cost of less than $100.
> The reason we stop junior devs from going down this path is because experience teaches us that things will break and when they do, it will incur a world of pain.
Hyperbole. It's also very often a "world of pain" with a lot of senior code.
> "LLM as abstraction" might be a possible future, but it assumes LLMs are significantly more capable than a junior dev at managing a growing mess of complex code.
Ignoring for a second they actually already are indeed, it doesn’t matter because the cost of rewriting the mess drops by an order of magnitude with each frontier model release. You won’t need good code because you’ll be throwing everything away all the time.
> things will break and when they do, it will incur a world of pain
How much if this is still true and exaggerated in our world environment today where the cost of making things is near 0?
I think “Evolution” would say that the cost of producing is near 0 so the possibility of creating what we want is high. The cost of trying again is low so mistakes and pain aren’t super high. For really high stakes situation (which most situations are not) bring the expert human in the loop until the expert better than that human is AI.
Current dependency hell that is modern development, just how wide the openings are for supply chain attacks and seemingly every other week we get a new RCE.
I'd rather 100 loosely coupled scripts peer reviewed by a half a dozen of LLM agents.
I'm incentivised to use abstractions that are harder to learn, but execute faster or more safely once compiled. E.g. more Rust, Lean.
> If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
LLMs benefit from abstractions the same way as we do.
LLMs currently copy our approaches to solving problems and copy all the problems those approaches bring.
Letting LLMs skip all the abstractions is about as likely to succeed as genetic programming is efficient.
For example, writing more vanilla JS instead of React, you're just reinventing the necessary abstractions more verbosely and with a higher risk of duplicate code or mismatching abstractions.
In a recent interview with Bret Weinstein, a former professor of evolutionary biology, he proposed that one property of evolution that makes the story of one species evolving into another more likely is that it's not just random permutations of single genes; it's also permutations to counter variables encoded as telomeres and possibly microsatellites.
Bret compares this to flipping random bits in a program to make it work better vs. tweaking variables randomly in a high-level language. Mutating parameters at a high-level for something that already works is more likely to result in something else that works than mutating parameters at a low level.
So I believe LLMs benefit from high abstractions, like us.
We just need good ones; and good ones for us might not be the same as good ones for LLMs.
> For example, writing more vanilla JS instead of React, you're just reinventing the necessary abstractions more verbosely and with a higher risk of duplicate code or mismatching abstractions.
Right, but I'm also getting pages that load faster and don't require a build step, making them more convenient to hack on. I'm enjoying that trade-off a lot.
I'd rather have LLMs build on top of proven, battle-tested production libraries than keep writing their own from scratch. You're going to fill up context with all of its re-invented wheels when it already knows how to use common options.
Not to mention that testing things like this is hard. And why waste time (and context and complexity) for humans and LLMs trying to do something hard like state syncing when you can focus on something else?
For smol things like left-pad, sure but the two examples given (moment and react) solve really hard problems. If I were reviewing a PR where someone tried to re-implement time zone handling in JS, that’s not making it through review.
In JS, the DOM and time zones are some of the most messed up foundations you’re building on top of ime. (The DOM is amazing for documents but not designed for web apps.)
I think we really need to be careful about adding dependencies that we’re maintaining ourselves, especially when you factor in employee churn and existing options. Unless it’s the differentiator for the business you’re building, my advice to engineers is to strongly consider other options and have a case for why they don’t fit.
AI can play into the engineering blind spot of building it ourselves because it’s fun. But engineering as a discipline requires restraint.
...is a loaded question, with a complex and nuanced answer. Especially when you continue:
> it's worth paying the React complexity/page-weight tax
All right; then why do we code in React when a smaller alternative, such as Preact, exists, which solves the same problem, but for a much lower page-weight tax?
Why do we code in React when a mechanism to synchronize data with tiny UI fragments through signals exists, as exemplified by Solid?
Why do people use React to code things where data doesn't even change, or changes so little that to sync it with the UI does not present any challenge whatsoever, such as blogs or landing pages?
I don't think the question 'why do we code with React?' has a simple and satisfactory answer anymore. I am sure marketing and educational practices play a large role in it.
My cynical answer is that most web developers who learned their craftsin the last decade learned frontend React-first, and a lot of them genuinely don't have experience working without it.
Which means hiring for a React team is easier. Which means learning React makes you more employable.
Our industry wants disruption, speed, delivery! Automatic code generation does that wonderfully.
If we wanted safety, stability, performance, and polish, the impact of LLMs would be more limited. They have a tendency to pile up code on top of code.
I think the new tech is just accelerating an already existing problem. Most tech products are already rotting, take a look at windows or iOS.
I wonder what will it take for a significant turning point in this mentality.
It's not something that suddenly changed. "I'll generate some code" is as nondeterministic as "I'll look for a library that does it", "I'll assign John to code this feature", or "I'll outsource this code to a consulting company". Even if you write yourself, you're pretty nondeterministic in your results - you're not going to write exactly the same code to solve a problem, even if you explicitly try.
Contrary to code generation, all the other examples have one common point which is the main advantage, which is the alignment between your objective and their actions. With a good enough incentive, they may as well be deterministic.
When you order home delivery, you don’t care about by who and how. Only the end result matters. And we’ve ensured that reliability is good enough that failures are accidents, not common occurrence.
Code generation is not reliable enough to have the same quasi deterministic label.
It's wild that management would be willing to accept it.
I think that for some people it is harder to reason about determinism because it is similar to correctness, and correctness can, in many scenarios be something you trade off - for example in relation to scaling and speed you will often trade off correctness.
If you do not think clearly about the difference with determinism and other similar properties like (real-time) correctness which you might be willing to trade off, you might think that trading off determinism is just more of the same.
Note: I'm against trading off determinism, but I am willing to think there might be a reason to trade it off, just I worry that people are not actually thinking through what it is they're trading when they do it.
Determinism require formality (enactment of rules) and some kind of omniscience about the system. Both are hard to acquire. I’ve seen people trying hard not to read any kind of manual and failing to reason logically even when given hints about the solution to a problem.
I think those that are most successful at creating maintainable code with AI are those that spend more time upfront limiting the nondeterminism aspect using design and context.
There has always been a laissez-faire subset of programmers who thrive on living in the debugger, getting occasional dopamine hits every time they remove any footgun they previously placed.
I cannot count the times that I've had essentially this conversation:
"If x happens, then y, and z, it will crash here."
"What are the odds of that happening?"
"If you can even ask that question, the probability that it will occur at a customer site somewhere sometime approaches one."
It's completely crazy. I've had variants on the conversation from hardware designers, too. One time, I was asked to torture a UART, since we had shipped a broken one. (I normally build stuff, but I am your go-to whitebox tester, because I hone in on things that look suspicious rather than shying away from them.) When I was asked the inevitable "Could that really happen in a customer system?" after creating a synthetic scenario where the UART and DMA together failed, my response was:
"I don't know. You have two choices. Either fix it where the test passes, or prove that no customer could ever inadvertently recreate the test conditions."
My dad worked in the auto industry and they came across a defect in an engine control computer where they were able to give it something like 10 million to one odds of triggering.
They then turned the thing on, it ran for several seconds, encountered the error, and crashed.
Oh, that's right, the CPU can do millions of things a second.
Something I keep in the back of my mind when thinking about the odds in programming. You need to do extra leg work to make sure that you're measuring things in a way that's practical.
Delving a bit deeper... I've been wondering if the problem's related to the rise in H1B workers and contractors. These programmers have an extra incentive to avoid pushing back on c-suite/skip level decisions - staying out of in-office politics reduces the risk of deportation. I think companies with a higher % of engineers working with that incentive have a higher risk of losing market share in the long-term.
My work is better than it has been for decades. Now I can finally think and experiment instead of wasting my time on coding nitty-gritty detail, impossible to abstract. Last autumn was the game changer, basically Codex and later Opus 4.5; the latter is good with any decent scaffolding.
I have to admit, LLMs do save a lot of typing a d associated syntax errors. If you know what you want and can spot and fix mistakes made by the LLM then they can be pretty useful. I don’t think it’s wise to use them for development if you are not knowledgeable enough in the domain and language to recognize errors or dead ends in the generated code though.
Could we all just agree to stop using the term "abstraction". It's meaningless and confusing. It's cover for a multitude of sins, because it really could mean anything at all. Don't lay all the blame on the c-suite; they are what they are, and have their own view. Don't moan about the latest egregious excess of some llm. If it works for you, use it; if it doesn't, don't. But, stop whinging.
The system is designed to do exactly that. This is called ‘productivity increase’ and is deflationary in large dosages. Deflation sounds good until you understand where it’s coming from.
This is true, though the people that actually push the field forward do know enough about every level of abstraction to get the job done. Making something (very important) horrible just to rush to market can be a pretty big progress blocker.
Jensen is someone I trust to understand the business side and some of those lower technical layers, so I'm not too concerned.
> It’s watching the profession collectively decide that the solution to uncertainty is to pile abstraction on top of abstraction until no one can explain what’s actually happening anymore.
The ubiquitous adoption of LLMs for generating code is mostly a sign of bad abstraction or the absence of abstraction, not the excess of abstraction.
And choosing/making the right abstraction is kind of the name of the game, right? So it's not abstraction per se that's a problem.
Wow - can we coin "Slopbrain" for people who are so far gone into AI eventualism that they can no longer function? Liked "cooked" but "slopped" or something. Good grief lol. Talk about getting lost in the sauce...
WSJ has been writing increasingly about "AI Psychosis" (here's their most recent piece [0]).
I'm increasingly seeing that this is the real threat of AI. I've personally known people who have started to strain relationships with friends and family because they sincerely believe they are evolving into something new. While not as dramatic, the normalization of the use of "AI as therapist" is equally concerning. I know tons of people that rely on LLMs to guide them in difficult family decisions, career decisions, etc on an almost daily basis. If I'm honest, I myself have had times where I've leaned into this too much. I've also had times where AI starts telling me how clever I am, but thankfully a lifetime of low self worth signals warning flags in my brain when I hear this stuff! For most people, there is real temptation to buy into the praise.
Seeing Karpathy claim he can't keep up was shocking. It also immediately raises the question to anyone with a clear head: "Wait, if even Karpathy cannot use these tools effectively... just what is so useful about AI?" Isn't the entire point of AI that I can merely describe my problem and have a solution in a fraction of the time.
The fact that so many true believers in AI seem to forever be just a few more tricks away from really unleashing this power, starts to make it feel very much like magical thinking on a huge scale.
The real danger of AI is that we're entering into an era of mass hallucination across multiple fields and areas of human activity.
> I've personally known people who have started to strain relationships with friends and family because they sincerely believe they are evolving into something new.
Cryptoboys did it first, please recognize their innovation ty
They aren't addressing my comment (which is obviously an overreaction to the tweet), he's asking you why we should appeal to authority rather than evaluate whether Karpathy is completely overreacting and in way too deep.
(Not the grandparent poster, but answering for myself)
1. Name calling is cheap and does not inform or persuade.
2. Karpathy has a history of informing me of interesting things, is widely known for his talent for making complex ideas grokkable, and has contributed to the field of AI for more than a decade, very notably as a first class communicator of ideas to the public.
Note that (2) is appealing to Karpathy's credibility due to his previous work, and NOT to authority. I have a positive prior on his ideas due to my exposure to his previous ideas, and particularly his ability to explain things, and not due to who he is, or what job he has.
> There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and ...
This sounds unbearable. It doesn't sound like software development, it sounds like spending a thousand hours tinkering with your vim config. It reminds me of the insane patchwork of sprawl you often get in DevOps - but now brought to your local machine.
I honestly don't see the upside, or how it's supposed to make any programmer worth their weight in salt 10x better.
As far as I can tell as a heavy coding agent user: you don’t need to know any of this and that’s a testament to how good code agent TUIs have become. All I do to be productive with a coding agent is tell it to break a problem down into tasks, store it inside beads, and then make sure each step is approved by me. I also add in a TDD requirement where it needs to build tests that fail then eventually pass.
Everything else I’ve used has been over engineered and far less impactful. What I just said above is already what many of us do anyway.
This sounds like my complete and utter nightmare. No art or finesse in building the thing - only an exercise in torturing language to someone who at a fundamental level doesn't understand a thing.
I'm not viewing AI tooling as an extinction of the art of programming, only illuminating how telling an AI how to create programs isn't in the same universe as programming, where the technical skill to do such a thing is on par with punching in how long my microwave should nuke my popcorn.
Not really at all like this, more like being a tech lead for a team of savants who simultaneously are great at parts of software engineering, and limited at others. Though that latter category is slimmer than a year ago…
The point is, you can get lots of quality work out of this team if you learn to manage them well.
If that sounds like a “complete and utter nightmare”, then don’t use AI. Hopefully you can keep up without it in the long run.
I can't see the original post because my browser settings break Twitter (I also haven't liked much of Karpathy's output), but I agree. I call this style of software development 'meeting-based programming,' because that seems to be the mental model that the designers of the tools are pursuing. This probably explains, in part, why c-suite/MBA airheads are so excited about the tools: meetings are how they think and work.
I suppose LLMs/chatbots and 'agents' are just the latest phase of a trend that the internet has been encouraging for decades: the elimination of mental privacy. I don't mean 'privacy' in an everyday sense -- i.e. things I keep to myself and don't share. I mean 'privacy' in a more basic sense: private experience -- sitting by oneself; having a mental space that doesn't include anybody else or invite the outside world; simply spending time with one's own thoughts.
The internet encourages us to direct our thoughts and questions outward: look things up; find out what others have said; go to wikipedia; etc. This is, I think, horribly corrosive to the very essence of being a thinking, sentient being. It's also unsurprising, I guess. Humans are social animals. We're going to find ourselves easily seduced by anything that lets us replace private experience with social experience. I suppose it was only a matter of time until someone did this with programming tools, too.
> ... or how it's supposed to make any programmer worth their weight in salt 10x better.
It doesn't. The only people I've seen claim such speedups are either not generally fluent in programming or stand to benefit financially from reinforcing this meme.
For every conspicuous vibecoding influencer there are a bunch of experienced software engineers using them to get things done. The newest generation of models are actually pretty decent at following instructions and using existing code as a template. Building line-of-business apps is much quicker with Claude Code because once you've nicely scaffolded everything you can just tell it to build stuff and it'll do so the same way you would have in a fraction of the time. You can also use it to research alternatives to architectural approaches and tooling that you come up with so that you don't paint yourself into a corner by having not heard about some semi-niche tool that fits your use case perfectly.
Of course I wouldn't use an LLM to #yolo some Next.js monstrosity with a flavor-of-the-week ORM and random Tailwind. I have, however, had it build numerous parts of my apps after telling it all about the mise targets and tests and architecture of the code that I came up with up front. In a way it vindicates my approach to software engineering because it's able to use the tools available to it to (reasonably) ensure correctness before it says it's done.
I am a professional engineer with around 10 years of experience and I use AI to work about 5x faster on a site I personally maintain (~100 DAU, so not huge, but also not nothing). I don’t work in AI so I get no financial benefit by “reinforcing this meme”.
Oh, well if it can generate some simple code for your personal website, surely it can also be the "next level of abstraction" for the entirety of software engineering.
Well, I don’t really think it’s “simple”. The code uses React, nodejs, realtime events pushed via SSE, infra pushed via Terraform, postgres, blob store on S3, emails send with SES… sure, it’s not the next Google, but it’s a bit above, like, a personal blog.
And in any case, you are moving goalposts. OP said he had never seen anyone serious claim that they got productivity gains from AI. When I claim that, you say “well it’s not the next level of abstraction for all SWE”. Obviously - I never claimed that?
I admit to pangs of this, but it's really never made any sense because the implication is that the profession is now magically closed off to newcomers.
Imagine someone in the 90s saying "if you don't master the web NOW you will be forever behind!" and yet 20 years later kids who weren't even born then are building web apps and frameworks.
Waiting for it to all shake out and "mastering" it then is still a strategy. The only thing you'll sacrifice is an AI funding lottery ticket.
Finally a voice of reason. The tools will just get better and easier to use. I use LLMs now, but I'm not going to dump a bunch of time learning the new hotness. I'll let other people do that and pickup the useful pieces later.
Unless your gunning for a top position as a vibe coder, this whole concept of "falling behind" is just pure FOMO.
Eh, for myself as a middle-aged software engineer, it feels a little like the last chopper out of Saigon. I feel less and less confident that I can make as good a living in software for the next decade as I have for the last couple. Or if I want to. The job is changing so fast right now, and I’m not sure I like it. When I worked in big tech, I preferred being an IC over an EM or tech lead because I like writing code. Now it feels increasingly like you can’t be an IC in that way anymore. You’re now coding through others, either humans or AI.
Sure, I can write code manually, but in my case I’m working full time on my own SaaS and I am absolutely faster and more effective with AI. It’s not even close. And the gains are so extreme that I can’t justify writing beautiful hand-crafted artisanal code anymore. It turns out that code that’s “good enough” will do, and that’s all I can afford right now.
But long-term, I don’t know that I want to do that work, especially for some corporation. It feels like the difference between being a master furniture craftsman, and then going work in an IKEA factory.
I am a software developer and mainly a programmer for decades now. I love programming. I love to be "once" with the computer. I will never give this joy up. If I need to sell shoes at daytime, I will program real computer programs in the evenings. If it won't be possible with modern machinery anymore, I will take my Commodore 64. I am a free man.
My company takes between Christmas and New Years off. I took a week before that off too. I have not used AI in that time. The slower pace of life is amazing. But when I get back to coding it will be back to running at 180%. It’s the new norm. However I’ve decided to take longer “no computer” breaks in my day. I have to adapt but I need to defend my “take it slow” times and find some analogue hobbies. The shift is real and you can’t wind it back.
I’ve been taking my son for stroller walks more often over Christmas. I bring a headset for listening to music, podcasts, audiobooks, tech talks. “Be effective.” But I end up just walking and thinking, realising this is “free time”.
It sounds ridiculous and easy to say spending time walking and thinking will improve your decisions and priorities that no productivity hack will.
I only actually did slow down for a while because I had to for the well-being of my family. Sure feels important to not always be on top of everyone else’s business.
As an Opus user, I genuinely don’t understand how someone can work for weeks or months without regularly opening an IDE. The output almost always fails.
I repeatedly rewrite prompts, restate the same constraints, and write detailed acceptance criteria, yet still end up with broken or non-functional code.its very frustrating to say the least Yesterday alone I spent about $200 on generations that now require significant manual rewrites just to make them work.
At that point, the gains are questionable. My biggest success is having the model take over the first Design in my app and I take it from there, but those hundred lines if not thousand lines of code it generates are so Messi, it's insanely painful to refactor the mess afterwards
Sometimes I have a similar file or related files. I copy their names and say use them as reference.
Code quality improves by 10 times if you do so. Even providing a a example from framework's getting started works great too for new project.
Yeah the pain of cleaning up small mess is great too. I had some tests failing and type failing issues, I thought I will fix it later by only using AI prompt. As the size was growing, failing Typescript issues was growing too. At some point it was 5000+ type issues and countless number of failing unit tests. Then more and more. I tried to fix with AI, since it was not possible fixing old way. Then I discarded the whole project when it was around 500k lines of code.
My trick is to explicitly roll play that we’re doing a spike. This gets all of the models to ignore all of the details they normally get hung up on. Once I have the basics in place, I can tell it to fix details.
It’s _always_ easier to add more code than it is to fix broken code.
- Heavy usage of plan mode. Tell AI something like "make at least 20 searches to online documentation", support every claim with a reference, etc. Tell AI "make a task for every little thing you'll implement"
- Have the AI write tests, particularly the more expensive ones like integration and end-to-end, so you have an easy way to verify functionality.
- Setup Claude Code GHA to automatically review PRs. Give the review feedback to the agent that implemented it, either via copy-pasting or tell the agent "fetch review comments and fix them".
> make at least 20 searches to online documentation
Lol sometimes I have to spend two turns convincing Claude to use its goddamn search and look up the damn doc instead of trying to shoot from the hip for the fifth time. ChatGPT at least has forced search mode.
I've found that telling it to specifically do N searches works consistently. I do really wish Claude Code had a "deep research" mode similar to 'normal' Claude.
This is what an AGENTS.md - https://agents.md/ (or CLAUDE.md) file is for. Put common constraints to correct model mistakes/issues with respect to the codebase, e.g. in a “code style” section.
Why would you spend $200 a day on Opus if you can pay that for a month via the highest tier Claude Max subscription? Are you using the API in some special way?
The $200/month plan doesn't have limits either - they have an overage fee you can pay now in Claude Code so once you've expended your rate limited token allowance you can keep on working and pay for the extra tokens out of an additional cash reserve you've set up.
> The $200/month plan doesn't have limits either... once you've expended your rate limited token allowance... pay for the extra tokens out of an additional cash reserve you've set up
You're absolutely right! Limited token allowance for $200/month is actually unlimited tokens when paying for extra from a cash reserve which is also unlimited, of course.
I think you may have misunderstood something here.
When paying for Claude Max even at $200/month there are limits - you have a limit to the number of tokens you can use per five hour period, and if you run out of that you may have to wait an hour for the reset.
You COULD instead use an API key and avoid that limit and reset, but that would end up costing you significantly more since the $200/month plan represents such a big discount on API costs.
As-of a few weeks ago there's a third option: pay for the $200/month plan but allow it to charge you extra for tokens when you reach those limits. That gives you the discount but means your work isn't interrupted.
Thank you for the explanation, but I did fully understand that is what you were saying.
What I don't fully understand is how you can characterize that as "not limited" with a straight face; then again, I can't see your face so maybe you weren't straight faced as you wrote it in the first place.
Hopefully you could see my well meaning smile with the "absolutely right" opening, but apparently that's no longer common so I can understand your confusion as https://absolutelyright.lol/ indicates Opus 4.5 has had it RLHF'd away.
> OpenAI's sales and marketing expenses increased to _$2 billion_ in the first half of 2025.
Looks like AI companies spend enough on marketing budgets to create the illusion that AI makes development better.
Let's wait one more year, and perhaps everyone who didn't fall victim to these "slimming pills” for developers' brains will be glad about the choice they made.
Well. I was a sceptic for a long time, but a friend recently convinced me to try Claude Code and showed me around. I revived an open source project I regularly get back to, code for a bit, have to wrestle with toil and dependency updates, and loose the joy before I really get a lot done, so I stop again.
With Claude, all it took to fix all of that drudge was a single sentence. In the last two weeks, I implemented several big features, fixed long standing issues and did migrations to new major versions of library dependencies that I wouldn’t have tackled at all on my own—I do this for fun after all, and updating Zod isn’t fun. Claude just does it for me, while I focus on high-level feature descriptions.
I’m still validating and tweaking my workflow, but if I can keep up that pace and transfer it to other projects, I just got several times more effective.
This sounds to me like a lack of resource management, as tasks that junior developers might perform don't match your skills, and are thus boring.
As a creator of an open-source platform myself, I find trusting a semi-random word generator in front of users unreliable.
Moreover, I believe it creates a bad habit. I've seen developers forget how to read documentation and instead trust AI, and of course, as a result AI makes mistakes that are hard to debug or provokes security issues that are easy to overlook.
I know this sounds like a luddite talking, but I'm still not convinced that AI in its current state can be reliable in any way. However, because of engineers like you, AI is learning to make better choices, and that might change in the future.
That’s a totally fair take IMHO, and I’m very much conflicted on several ends on this topic—for example, would I want my juniors to use an agent? No; not even the mid levels, probably. As you say, it’s easy to form bad habits, and you need a good intuition for architecture and complexity, otherwise you end up with broken, unmaintainable messes. but if you have that, it’s like magic.
Let's wait one more year, and perhaps everyone who didn't fall victim to these "slimming pills” for developers' brains will be glad about the choice they made.
AI is only getting better at consuming energy and wasting people's time communicating with this T9. However, if talented engineers continue to use it, it might eventually provide more accurate replies as a result.
Answering your question, no matter how much I personally degrade or improve, I will not be able to produce anything even remotely comparable in terms of negative impact that AI brings to humanity these days.
> strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering
Sounds fever dreamish. Thank you sincerely (not) for creating it!
Does any of you bother the fact that now you have to pay money in order to do your job? I mean AI model subscriptions. Somehow it feels wrong for me to pay for tools that are trying to replace me.
IDEs used to be extremely expensive back in the 1990s. IDEs such as Microsoft Visual Studio and IBM's Visual age for Java were quite expensive subscription as I recall. subsequently, open source IDEs like Eclipse and VisualStudio seem to have become the norm.
For the longest time, the joy of creation in programming came from solving hard problems. The pursuit of a challenge meant something. Now, that pursuit seems to be short-circuited by an animated being racing ahead under a different set of incentives. I see a tsunami at the beach, and I’m not sure whether I can run fast enough.
Not to mention many companies speedrunning systems of strange and/or perverse incentives with AI adoption.
That being said, Welch’s grape juice hasn’t put Napa valley out of business. Human taste is still the subjective filter that LLMs can only imitate, not replace.
I view LLM assisted coding (on the sliding scale from vibe coding to fancy auto complete) similar to how Ableton and other DAW software have empowered good musicians that might not have made it otherwise due to lack of connections or money, but the music industry hasn’t collapsed completely.
Yep DAW’s aren’t the comparison. People are not thinking deeply about what is going on - there is a big war on-going in order to eradicate taste and make it systematic to immensely benefit the few.
And there it is again, the "powerful alien tool" that was just "handed to us".
No decades of research and massive allocation of resources over the last few years as well as very intentional decision making by tech leadership to develop this specific technology.
Nope, it just mysteriously dropped from the sky one day.
The point is that all that research mostly doesn’t help in mastering the tool. Unlike traditional tools, it doesn’t come with an instruction manual. It’s like an alien tool just handed to us in exactly that sense.
It’s written in the title of the post “Andrew Karpathy” he’s fairly well known in AI circles, he was head of autopilot at Tesla, and co-founded OpenAI. If you’re curious to learn more about him, the Wikipedia page has a short summary: https://en.wikipedia.org/wiki/Andrej_Karpathy
I have never felt this much ahead as a programmer. So many developers I see, including at my workplace, are blindly prompting models hoping to solve their problem and failing every step of the way. The people who truly understand what is happening are still in the ruling class, and their skills are not going to be irrelevant anytime soon.
His "youtube course" already exists, and it's absolutely transformational.
He's working on a more formal educational framework/service of some kind, which will presumably not be free, but what he's already posted is some of the most effective CS pedagogy I've ever encountered (and personally benefited from.)
I've worked really hard over the last year at working out how to use these things, and it has more than paid off.
But I think if I had started learning today instead of a year ago, I'd get up to speed in more like 6 months instead of a year. A lot of stuff I learned a year ago is not really necessary anymore, but furthermore, there's just a lot more information out there about how to use these from people who have been learning it on their own.
I just don't think people who have ignored it up until now are really that far behind.
The thing that always trips me up is the lack of isolation/sandboxing that all of the AI programming tools provide.
I want to orchestrate a workforce of agents, but they can't be trusted not to run amok.
Does anyone have a better way to do this other than spinning up a cloud VM to run goose or claude or whatever poorly isolated agent tool?
I have seen Claude disable its sandbox. Here is the most recent example from a couple of weeks ago while debugging Rust:
"The panic is due to sandbox restrictions, not code errors.
Let me try again with the sandbox disabled:"
I have since added a sandbox around my ~/dev/ folder using sandbox-exec in macOS. It is a pain to configure properly but at least I know where sandbox is controlled.
That refers to the sandbox "escape hatch" [1], running a command without a sandbox is a separate approval so you get another prompt even if that command has been pre-approved. Their system prompt [2] is too vague about what kinds of failures the sandbox can cause, in my experience the agent always jumps straight to disabling the sandbox if a command fails. Probably best to disable the escape hatch and deal with failures manually.
Obviously people perceive value there, but on the surface it does seem odd.
"These things are more destructive than your average toddler, so you need to have a fence in place kind of like that one in Jurassic Park, except you need to make sure it absolutely positively cannot be shut off, but all this effort is worthwhile, because, kind of like civets, some of the artifacts they shit out while they are running amok appear to have some value."
It’s shocking the collective shrug I get from our security people at work. I attend pretty serious meetings about genAI implementations and when I ask about points of view around security given things as crazy as “adversarial poetry” is a real thing I just get shrugs. I get the feeling they don’t want to be the ones to say “no, don’t bring genai to our clients” but also won’t dare say “yes, our client’s data is safe with integrated genai”.
I’m convinced much of this is all noise - people seem to be focusing on the wrong unit of analysis. Producing software and lots of it has never been a problem - coming up with the right projects and producing a vertically differentiated product to what already exists is.
Being a nondeterministic tool, the output for a given input can vary. Rather than having a solid plan of, "if I provide this input, then that will happen", it's more like, "if I do something like this, I can expect something like that, probably, and if not, then try again until it works, I suppose".
What are the productivity gains? Obviously, it must vary. The quality of the tool output varies based on numerous criteria, including what programming language is being used and what problem is trying to be solved. The fact that person A gets a 10x productivity increase on their project does not mean that person B will also get a 10x productivity increase on their project, no matter how well they use the tool.
But again, tool usage itself is variable. Person A themselves might get a 10x boost one time, and 8x another time, and 4x another time, and 2x another time.
Non determinism does not imply non correctness. You can have the LLM do 10 different outputs, but maybe all 10 are valid solutions. Some might be more optimal in certain situations, and some might appeal to different people aesthetically.
Nondeterminism indeed does not imply non-correctness.
All ten outputs might be valid. All ten will almost certainly be different -- though even that is not guaranteed.
The OP referred to the notion of there being no manual; we have to figure out how to use the tool ourselves.
A traditional programming tool manual would explain that you can provide input X and expect output Y. Do this, and that will happen. It is not so clear-cut with AI tools, because they are -- by default, in popular configurations -- nondeterministic.
Why would one opt to use an LLM-based AI tool as a compiler? It seems that would be extraordinarily complex over traditional compilers, but for what benefit?
Non determinism of AI feels like a compiler which will on same input code spit out different executable on every run. Fixing bugs will become more like a ritual to satisfy whims of the machine spirit.
But how different? Compilers do, in fact, spit out different binaries with each run. There are timestamps and other subtle details embedded in them (esp compiler version and linking) that make the same source result in a different binary. "That's different"; "that's not the same thing!" I see you thinking. As long as the AI prompt "make me a login screen" results in a login screen appropriate for the rest of the code, and not "rm -rf ~/", does it matter if the indeterminism produces a login page with a Google login page before the email login button or after?
I for one am not using AI, will not touch that steaming pile of manure with a 10 yard stick, and I couldn't care less about the so called magnitude 9 earthquake. When this bubble finally bursts into nothingness, I'll be still here practicing my craft and providing real value for my clients.
I'm using it less and less now, since the sheen has worn off and I've been able to more accurately judge its capabilities. It's like an intern at everything it does and unfortunately I'm expected to produce better code than that.
I'm very confused, are you or are you not an LLM run account?
A couple weeks ago, under a freshly made account "llmslave", you said it's already replacing devs and the field is cooked, and anyone who doesn't see that lacks the skills to adopt AI [1]
I pointed out that given your name and low quality comments, you were likely an LLM run account. As SOON as I made that comment, you abandoned the account and have now made a duplicate llmslave2 account, with a different opinion
An agent like Claude code? Maybe a few weeks ago. I use ai autocomplete and
ask Claude to explain basic stuff outside my wheelhouse, generate throwaway bash scripts, etc. And I have Claude review code I'm unsure of / rubber ducky debugging, but that's about it.
> This is from the man who has no finished open source projects
To be fair, which open source project can really claim that it is "finished", and what does "finished" even mean?
The only projects that I can truly call "finished" are those that I have laid to rest because they have been superseded by newer technologies, not because they have achieved completeness, because there is always more to do.
Is there someone already mastering “agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering” ?
Why, the other rats in front of you in the race, of course!
As the pithy, if cheese expression goes, read not the times; read the eternities. People who spend so much time frantically chasing superficial ephemera like this are people without any sense of life's purpose. They're cogs in some hellish consumerist machine.
If you want to chase the mob off the cliff, go ahead. Insanity and stupidity aren't sound life strategies, though. They're a sign you have lost the plot.
The "bubble" is in the financial investment, not in the technology. AI won't disappear after the bubble bursts, just like the web didn't disappear after 2000. If anything, bursting the financial bubble will most likely encourage researchers to experiment more, trying a larger range of cheaper approaches, and do more fundamental engineering rather than just scaling.
AI is here to stay, and the only thing that can stop it at this stage is a Butlerian jihad.
I maintain, the web today is not what people though it would be in 1998. The tech has it's uses, it's just not what snake oil sellers are making it to be. And talking about Butlerian jihad is borderline snake oil selling.
Man, this is giving me a cognitive dissonance compared to my experiences.
Actually, even the post itself reads like a cognitive dissonance with a dash of the usual "if it's not working for you then you are using it wrong" defence.
I feel exactly like Karpathy here. I have some work to do, and I know exactly what I need to do, and I'm able to explain it to AI, and the AI seems to understand me (I'm lately using Opus 4.5). I wrote down a roadmap, it should take me a few weeks of coding. It feels like with a proper workflow with AI agents, this work should be doable in one or two days. Yet, I know by now that it's not going to be nearly that fast. I'll be lucky if I finish 30% faster than if I just code the entire damn thing myself. The thing is, I am a huge AI optimist, I'm not one of the AI skeptics, not even close. Karpathy is not an AI skeptic. We just both feel this sense of possibility, and the fact that we can't make AI help us more is frustrating. That's all. There's no telling anyone else "it's on you if you can't make it work for you". I think Karpathy figured out by now, and at least I did, that the number of AI skeptics by now far outnumbers the number of AI optimists, and it has become something akin to a political conviction. It's quite futile to try and change someone's mind about whether AI is good, bad, overhyped, underused, etc. People picked their side and that's that.
I think you articulated perfectly why it's a bubble and why execs are so eager to push it everywhere. It's so alluring, it constantly feels like we're on the verge of something great. No wonder so many people have their brains fried by it.
we're 10 months into agentic coding. Claude code came out in march. I dont understand how you are so unimaginative to think what this might look like in 5 years even with slow progress.
It might be genuinely useful in 5 years, my issue is how it's being marketed now. We're 6 months into "AI will be writing 90% of code in three months" among other ridiculous statements.
Agreed. It is very similar to gambling in how it tricks the human mind. I am sure some of this AI technology will prove yo be useful but the breakthrough has been just around the corner since soon after ChatGPT was released.
I sort of agree. If anything I feel like they've gotten a bit worse, but the advances in the tooling around them (eg claude code) has masked that slightly.
I think they are useful as an augmentation, but largely valueless for directly outputting code. Who knows if that will change. It's still made me more productive as a dev despite not oneshotting entire files. It's just not industry-changing, at least yet.
If I can reassure you, if your project is complex enough and involve heavy data manipulation, a 30% improvement using Opus/Gemini 3/codex 5.2 seems like a good result. I think on complex tasks, Opus 4.5 improves my output by around 20-25%.
And since it's way, way less wrong than sonnet4, it might also improve my whole team velocity.
I won't lie, AI coding has been a net negative for the 'lazy devs' on my team who don't delves into their own generated code (by 'lazy devs' here I mean the subset of devs who do the work but often don't bother to truly understand the logic behind what they used/did. They are very good coworkers, add velue and are not really lazy, but I don't see another term for that).
I think with better processes and training they could be. It is just that right now we do not train them and put them through scrum and other horrible processes. Median developers are bad due to bad management.
I think of it this way. If you dropped Einstein with a time machine two thousand year ago, people would think he is some crazy guy doing scribbles in the sand.
No one would ever know how smart he is. The same is with people and advanced AGI like Gemini 3 Pro or Chatgpt 5.2 Pro.
We are just dumber than them.
I also like to think that Einstein would be smart enough to explain things from a common point of understanding if you did drop him 2000 years in the past (assuming he also possesses the scientific knowledge humanity accrued in that 2000 year gap). So, your analogy doesn't really make a lot of sense here. I also doubt he'd be able to prove his theories with the technology of the past but that's a different matter.
If we did have AGI models, they would be able to solve our hardest problems (assuming a generous definition of AGI) even if we didn't immediately understand exactly how they got there. We already have a lot of complex systems that most people don't fully understand but can certainly verify the quality of. The whole "too smart for people to understand that they're too smart" is just a tired trope.
You think they have “advanced AGI” and are worried about keeping up with the software industry? There would be be nothing to keep up with at that point.
To use an analogy, it would be like spending all your time before a battle making sure your knife is sharp when your opponent has a tank.
I have been using Copilot, Cursor, then CC for a little more than a year now. I have written code with teams using these tools and I am writing mostly for myself now. My observations have been the following:
1) These tools obviously improved significantly over the past 12 months. They can churn out code that makes sense in the context of the codebase, meaning there is more grounding to the codebase they are working on as opposed to codebases they have been trained on.
2) On the surface they are pretty good at solving known problems. You are not going to make them write well-optimized renderer or an RL algorithm but they can write run-of-the-mill business logic better _and_ faster than I can-- if you optimize for both speed of production and quality.
3) Out of the box, their personality is to just solve the problem in front of them as quickly as possible and move on. This leads them to make suboptimal decisions (e.g. solving a deadlock by sleeping for 2 seconds, CC Opus 4.5 just last night). This personality can be altered with appropriate guidance. For example, a shortcut I use is to append "idiomatic" to my request-- "come up with an idiomatic solution" or "is that the most idiomatic solution we can think of." Similarly when writing tests or reviewing tests I use "intent of the function under test" which makes the model output better solution or code.
4) These models, esp. Opus 4.5 and GPT 5.2, are remarkable bug hunters. I can point at a symptom and they come away with the bug. I then ask them to explain me why the bug happens and I follow the code to see if it's true. I have not come across a bad bug, yet. They can find deadlocks and starvations, you then have to guide them to a good fix (see #3).
5) Code quality is not sufficient to create product quality, but it is often necessary to sustain it. Sustainability window is shorter nowadays. Therefore, more than ever, quality of the code matters. I can see Claude Code slowly degrading in quality every single day--and I use it every single day for many hours. As much as it pains me to say this, compared to Opencode, Amp, and Toad I can feel the "slop" in Claude Code. I would love to study the codebases of these tools overtime to measure their quality--I know it's possible for all but Claude Code.
6) I used to worry I don't have a good mental model of the software I build. Much like journaling, I think there is something to be said about the process of writing/making actually gives you a very precise mental model. However, I have been trying to let that go and use the model as a tool to query and develop the mental model post facto. It's not the same but I think it is going to be the new norm. We need tooling in this space.
7) Despite your own experiences with these tools it is imperative that they be in your toolbox. If you have abstained from them thus far, perhaps best way to get them incorporated is by starting to use them for attending to your toil.
8) You can still handcraft code. There is so much fun, beauty and pleasure it in to deny doing it. Don't expect this to be your job. This is your passion.
> Despite your own experiences with these tools it is imperative that they be in your toolbox.
Why is it imperative? Whenever I read comments like this I just think the author is cynically drumming up hype because of the looming AI bubble collapse.
I don't usually post something like this, but this is so fucking stupid. I'm prepared to stand by that. Let's see in a few years if I'm right.
"AI" is literally models trained to make you think it's intelligent. That's it. It's like the ultimate "algorithm" or addiction machine. It's trained to make you think it's amazing and magical and therefore you think it's amazing and magical.
This could apply if we looked at questions in vacuum - someone had a conversation and was judging the models based on that. But some of us just use it for work and get good results daily. "Intelligent" is irrelevant; it's "useful". It doesn't matter what feelings I have about it if it saves me 2h of typing from time to time.
To me, as just another kinda old (I’m 49) swe, the biggest benefit of using an LLM tool is it saves a shit ton of typing. I know what I want and I know when it’s right, just saving me from typing it all out is worth $20 bucks a month.
It’s trained to (lossy) compress large amounts of data. The system prompts have leaked and it’s just instructed to be helpful, right? I don’t entirely disagree with your sentiment, though. It’s brute force.
> There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering.
"I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue. There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering. Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession. Roll up your sleeves to not fall behind."
I have been telling everybody I know over the Christmas break that I have been coding from around 10-36 years of age, as a career and always in my spare time as a hobby. I have a lacklustre computer science knowledge and never worked at the scale of FANG etc but am still rather confident in my understanding of code and the tech scene in general. I've been telling people I haven't "coded" for almost 6 months now, I only interface with agentic setups and only open my IDE to make copy and config changes.
I understand we are all in different camps for a multitude of reasons;
- The jouissance of rote coding and abstraction
- The tree of knowledge specifically in programming, and which branches and nodes we each currently sit at in our understanding
- Technical paradigms that humans may have argued about have now shifted to obvious answers for agentic harnesses (think something like TDD, I for one barely used that as a style because I've mostly worked in startups building apps and found the cost of my labour not worth it, but agentic harnesse loops absolutely excel at it)
- The geography and size of the markets we work in
- The complexity of the subject matter / domain expertise
- The cost prohibitive nature of token based programming (not everyone can afford it, and the big fish seemingly have quite the advantage going fourth)
- Agentic coding has proven it can build UI's very easily, and depending on experience, it can build a very very many things easily. it excels in having feedback loops such as linting or simple javascript errors, which are observability problems in my opinion. Once it can do full stack observability (APM, system, network), it's ability to reason and correct problems on the fly for any complex system seems overly easy from my purvue.
- At the human nature level, some individuals prefer to think in 0's and 1's, some in words, some inbetween, and so on, what type of communication do agentic setups prefer?
With some of that above intuition that is easily up for debate, I've decided to lean 100% into agentic coding, I think it will be absolutely everywhere and obviously with humans in the loop but I don't think humans will need to review the pull requests. I am personally treating it as an existential threat to my career after having seen enough of what it's capable of. (with some imagination and a bit of a gambling spirit, as us mere mortals surely can't predict the future)
With my gambit, I'm not choosing to exit the tech scene and instead optimistically investing my mental prowess into figuring out where "humans in the loop" will be positioned. Currently I'm looking into CI level tooling, the known being code quality, and all the various forms of software testing paradigms. The emerging evals in my mind will keep evolving and beyond testing our ideas of model intelligence and chat bot responses will do a lot more.
---
A more practical rant: If you are building a recommendation engine for A and B, the engine could have X amount of modules that return a score which when all combined make up the final decision between A and B. Forgive me but let's just use dating as an example. A product manager would say we need a new module to calculate relevance between A and B based off their food preferences. An agentic harness can easily code that module and create the tests for it. The product manager could ask an LLM to make a list of 1000 reasons why two people might be suitable for dating. The agent could easily go away and code and test all those modules and probably maintain technical consistency but drift from the companies philosophical business model. I am looking into building "semantic linting" for codebases, how can the agent maintain the code so it aligns with the company's business model. And if for whatever reason those 1000 modules need to be refactored, how can the agent maintain the code so it aligns with the company's business model. Essentially trying to make a feedback loop between the companies needs and the code itself. To stop the agent and the business from drifting in either directions, and allowing for automatic feedback loops for the agent to fix them. In short, I think there will be new tools invented that us human's will be mastering as to Karpathy's point.
What exhausts me isn’t “falling behind.” It’s watching the profession collectively decide that the solution to uncertainty is to pile abstraction on top of abstraction until no one can explain what’s actually happening anymore.
This agentic arms race by C-suite know-nothings feels less like leverage and more like denial. We took a stochastic text generator, noticed it lies confidently, wipes entire databases and harddrives, and responded by wrapping it in managers, sub-agents, memories, tools, permissions, workflows, and orchestration layers so we don’t have to look directly at the fact that it still doesn’t understand anything.
Now we’re expected to maintain a mental model not just of our system, but of a swarm of half-reliable interns talking to each other in a language that isn’t executable, reproducible, or stable.
Work now feels duller than dishwater, enough to have forced me to career pivot for 2026.
I think AI-assisted programming may be having the opposite effect, at least for me.
I'm now incentivized to use less abstractions.
Why do we code with React? It's because synchronizing state between a UI and a data model is difficult and it's easy to make mistakes, so it's worth paying the React complexity/page-weight tax in order for a "better developer experience" that allows us to build working, reliable software with less typing of code into a text editor.
If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
How often have you dropped in a big complex library like Moment.js just because you needed to convert a time from one format to another, and it would take too long to hand-write that one feature (and add tests for it to make sure it's robust)? With an LLM that's a single prompt and a couple of minutes of wait.
Using LLMs to build black box abstraction layers is a choice. We can choose to have them build LESS abstraction layers for us instead.
> If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
I've had plenty of junior devs justify massive code bases of random scripts and 100+ line functions with the same logic. There's a reason senior devs almost always push back on this when it's encountered.
Everything hinges on that "if". But you're baking a tautology into your reasoning: "if LLMs can do everything we need them to, we can use LLMs for everything we need".
The reason we stop junior devs from going down this path is because experience teaches us that things will break and when they do, it will incur a world of pain.
So "LLM as abstraction" might be a possible future, but it assumes LLMs are significantly more capable than a junior dev at managing a growing mess of complex code.
This is clearly not the case with simplistic LLM usage today. "Ah! But you need agents and memory and context management, etc!" But all of these are abstractions. This is what I believe the parent comment is really pointing out.
If AI could do what we originally hoped it could: follow simple instructions to solve complex tasks. We'd be great, and I would agree with your argument. But we are very clearly not in that world. Especially since Karpathy can't even keep up with the sophisticated machinery necessary to properly orchestrate these tools. All of the people decrying "you're not doing it right!" are emphatically proving that LLMs cannot perform these tasks at the level we need them to.
I'm not arguing for using LLMs as an abstraction.
I'm saying that a key component of the dependency calculation has changed.
It used to be that one of the most influential facts affecting your decision to add a new library was the cost of writing the subset of code that you needed yourself. If writing that code and the accompanying tests represented more than an hour of work, a library was usually a better investment.
If the code and tests take a few minutes those calculations can look very different.
Making these decisions, effectively responsibility as one of the key characteristics of a senior engineer, which is why it's so interesting that all of those years of intuition are being disrupted.
The code we are producing remains the same. The difference is that a senior developer may have written that function + tests in several hours, at a cost of thousands of dollars. Now that same senior developer can produce exactly the same code at a time cost of less than $100.
> The reason we stop junior devs from going down this path is because experience teaches us that things will break and when they do, it will incur a world of pain.
Hyperbole. It's also very often a "world of pain" with a lot of senior code.
> All of the people decrying "you're not doing it right!" are emphatically proving that LLMs cannot perform these tasks at the level we need them to.
the people are telling you “you are not doing it right!” - that’s it, there is nothing to interpret addition to this basic sentence
> "LLM as abstraction" might be a possible future, but it assumes LLMs are significantly more capable than a junior dev at managing a growing mess of complex code.
Ignoring for a second they actually already are indeed, it doesn’t matter because the cost of rewriting the mess drops by an order of magnitude with each frontier model release. You won’t need good code because you’ll be throwing everything away all the time.
I've yet to understand this argument. If you replace a brown turd with a yellowish turd, it'll still be a turd.
> things will break and when they do, it will incur a world of pain
How much if this is still true and exaggerated in our world environment today where the cost of making things is near 0?
I think “Evolution” would say that the cost of producing is near 0 so the possibility of creating what we want is high. The cost of trying again is low so mistakes and pain aren’t super high. For really high stakes situation (which most situations are not) bring the expert human in the loop until the expert better than that human is AI.
I'm sorry, but I don't agree.
Current dependency hell that is modern development, just how wide the openings are for supply chain attacks and seemingly every other week we get a new RCE.
I'd rather 100 loosely coupled scripts peer reviewed by a half a dozen of LLM agents.
Why would I want to maintain in perpetuity random snippets when a library exists? How is that an improvement?
Right there with you.
I'm instructing my agents to doing old school boring form POST, SSR templates, and vanilla JS / CSS.
I previously shifted away from this to abstractions because typing all the boilerplate was tedious.
But now that I'm not typing, the tedious but simple approach is great for the agent writing the code, and great for the the people doing code reviews.
> I'm now incentivized to use less abstractions.
I'm incentivised to use abstractions that are harder to learn, but execute faster or more safely once compiled. E.g. more Rust, Lean.
> If an LLM is typing that code - and it can maintain a test suite that shows everything works correctly - maybe we don't need that abstraction after all.
LLMs benefit from abstractions the same way as we do.
LLMs currently copy our approaches to solving problems and copy all the problems those approaches bring.
Letting LLMs skip all the abstractions is about as likely to succeed as genetic programming is efficient.
For example, writing more vanilla JS instead of React, you're just reinventing the necessary abstractions more verbosely and with a higher risk of duplicate code or mismatching abstractions.
In a recent interview with Bret Weinstein, a former professor of evolutionary biology, he proposed that one property of evolution that makes the story of one species evolving into another more likely is that it's not just random permutations of single genes; it's also permutations to counter variables encoded as telomeres and possibly microsatellites.
https://podcasts.happyscribe.com/the-joe-rogan-experience/24...
Bret compares this to flipping random bits in a program to make it work better vs. tweaking variables randomly in a high-level language. Mutating parameters at a high-level for something that already works is more likely to result in something else that works than mutating parameters at a low level.
So I believe LLMs benefit from high abstractions, like us.
We just need good ones; and good ones for us might not be the same as good ones for LLMs.
> For example, writing more vanilla JS instead of React, you're just reinventing the necessary abstractions more verbosely and with a higher risk of duplicate code or mismatching abstractions.
Right, but I'm also getting pages that load faster and don't require a build step, making them more convenient to hack on. I'm enjoying that trade-off a lot.
For moment you an use `date-fns` and tree shake.
I'd rather have LLMs build on top of proven, battle-tested production libraries than keep writing their own from scratch. You're going to fill up context with all of its re-invented wheels when it already knows how to use common options.
Not to mention that testing things like this is hard. And why waste time (and context and complexity) for humans and LLMs trying to do something hard like state syncing when you can focus on something else?
Every dependency carries a cost. You are effectively outsourcing part of the future maintenance of your project to an external team.
This can often be a very solid bet, but it can also occasionally backfire if the library you chose falls out of date and is no longer maintained.
For this reason I lean towards fewer dependencies, and have a high bar for when a dependency is worth adding to a project.
I prefer a dozen well vetted dependencies to hundreds of smaller ones that each solve a problem that I could have solved effectively without them.
For smol things like left-pad, sure but the two examples given (moment and react) solve really hard problems. If I were reviewing a PR where someone tried to re-implement time zone handling in JS, that’s not making it through review.
In JS, the DOM and time zones are some of the most messed up foundations you’re building on top of ime. (The DOM is amazing for documents but not designed for web apps.)
I think we really need to be careful about adding dependencies that we’re maintaining ourselves, especially when you factor in employee churn and existing options. Unless it’s the differentiator for the business you’re building, my advice to engineers is to strongly consider other options and have a case for why they don’t fit.
AI can play into the engineering blind spot of building it ourselves because it’s fun. But engineering as a discipline requires restraint.
> Why do we code with React?
...is a loaded question, with a complex and nuanced answer. Especially when you continue:
> it's worth paying the React complexity/page-weight tax
All right; then why do we code in React when a smaller alternative, such as Preact, exists, which solves the same problem, but for a much lower page-weight tax?
Why do we code in React when a mechanism to synchronize data with tiny UI fragments through signals exists, as exemplified by Solid?
Why do people use React to code things where data doesn't even change, or changes so little that to sync it with the UI does not present any challenge whatsoever, such as blogs or landing pages?
I don't think the question 'why do we code with React?' has a simple and satisfactory answer anymore. I am sure marketing and educational practices play a large role in it.
Yeah, I share all of those questions.
My cynical answer is that most web developers who learned their craftsin the last decade learned frontend React-first, and a lot of them genuinely don't have experience working without it.
Which means hiring for a React team is easier. Which means learning React makes you more employable.
I'd rather use React than a bespoke solution created by an ephemeral agent, and I'd rather self-trepanate than use React
Our industry wants disruption, speed, delivery! Automatic code generation does that wonderfully.
If we wanted safety, stability, performance, and polish, the impact of LLMs would be more limited. They have a tendency to pile up code on top of code.
I think the new tech is just accelerating an already existing problem. Most tech products are already rotting, take a look at windows or iOS.
I wonder what will it take for a significant turning point in this mentality.
disruption is a code word for deregulation, and deregulation is bad for everyone except execs and investors
It’s wild that programmers are willing to accept less determinism.
It's not something that suddenly changed. "I'll generate some code" is as nondeterministic as "I'll look for a library that does it", "I'll assign John to code this feature", or "I'll outsource this code to a consulting company". Even if you write yourself, you're pretty nondeterministic in your results - you're not going to write exactly the same code to solve a problem, even if you explicitly try.
Contrary to code generation, all the other examples have one common point which is the main advantage, which is the alignment between your objective and their actions. With a good enough incentive, they may as well be deterministic.
When you order home delivery, you don’t care about by who and how. Only the end result matters. And we’ve ensured that reliability is good enough that failures are accidents, not common occurrence.
Code generation is not reliable enough to have the same quasi deterministic label.
It's wild that management would be willing to accept it.
I think that for some people it is harder to reason about determinism because it is similar to correctness, and correctness can, in many scenarios be something you trade off - for example in relation to scaling and speed you will often trade off correctness.
If you do not think clearly about the difference with determinism and other similar properties like (real-time) correctness which you might be willing to trade off, you might think that trading off determinism is just more of the same.
Note: I'm against trading off determinism, but I am willing to think there might be a reason to trade it off, just I worry that people are not actually thinking through what it is they're trading when they do it.
Management is used to nondeterminism, because that’s what their employees always have been.
Determinism require formality (enactment of rules) and some kind of omniscience about the system. Both are hard to acquire. I’ve seen people trying hard not to read any kind of manual and failing to reason logically even when given hints about the solution to a problem.
I think those that are most successful at creating maintainable code with AI are those that spend more time upfront limiting the nondeterminism aspect using design and context.
Mortgages don't pay for themselves.
This gets repeated all the time, but it’s total nonsense. The output of an LLM is fixed just as the output of a human is.
> It’s wild that programmers are willing to accept less determinism.
It's wild that you think programmers is some kind of caste that makes any decisions.
You can have the best of both worlds if you use structured/constrained generation.
There has always been a laissez-faire subset of programmers who thrive on living in the debugger, getting occasional dopamine hits every time they remove any footgun they previously placed.
I cannot count the times that I've had essentially this conversation:
"If x happens, then y, and z, it will crash here."
"What are the odds of that happening?"
"If you can even ask that question, the probability that it will occur at a customer site somewhere sometime approaches one."
It's completely crazy. I've had variants on the conversation from hardware designers, too. One time, I was asked to torture a UART, since we had shipped a broken one. (I normally build stuff, but I am your go-to whitebox tester, because I hone in on things that look suspicious rather than shying away from them.) When I was asked the inevitable "Could that really happen in a customer system?" after creating a synthetic scenario where the UART and DMA together failed, my response was:
"I don't know. You have two choices. Either fix it where the test passes, or prove that no customer could ever inadvertently recreate the test conditions."
He fixed it, but not without a lot of grumbling.
My dad worked in the auto industry and they came across a defect in an engine control computer where they were able to give it something like 10 million to one odds of triggering.
They then turned the thing on, it ran for several seconds, encountered the error, and crashed.
Oh, that's right, the CPU can do millions of things a second.
Something I keep in the back of my mind when thinking about the odds in programming. You need to do extra leg work to make sure that you're measuring things in a way that's practical.
I mean we've had to cope with users for ages, this is not that different.
The good ones don't accept. Sadly there's just many more idiots out there trying to make a quick buck
Delving a bit deeper... I've been wondering if the problem's related to the rise in H1B workers and contractors. These programmers have an extra incentive to avoid pushing back on c-suite/skip level decisions - staying out of in-office politics reduces the risk of deportation. I think companies with a higher % of engineers working with that incentive have a higher risk of losing market share in the long-term.
My work is better than it has been for decades. Now I can finally think and experiment instead of wasting my time on coding nitty-gritty detail, impossible to abstract. Last autumn was the game changer, basically Codex and later Opus 4.5; the latter is good with any decent scaffolding.
I have to admit, LLMs do save a lot of typing a d associated syntax errors. If you know what you want and can spot and fix mistakes made by the LLM then they can be pretty useful. I don’t think it’s wise to use them for development if you are not knowledgeable enough in the domain and language to recognize errors or dead ends in the generated code though.
Could we all just agree to stop using the term "abstraction". It's meaningless and confusing. It's cover for a multitude of sins, because it really could mean anything at all. Don't lay all the blame on the c-suite; they are what they are, and have their own view. Don't moan about the latest egregious excess of some llm. If it works for you, use it; if it doesn't, don't. But, stop whinging.
What are you pivoting to?
I'm also interested in hearing this.
For me, I'm planning to ride out this industry for another couple years building cash until I can't stand it, then pivot to driving a city bus.
> then pivot to driving a city bus.
You seem to be counting on Waymo not obsoleting that occupation. ;)
Gardening and plumbing. Driving buses will be solved.
What are you pivoting to?
Don't forget you are expected to deliver x10 for the same pay, "because you have the AI now".
The system is designed to do exactly that. This is called ‘productivity increase’ and is deflationary in large dosages. Deflation sounds good until you understand where it’s coming from.
So you're washing dishes now?
Every technical person has been complaining about this for the entire history of computer programming
Unless you’re writing literal memory instructions then you’re operating on between 4 and 10 levels of abstraction already as an engineer
It has never been tractable for humans to program a series of switches without incredible number of abstractions
The vast majority of programmers never understood how computers work to begin with
This is true, though the people that actually push the field forward do know enough about every level of abstraction to get the job done. Making something (very important) horrible just to rush to market can be a pretty big progress blocker.
Jensen is someone I trust to understand the business side and some of those lower technical layers, so I'm not too concerned.
> It’s watching the profession collectively decide that the solution to uncertainty is to pile abstraction on top of abstraction until no one can explain what’s actually happening anymore.
The ubiquitous adoption of LLMs for generating code is mostly a sign of bad abstraction or the absence of abstraction, not the excess of abstraction.
And choosing/making the right abstraction is kind of the name of the game, right? So it's not abstraction per se that's a problem.
Wow - can we coin "Slopbrain" for people who are so far gone into AI eventualism that they can no longer function? Liked "cooked" but "slopped" or something. Good grief lol. Talk about getting lost in the sauce...
WSJ has been writing increasingly about "AI Psychosis" (here's their most recent piece [0]).
I'm increasingly seeing that this is the real threat of AI. I've personally known people who have started to strain relationships with friends and family because they sincerely believe they are evolving into something new. While not as dramatic, the normalization of the use of "AI as therapist" is equally concerning. I know tons of people that rely on LLMs to guide them in difficult family decisions, career decisions, etc on an almost daily basis. If I'm honest, I myself have had times where I've leaned into this too much. I've also had times where AI starts telling me how clever I am, but thankfully a lifetime of low self worth signals warning flags in my brain when I hear this stuff! For most people, there is real temptation to buy into the praise.
Seeing Karpathy claim he can't keep up was shocking. It also immediately raises the question to anyone with a clear head: "Wait, if even Karpathy cannot use these tools effectively... just what is so useful about AI?" Isn't the entire point of AI that I can merely describe my problem and have a solution in a fraction of the time.
The fact that so many true believers in AI seem to forever be just a few more tricks away from really unleashing this power, starts to make it feel very much like magical thinking on a huge scale.
The real danger of AI is that we're entering into an era of mass hallucination across multiple fields and areas of human activity.
0. https://www.wsj.com/tech/ai/ai-chatbot-psychosis-link-1abf9d...
> I've personally known people who have started to strain relationships with friends and family because they sincerely believe they are evolving into something new.
Cryptoboys did it first, please recognize their innovation ty
Cyberpunk was right!
I feel Karpathy is smart enough to deserve a less dismissive response than this.
A mix of "too clever by half" and "never meet your heroes".
You think we should appeal to authority rather than address the ideas on their own merits?
How is saying the author has “slopbrain” is “addressing the idea on its own merits”? It’s just name calling.
They aren't addressing my comment (which is obviously an overreaction to the tweet), he's asking you why we should appeal to authority rather than evaluate whether Karpathy is completely overreacting and in way too deep.
Why do you feel that way?
(Not the grandparent poster, but answering for myself)
1. Name calling is cheap and does not inform or persuade.
2. Karpathy has a history of informing me of interesting things, is widely known for his talent for making complex ideas grokkable, and has contributed to the field of AI for more than a decade, very notably as a first class communicator of ideas to the public.
Note that (2) is appealing to Karpathy's credibility due to his previous work, and NOT to authority. I have a positive prior on his ideas due to my exposure to his previous ideas, and particularly his ability to explain things, and not due to who he is, or what job he has.
Twitter folks call this LLM or AI Psychosis.
> There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and ...
This sounds unbearable. It doesn't sound like software development, it sounds like spending a thousand hours tinkering with your vim config. It reminds me of the insane patchwork of sprawl you often get in DevOps - but now brought to your local machine.
I honestly don't see the upside, or how it's supposed to make any programmer worth their weight in salt 10x better.
As far as I can tell as a heavy coding agent user: you don’t need to know any of this and that’s a testament to how good code agent TUIs have become. All I do to be productive with a coding agent is tell it to break a problem down into tasks, store it inside beads, and then make sure each step is approved by me. I also add in a TDD requirement where it needs to build tests that fail then eventually pass.
Everything else I’ve used has been over engineered and far less impactful. What I just said above is already what many of us do anyway.
This sounds like my complete and utter nightmare. No art or finesse in building the thing - only an exercise in torturing language to someone who at a fundamental level doesn't understand a thing.
Nothing stopping you from hand sculpting software like we did in the before times.
Mass production however won’t stop, it’s barely started literally a couple months ago and it’s the slowest and worst it’ll ever be.
I'm not viewing AI tooling as an extinction of the art of programming, only illuminating how telling an AI how to create programs isn't in the same universe as programming, where the technical skill to do such a thing is on par with punching in how long my microwave should nuke my popcorn.
Not really at all like this, more like being a tech lead for a team of savants who simultaneously are great at parts of software engineering, and limited at others. Though that latter category is slimmer than a year ago…
The point is, you can get lots of quality work out of this team if you learn to manage them well.
If that sounds like a “complete and utter nightmare”, then don’t use AI. Hopefully you can keep up without it in the long run.
> This sounds unbearable.
I can't see the original post because my browser settings break Twitter (I also haven't liked much of Karpathy's output), but I agree. I call this style of software development 'meeting-based programming,' because that seems to be the mental model that the designers of the tools are pursuing. This probably explains, in part, why c-suite/MBA airheads are so excited about the tools: meetings are how they think and work.
I suppose LLMs/chatbots and 'agents' are just the latest phase of a trend that the internet has been encouraging for decades: the elimination of mental privacy. I don't mean 'privacy' in an everyday sense -- i.e. things I keep to myself and don't share. I mean 'privacy' in a more basic sense: private experience -- sitting by oneself; having a mental space that doesn't include anybody else or invite the outside world; simply spending time with one's own thoughts.
The internet encourages us to direct our thoughts and questions outward: look things up; find out what others have said; go to wikipedia; etc. This is, I think, horribly corrosive to the very essence of being a thinking, sentient being. It's also unsurprising, I guess. Humans are social animals. We're going to find ourselves easily seduced by anything that lets us replace private experience with social experience. I suppose it was only a matter of time until someone did this with programming tools, too.
> ... or how it's supposed to make any programmer worth their weight in salt 10x better.
It doesn't. The only people I've seen claim such speedups are either not generally fluent in programming or stand to benefit financially from reinforcing this meme.
For every conspicuous vibecoding influencer there are a bunch of experienced software engineers using them to get things done. The newest generation of models are actually pretty decent at following instructions and using existing code as a template. Building line-of-business apps is much quicker with Claude Code because once you've nicely scaffolded everything you can just tell it to build stuff and it'll do so the same way you would have in a fraction of the time. You can also use it to research alternatives to architectural approaches and tooling that you come up with so that you don't paint yourself into a corner by having not heard about some semi-niche tool that fits your use case perfectly.
Of course I wouldn't use an LLM to #yolo some Next.js monstrosity with a flavor-of-the-week ORM and random Tailwind. I have, however, had it build numerous parts of my apps after telling it all about the mise targets and tests and architecture of the code that I came up with up front. In a way it vindicates my approach to software engineering because it's able to use the tools available to it to (reasonably) ensure correctness before it says it's done.
I am a professional engineer with around 10 years of experience and I use AI to work about 5x faster on a site I personally maintain (~100 DAU, so not huge, but also not nothing). I don’t work in AI so I get no financial benefit by “reinforcing this meme”.
Oh, well if it can generate some simple code for your personal website, surely it can also be the "next level of abstraction" for the entirety of software engineering.
Well, I don’t really think it’s “simple”. The code uses React, nodejs, realtime events pushed via SSE, infra pushed via Terraform, postgres, blob store on S3, emails send with SES… sure, it’s not the next Google, but it’s a bit above, like, a personal blog.
And in any case, you are moving goalposts. OP said he had never seen anyone serious claim that they got productivity gains from AI. When I claim that, you say “well it’s not the next level of abstraction for all SWE”. Obviously - I never claimed that?
Our ops guy has thrown together several buggy dashboards using AI tools. They're passable but impossible to maintain.
I admit to pangs of this, but it's really never made any sense because the implication is that the profession is now magically closed off to newcomers.
Imagine someone in the 90s saying "if you don't master the web NOW you will be forever behind!" and yet 20 years later kids who weren't even born then are building web apps and frameworks.
Waiting for it to all shake out and "mastering" it then is still a strategy. The only thing you'll sacrifice is an AI funding lottery ticket.
Finally a voice of reason. The tools will just get better and easier to use. I use LLMs now, but I'm not going to dump a bunch of time learning the new hotness. I'll let other people do that and pickup the useful pieces later.
Unless your gunning for a top position as a vibe coder, this whole concept of "falling behind" is just pure FOMO.
Eh, for myself as a middle-aged software engineer, it feels a little like the last chopper out of Saigon. I feel less and less confident that I can make as good a living in software for the next decade as I have for the last couple. Or if I want to. The job is changing so fast right now, and I’m not sure I like it. When I worked in big tech, I preferred being an IC over an EM or tech lead because I like writing code. Now it feels increasingly like you can’t be an IC in that way anymore. You’re now coding through others, either humans or AI.
Sure, I can write code manually, but in my case I’m working full time on my own SaaS and I am absolutely faster and more effective with AI. It’s not even close. And the gains are so extreme that I can’t justify writing beautiful hand-crafted artisanal code anymore. It turns out that code that’s “good enough” will do, and that’s all I can afford right now.
But long-term, I don’t know that I want to do that work, especially for some corporation. It feels like the difference between being a master furniture craftsman, and then going work in an IKEA factory.
I am a software developer and mainly a programmer for decades now. I love programming. I love to be "once" with the computer. I will never give this joy up. If I need to sell shoes at daytime, I will program real computer programs in the evenings. If it won't be possible with modern machinery anymore, I will take my Commodore 64. I am a free man.
Edit: Corrected since/for. :-)
you mean "one" not "once" right?
(for decades)
('since' takes time_point - 'for' takes time_duration)
My company takes between Christmas and New Years off. I took a week before that off too. I have not used AI in that time. The slower pace of life is amazing. But when I get back to coding it will be back to running at 180%. It’s the new norm. However I’ve decided to take longer “no computer” breaks in my day. I have to adapt but I need to defend my “take it slow” times and find some analogue hobbies. The shift is real and you can’t wind it back.
I’ve been taking my son for stroller walks more often over Christmas. I bring a headset for listening to music, podcasts, audiobooks, tech talks. “Be effective.” But I end up just walking and thinking, realising this is “free time”.
It sounds ridiculous and easy to say spending time walking and thinking will improve your decisions and priorities that no productivity hack will.
I only actually did slow down for a while because I had to for the well-being of my family. Sure feels important to not always be on top of everyone else’s business.
As an Opus user, I genuinely don’t understand how someone can work for weeks or months without regularly opening an IDE. The output almost always fails.
I repeatedly rewrite prompts, restate the same constraints, and write detailed acceptance criteria, yet still end up with broken or non-functional code.its very frustrating to say the least Yesterday alone I spent about $200 on generations that now require significant manual rewrites just to make them work.
At that point, the gains are questionable. My biggest success is having the model take over the first Design in my app and I take it from there, but those hundred lines if not thousand lines of code it generates are so Messi, it's insanely painful to refactor the mess afterwards
Sometimes I have a similar file or related files. I copy their names and say use them as reference. Code quality improves by 10 times if you do so. Even providing a a example from framework's getting started works great too for new project.
Yeah the pain of cleaning up small mess is great too. I had some tests failing and type failing issues, I thought I will fix it later by only using AI prompt. As the size was growing, failing Typescript issues was growing too. At some point it was 5000+ type issues and countless number of failing unit tests. Then more and more. I tried to fix with AI, since it was not possible fixing old way. Then I discarded the whole project when it was around 500k lines of code.
My trick is to explicitly roll play that we’re doing a spike. This gets all of the models to ignore all of the details they normally get hung up on. Once I have the basics in place, I can tell it to fix details.
It’s _always_ easier to add more code than it is to fix broken code.
What does your software creation workflow look like? Do you have a design phase?
I hardly ever open an IDE anymore.
I use Claude Code and Cursor. What I do:
- use statically typed languages: TypeScript, Go, Rust, Python w/ types
- Setup linters. For TS I have a bunch of custom lint rules (authored by AI) for common feedback that I've given. (https://github.com/shepherdjerred/monorepo/tree/main/package...)
- For Cursor, lots of feedback on my desired style. https://github.com/shepherdjerred/scout-for-lol/tree/main/.c...
- Heavy usage of plan mode. Tell AI something like "make at least 20 searches to online documentation", support every claim with a reference, etc. Tell AI "make a task for every little thing you'll implement"
- Have the AI write tests, particularly the more expensive ones like integration and end-to-end, so you have an easy way to verify functionality.
- Setup Claude Code GHA to automatically review PRs. Give the review feedback to the agent that implemented it, either via copy-pasting or tell the agent "fetch review comments and fix them".
Some examples of what I've made:
- Many features for https://scout-for-lol.com/, a League of Legends bot for Discord
- A program to generate TypeScript types for Helm charts (https://github.com/shepherdjerred/homelab/tree/main/src/helm...)
- A program to summarize all of the dependency updates for my Homelab (https://github.com/shepherdjerred/homelab/tree/main/src/deps...)
- A program to manage multiple instances of CLI agents like Claude Code (https://github.com/shepherdjerred/monorepo/tree/main/package...)
- A Discord AI bot in the style of my friends (https://github.com/shepherdjerred/monorepo/tree/main/package...)
> make at least 20 searches to online documentation
Lol sometimes I have to spend two turns convincing Claude to use its goddamn search and look up the damn doc instead of trying to shoot from the hip for the fifth time. ChatGPT at least has forced search mode.
I've found that telling it to specifically do N searches works consistently. I do really wish Claude Code had a "deep research" mode similar to 'normal' Claude.
This is what an AGENTS.md - https://agents.md/ (or CLAUDE.md) file is for. Put common constraints to correct model mistakes/issues with respect to the codebase, e.g. in a “code style” section.
Why would you spend $200 a day on Opus if you can pay that for a month via the highest tier Claude Max subscription? Are you using the API in some special way?
At a guess an Enterprise API account. Pay per token but no limits.
It’s very easy to spend $100s per dev per day.
The $200/month plan doesn't have limits either - they have an overage fee you can pay now in Claude Code so once you've expended your rate limited token allowance you can keep on working and pay for the extra tokens out of an additional cash reserve you've set up.
> The $200/month plan doesn't have limits either... once you've expended your rate limited token allowance... pay for the extra tokens out of an additional cash reserve you've set up
You're absolutely right! Limited token allowance for $200/month is actually unlimited tokens when paying for extra from a cash reserve which is also unlimited, of course.
I think you may have misunderstood something here.
When paying for Claude Max even at $200/month there are limits - you have a limit to the number of tokens you can use per five hour period, and if you run out of that you may have to wait an hour for the reset.
You COULD instead use an API key and avoid that limit and reset, but that would end up costing you significantly more since the $200/month plan represents such a big discount on API costs.
As-of a few weeks ago there's a third option: pay for the $200/month plan but allow it to charge you extra for tokens when you reach those limits. That gives you the discount but means your work isn't interrupted.
Extra Usage for Paid Claude Plans: https://support.claude.com/en/articles/12429409-extra-usage-...
Thank you for the explanation, but I did fully understand that is what you were saying.
What I don't fully understand is how you can characterize that as "not limited" with a straight face; then again, I can't see your face so maybe you weren't straight faced as you wrote it in the first place.
Hopefully you could see my well meaning smile with the "absolutely right" opening, but apparently that's no longer common so I can understand your confusion as https://absolutelyright.lol/ indicates Opus 4.5 has had it RLHF'd away.
I’ve had decent results from it. What programming language are you using?
https://xcancel.com/karpathy/status/2004607146781278521
> OpenAI's sales and marketing expenses increased to _$2 billion_ in the first half of 2025.
Looks like AI companies spend enough on marketing budgets to create the illusion that AI makes development better.
Let's wait one more year, and perhaps everyone who didn't fall victim to these "slimming pills” for developers' brains will be glad about the choice they made.
Well. I was a sceptic for a long time, but a friend recently convinced me to try Claude Code and showed me around. I revived an open source project I regularly get back to, code for a bit, have to wrestle with toil and dependency updates, and loose the joy before I really get a lot done, so I stop again.
With Claude, all it took to fix all of that drudge was a single sentence. In the last two weeks, I implemented several big features, fixed long standing issues and did migrations to new major versions of library dependencies that I wouldn’t have tackled at all on my own—I do this for fun after all, and updating Zod isn’t fun. Claude just does it for me, while I focus on high-level feature descriptions.
I’m still validating and tweaking my workflow, but if I can keep up that pace and transfer it to other projects, I just got several times more effective.
This sounds to me like a lack of resource management, as tasks that junior developers might perform don't match your skills, and are thus boring.
As a creator of an open-source platform myself, I find trusting a semi-random word generator in front of users unreliable.
Moreover, I believe it creates a bad habit. I've seen developers forget how to read documentation and instead trust AI, and of course, as a result AI makes mistakes that are hard to debug or provokes security issues that are easy to overlook.
I know this sounds like a luddite talking, but I'm still not convinced that AI in its current state can be reliable in any way. However, because of engineers like you, AI is learning to make better choices, and that might change in the future.
That’s a totally fair take IMHO, and I’m very much conflicted on several ends on this topic—for example, would I want my juniors to use an agent? No; not even the mid levels, probably. As you say, it’s easy to form bad habits, and you need a good intuition for architecture and complexity, otherwise you end up with broken, unmaintainable messes. but if you have that, it’s like magic.
Let's wait one more year, and perhaps everyone who didn't fall victim to these "slimming pills” for developers' brains will be glad about the choice they made.
In that year, AI will get better. Will you?
AI is only getting better at consuming energy and wasting people's time communicating with this T9. However, if talented engineers continue to use it, it might eventually provide more accurate replies as a result.
Answering your question, no matter how much I personally degrade or improve, I will not be able to produce anything even remotely comparable in terms of negative impact that AI brings to humanity these days.
> strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering
Sounds fever dreamish. Thank you sincerely (not) for creating it!
Honestly surprised at this take by him. For one, feels like exaggeration. For two, are these tools really that hard to use?
Does any of you bother the fact that now you have to pay money in order to do your job? I mean AI model subscriptions. Somehow it feels wrong for me to pay for tools that are trying to replace me.
Your employer is not paying for these things?
IDEs used to be extremely expensive back in the 1990s. IDEs such as Microsoft Visual Studio and IBM's Visual age for Java were quite expensive subscription as I recall. subsequently, open source IDEs like Eclipse and VisualStudio seem to have become the norm.
Visual Studio has never been open source, though some of the underlying build tools and compilers are.
Visual Studio Code is a different thing... and claims to be open source, but by intent and approach really is closer to source available.
Between subscription software and subscription AI and the rising prices of computer hardware, the idea of a "personal computer" is quickly dying.
Not for me.
paying to train* and fund the research for the tools to replace us
For the longest time, the joy of creation in programming came from solving hard problems. The pursuit of a challenge meant something. Now, that pursuit seems to be short-circuited by an animated being racing ahead under a different set of incentives. I see a tsunami at the beach, and I’m not sure whether I can run fast enough.
I see it more like a playing a text adventure game. You give it commands and sometimes it works, and sometimes the results are unexpected.
Not to mention many companies speedrunning systems of strange and/or perverse incentives with AI adoption.
That being said, Welch’s grape juice hasn’t put Napa valley out of business. Human taste is still the subjective filter that LLMs can only imitate, not replace.
I view LLM assisted coding (on the sliding scale from vibe coding to fancy auto complete) similar to how Ableton and other DAW software have empowered good musicians that might not have made it otherwise due to lack of connections or money, but the music industry hasn’t collapsed completely.
In the music world, I would say that, rather than DAWs, LLM-assisted coding is more like LLM-assisted music creation.
Yep DAW’s aren’t the comparison. People are not thinking deeply about what is going on - there is a big war on-going in order to eradicate taste and make it systematic to immensely benefit the few.
> I can run fast enough.
Can you do some code reviews while you're running?
(Inception scene) here a minute is seven hours
And there it is again, the "powerful alien tool" that was just "handed to us".
No decades of research and massive allocation of resources over the last few years as well as very intentional decision making by tech leadership to develop this specific technology.
Nope, it just mysteriously dropped from the sky one day.
The point is that all that research mostly doesn’t help in mastering the tool. Unlike traditional tools, it doesn’t come with an instruction manual. It’s like an alien tool just handed to us in exactly that sense.
Do you know who is the author ?
It’s written in the title of the post “Andrew Karpathy” he’s fairly well known in AI circles, he was head of autopilot at Tesla, and co-founded OpenAI. If you’re curious to learn more about him, the Wikipedia page has a short summary: https://en.wikipedia.org/wiki/Andrej_Karpathy
It is even worse coming from him.
Yes, and I'm disappointed he seems to have joined the AI mysticism crowd.
I have never felt this much ahead as a programmer. So many developers I see, including at my workplace, are blindly prompting models hoping to solve their problem and failing every step of the way. The people who truly understand what is happening are still in the ruling class, and their skills are not going to be irrelevant anytime soon.
Not sure what you mean by blindly prompting models
100% - I cant believe there are smart people in this conversation that dont see this.
If you dont understand AWS you can't vibe code a terraform codebase that creates a complex infrastructure .. etc
Countdown to his youtube course explaining it all for beginners commences...
His "youtube course" already exists, and it's absolutely transformational.
He's working on a more formal educational framework/service of some kind, which will presumably not be free, but what he's already posted is some of the most effective CS pedagogy I've ever encountered (and personally benefited from.)
> Clearly some powerful alien tool was handed around except it comes with no manual
Using tools before their manual exists is the oldest human trick, not the newest.
I love that Agile and Scrum is still unmentioned. Can we stick a fork in it yet?
Don’t you do retrospectives with your coding agents?
No, no, no.
We need to have a scrum with 3 agents each from the top 4 AI vendors, with each agent adhering to instructions given by a different programmer.
It's kind of like Robot Wars, except the damage is less physical and more costly.
The person saying this has a financial interest in saying so.
If Karpathy feels behind, imaging how we, regular folks feel
I've worked really hard over the last year at working out how to use these things, and it has more than paid off.
But I think if I had started learning today instead of a year ago, I'd get up to speed in more like 6 months instead of a year. A lot of stuff I learned a year ago is not really necessary anymore, but furthermore, there's just a lot more information out there about how to use these from people who have been learning it on their own.
I just don't think people who have ignored it up until now are really that far behind.
The thing that always trips me up is the lack of isolation/sandboxing that all of the AI programming tools provide. I want to orchestrate a workforce of agents, but they can't be trusted not to run amok.
Does anyone have a better way to do this other than spinning up a cloud VM to run goose or claude or whatever poorly isolated agent tool?
I have seen Claude disable its sandbox. Here is the most recent example from a couple of weeks ago while debugging Rust: "The panic is due to sandbox restrictions, not code errors. Let me try again with the sandbox disabled:"
I have since added a sandbox around my ~/dev/ folder using sandbox-exec in macOS. It is a pain to configure properly but at least I know where sandbox is controlled.
That refers to the sandbox "escape hatch" [1], running a command without a sandbox is a separate approval so you get another prompt even if that command has been pre-approved. Their system prompt [2] is too vague about what kinds of failures the sandbox can cause, in my experience the agent always jumps straight to disabling the sandbox if a command fails. Probably best to disable the escape hatch and deal with failures manually.
[1] https://code.claude.com/docs/en/sandboxing#configure-sandbox...
[2] https://github.com/Piebald-AI/claude-code-system-prompts/blo...
I'm working on a solution [0] for this. My current approach is:
1. Create a new Git worktree
2. Create a Docker container w/ bind mount
3. Provide an interface for easily switching between your active worktrees/containers.
For credentials, I have an HTTP/HTTPS mitm [1] that runs on the host with creds, so there are zero secrets in the container.
The end goal is to be able to manage, say, 5-10 Claude instances at a time. I want something like Claude Code for Web, but self-hosted.
[0]: https://github.com/shepherdjerred/monorepo/tree/main/package...
[1]: https://github.com/shepherdjerred/monorepo/pull/156
If they cannot be trusted, why would you use them in the first place?
For the same reason you'd build a fire.
Obviously people perceive value there, but on the surface it does seem odd.
"These things are more destructive than your average toddler, so you need to have a fence in place kind of like that one in Jurassic Park, except you need to make sure it absolutely positively cannot be shut off, but all this effort is worthwhile, because, kind of like civets, some of the artifacts they shit out while they are running amok appear to have some value."
It’s shocking the collective shrug I get from our security people at work. I attend pretty serious meetings about genAI implementations and when I ask about points of view around security given things as crazy as “adversarial poetry” is a real thing I just get shrugs. I get the feeling they don’t want to be the ones to say “no, don’t bring genai to our clients” but also won’t dare say “yes, our client’s data is safe with integrated genai”.
Love the mix of metaphors.
I run them inside a sandbox https://github.com/ashishb/amazing-sandbox
Mind you he is in the industry, and founding a company whose success depends on this stuff.
He meant to post that from his alt account 'regularcoderguy'
I’m convinced much of this is all noise - people seem to be focusing on the wrong unit of analysis. Producing software and lots of it has never been a problem - coming up with the right projects and producing a vertically differentiated product to what already exists is.
That's true. The noise is being generated by people who are directly or indirectly incentivized to talk about it.
> coming up with the right projects and producing a vertically differentiated product to what already exists is.
Agreed but not all engineers are involved with this aspect of the business and the concern applies to them.
Being a nondeterministic tool, the output for a given input can vary. Rather than having a solid plan of, "if I provide this input, then that will happen", it's more like, "if I do something like this, I can expect something like that, probably, and if not, then try again until it works, I suppose".
What are the productivity gains? Obviously, it must vary. The quality of the tool output varies based on numerous criteria, including what programming language is being used and what problem is trying to be solved. The fact that person A gets a 10x productivity increase on their project does not mean that person B will also get a 10x productivity increase on their project, no matter how well they use the tool.
But again, tool usage itself is variable. Person A themselves might get a 10x boost one time, and 8x another time, and 4x another time, and 2x another time.
Non determinism does not imply non correctness. You can have the LLM do 10 different outputs, but maybe all 10 are valid solutions. Some might be more optimal in certain situations, and some might appeal to different people aesthetically.
Nondeterminism indeed does not imply non-correctness.
All ten outputs might be valid. All ten will almost certainly be different -- though even that is not guaranteed.
The OP referred to the notion of there being no manual; we have to figure out how to use the tool ourselves.
A traditional programming tool manual would explain that you can provide input X and expect output Y. Do this, and that will happen. It is not so clear-cut with AI tools, because they are -- by default, in popular configurations -- nondeterministic.
We are one functional output guarantee away from them being optimizing compilers.
Of course, we maybe never get there :)
Why would one opt to use an LLM-based AI tool as a compiler? It seems that would be extraordinarily complex over traditional compilers, but for what benefit?
It would be, in its ideal state a vague problem to concrete and robust implementation compiler.
A star trek replicator for software.
Obviously we are nowhere near that, and we may never arrive. But this is the big bet.
>A star trek replicator for software
That's a very interesting way to put it.
Non determinism of AI feels like a compiler which will on same input code spit out different executable on every run. Fixing bugs will become more like a ritual to satisfy whims of the machine spirit.
But how different? Compilers do, in fact, spit out different binaries with each run. There are timestamps and other subtle details embedded in them (esp compiler version and linking) that make the same source result in a different binary. "That's different"; "that's not the same thing!" I see you thinking. As long as the AI prompt "make me a login screen" results in a login screen appropriate for the rest of the code, and not "rm -rf ~/", does it matter if the indeterminism produces a login page with a Google login page before the email login button or after?
Also interesting is the possibility that a 10x boost for person A might still be slower than person B not using AI.
I for one am not using AI, will not touch that steaming pile of manure with a 10 yard stick, and I couldn't care less about the so called magnitude 9 earthquake. When this bubble finally bursts into nothingness, I'll be still here practicing my craft and providing real value for my clients.
I'm using it less and less now, since the sheen has worn off and I've been able to more accurately judge its capabilities. It's like an intern at everything it does and unfortunately I'm expected to produce better code than that.
I'm very confused, are you or are you not an LLM run account?
A couple weeks ago, under a freshly made account "llmslave", you said it's already replacing devs and the field is cooked, and anyone who doesn't see that lacks the skills to adopt AI [1]
I pointed out that given your name and low quality comments, you were likely an LLM run account. As SOON as I made that comment, you abandoned the account and have now made a duplicate llmslave2 account, with a different opinion
Are you doing an experiment or something?
[1] https://news.ycombinator.com/item?id=46291504#46292968
No, I'm just a fan account. No affiliation with the OG llmslave, I just thought the name and concept was funny.
When was the last time you used it?
An agent like Claude code? Maybe a few weeks ago. I use ai autocomplete and ask Claude to explain basic stuff outside my wheelhouse, generate throwaway bash scripts, etc. And I have Claude review code I'm unsure of / rubber ducky debugging, but that's about it.
i know how he feels :/
Let go of your AI gods and embrace the abyss. We've survived for decades without them and will survive in spite of them.
This is from the man who has no finished open source projects and who recommended camera-only FSD to Tesla, which he also did not finish.
The actually productive programmers, who wrote the stack that powers the economy before and after 2023 need not listen to these cheap commercials.
who recommended camera-only FSD to Tesla
That's a bummer if true. Is there a reliable source that lays that decision at Karpathy's feet?
> This is from the man who has no finished open source projects
To be fair, which open source project can really claim that it is "finished", and what does "finished" even mean?
The only projects that I can truly call "finished" are those that I have laid to rest because they have been superseded by newer technologies, not because they have achieved completeness, because there is always more to do.
> not because they have achieved completeness, because there is always more to do.
this is because SWEs love bloat and any good idea eventually needs to balloon into some ever-growing monstrosity :)
Then replace "finished" with "production software".
> To be fair, which open source project can really claim that it is "finished", and what does "finished" even mean?
https://github.com/left-pad
Behind who?
Is there someone already mastering “agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering” ?
And do they have a blog?
> Behind who[m]?
Why, the other rats in front of you in the race, of course!
As the pithy, if cheese expression goes, read not the times; read the eternities. People who spend so much time frantically chasing superficial ephemera like this are people without any sense of life's purpose. They're cogs in some hellish consumerist machine.
If you want to chase the mob off the cliff, go ahead. Insanity and stupidity aren't sound life strategies, though. They're a sign you have lost the plot.
Yeah. OR. You just ignore the bullshit until the bubble burst. Then we'll see what's left and it will not be what the majority think.
The "bubble" is in the financial investment, not in the technology. AI won't disappear after the bubble bursts, just like the web didn't disappear after 2000. If anything, bursting the financial bubble will most likely encourage researchers to experiment more, trying a larger range of cheaper approaches, and do more fundamental engineering rather than just scaling.
AI is here to stay, and the only thing that can stop it at this stage is a Butlerian jihad.
AI has been here long before LLM’s… also I dislike the people seemingly tying the two terms together as one.
I maintain, the web today is not what people though it would be in 1998. The tech has it's uses, it's just not what snake oil sellers are making it to be. And talking about Butlerian jihad is borderline snake oil selling.
Interesting. What particular 1998 claims do you have in mind that were not (at least approximately) fulfilled?
Not even Butlerian Jihad will stop the current progress at this point.
There seems to be a lot of churn, like how js was. We can just wait and see what the react of llms ends up being.
Man, this is giving me a cognitive dissonance compared to my experiences.
Actually, even the post itself reads like a cognitive dissonance with a dash of the usual "if it's not working for you then you are using it wrong" defence.
I feel exactly like Karpathy here. I have some work to do, and I know exactly what I need to do, and I'm able to explain it to AI, and the AI seems to understand me (I'm lately using Opus 4.5). I wrote down a roadmap, it should take me a few weeks of coding. It feels like with a proper workflow with AI agents, this work should be doable in one or two days. Yet, I know by now that it's not going to be nearly that fast. I'll be lucky if I finish 30% faster than if I just code the entire damn thing myself. The thing is, I am a huge AI optimist, I'm not one of the AI skeptics, not even close. Karpathy is not an AI skeptic. We just both feel this sense of possibility, and the fact that we can't make AI help us more is frustrating. That's all. There's no telling anyone else "it's on you if you can't make it work for you". I think Karpathy figured out by now, and at least I did, that the number of AI skeptics by now far outnumbers the number of AI optimists, and it has become something akin to a political conviction. It's quite futile to try and change someone's mind about whether AI is good, bad, overhyped, underused, etc. People picked their side and that's that.
I think you articulated perfectly why it's a bubble and why execs are so eager to push it everywhere. It's so alluring, it constantly feels like we're on the verge of something great. No wonder so many people have their brains fried by it.
we're 10 months into agentic coding. Claude code came out in march. I dont understand how you are so unimaginative to think what this might look like in 5 years even with slow progress.
It might be genuinely useful in 5 years, my issue is how it's being marketed now. We're 6 months into "AI will be writing 90% of code in three months" among other ridiculous statements.
Agreed. It is very similar to gambling in how it tricks the human mind. I am sure some of this AI technology will prove yo be useful but the breakthrough has been just around the corner since soon after ChatGPT was released.
I don't mean to be inflammatory but I am not at all convinced that LLMs will be useful for software development in 5 years!
I think LLMs are very well marketed but I don't think they're very good at writing code and I don't think they've gotten better at it!
I sort of agree. If anything I feel like they've gotten a bit worse, but the advances in the tooling around them (eg claude code) has masked that slightly.
I think they are useful as an augmentation, but largely valueless for directly outputting code. Who knows if that will change. It's still made me more productive as a dev despite not oneshotting entire files. It's just not industry-changing, at least yet.
If I can reassure you, if your project is complex enough and involve heavy data manipulation, a 30% improvement using Opus/Gemini 3/codex 5.2 seems like a good result. I think on complex tasks, Opus 4.5 improves my output by around 20-25%.
And since it's way, way less wrong than sonnet4, it might also improve my whole team velocity.
I won't lie, AI coding has been a net negative for the 'lazy devs' on my team who don't delves into their own generated code (by 'lazy devs' here I mean the subset of devs who do the work but often don't bother to truly understand the logic behind what they used/did. They are very good coworkers, add velue and are not really lazy, but I don't see another term for that).
“We just both feel this sense of possibility, and the fact that we can't make AI help us more is frustrating”
The mirage is alluring.
The real mirage is the utility of median developers
I think with better processes and training they could be. It is just that right now we do not train them and put them through scrum and other horrible processes. Median developers are bad due to bad management.
give them better incentives
I think of it this way. If you dropped Einstein with a time machine two thousand year ago, people would think he is some crazy guy doing scribbles in the sand. No one would ever know how smart he is. The same is with people and advanced AGI like Gemini 3 Pro or Chatgpt 5.2 Pro. We are just dumber than them.
> We are just dumber than them.
you are, for sure.
Why do you think the models are AGI?
I also like to think that Einstein would be smart enough to explain things from a common point of understanding if you did drop him 2000 years in the past (assuming he also possesses the scientific knowledge humanity accrued in that 2000 year gap). So, your analogy doesn't really make a lot of sense here. I also doubt he'd be able to prove his theories with the technology of the past but that's a different matter.
If we did have AGI models, they would be able to solve our hardest problems (assuming a generous definition of AGI) even if we didn't immediately understand exactly how they got there. We already have a lot of complex systems that most people don't fully understand but can certainly verify the quality of. The whole "too smart for people to understand that they're too smart" is just a tired trope.
You are certainly dumber than them if you think they are AGI. These models are smart and getting smarter, but they are not AGI.
You think they have “advanced AGI” and are worried about keeping up with the software industry? There would be be nothing to keep up with at that point.
To use an analogy, it would be like spending all your time before a battle making sure your knife is sharp when your opponent has a tank.
I have been using Copilot, Cursor, then CC for a little more than a year now. I have written code with teams using these tools and I am writing mostly for myself now. My observations have been the following:
1) These tools obviously improved significantly over the past 12 months. They can churn out code that makes sense in the context of the codebase, meaning there is more grounding to the codebase they are working on as opposed to codebases they have been trained on.
2) On the surface they are pretty good at solving known problems. You are not going to make them write well-optimized renderer or an RL algorithm but they can write run-of-the-mill business logic better _and_ faster than I can-- if you optimize for both speed of production and quality.
3) Out of the box, their personality is to just solve the problem in front of them as quickly as possible and move on. This leads them to make suboptimal decisions (e.g. solving a deadlock by sleeping for 2 seconds, CC Opus 4.5 just last night). This personality can be altered with appropriate guidance. For example, a shortcut I use is to append "idiomatic" to my request-- "come up with an idiomatic solution" or "is that the most idiomatic solution we can think of." Similarly when writing tests or reviewing tests I use "intent of the function under test" which makes the model output better solution or code.
4) These models, esp. Opus 4.5 and GPT 5.2, are remarkable bug hunters. I can point at a symptom and they come away with the bug. I then ask them to explain me why the bug happens and I follow the code to see if it's true. I have not come across a bad bug, yet. They can find deadlocks and starvations, you then have to guide them to a good fix (see #3).
5) Code quality is not sufficient to create product quality, but it is often necessary to sustain it. Sustainability window is shorter nowadays. Therefore, more than ever, quality of the code matters. I can see Claude Code slowly degrading in quality every single day--and I use it every single day for many hours. As much as it pains me to say this, compared to Opencode, Amp, and Toad I can feel the "slop" in Claude Code. I would love to study the codebases of these tools overtime to measure their quality--I know it's possible for all but Claude Code.
6) I used to worry I don't have a good mental model of the software I build. Much like journaling, I think there is something to be said about the process of writing/making actually gives you a very precise mental model. However, I have been trying to let that go and use the model as a tool to query and develop the mental model post facto. It's not the same but I think it is going to be the new norm. We need tooling in this space.
7) Despite your own experiences with these tools it is imperative that they be in your toolbox. If you have abstained from them thus far, perhaps best way to get them incorporated is by starting to use them for attending to your toil.
8) You can still handcraft code. There is so much fun, beauty and pleasure it in to deny doing it. Don't expect this to be your job. This is your passion.
> Despite your own experiences with these tools it is imperative that they be in your toolbox.
Why is it imperative? Whenever I read comments like this I just think the author is cynically drumming up hype because of the looming AI bubble collapse.
I don't usually post something like this, but this is so fucking stupid. I'm prepared to stand by that. Let's see in a few years if I'm right.
"AI" is literally models trained to make you think it's intelligent. That's it. It's like the ultimate "algorithm" or addiction machine. It's trained to make you think it's amazing and magical and therefore you think it's amazing and magical.
This could apply if we looked at questions in vacuum - someone had a conversation and was judging the models based on that. But some of us just use it for work and get good results daily. "Intelligent" is irrelevant; it's "useful". It doesn't matter what feelings I have about it if it saves me 2h of typing from time to time.
To me, as just another kinda old (I’m 49) swe, the biggest benefit of using an LLM tool is it saves a shit ton of typing. I know what I want and I know when it’s right, just saving me from typing it all out is worth $20 bucks a month.
The system prompt may vary but:
"It's trained to make you think it's amazing and magical and therefore you think it's amazing and magical."
is the dark pattern underlying the entire LLM hype cycle IMO.
It’s trained to (lossy) compress large amounts of data. The system prompts have leaked and it’s just instructed to be helpful, right? I don’t entirely disagree with your sentiment, though. It’s brute force.
> There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering.
Slop-oriented programming
"I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue. There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering. Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession. Roll up your sleeves to not fall behind."
I have been telling everybody I know over the Christmas break that I have been coding from around 10-36 years of age, as a career and always in my spare time as a hobby. I have a lacklustre computer science knowledge and never worked at the scale of FANG etc but am still rather confident in my understanding of code and the tech scene in general. I've been telling people I haven't "coded" for almost 6 months now, I only interface with agentic setups and only open my IDE to make copy and config changes.
I understand we are all in different camps for a multitude of reasons;
- The jouissance of rote coding and abstraction
- The tree of knowledge specifically in programming, and which branches and nodes we each currently sit at in our understanding
- Technical paradigms that humans may have argued about have now shifted to obvious answers for agentic harnesses (think something like TDD, I for one barely used that as a style because I've mostly worked in startups building apps and found the cost of my labour not worth it, but agentic harnesse loops absolutely excel at it)
- The geography and size of the markets we work in
- The complexity of the subject matter / domain expertise
- The cost prohibitive nature of token based programming (not everyone can afford it, and the big fish seemingly have quite the advantage going fourth)
- Agentic coding has proven it can build UI's very easily, and depending on experience, it can build a very very many things easily. it excels in having feedback loops such as linting or simple javascript errors, which are observability problems in my opinion. Once it can do full stack observability (APM, system, network), it's ability to reason and correct problems on the fly for any complex system seems overly easy from my purvue.
- At the human nature level, some individuals prefer to think in 0's and 1's, some in words, some inbetween, and so on, what type of communication do agentic setups prefer?
With some of that above intuition that is easily up for debate, I've decided to lean 100% into agentic coding, I think it will be absolutely everywhere and obviously with humans in the loop but I don't think humans will need to review the pull requests. I am personally treating it as an existential threat to my career after having seen enough of what it's capable of. (with some imagination and a bit of a gambling spirit, as us mere mortals surely can't predict the future)
With my gambit, I'm not choosing to exit the tech scene and instead optimistically investing my mental prowess into figuring out where "humans in the loop" will be positioned. Currently I'm looking into CI level tooling, the known being code quality, and all the various forms of software testing paradigms. The emerging evals in my mind will keep evolving and beyond testing our ideas of model intelligence and chat bot responses will do a lot more.
---
A more practical rant: If you are building a recommendation engine for A and B, the engine could have X amount of modules that return a score which when all combined make up the final decision between A and B. Forgive me but let's just use dating as an example. A product manager would say we need a new module to calculate relevance between A and B based off their food preferences. An agentic harness can easily code that module and create the tests for it. The product manager could ask an LLM to make a list of 1000 reasons why two people might be suitable for dating. The agent could easily go away and code and test all those modules and probably maintain technical consistency but drift from the companies philosophical business model. I am looking into building "semantic linting" for codebases, how can the agent maintain the code so it aligns with the company's business model. And if for whatever reason those 1000 modules need to be refactored, how can the agent maintain the code so it aligns with the company's business model. Essentially trying to make a feedback loop between the companies needs and the code itself. To stop the agent and the business from drifting in either directions, and allowing for automatic feedback loops for the agent to fix them. In short, I think there will be new tools invented that us human's will be mastering as to Karpathy's point.