The comment that points out that this week-long experiment produced nothing more than a non-functional wrapper for Servo (an existing Rust browser) should be at the top:
The blog[0] is worded rather conservatively but on Twitter [2] the claim is pretty obvious and the hype effect is achieved [2]
CEO stated "We built a browser with GPT-5.2 in Cursor"
instead of
"by dividing agents into planners and workers we managed to get them busy for weeks creating thousands of commits to the main branch, resolving merge conflicts along the way. The repo is 1M+ lines of code but the code does not work (yet)"
Even then, "resolving merge conflicts along the way" doesn't mean anything, as there are two trivial merge strategies that are always guaranteed to work ('ours' and 'theirs').
Haha. True, CI success was not part of PR accept criteria at any point.
If you view the PRs, they bundle multiple fixes together, at least according to the commit messages. The next hurdle will be to guardrail agents so that they only implement one task and don't cheat by modifying the CI piepeline
If I had a nickel for every time I've seen a human dev disable/xfail/remove a failing test "because it's wrong" and then proceeding to break production I would have several nickels, which is not much, but does suggest that deleting failing tests, like many behaviors, is not LLM-specific.
So clearly someone, at some point, managed to run this, surely? That's where the screenshots come from? I just don't understand how, given the code is riddled with errors.
I'm eager to find out if this was actually successfully compiled at one point (otherwise how did they get the screenshots?), so I'm running `cargo check` for each of the last 100 commits to see if anything works. Will update here with the results once it's ready.
> Yeah, seems latest commit does let `cargo check` successfully run. I'm gonna write an update blog post once they've made their statement, because I'm guessing they're about to say something.
> Sometime fishy is happening in their `git log`, it doesn't seem like it was the agents who "autonomously" actually made things compile in the end. Notice the git username and email addresses switching around, even a commit made inside a EC2 instance managed to get in there: https://gist.github.com/embedding-shapes/d09225180ea3236f180...
Gonna need to look closer into it when I have time, but seems they manually patched it up in the end, so the original claim still doesn't stand :/
I wouldn’t be surprised if any form of screen shot is fake (as in not made the way it claims), in my experience Occam’s razor tends to lead that way when extraordinary claims are made regarding LLM’s.
The Actions overview is impressive: There have been 160,469 workflow runs, of which 247 succeeded. The reason the workflows are failing is because they have exceeded their spending limit. Of course, the agents couldn't care less.
Yeah, seems latest commit does let `cargo check` successfully run. I'm gonna write an update blog post once they've made their statement, because I'm guessing they're about to say something.
Sometime fishy is happening in their `git log`, it doesn't seem like it was the agents who "autonomously" actually made things compile in the end. Notice the git username and email addresses switching around, even a commit made inside a EC2 instance managed to get in there: https://gist.github.com/embedding-shapes/d09225180ea3236f180...
I really doubt this marketing approach is effective. Isn't this just shooting themselves in the foot? My actual experience with Cursor has been: their design is excellent and the UX is great—it handles frontend work reasonably well. But as soon as you go deeper, it becomes very prone to serious bugs. While the addition of Claude's new models has helped somewhat, the results are still not as good as Google's Antigravity (despite its poor UX and numerous bugs). What's worse, even with this much-hyped Claude model, you can easily blow through the $20 subscription limit in just a few days. Maybe they're betting on models becoming 10x better and 10x cheaper, but that seems unlikely to happen anytime soon.
I think the original post was just headline bait. There is such a fast news cycle around AI that many people would take "Thousands of AI agents collaborate to make a web browser" at face value.
At least I now have something to link to, when this inevitable gets mentioned in some off-hand HN comment, about how "now AI agents can build whole browsers from scratch".
> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.
"From scratch" sounds very impressive. "custom JS VM" is as well. So let's take a look at the dependencies [1], where we find
- html5ever
- cssparser
- rquickjs
That's just servo [2], a Rust based browser initially built by Mozilla (and now maintained by Igalia [3]) but with extra steps. So this supposed "from scratch" browser is just calling out to code written by humans. And after all that it doesn't even compile! It's just plain slop.
That is because I've noticed the AI just edits the version management files (package.json, cargo.toml, etc) directly instead of using the build tool (npm add, cargo add), so it always hallucinates a random old version that's found in its training set. I explicitly have to tell the AI to use the build tool whenever I use AI.
> The JS engine used a custom JS VM being developed in vendor/ecma-rs as part of the browser, which is a copy of my personal JS parser project vendored to make it easier to commit to.
It looks like there are two JS backends: quickjs and vm-js (vendor/ecma-rs/vm-js), based on a brief skim of the code. There is some logic to select between the two. I have no idea if either or both of them work.
I thought they'd plagiarise, not import. Importing servo's code would make it obvious because it's so easy to look at their dependencies file. And yet ... they did. I really think they thought no one would check?
You know, a good test would be to tell it to write a browser using a custom programming language, or at least some language for which there are no web browsers written.
Write a browser without any access to the internet, is what I'd attempted if I was running this experiment. Just seed it with a bunch of local HTML, CSS and JS files from the various testing suites that exists.
To be fair, even if "from scratch" means "download and build Chromium", that's still nontrivial to accomplish. And with how complicated a modern browser is, you can get into Ship of Theseus philosophy pretty fast.
I wouldn't particularly care what code the agents copied, the bigger indictment is the code doesn't work.
So really, they failed to meet the bar of "download and build Chromium" and there's no point to talk about the code at all.
I haven’t studied the project that this is a comment on, but: The article notices that something that compiles, runs, and renders a trivial HTML page might be a good starting point, and I would certainly agree with that when it’s humans writing the code. But is it the only way? Instead of maintaining “builds and runs” as a constant and varying what it does, can it make sense to have “a decent-sized subset of browser functionality” as a constant and varying the “builds and runs” bit? (Admittedly, that bit does not seem to be converging here, but I’m curious in more general terms.)
In theory you could generate a bunch of code that seems mostly correct and then gradually tweak it until it's closer and closer to compiling/working, but that seems ill-suited to how current AI agents work (or even how people work). AI agents are prone to make very local fixes without an understanding of wider context, where those local fixes break a lot of assumptions in other pieces of code.
It can be very hard to determine if an isolated patch that goes from one broken state to a different broken state is on net an improvement. Even if you were to count compile errors and attempt to minimize them, some compile errors can demonstrate fatal flaws in the design while others are minor syntax issues. It's much easier to say that broken tests are very bad and should be avoided completely, as then it's easier to ensure that no patch makes things worse than it was before.
Obviously, it has to eventually build and run if there’s to be any point to it, but is it necessary that every, or even any, step along the way builds and runs? I imagine some sort of iterative set-up where one component generates code, more or less "intelligently", and others check it against the C, HTML, JavaScript, CSS and what-have-you specs, and the whole thing iterates until all the checking components are happy. The components can’t be completely separate, of course, they’d have to be more or less intermingled or convergence would be very slow (like when lcamtuf had his fuzzer generate a JPEG out of an empty file), but isn’t that basically what (large) neural networks are; tangled messes of interconnected functions that do things in ways too complicated for anyone to bother figuring out?
The amount of negativity in the original post was astounding.
People were making all sorts of statements like:
- “I cloned it and there were loads of compiler warnings”
- “the commit build success rate was a joke”
- “it used 3rd party libs”
- “it is AI slop”
What they all seem to be just glossing over is how the project unfolded: without human intervention, using computers, in an exceptionally accelerated time frame, working 24hr/day.
If you are hung up on commit build quality, or code quality, you are completely missing the point, and I fear for your job prospects. These things will get better; they will get safer as the workflows get tuned; they will scale well beyond any of us.
Don’t look at where the tech is. Look where it’s going.
As mentioned elsewhere (I'm the author of this blogpost), I'm a heavy LLM user myself, use it everyday as a tool, get lots of benefits from it. It's not a "hit post" on using LLM tools for development, it's a post about Cursor making grand claims without being able to back them up.
No one is hung up on the quality, but there is a ground fact if something "compiles" or "doesnt". No one is gonna claim a software project was successful if the end artifact doesn't compile.
I think for the point of the article, it appeared to, at some point, render homepages for select well known sites. I certainly did not expect this to be a serious browser, with any reliability or legs. I don’t think that is dishonest.
> I certainly did not expect this to be a serious browser, with any reliability or legs.
Me neither, and I note so twice in the submission article. But I also didn't expect a project that for the last 100+ commits couldn't reliably be built and therefore tested and tried out.
> What they all seem to be just glossing over is how the project unfolded: without human intervention, using computers, in an exceptionally accelerated time frame, working 24hr/day.
Correct, but Gas Town [1] already happened and what's more _actually worked_, so this experiment is both useless (because it doesn't demonstrate working software) _and_ derivative (because we've already seen that you can set up a project where with spend similar to the spend of a single developer you can churn out more code than any human could read in a week).
It is hard to look at where it is going when there are so many lies about where the tech is today. There are extraordinary claims made on Twitter all the time about the technology, but when you look into things, it’s all just smoke and mirrors, the claims misrepresent the reality.
I wonder who they actually tried to impress with that? People who understand and appreciate the difficulty of building a browser from scratch would surely be interested to understand what you (or your Agent) did to a degree that they would understand if you didn’t.
Key phrase "They never actually claim this browser is working and functional
" This is what most AI "successes" turn out to be when you apply even a modicum of scrutiny.
In my personal experience, Codex and Claude Code are definitively capable tools when used in certain ways.
What Cursor did with their blogpost seems intentionally and outright misleading, since I'm not able to even run the thing. With Codex/Claude Codex it's relatively easy to download it and run it to try for yourself.
Yes, many tools work like that, especially professional tools.
You think you can just fire up Ableton, Cubase or whatever and make as great music as a artist who done that for a long time? No, it requires practice and understanding. Every tool works like this, some different difficulties, some different skill levels, but all of them have it in some way.
This is the company making the tool that is holding the tool, in this case, claiming that "[they] built a browser" when, if TFA's assertions are correct, they did not "build a browser" by any reasonable interpretation of those words.
(I grant that you're speaking from your experience, about different tools, two replies up, but this claims is just paper-rock-scissorable through these various AI tools. "Oh, this tool's authors are just hype, but this tool works totes-mc-oates…". Fool me once, and all.)
Yes, and apparently is a horrible way, because they've obviously failed to produce a functioning browser. But since I'm the author of TFA, I guess I'm kind of biased in this discussion.
Codex was sold to me as a tool that can help me do program. I tried it, evaluated it, found it helpful, continued using it. Based on my experience, it definitively helps with some tasks. Apparently also, it does not work for others, for some not at all. I know the tool works for me, and I take the claim that it doesn't for others, what am I left to believe in? That the tool doesn't actually work, even though my own experience and usage of it says otherwise?
Codex is still an "AI success", regardless if it could build an entire browser by itself, from scratch, or whatever. It helps as it is today, I wouldn't need it to get better to continue using it.
But even with this perspective, which I'd say is "nuanced" (others would claim "AI zealot" probably), I'm trying to see if what Cursor claims is actually true, that they managed to build a browser in that way. When it doesn't seem true, I call it out. I still disagree with "This is what most AI "successes" turn out to be when you apply even a modicum of scrutiny", and I'm claiming what Cursor is doing here is different.
Not even the Ableton marketing team is telling me I can just fire up Ableton and make great music and if I can't do that I must be a brainwashed doomer.
The argument isn't what OpenAI/Anthropic are selling their users, what I said was:
> are definitively capable tools when used in certain ways
Which I received pushback on. My reply is to that pushback, defending what I said, not what others told you.
Edit: Besides the point, but Ableton (and others) constantly tell people how to learn how to use the tool, so they use it the right way. There is a whole industry of people (teachers) who specialize in specific software/hardware and teaching others "how to hold the tool correctly".
> Codex and Claude Code are definitively capable tools when used in certain ways.
They definitely can make some things better and you can do somethings faster, but all the efficiency is gonna get sucked up by companies trying to drop more slop.
Browsers contain several high complexity pieces each of could take a while to build on its own, and interconnect them with reasonably verbose APIs that need to be implemented or at least stubbed out for code to not crash. There is also the difficulty of matching existing implementations quirk for quirk.
I guess the complexity is on-par with operating systems, but with the added compatibility problems that in order to be useful it doesn't just have to load sites intended to be compatible with it, it has to handle sites people actually use on the internet, and those are both a moving target, and tend to use lots of high complexity features that you have to build or at least stub out before the site will even work.
This is why AI skeptics exist. We’re now at the point where you can make entirely unsubstantiated claims about AI capability, and even many folks on HN will accept it with a complete lack of discernment. The hype is out of control.
> folks on HN will accept it with a complete lack of discernment
Well, I'm a heavy LLM user, I "believe" LLM helps me a lot for some tasks, but I'm also a developer with decades of experience, so I'm not gonna claim it'll help non-programmers to build software, or whatever. They're tools, not solutions in themselves.
But even us "folks on HN" who generally keep up with where the ecosystem is going, have a limit I suppose. You need to substantiate what you're saying, and if you're saying you've managed to create a browser, better let others verify that somehow.
The second top comment is my own (skeptical) comment, with 20 points at this moment. Thanks to those 20 people, I felt compelled to write the blog-post in this submission, and try to ask a bit clearer "what is going on?", since apparently we're at least 20 people who is wondering about this.
I certainly don’t think Simon is a shill. He’s obviously a highly talented person, who in my opinion just doesn’t exercise appropriate discernment in some cases.
Edit: Of course, this isn’t a trait unique to Simon either. Everybody has blind spots, and it’s reasonable to be excited when new tech is released. On an unrelated note, my intent is to push back against some of the people here who try to shut down skepticism. Obviously, this doesn’t describe Simon, but I’ve seen others here who try to silence skeptical voices. This comes across as highly controlling and insecure.
I do not think you are reacting to what I said in good faith.
> he better hope he's on the right side of history here, as otherwise he will have burnt his reputation
That's something I've actually given quite a lot of thought to. My reputation and credibility matters a great deal to me. If it turns out this entire LLM thing was an over-hyped scam I'll take a very big hit to that reputation, and I'll deserve it.
(If AI rises up and tries to kill or enslave us all I'll be too busy fighting back to care.)
Always take any pronouncement from an AI company (heavily dependent on VC and public sentiment on AI) with a heavy grain of salt..
hype over reality
I’m building an AI startup myself and I know that world and its full of hypsters and hucksters unfortunately - also social media communication + low attention span + AI slop communication is a blight upon todays engineering culture
The comment that points out that this week-long experiment produced nothing more than a non-functional wrapper for Servo (an existing Rust browser) should be at the top:
https://news.ycombinator.com/item?id=46649046
The blog[0] is worded rather conservatively but on Twitter [2] the claim is pretty obvious and the hype effect is achieved [2]
CEO stated "We built a browser with GPT-5.2 in Cursor"
instead of
"by dividing agents into planners and workers we managed to get them busy for weeks creating thousands of commits to the main branch, resolving merge conflicts along the way. The repo is 1M+ lines of code but the code does not work (yet)"
[0] https://cursor.com/blog/scaling-agents
[1] https://x.com/kimmonismus/status/2011776630440558799
[2] https://x.com/mntruell/status/2011562190286045552
[3]https://www.reddit.com/r/singularity/comments/1qd541a/ceo_of...
Even then, "resolving merge conflicts along the way" doesn't mean anything, as there are two trivial merge strategies that are always guaranteed to work ('ours' and 'theirs').
So, AI agent battle royale
Haha. True, CI success was not part of PR accept criteria at any point.
If you view the PRs, they bundle multiple fixes together, at least according to the commit messages. The next hurdle will be to guardrail agents so that they only implement one task and don't cheat by modifying the CI piepeline
If I had a nickel for every time I've seen a human dev disable/xfail/remove a failing test "because it's wrong" and then proceeding to break production I would have several nickels, which is not much, but does suggest that deleting failing tests, like many behaviors, is not LLM-specific.
> but does suggest that deleting failing tests, like many behaviors, is not LLM-specific.
True, but it is shocking how often claude suggests just disabling or removing tests.
that’s not guaranteed to work. Other parts of the CodeBase that didn’t conflict could depend on the discarded code.
The point is that the merge conflict was resolved, regardless of whether there was a working product at the end. Which there apparently isn’t.
Well they did mention the code doesn't work.
Where did Cursor say that?
So clearly someone, at some point, managed to run this, surely? That's where the screenshots come from? I just don't understand how, given the code is riddled with errors.
Somebody managed to get it to compile https://x.com/CanadaHonk/status/2011612084719796272
But apparently "some pages take a literal minute to load"
> to be clear those 2 hours were fixing compile errors and bugs, not compile time
Seems like "I had to do the last mile myself", not "autonomous coding" which was Cursor's claim here.
Maybe they just asked an AI to create an image of a rendered webpage?
The link [0] implies that the browser worked. Can you help me understand what's "conservative" about that?
I'm eager to find out if this was actually successfully compiled at one point (otherwise how did they get the screenshots?), so I'm running `cargo check` for each of the last 100 commits to see if anything works. Will update here with the results once it's ready.
Edit: As mentioned, I ran `cargo check` on all the last 100 commits, and seems every single of them failed in some way: https://gist.github.com/embedding-shapes/f5d096dd10be44ff82b...
Should compile now: https://news.ycombinator.com/item?id=46650998
> Yeah, seems latest commit does let `cargo check` successfully run. I'm gonna write an update blog post once they've made their statement, because I'm guessing they're about to say something.
> Sometime fishy is happening in their `git log`, it doesn't seem like it was the agents who "autonomously" actually made things compile in the end. Notice the git username and email addresses switching around, even a commit made inside a EC2 instance managed to get in there: https://gist.github.com/embedding-shapes/d09225180ea3236f180...
Gonna need to look closer into it when I have time, but seems they manually patched it up in the end, so the original claim still doesn't stand :/
I wouldn’t be surprised if any form of screen shot is fake (as in not made the way it claims), in my experience Occam’s razor tends to lead that way when extraordinary claims are made regarding LLM’s.
If you look at the original Cursor post, they say they are currently running similar experiments, for instance, this Excel clone:
https://github.com/wilson-anysphere/formula
The Actions overview is impressive: There have been 160,469 workflow runs, of which 247 succeeded. The reason the workflows are failing is because they have exceeded their spending limit. Of course, the agents couldn't care less.
The latest commit now builds and runs (at least on my Mac). It’s tragically broken and the code is…dunno…something. 3m lines of something.
I couldn’t make it render the apple page that was on the Cursor promo. Maybe they’ve used some other build.
Yeah, seems latest commit does let `cargo check` successfully run. I'm gonna write an update blog post once they've made their statement, because I'm guessing they're about to say something.
Sometime fishy is happening in their `git log`, it doesn't seem like it was the agents who "autonomously" actually made things compile in the end. Notice the git username and email addresses switching around, even a commit made inside a EC2 instance managed to get in there: https://gist.github.com/embedding-shapes/d09225180ea3236f180...
Noticed that as well - I think it was “manual”
I really doubt this marketing approach is effective. Isn't this just shooting themselves in the foot? My actual experience with Cursor has been: their design is excellent and the UX is great—it handles frontend work reasonably well. But as soon as you go deeper, it becomes very prone to serious bugs. While the addition of Claude's new models has helped somewhat, the results are still not as good as Google's Antigravity (despite its poor UX and numerous bugs). What's worse, even with this much-hyped Claude model, you can easily blow through the $20 subscription limit in just a few days. Maybe they're betting on models becoming 10x better and 10x cheaper, but that seems unlikely to happen anytime soon.
I think the original post was just headline bait. There is such a fast news cycle around AI that many people would take "Thousands of AI agents collaborate to make a web browser" at face value.
At least I now have something to link to, when this inevitable gets mentioned in some off-hand HN comment, about how "now AI agents can build whole browsers from scratch".
It's a great post, I will use it for the same. Thank you.
The CEO said
> It's 3M+ lines of code across thousands of files. The rendering engine is from-scratch in Rust with HTML parsing, CSS cascade, layout, text shaping, paint, and a custom JS VM.
"From scratch" sounds very impressive. "custom JS VM" is as well. So let's take a look at the dependencies [1], where we find
- html5ever
- cssparser
- rquickjs
That's just servo [2], a Rust based browser initially built by Mozilla (and now maintained by Igalia [3]) but with extra steps. So this supposed "from scratch" browser is just calling out to code written by humans. And after all that it doesn't even compile! It's just plain slop.
[1] - https://github.com/wilsonzlin/fastrender/blob/main/Cargo.tom...
[2] - https://github.com/servo/servo
[3] - https://blogs.igalia.com/mrego/servo-2025-stats/
Yeah, it's
- Servo's HTML parser
- Servo's CSS parser
- QuickJS for JS
- selectors for CSS selector matching
- resvg for SVG rendering
- egui, wgpu, and tiny-skia for rendering
- tungstenite for WebSocket support
And all of that has 3M of lines!
Also selectors and taffy.
It's also using weirdly old versions of some dependencies (e.g. wgpu 0.17 from June 2023 when the latest is 28 released in Decemeber 2025)
That is because I've noticed the AI just edits the version management files (package.json, cargo.toml, etc) directly instead of using the build tool (npm add, cargo add), so it always hallucinates a random old version that's found in its training set. I explicitly have to tell the AI to use the build tool whenever I use AI.
> The JS engine used a custom JS VM being developed in vendor/ecma-rs as part of the browser, which is a copy of my personal JS parser project vendored to make it easier to commit to.
https://news.ycombinator.com/item?id=46650998
It looks like there are two JS backends: quickjs and vm-js (vendor/ecma-rs/vm-js), based on a brief skim of the code. There is some logic to select between the two. I have no idea if either or both of them work.
Honestly as soon as I saw browser in rust I assumed it had just reproduced the servo source code in part, or utilised its libraries.
I thought they'd plagiarise, not import. Importing servo's code would make it obvious because it's so easy to look at their dependencies file. And yet ... they did. I really think they thought no one would check?
> And yet ... they did. I really think they thought no one would check?
I doubt even they checked, given they say they just let the agents run autonomously.
You know, a good test would be to tell it to write a browser using a custom programming language, or at least some language for which there are no web browsers written.
Write a browser without any access to the internet, is what I'd attempted if I was running this experiment. Just seed it with a bunch of local HTML, CSS and JS files from the various testing suites that exists.
To be fair, even if "from scratch" means "download and build Chromium", that's still nontrivial to accomplish. And with how complicated a modern browser is, you can get into Ship of Theseus philosophy pretty fast.
I wouldn't particularly care what code the agents copied, the bigger indictment is the code doesn't work.
So really, they failed to meet the bar of "download and build Chromium" and there's no point to talk about the code at all.
Dear god please let AI get forever stuck at this point because it would be so funny
These are stories that solely exist just to sell shovels and would cause one uninformed CEO to layoff actual humans.
I haven’t studied the project that this is a comment on, but: The article notices that something that compiles, runs, and renders a trivial HTML page might be a good starting point, and I would certainly agree with that when it’s humans writing the code. But is it the only way? Instead of maintaining “builds and runs” as a constant and varying what it does, can it make sense to have “a decent-sized subset of browser functionality” as a constant and varying the “builds and runs” bit? (Admittedly, that bit does not seem to be converging here, but I’m curious in more general terms.)
In theory you could generate a bunch of code that seems mostly correct and then gradually tweak it until it's closer and closer to compiling/working, but that seems ill-suited to how current AI agents work (or even how people work). AI agents are prone to make very local fixes without an understanding of wider context, where those local fixes break a lot of assumptions in other pieces of code.
It can be very hard to determine if an isolated patch that goes from one broken state to a different broken state is on net an improvement. Even if you were to count compile errors and attempt to minimize them, some compile errors can demonstrate fatal flaws in the design while others are minor syntax issues. It's much easier to say that broken tests are very bad and should be avoided completely, as then it's easier to ensure that no patch makes things worse than it was before.
> generate a bunch of code that seems mostly correct and then gradually tweak it until it's closer and closer to compiling/working
The diffusion model of software engineering
...What use is code if it doesn't build and run? What other way is there to build a browser that doesn't involved 'build and run'?
Writing junk in a text file isn't the hard part.
Obviously, it has to eventually build and run if there’s to be any point to it, but is it necessary that every, or even any, step along the way builds and runs? I imagine some sort of iterative set-up where one component generates code, more or less "intelligently", and others check it against the C, HTML, JavaScript, CSS and what-have-you specs, and the whole thing iterates until all the checking components are happy. The components can’t be completely separate, of course, they’d have to be more or less intermingled or convergence would be very slow (like when lcamtuf had his fuzzer generate a JPEG out of an empty file), but isn’t that basically what (large) neural networks are; tangled messes of interconnected functions that do things in ways too complicated for anyone to bother figuring out?
If this is what makes the AI bubble pop I'll laugh so hard.
The amount of negativity in the original post was astounding.
People were making all sorts of statements like: - “I cloned it and there were loads of compiler warnings” - “the commit build success rate was a joke” - “it used 3rd party libs” - “it is AI slop”
What they all seem to be just glossing over is how the project unfolded: without human intervention, using computers, in an exceptionally accelerated time frame, working 24hr/day.
If you are hung up on commit build quality, or code quality, you are completely missing the point, and I fear for your job prospects. These things will get better; they will get safer as the workflows get tuned; they will scale well beyond any of us.
Don’t look at where the tech is. Look where it’s going.
As mentioned elsewhere (I'm the author of this blogpost), I'm a heavy LLM user myself, use it everyday as a tool, get lots of benefits from it. It's not a "hit post" on using LLM tools for development, it's a post about Cursor making grand claims without being able to back them up.
No one is hung up on the quality, but there is a ground fact if something "compiles" or "doesnt". No one is gonna claim a software project was successful if the end artifact doesn't compile.
I think for the point of the article, it appeared to, at some point, render homepages for select well known sites. I certainly did not expect this to be a serious browser, with any reliability or legs. I don’t think that is dishonest.
> I certainly did not expect this to be a serious browser, with any reliability or legs.
Me neither, and I note so twice in the submission article. But I also didn't expect a project that for the last 100+ commits couldn't reliably be built and therefore tested and tried out.
> What they all seem to be just glossing over is how the project unfolded: without human intervention, using computers, in an exceptionally accelerated time frame, working 24hr/day.
Correct, but Gas Town [1] already happened and what's more _actually worked_, so this experiment is both useless (because it doesn't demonstrate working software) _and_ derivative (because we've already seen that you can set up a project where with spend similar to the spend of a single developer you can churn out more code than any human could read in a week).
[1]: https://github.com/steveyegge/gastown
It is hard to look at where it is going when there are so many lies about where the tech is today. There are extraordinary claims made on Twitter all the time about the technology, but when you look into things, it’s all just smoke and mirrors, the claims misrepresent the reality.
I wonder who they actually tried to impress with that? People who understand and appreciate the difficulty of building a browser from scratch would surely be interested to understand what you (or your Agent) did to a degree that they would understand if you didn’t.
Key phrase "They never actually claim this browser is working and functional " This is what most AI "successes" turn out to be when you apply even a modicum of scrutiny.
In my personal experience, Codex and Claude Code are definitively capable tools when used in certain ways.
What Cursor did with their blogpost seems intentionally and outright misleading, since I'm not able to even run the thing. With Codex/Claude Codex it's relatively easy to download it and run it to try for yourself.
"definitively capable tools when used in certain ways". This sounds like "if it doesn't work for you is because you don't use in the right way" imo.
Reminds me of SAAP/Salesforce.
Yes, many tools work like that, especially professional tools.
You think you can just fire up Ableton, Cubase or whatever and make as great music as a artist who done that for a long time? No, it requires practice and understanding. Every tool works like this, some different difficulties, some different skill levels, but all of them have it in some way.
This is the company making the tool that is holding the tool, in this case, claiming that "[they] built a browser" when, if TFA's assertions are correct, they did not "build a browser" by any reasonable interpretation of those words.
(I grant that you're speaking from your experience, about different tools, two replies up, but this claims is just paper-rock-scissorable through these various AI tools. "Oh, this tool's authors are just hype, but this tool works totes-mc-oates…". Fool me once, and all.)
Yes, and apparently is a horrible way, because they've obviously failed to produce a functioning browser. But since I'm the author of TFA, I guess I'm kind of biased in this discussion.
Codex was sold to me as a tool that can help me do program. I tried it, evaluated it, found it helpful, continued using it. Based on my experience, it definitively helps with some tasks. Apparently also, it does not work for others, for some not at all. I know the tool works for me, and I take the claim that it doesn't for others, what am I left to believe in? That the tool doesn't actually work, even though my own experience and usage of it says otherwise?
Codex is still an "AI success", regardless if it could build an entire browser by itself, from scratch, or whatever. It helps as it is today, I wouldn't need it to get better to continue using it.
But even with this perspective, which I'd say is "nuanced" (others would claim "AI zealot" probably), I'm trying to see if what Cursor claims is actually true, that they managed to build a browser in that way. When it doesn't seem true, I call it out. I still disagree with "This is what most AI "successes" turn out to be when you apply even a modicum of scrutiny", and I'm claiming what Cursor is doing here is different.
FWIW IMHO Windsurf is better than Cursor. Claude Code is better than both for many tasks, but not all.
Not even the Ableton marketing team is telling me I can just fire up Ableton and make great music and if I can't do that I must be a brainwashed doomer.
The argument isn't what OpenAI/Anthropic are selling their users, what I said was:
> are definitively capable tools when used in certain ways
Which I received pushback on. My reply is to that pushback, defending what I said, not what others told you.
Edit: Besides the point, but Ableton (and others) constantly tell people how to learn how to use the tool, so they use it the right way. There is a whole industry of people (teachers) who specialize in specific software/hardware and teaching others "how to hold the tool correctly".
or the iPhone...
> if it doesn't work for you is because you don't use in the right way
That's an almost universal truth that you need to learn how to use any non trivial tool.
> Codex and Claude Code are definitively capable tools when used in certain ways.
They definitely can make some things better and you can do somethings faster, but all the efficiency is gonna get sucked up by companies trying to drop more slop.
No you see you just need to prompt it to implement functional and working code. You're just inexperienced and holding it wrong
$200/month tool (real cost could be $1000/month), but you have to babysit it.
Out of curiosity, what is the most difficult thing about building a browser?
The very long task list.
Browsers contain several high complexity pieces each of could take a while to build on its own, and interconnect them with reasonably verbose APIs that need to be implemented or at least stubbed out for code to not crash. There is also the difficulty of matching existing implementations quirk for quirk.
I guess the complexity is on-par with operating systems, but with the added compatibility problems that in order to be useful it doesn't just have to load sites intended to be compatible with it, it has to handle sites people actually use on the internet, and those are both a moving target, and tend to use lots of high complexity features that you have to build or at least stub out before the site will even work.
Cursor CEO got grilled in HN for a good reason.
> company claims they "built a browser" from scratch
> looks inside
> completely useless and busted
30 billion dollar VS Code fork everyone. When we do start looking at these people for what they are: snake oil salesmen.
The slop laundered the FOSS Servo code into a broken mess and called it a browser. EFF right off.
This is why AI skeptics exist. We’re now at the point where you can make entirely unsubstantiated claims about AI capability, and even many folks on HN will accept it with a complete lack of discernment. The hype is out of control.
> folks on HN will accept it with a complete lack of discernment
Well, I'm a heavy LLM user, I "believe" LLM helps me a lot for some tasks, but I'm also a developer with decades of experience, so I'm not gonna claim it'll help non-programmers to build software, or whatever. They're tools, not solutions in themselves.
But even us "folks on HN" who generally keep up with where the ecosystem is going, have a limit I suppose. You need to substantiate what you're saying, and if you're saying you've managed to create a browser, better let others verify that somehow.
Take a look at this thread regarding the original claim: https://news.ycombinator.com/item?id=46624541
The top comment is indeed baseless hype without a hint of skepticism.
The second top comment is my own (skeptical) comment, with 20 points at this moment. Thanks to those 20 people, I felt compelled to write the blog-post in this submission, and try to ask a bit clearer "what is going on?", since apparently we're at least 20 people who is wondering about this.
There is also clearly a lot of other skeptical people in that submission too. Also, simonw (from that top comment) told me themselves "it's not clear that what they built even runs": https://bsky.app/profile/simonwillison.net/post/3mckgw4mxoc2...
> The top comment is indeed baseless hype without a hint of skepticism.
and he wonders why people call him a shill
accepting everything some shit company tells you as gospel is not the default position of a "researcher"
he better hope he's on the right side of history here, as otherwise he will have burnt his reputation
I certainly don’t think Simon is a shill. He’s obviously a highly talented person, who in my opinion just doesn’t exercise appropriate discernment in some cases.
Edit: Of course, this isn’t a trait unique to Simon either. Everybody has blind spots, and it’s reasonable to be excited when new tech is released. On an unrelated note, my intent is to push back against some of the people here who try to shut down skepticism. Obviously, this doesn’t describe Simon, but I’ve seen others here who try to silence skeptical voices. This comes across as highly controlling and insecure.
See comment here: https://news.ycombinator.com/item?id=46646777#46650837
I do not think you are reacting to what I said in good faith.
> he better hope he's on the right side of history here, as otherwise he will have burnt his reputation
That's something I've actually given quite a lot of thought to. My reputation and credibility matters a great deal to me. If it turns out this entire LLM thing was an over-hyped scam I'll take a very big hit to that reputation, and I'll deserve it.
(If AI rises up and tries to kill or enslave us all I'll be too busy fighting back to care.)
As usual, I was careful with my words:
> This project from Cursor is the second attempt I've seen at this now!
I used the word "attempt" very deliberately, to avoid suggesting that either of these two projects had achieved the goal.
I don't see how you can get to "baseless hype without a hint of skepticism" there unless you've already decided to take anything I say in bad faith.
Are you telling me AI bros lying about their products? No way that ever happened…
Lesson 1:
Always take any pronouncement from an AI company (heavily dependent on VC and public sentiment on AI) with a heavy grain of salt..
hype over reality
I’m building an AI startup myself and I know that world and its full of hypsters and hucksters unfortunately - also social media communication + low attention span + AI slop communication is a blight upon todays engineering culture