I think this is a great example of both points of view in the ongoing debate.
Pro-LLM coding agents: look! a working compiler built in a few hours by an agent! this is amazing!
Anti-LLM coding agents: it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.
Pro: Sure, but we can get the agent to fix that.
Anti: Can you, though? We've seen that the more complex the code base, the worse the agents do. Fixing complex issues in a compiler seems like something the agents will struggle with. Also, if they could fix it, why haven't they?
Pro: Sure, maybe now, but the next generation will fix it.
Anti: Maybe. While the last few generations have been getting better and better, we're still not seeing them deal with this kind of complexity better.
Pro: Yeah, but look at it! This is amazing! A whole compiler in just a few hours! How many millions of hours were spent getting GCC to this state? It's not fair to compare them like this!
Anti: Anthropic said they made a working compiler that could compile the Linux kernel. GCC is what we normally compile the Linux kernel with. The comparison was invited. It turned out (for whatever reason) that CCC failed to compile the Linux kernel when GCC could. Once again, the hype of AI doesn't match the reality.
Pro: but it's only been a few years since we started using LLMs, and a year or so since agents. This is only the beginning!
Anti: this is all true, and yes, this is interesting. But there are so many other questions around this tech. Let's not rush into it and mess everything up.
I'm reminded, once again, of the recent "vibe coded" OCaml fiasco[1].
The PR author had zero understanding why their entirely LLM-generated contribution was viewed so suspiciously.
The article validates a significant point: it is one thing to have passing tests and be able to produce output that resembles correctness - however it's something entirely different for that output to be good and maintainable.
however it's something entirely different for that output to be good and maintainable
People aren't prompting LLMs to write good, maintainable code though. They're assuming that because we've made a collective assumption that good, maintainable code is the goal then it must also be the goal of an LLM too. That isn't true. LLMs don't care about our goals. They are solving problems in a probabilistic way based on the content of their training data, context, and prompting. Presumably if you take all the code in the world and throw it in mixer what comes out is not our Platonic ideal of the best possible code, but actually something more like a Lovecraftian horror that happens to get the right output. This is quite positive because it shows that with better prompting+context+training we might actually be able to guide an LLM to know what good and bad looks like (based on the fact that we know). The future is looking great.
However, we also need to be aware that 'good, maintainable code' is often not what we think is the ideal output of a developer. In businesses everywhere the goal is 'whatever works right now, and to hell with maintainability'. When a business is 3 months from failing spending time to write good code that you can continue to work on in 10 years feels like wasted effort. So really, for most code that's written, it doesn't actually need to be good or maintainable. It just needs to work. And if you look at the code that a lot of businesses are running, it doesn't. LLMs are a step forward in just getting stuff to work in the first place.
If we can move to 'bug free' using AI, at the unit level, then AI is useful. Above individual units of code, like logic, architecture, security, etc things still have to come from the developer because AI can't have the context of a complete application yet. When that's ready then we can tackle 'tech debt free' because almost all tech debt lives at that higher level. I don't think we'll get there for a long time.
> People aren't prompting LLMs to write good, maintainable code though.
Then they're not using the tools correctly. LLMs are capable of producing good clean code, but they need to be carefully instructed as to how.
I recently used Gemini to build my first Android app, and I have zero experience with Kotlin or most of the libraries (but I have done many years of enterprise Java in my career). When I started I first had a long discussion with the AI about how we should set up dependency injection, Material3 UI components, model-view architecture, Firebase, logging, etc and made a big Markdown file with a detailed architecture description. Then I let the agent mode implement the plan over several steps and with a lot of tweaking along the way. I've been quite happy with the result, the app works like a charm and the code is neatly structured and easy to jump into whenever I need to make changes. Finishing a project like this in a couple of dozen hours (especially being a complete newbie to the stack) simply would not have been possible 2-3 years ago.
I just read that whole thread and I think the author made the mistake of submitting a 13k loc PR, but other than that - while he gets downvoted to hell on every comment - he's actually acting professionally and politely.
I wouldn't call this a fiasco, it reads to me more that being able to create huge amounts of code - whether the end result works well or not - breaks the traditional model of open source. Small contributions can be verified and the merrit-vs-maintenance-effort can at least be assessed somewhat more realistically.
I have no bones in the "vibe coding sucks" vs "vibe coding rocks" discussion and I reading that thread as an outsider. I cannot help but find the PR author's attitude absolutely okay while the compiler folks are very defensive. I do agree with them that submitting a huge PR request without prior discussion cannot be the way forward. But that's almost orthogonal to the question of whether AI-generated code is or is not of value.
If I were the author, I would probably take my 13k loc proof-of-concept implementation and chop it down into bite-size steps that are easy to digest, and try to get them to get integrated into the compiler successively, with being totally upfront about what the final goal is. You'd need to be ready to accept criticism and requests for change, but it should not be too hard to have your AI of choice incorporate these into your code base.
I think the main mistake of the author was not to use vibe coding, it was to dream up his own personal ideal of a huge feature, and then go ahead and single-handedly implement the whole thing without involving anyone from the actual compiler project. You cannot blame the maintainers for not being crazy about accepting such a huge blob.
This to me sounds a lot like the SpaceX conversation:
- Ohh look it can [write small function / do a small rocket hop] but it can't [ write a compiler / get to orbit]!
- Ohh look it can [write a toy compiler / get to orbit] but it can't [compile linux / be reusable]
- Ohh look it can [compile linux / get reusable orbital rocket] but it can't [build a compiler that rivals GCC / turn the rockets around fast enough]
- <Denial despite the insane rate of progress>
There's no reason to keep building this compiler just to prove this point. But I bet it would catch up real fast to GCC with a fraction of the resources if it was guided by a few compiler engineers in the loop.
We're going to see a lot of disruption come from AI assisted development.
All these people that built GCC and evolved the language did not have the end result in their training set. They invented it. They extrapolated from earlier experiences and knowledge, LLMs only ever accidentally stumble into "between unknown manifolds" when the temperature is high enough, they interpolate with noise (in so many senses). The people building GCC together did not only solve a to technical problem. They solved a social one, agreeing on what they wanted to build, for what and why. LLMs are merely copying these decisions.
There are two questions which can be asked for both. The first one is "can these tech can achieve their goals?" which is what you seem debating. The other question is "is a successful outcome of these tech desirable at all?". One is making us pollute space faster than ever, as if we did not fuck the rest enough. They other will make a few very rich people even richer and probably everyone else poorer.
You can be wrong on every step of your approximation and still be right in the aggregate. E.g. order of magnitude estimate, where every step is wrong but mistakes cancel out.
Human crews on Mars is just as far fetched as it ever was. Maybe even farther due to Starlink trying to achieve Kessler syndrome by 2050.
> This to me sounds a lot like the SpaceX conversation
The problem is that it is absolutely indiscernible from the Theranos conversation as well…
If Anthropic stopped making lies about the current capability of their models (like “it compiles the Linux kernel” here, but it's far from the first time they do that), maybe neutral people would give them the benefit of the doubt.
For one grifter that happen to succeed at delivering his grandiose promises (Elon), how many grifters will fail?
As a pro, my argument is "it's good enough now to make me incredibly productive, and it's only going to keep getting better because of advancements in compute".
I don't think this is how pro and anti conversation goes.
I think the pro would tell you that if GCC developers could leverage Opus 4.6, they'd be more productive.
The anti would tell you that it doesn't help with productivity, it makes us less versed in the code base.
I think the CCC project was just a demonstration on what Opus can do now autonomously. 99.9% of software projects out there aren't building something as complex as a Linux compiler.
And not to mention that a C compiler is something we have literally 50 years worth of code for. I still seriously doubt the ability of LLMs to tackle truly new problems.
It seems that the cause of the difference in opinion is that the anti camp is looking at the current state while the pro camp looking at the slope and projecting it into the future.
As someone who leans pro in this debate, I don't think I would make that statement. I would say the results are exactly as we expect.
Also, a highly verifiable task like this is well suited to LLMs, and I expect within the next ~2 years AI tools will produce a better compiler than gcc.
> Pro-LLM coding agents: look! a working compiler built in a few hours by an agent! this is amazing!
> Anti-LLM coding agents: it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.
Pro-LLM: Read the freaking article, it's not that long. The compiler made a mistake in an area where only two compilers exists that are up to the task: Linux Kernel.
Me: Top 0.02%[1] human-level intelligence? Sure. But we aren't there yet.
[1] There are around 8k programming languages that are used (or were used) in practice (that is, they were deemed better than existing ones in some aspects) and there are around 50 million programmers.
Pretty much. It's missing a tiny detail though. One side is demanding we keep giving hundreds of billions to them and at the same time promising the other side's unemployment.
And no-one ever stops and thinks about what it means to give up so much control.
Maybe one of those companies will come out on top. The others produce garbage in comparison. Capital loves a single throat to choke and doesn't gently pluralise. So of course you buy the best service. And it really can generate any code, get it working, bug free. People unlearn coding on this level. And some day, poof, Microsoft is coming around and having some tiny problem that it can generate a working Office clone. Or whatever, it's just an example.
This technology will never be used to set anyone free. Never.
The entity that owns the generator owns the effective means of production, even if everyone else can type prompts.
The same technology could, in a different political and economic universe, widen human autonomy. But that universe would need strong commons, enforced interoperability, and a cultural refusal to outsource understanding.
And why is this different from abstractions that came before? There are people out there understanding what compilers are doing. They understand the model from top to bottom. Tools like compilers extended human agency while preserving a path to mastery. AI code generation offers capability while dissolving the ladder behind you.
We are not merely abstracting labor. We are abstracting comprehension itself. And once comprehension becomes optional, it rapidly becomes rare. Once it becomes rare, it becomes political. And once it becomes political, it will not be distributed generously.
Nah bro it makes them productive. Get with the program. Amazing . Fantastic. Of course it resonates with idiots because they can't think beyond the vicinity of their own greed. We are doomed , noone gives two cents. Idiocracy is here and it's not Costco.
What an amazing tech. And look, the CEOs are promising us a good future! Maybe we can cool the datacenters with Brawndo. Let me ask chat if that is a good idea.
I don't feel that I see this anywhere but if so, I guess I'm in a third camp.
I am "pro" in the sense that I believe that LLM's are making traditional programming obsolete. In fact there isn't any doubt in my mind.
However, I am "anti" in the sense that I am not excited or happy about it at all! And I certainly don't encourage anyone to throw money at accelerating that process.
> I believe that LLM's are making traditional programming obsolete. In fact there isn't any doubt in my mind.
Is this what AI psychosis looks like? How can anyone that is a half decent programmer actually believe that English + non-deterministic code generator will replace "traditional" programming?
> One side is demanding we keep giving hundreds of billions to them and at the same time promising the other side's unemployment.
That's a valid take. The problem is that there are, at this time, so many valid takes that it's hard to determine which are more valid/accurate than the other.
FWIW, I think this is more insightful than most of the takes I've seen, which basically amount to "side-1: we're moving to a higher level of abstraction" and "side-2: it's not higher abstraction, just less deterministic codegen".
I'm on the "higher level of abstraction" side, but that seems to be very much at odds with however Anthropic is defining it. Abstraction is supposed to give you better high-level clarity at the expense of low-level detail. These $20,000 burning, Gas Town-style orchestration matrices do anything but simplify high level concerns. In fact, they seem committed building extremely complex, low-level harnesses of testing and validation and looping cycles around agents upon agents to avoid actually trying to deal with whatever specific problem they are trying to solve.
How do you solve a problem you refuse to define explicitly? We end up with these Goodhart's Law solutions: they hit all of the required goals and declare victory, but completely fail in every reasonable metric that matters. Which I guess is an approach you make when you are selling agents by the token, but I don't see why anyone else is enamored with this approach.
First, remember when we had LLMs run optimisation passes last year? Alphaevolve doing square packing, and optimising ML kernels? The "anti" crowd was like "well, of course it can automatically optimise some code, that's easy". And things like "wake me up when it does hard tasks". Now, suddenly when they do hard tasks, we're back at "haha, but it's unoptimised and slow, laaame".
Second, if you could take 100 juniors, 100 mid level devs and 100 senior devs, lock them in a room for 2 weeks, how many working solutions that could boot up linux in 2 different arches, and almost boot in the third arch would you get? And could you have the same devs now do it in zig?
The thing that keeps coming up is that the "anti" crowd is fighting their own deamons, and have kinda lost the plot along the way. Every "debate" is about promisses, CEOs, billions, and so on. Meanwhile, at every step of the way these things become better and better. And incredibly useful in the right hands. I find it's best to just ignore the identity folks, and keep on being amazed at the progress. The haters will just find the next goalpost and the next fight with invisible entities. To paraphrase - those who can, do, those who can't, find things to nitpick.
You're heavily implying that because it can do this task, it can do any task at this difficulty or lower. Wrong. This thing isn't a human at the level of writing a compiler, and shouldn't be compared to one
Codex frustratingly failed at refactoring my tests for me the other day, despite me trying many, many prompts of increasing specificity. A task a junior could've done
Am I saying "haha it couldn't do a junior level task so therefor anything harder is out of reach?" No, of course not. Again, it's not a human. The comparison is irrelevant
Calculators are superhuman at arithmetic. Not much else, though. I predict this will be superhuman at some tasks (already is) and we'll be better at others
First off Alpha Evolve isn't an LLM. No more than a human is a kidney.
Second depends. If you told them to pretrain for writing C compiler however long it takes, I could see a smaller team doing it in a week or two. Keep in mind LLMs pretrain on all OSS including GCC.
> Meanwhile, at every step of the way these things become better and better.
Will they? Or do they just ingest more data and compute?[1] Again, time will tell. But to me this seems more like speed-running into an Idiocracy scenario than a revolution.
I think this will turn out another driverless car situation where last 1% needs 99% of the time. And while it might happen eventually it's going to take extremely long time.
[1] Because we don't have much more computing jumps left, nor will future data be as clean as now.
> AlphaEvolve, an evolutionary coding agent powered by large language models for general-purpose algorithm discovery and optimization. AlphaEvolve pairs the creative problem-solving capabilities of our Gemini models with automated evaluators that verify answers, and uses an evolutionary framework to improve upon the most promising ideas.
> AlphaEvolve leverages an ensemble of state-of-the-art large language models: our fastest and most efficient model, Gemini Flash, maximizes the breadth of ideas explored, while our most powerful model, Gemini Pro, provides critical depth with insightful suggestions. Together, these models propose computer programs that implement algorithmic solutions as code.
It’s more like a concept car vs a production line model. The capabilities it has were fine tuned for a specific scenario and are not yet available to the general public.
I think LLMs the technology is very cool and l’m frankly amazed at what it can do. What I’m ‘anti’ about is pushing the entire economy all in on LLM tech. The accelerationist take of ‘just keep going as fast as possible and it will work out, trust me bro’ is the most unhinged dangerous shit I’ve ever heard and unfortunately seems to be the default worldview of those in charge of the money. I’m not sure where all the AI tools will end up, but I am willing to bet big that the average person is not going to be better off 10 years from now. The direction the world is going scares the shit out of me and the usages of AI by bad actors is not helping assuage that fear.
Honestly? I think if we as a society could trust our leaders (government and industry) to not be total dirtbags the resistance to AI would be much lower.
In my experience, it is often the other way around. Enthusiasts are tasked with trying to open minds that seem very closed on the subject. Most serious users of these tools recognize the shortcomings and also can make well-educated guesses on the short term future. It's the anti crowd who get hellbent on this ridiculously unfounded "robots are just parrots and can't ever replace real programmers" shtick.
Maybe if AI evangelists would stop lying about what AI can do then people would hate it less.
But lying and hype is baked into the DNA of AI booster culture. At this point it can be safely assumed anything short of right-here-right-now proof is pure unfettered horseshit when coming from anyone and everyone promoting the value of AI.
Something that bothers me here is that Anthropic claimed in their blog post that the Linux kernel could boot on x86 - is this not actually true then? They just made that part up?
It seemed pretty unambiguous to me from the blog post that they were saying the kernel could boot on all three arch's, but clearly that's not true unless they did some serious hand-waving with kernel config options. Looking closer in the repo they only show a claimed Linux boot for RISC-V, so...
> Someone got it working on Compiler Explorer and remarked that the assembly output “reminds me of the quality of an undergraduate’s compiler assignment”. Which, to be fair, is both harsh and not entirely wrong when you look at the register spilling patterns.
This is what I've noticed about most LLM generated code, its about the quality of an undergrad, and I think there's a good reason for this - most of the code its been trained on is of undergrad quality. Stack overflow questions, a lot of undergrad open source projects, there are some professional quality open source projects (eg SqlLite) but they are outweighed by the mass of other code. Also things like Sqllite don't compare to things like Oracle or Sql Server which are proprietary.
> Where CCC Succeeds
Correctness: Compiled every C file in the kernel (0 errors)
I don't think that follows. It's entirely possible that the compiler produced garbage assembly for a bunch of the kernel code that would make it totally not work even if it did link. (The SQLite code passing its self tests doesn't convince me otherwise, because the Linux kernel uses way more advanced/low-level/uncommon features than SQLite does.)
I agree. Lack of errors is not an indicator of correct compilation. Piping something to /dev/null won't provide any errors either & so there is nothing we can conclude from it. The fact that it compiles SQLite correctly does provide some evidence that their compiler at least implements enough of the C semantics involved in SQLite.
It's really cool to see how slow unoptimised C is. You get so used to seeing C easily beat any other language in performance that you assume it's really just intrinsic to the language. The benchmark shows a SQLite3 unoptimised build 12x slower for CCC, 20x for optimised build. That's enormous!
I'm not dissing CCC here, rather I'm impressed with how much speed is squeezed out by GCC out of what is assumed to be already an intrinsically fast language.
I mean you can always make things slower. There are lots of non-optimizing or low optimizing compilers that are _MUCH_ faster than this. TCC is probably the most famous example, but hardly the only alternative C compiler with performance somewhere between -O1 and -O2 in GCC. By comparison as I understand it, CCC has performance worse than -O0 which is honestly a bit surprising to me, since -O0 should not be a hard to achieve target. As I understand it, at -O0 C is basically just macro expanding into assembly with a bit of order of operations thrown in. I don't believe it even does register allocation.
The speed of C is still largely intrinsic to the language.
The primatives are directly related to the actual silicon. A function call is actually going to turn into a call instruction (or get inlined). The order of bytes in your struct are how they exist in memory, etc. A pointer being dereferenced is a load/store.
The converse holds as well. Interpreted languages are slow because this association with the hardware isn't the case.
When you have a poopy compiler that does lots of register shuffling then you loose this association.
Specifically the constant spilling with those specific functions functions that were the 1000x slowdown, makes the C code look a lot more like Python code (where every variable is several dereference away).
Vibe coding is entertainment. Nothing wrong about entertainment, but when totally clueless people connect to their bank account, or control their devices with vibe coded programs, someone will be entertained for sure.
Large language models and small language models are very strong for solving problems, when the problem is narrow enough.
They are above human average for solving almost any narrow problem, independent of time, but when time is a factor, let's say less than a minute, they are better than experts.
An OS kernel is exactly a problem, that everyone prefers to be solved as correct as possible, even if arriving at the solution takes longer.
The author mentions stability and correctness of CCC, these are properties of Rust and not of vibe coding. Still impressive feat of claude code though.
Ironically, if they populated the repo first with objects, functions and methods with just todo! bodies, be sure the architecture compiles and it is sane, and only then let the agent fill the bodies with implementations most features would work correctly.
I am writing a program to do exactly that for Rust, but even then, how the user/programmer would know beforehand how many architectural details to specify using todo!, to be sure that the problem the agent tries to solve is narrow enough? That's impossible to know! If the problem is not narrow enough, then the implementation is gonna be a mess.
I think one of the issue is that the register allocation algorithm -- alongside the SSA generation -- is not enough.
Generally after the SSA pass, you convert all of them into register transfer language (RTL) and then do register allocation pass. And for GCC's case it is even more extreme -- You have GIMPLE in the middle that does more aggressive optimization, similar to rustc's MIR. CCC doesn't have all that, and for register allocation you can try to do simple linear scan just as the usual JIT compiler would do though (and from my understanding, something CCC should do at a simple cost), but most of the "hard part" of compiler today is actually optimization -- frontend is mostly a solved problem if you accept some hacks, unlike me who is still looking for an elegant academic solution to the typedef problem.
Note that the LLVM approach to IR is probably a bit more sane than the GCC one. GCC has ~3 completely different IRs at different stages in the pipeline, while LLVM mostly has only canonical IR form for passing data around through the optimization passes (and individual passes will sometimes make their own temporary IR locally to make a specific analysis easier).
If stevefan1999's referring to a nasty frontend issue, it might be due to the fact that a name introduced by a typedef and an identical identifier can mingle in the same scope, which makes parsing pretty nasty – e.g. (example from source at end):
typedef int AA;
void foo()
{
AA AA; /\* OK - define variable AA of type AA */
int BB = AA * 2; /\* OK - AA is just a variable name here \*/
}
void bar()
{
int aa = sizeof(AA), AA, bb = sizeof(AA);
}
In your example bar is actually trivial, since both the type AA and the variable AA are ints both aa and bb ends up as 4 no matter how you parse it. AA has to be typedef'd to something other than int.
Lexical parsing C is simple, except that typedef's technically make it non-context-free. See https://en.wikipedia.org/wiki/Lexer_hack When handwriting a parser, it's no big deal, but it's often a stumbling block for parser generators or other formal approaches. Though, I recall there's a PEG-based parser for C99/C11 floating around that was supposed to be compliant. But I'm having trouble finding a link, and maybe it was using something like LPeg, which has features beyond pure PEG that help with context dependent parsing.
Clang's solution (presented at the end of the Wikipedia article you linked) seem much better - just use a single lexical token for both types and variables.
Then, only the parser needs to be context sensitive, for the A* B; construct which is either a no-op multiplication (if A is a variable) or a variable declaration of a pointer type (if A is a type)
The 158,000x slowdown on SQLite is the number that matters here, not whether it can parse C correctly. Parsing is the solved problem — every CS undergrad writes a recursive descent parser. The interesting (and hard) parts of a compiler are register allocation, instruction selection, and optimization passes, and those are exactly where this falls apart.
That said, I think the framing of "CCC vs GCC" is wrong. GCC has had thousands of engineer-years poured into it. The actually impressive thing is that an LLM produced a compiler at all that handles enough of C to compile non-trivial programs. Even a terrible one. Five years ago that would've been unthinkable.
The goalpost everyone should be watching isn't "can it match GCC" — it's whether the next iteration closes that 158,000x gap to, say, 100x. If it does, that tells you something real about the trajectory.
The part of the article about the 158,000x slowdown doesn't really make sense to me.
It says that a nested query does a large number of iterations through the SQLite bytecode evaluator. And it claims that each iteration is 4x slower, with an additional 2-3x penalty from "cache pressure". (There seems to be no explanation of where those numbers came from. Given that the blog post is largely AI-generated, I don't know whether I can trust them not to be hallucinated.)
But making each iteration 12x slower should only make the whole program 12x slower, not 158,000x slower.
Such a huge slowdown strongly suggests that CCC's generated code is doing something asymptotically slower than GCC's generated code, which in turn suggests a miscompilation.
I notice that the test script doesn't seem to perform any kind of correctness testing on the compiled code, other than not crashing. I would find this much more interesting if it tried to run SQLite's extensive test suite.
"Ironically, among the four stages, the compiler (translation to assembly) is the most approachable one for an AI to build. It is mostly about pattern matching and rule application: take C constructs and map them to assembly patterns.
The assembler is harder than it looks. It needs to know the exact binary encoding of every instruction for the target architecture. x86-64 alone has thousands of instruction variants with complex encoding rules (REX prefixes, ModR/M bytes, SIB bytes, displacement sizes). Getting even one bit wrong means the CPU will do something completely unexpected.
The linker is arguably the hardest. It has to handle relocations, symbol resolution across multiple object files, different section types, position-independent code, thread-local storage, dynamic linking and format-specific details of ELF binaries. The Linux kernel linker script alone is hundreds of lines of layout directives that the linker must get exactly right."
I worked on compilers, assemblers and linkers and this is almost exactly backwards
Exactly this. Linker is threading given blocks together with fixups for position-independent code - this can be called rule application. Assembler is pattern matching.
This explanation confused me too:
Each individual iteration: around 4x slower (register spilling)
Cache pressure: around 2-3x additional penalty (instructions do not fit in L1/L2 cache)
Combined over a billion iterations: 158,000x total slowdown
If each iteration is X percent slower, then a billion iterations will also be X percent slower. I wonder what is actually going on.
Claude one-shot a basic x86 assembler + linker for me. Missing lots of instructions, yes, but that is a matter of filling in tables of data mechanically.
Supporting linker scripts is marginally harder, but having manually written compilers before, my experience is the exact opposite of yours.
CCC was and is a marketing stunt for a new model launch. Impressive, but still suffers from the same 80:20 rule. These 20% are optimizations, and we all know where the devel in “let me write my own language”.
As a neutral observation: it’s remarkable how quickly we as humans adjust expectations.
Imagine five years ago saying that you could have a general purpose AI write a c compiler that can handle the Linux kernel, by itself, from scratch for $20k by writing a simple English prompt.
That would have been completely unbelievable! Absurd! No one would take it seriously.
Can someone explain to me, what’s the big deal about this?
The AI model was trained on lots of code and spit out sonething similar than gcc. Why is this revolutionary?
It's a marketing gimmick. Cursor did the same recently when they claimed to have created a working browsers but it was basically just a bunch of open source software glued together into something barely functional for a PR stunt.
yeah its pretty amazing it can do this. The problem is the gaslighting by the companies making this - "see we can create compilers, we won't need programmers", programmers - "this is crap, are you insane?", classic gas lighting.
They should have gone one step further and also optimized for query performance (without editing the source code).
I have cough AI generated an x86 to x86 compiler (takes x86 in, replaces arbitrary instructions with functions and spits x86 out), at first it was horrible, but letting it work for 2 more days it was actually close to only 50% to 60% slowdown when every memory read instruction was replaced.
Now that's when people should get scared. But it's also reasonable to assume that CCC will look closer to GCC at that point, maybe influenced by other compilers as well. Tell it to write an arm compiler and it will never succeed (probably, maybe can use an intermeriadry and shove it into LLVM and it'll work, but at that point it is no longer a "C" compiler).
> CCC compiled every single C source file in the Linux 6.9 kernel without a single compiler error (0 errors, 96 warnings). This is genuinely impressive for a compiler built entirely by an AI.
It would be interesting to compare the source code used by CCC to other projects. I have a slight suspicion that CCC stole a lot of code from other projects.
The prospect of going the last mile to fix the remaining problems reminds me of the old joke:
"The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time."
Seeing that Claude can code a compiler doesn't help anyone if it's not coded efficiently, because getting it to be efficient is the hardest part, and it will be interesting seeing how long it will take to make it efficient. No one is gonna use some compiler that makes binaries run 700x longer.
I'm surprised that this wasn't possible before with just a bigger context size.
It seems like if Anthropic released a super cool and useful _free_ utility (like a compiler, for example) that was better than existing counterparts or solved a problem that hadn’t been solved before[0] and just casually said “Here is this awesome thing that you should use every day. By the way our language model made this.” it would be incredible advertising for them.
But they instead made a blog post about how it would cost you twenty thousand dollars to recreate a piece of software that they do not, with a straight face, actually recommend that you use in any capacity beyond as a toy.
[0] I am categorically not talking about anything AI related or anything that is directly a part of their sales funnel. I am talking about a piece of software that just efficiently does something useful. GCC is an example, Everything by voidtools is an example, Wireshark is an example, etc. Claude is not an example.
You, know, it sure does add some additional perspective to the original Anthropic marketing materia... ahem, I mean article, to learn that the CCC compiled runtime for SQLite could potentially run up to 158,000 times slower than a GCC compiled one...
Nevertheless, the victories continue to be closer to home.
Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?
I am pretty sure everybody agrees that this result is somewhere between slop code that barely works and the pinnacle of AI-assisted compiler technology. But discussions should not be held from the extreme points. Instead, I am looking for a realistic estimation from the HN community about where to place these results in a human context. Since I have no experience with compilers, I would welcome any of your opinions.
> Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?
I offered to do it, but without a deadline (I work f/time for money), only a cost estimation based on how many hours I think it should take me: https://news.ycombinator.com/item?id=46909310
The poster I responded to had claimed that it was not possible to produce a compiler capable of compiling a bootable Linux kernel within the $20k cost, nor for double that ($40k).
I offered to do it for $40k, but no takers. I initially offered to do it for $20k, but the poster kept evading, so I settled on asking for the amount he offered.
Did Anthropic release the scaffolding, harnesses, prompts, etc. they used to build their compiler? That would be an even cooler flex to be able to go and say "Here, if you still doubt, run this and build your own! And show us what else you can build using these techniques."
But gcc is part of it's training data so of course it spit out an autocomplete of a working compiler
/s
This is actually a nice case study in why agentic LLMs do kind of think. It's by no means the same code or compiler. It had to figure out lots and lots of problems along the way to get to the point of tests passing.
> But gcc is part of it's training data so of course it spit out an autocomplete of a working compiler /s
Why the sarcasm tag? It is almost certainly trained on several compiler codebases, plus probably dozens of small "toy" C compilers created as hobby / school projects.
It's an interesting benchmark not because the LLM did something novel, but because it evidently stayed focused and maintained consistency long enough for a project of this complexity.
I think this is a great example of both points of view in the ongoing debate.
Pro-LLM coding agents: look! a working compiler built in a few hours by an agent! this is amazing!
Anti-LLM coding agents: it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.
Pro: Sure, but we can get the agent to fix that.
Anti: Can you, though? We've seen that the more complex the code base, the worse the agents do. Fixing complex issues in a compiler seems like something the agents will struggle with. Also, if they could fix it, why haven't they?
Pro: Sure, maybe now, but the next generation will fix it.
Anti: Maybe. While the last few generations have been getting better and better, we're still not seeing them deal with this kind of complexity better.
Pro: Yeah, but look at it! This is amazing! A whole compiler in just a few hours! How many millions of hours were spent getting GCC to this state? It's not fair to compare them like this!
Anti: Anthropic said they made a working compiler that could compile the Linux kernel. GCC is what we normally compile the Linux kernel with. The comparison was invited. It turned out (for whatever reason) that CCC failed to compile the Linux kernel when GCC could. Once again, the hype of AI doesn't match the reality.
Pro: but it's only been a few years since we started using LLMs, and a year or so since agents. This is only the beginning!
Anti: this is all true, and yes, this is interesting. But there are so many other questions around this tech. Let's not rush into it and mess everything up.
I'm reminded, once again, of the recent "vibe coded" OCaml fiasco[1].
The PR author had zero understanding why their entirely LLM-generated contribution was viewed so suspiciously.
The article validates a significant point: it is one thing to have passing tests and be able to produce output that resembles correctness - however it's something entirely different for that output to be good and maintainable.
[1] https://github.com/ocaml/ocaml/pull/14369
however it's something entirely different for that output to be good and maintainable
People aren't prompting LLMs to write good, maintainable code though. They're assuming that because we've made a collective assumption that good, maintainable code is the goal then it must also be the goal of an LLM too. That isn't true. LLMs don't care about our goals. They are solving problems in a probabilistic way based on the content of their training data, context, and prompting. Presumably if you take all the code in the world and throw it in mixer what comes out is not our Platonic ideal of the best possible code, but actually something more like a Lovecraftian horror that happens to get the right output. This is quite positive because it shows that with better prompting+context+training we might actually be able to guide an LLM to know what good and bad looks like (based on the fact that we know). The future is looking great.
However, we also need to be aware that 'good, maintainable code' is often not what we think is the ideal output of a developer. In businesses everywhere the goal is 'whatever works right now, and to hell with maintainability'. When a business is 3 months from failing spending time to write good code that you can continue to work on in 10 years feels like wasted effort. So really, for most code that's written, it doesn't actually need to be good or maintainable. It just needs to work. And if you look at the code that a lot of businesses are running, it doesn't. LLMs are a step forward in just getting stuff to work in the first place.
If we can move to 'bug free' using AI, at the unit level, then AI is useful. Above individual units of code, like logic, architecture, security, etc things still have to come from the developer because AI can't have the context of a complete application yet. When that's ready then we can tackle 'tech debt free' because almost all tech debt lives at that higher level. I don't think we'll get there for a long time.
> People aren't prompting LLMs to write good, maintainable code though.
Then they're not using the tools correctly. LLMs are capable of producing good clean code, but they need to be carefully instructed as to how.
I recently used Gemini to build my first Android app, and I have zero experience with Kotlin or most of the libraries (but I have done many years of enterprise Java in my career). When I started I first had a long discussion with the AI about how we should set up dependency injection, Material3 UI components, model-view architecture, Firebase, logging, etc and made a big Markdown file with a detailed architecture description. Then I let the agent mode implement the plan over several steps and with a lot of tweaking along the way. I've been quite happy with the result, the app works like a charm and the code is neatly structured and easy to jump into whenever I need to make changes. Finishing a project like this in a couple of dozen hours (especially being a complete newbie to the stack) simply would not have been possible 2-3 years ago.
Damn... "AI has a very deep understanding of how this code works. Please challenge me on this." this person is something else. Just... wow.
The Ai legal analysis seemed to be the nail in the coffin.
Adding Ai generated comments are IMHO some of the most rude uses of Ai.
I just read that whole thread and I think the author made the mistake of submitting a 13k loc PR, but other than that - while he gets downvoted to hell on every comment - he's actually acting professionally and politely.
I wouldn't call this a fiasco, it reads to me more that being able to create huge amounts of code - whether the end result works well or not - breaks the traditional model of open source. Small contributions can be verified and the merrit-vs-maintenance-effort can at least be assessed somewhat more realistically.
I have no bones in the "vibe coding sucks" vs "vibe coding rocks" discussion and I reading that thread as an outsider. I cannot help but find the PR author's attitude absolutely okay while the compiler folks are very defensive. I do agree with them that submitting a huge PR request without prior discussion cannot be the way forward. But that's almost orthogonal to the question of whether AI-generated code is or is not of value.
If I were the author, I would probably take my 13k loc proof-of-concept implementation and chop it down into bite-size steps that are easy to digest, and try to get them to get integrated into the compiler successively, with being totally upfront about what the final goal is. You'd need to be ready to accept criticism and requests for change, but it should not be too hard to have your AI of choice incorporate these into your code base.
I think the main mistake of the author was not to use vibe coding, it was to dream up his own personal ideal of a huge feature, and then go ahead and single-handedly implement the whole thing without involving anyone from the actual compiler project. You cannot blame the maintainers for not being crazy about accepting such a huge blob.
This to me sounds a lot like the SpaceX conversation:
- Ohh look it can [write small function / do a small rocket hop] but it can't [ write a compiler / get to orbit]!
- Ohh look it can [write a toy compiler / get to orbit] but it can't [compile linux / be reusable]
- Ohh look it can [compile linux / get reusable orbital rocket] but it can't [build a compiler that rivals GCC / turn the rockets around fast enough]
- <Denial despite the insane rate of progress>
There's no reason to keep building this compiler just to prove this point. But I bet it would catch up real fast to GCC with a fraction of the resources if it was guided by a few compiler engineers in the loop.
We're going to see a lot of disruption come from AI assisted development.
All these people that built GCC and evolved the language did not have the end result in their training set. They invented it. They extrapolated from earlier experiences and knowledge, LLMs only ever accidentally stumble into "between unknown manifolds" when the temperature is high enough, they interpolate with noise (in so many senses). The people building GCC together did not only solve a to technical problem. They solved a social one, agreeing on what they wanted to build, for what and why. LLMs are merely copying these decisions.
There are two questions which can be asked for both. The first one is "can these tech can achieve their goals?" which is what you seem debating. The other question is "is a successful outcome of these tech desirable at all?". One is making us pollute space faster than ever, as if we did not fuck the rest enough. They other will make a few very rich people even richer and probably everyone else poorer.
Interesting that people call this "progress" :)
> the insane rate of progress
Yeah but the speed of progress can never catch the speed of a moving goalpost!
What about the hype? If you claim your LLM generated compiler is functionally on par with GCC I’d expect it to match your claim.
I still won’t use it while it also matches all the non-functional requirements but you’re free to go and recompile all the software you use with it.
> Yeah but the speed of progress can never catch the speed of a moving goalpost!
How do you like those coast-to-coast self drives since the end of 2017?
You can be wrong on every step of your approximation and still be right in the aggregate. E.g. order of magnitude estimate, where every step is wrong but mistakes cancel out.
Human crews on Mars is just as far fetched as it ever was. Maybe even farther due to Starlink trying to achieve Kessler syndrome by 2050.
> This to me sounds a lot like the SpaceX conversation
The problem is that it is absolutely indiscernible from the Theranos conversation as well…
If Anthropic stopped making lies about the current capability of their models (like “it compiles the Linux kernel” here, but it's far from the first time they do that), maybe neutral people would give them the benefit of the doubt.
For one grifter that happen to succeed at delivering his grandiose promises (Elon), how many grifters will fail?
> Pro: Sure, maybe now, but the next generation will fix it.
Do we need a c2 wiki page for "sufficiently smart LLM" like we do for https://wiki.c2.com/?SufficientlySmartCompiler ?
Exactly. This flawed argument by which everything will be fixed by future models drives me crazy every time.
Just a couple more trillion and 6 more months!
I read a Youtube comment recently on pro AI video, it was
"The source code of gcc is available online"
As an Anti, my argument is "if AI will good in future, then come back in the future"
As a pro, my argument is "it's good enough now to make me incredibly productive, and it's only going to keep getting better because of advancements in compute".
I don't think this is how pro and anti conversation goes.
I think the pro would tell you that if GCC developers could leverage Opus 4.6, they'd be more productive.
The anti would tell you that it doesn't help with productivity, it makes us less versed in the code base.
I think the CCC project was just a demonstration on what Opus can do now autonomously. 99.9% of software projects out there aren't building something as complex as a Linux compiler.
And not to mention that a C compiler is something we have literally 50 years worth of code for. I still seriously doubt the ability of LLMs to tackle truly new problems.
It seems that the cause of the difference in opinion is that the anti camp is looking at the current state while the pro camp looking at the slope and projecting it into the future.
Two completely valid perspectives.
Unless you need a correctly compiled Linux kernel. In that case one gets exhausting real quick.
> It's not fair to compare them like this!
As someone who leans pro in this debate, I don't think I would make that statement. I would say the results are exactly as we expect.
Also, a highly verifiable task like this is well suited to LLMs, and I expect within the next ~2 years AI tools will produce a better compiler than gcc.
Don't forget that gcc is in the training set.
That's what always puts me off: when AI replaces artists, SO and FOSS projects, it can only feed into itself and deteriorate..
The same two years as in "full self driving available in 2 years"?
Right.
These are different technologies with different rates of demonstrated growth. They have very little to do with each other.
Well let's check again in two years then.
> Pro-LLM coding agents: look! a working compiler built in a few hours by an agent! this is amazing!
> Anti-LLM coding agents: it's not a working compiler, though. And it doesn't matter how few hours it took, because it doesn't work. It's useless.
Pro-LLM: Read the freaking article, it's not that long. The compiler made a mistake in an area where only two compilers exists that are up to the task: Linux Kernel.
Anti-LLM: isn’t all this intelligence supposed to give us something better than what we already have?
Me: Top 0.02%[1] human-level intelligence? Sure. But we aren't there yet.
[1] There are around 8k programming languages that are used (or were used) in practice (that is, they were deemed better than existing ones in some aspects) and there are around 50 million programmers.
Pretty much. It's missing a tiny detail though. One side is demanding we keep giving hundreds of billions to them and at the same time promising the other side's unemployment.
And no-one ever stops and thinks about what it means to give up so much control.
Maybe one of those companies will come out on top. The others produce garbage in comparison. Capital loves a single throat to choke and doesn't gently pluralise. So of course you buy the best service. And it really can generate any code, get it working, bug free. People unlearn coding on this level. And some day, poof, Microsoft is coming around and having some tiny problem that it can generate a working Office clone. Or whatever, it's just an example.
This technology will never be used to set anyone free. Never.
The entity that owns the generator owns the effective means of production, even if everyone else can type prompts.
The same technology could, in a different political and economic universe, widen human autonomy. But that universe would need strong commons, enforced interoperability, and a cultural refusal to outsource understanding.
And why is this different from abstractions that came before? There are people out there understanding what compilers are doing. They understand the model from top to bottom. Tools like compilers extended human agency while preserving a path to mastery. AI code generation offers capability while dissolving the ladder behind you.
We are not merely abstracting labor. We are abstracting comprehension itself. And once comprehension becomes optional, it rapidly becomes rare. Once it becomes rare, it becomes political. And once it becomes political, it will not be distributed generously.
Nah bro it makes them productive. Get with the program. Amazing . Fantastic. Of course it resonates with idiots because they can't think beyond the vicinity of their own greed. We are doomed , noone gives two cents. Idiocracy is here and it's not Costco.
Sorry! Of course.
What an amazing tech. And look, the CEOs are promising us a good future! Maybe we can cool the datacenters with Brawndo. Let me ask chat if that is a good idea.
I don't feel that I see this anywhere but if so, I guess I'm in a third camp.
I am "pro" in the sense that I believe that LLM's are making traditional programming obsolete. In fact there isn't any doubt in my mind.
However, I am "anti" in the sense that I am not excited or happy about it at all! And I certainly don't encourage anyone to throw money at accelerating that process.
> I believe that LLM's are making traditional programming obsolete. In fact there isn't any doubt in my mind.
Is this what AI psychosis looks like? How can anyone that is a half decent programmer actually believe that English + non-deterministic code generator will replace "traditional" programming?
> One side is demanding we keep giving hundreds of billions to them and at the same time promising the other side's unemployment.
That's a valid take. The problem is that there are, at this time, so many valid takes that it's hard to determine which are more valid/accurate than the other.
FWIW, I think this is more insightful than most of the takes I've seen, which basically amount to "side-1: we're moving to a higher level of abstraction" and "side-2: it's not higher abstraction, just less deterministic codegen".
I'm on the "higher level of abstraction" side, but that seems to be very much at odds with however Anthropic is defining it. Abstraction is supposed to give you better high-level clarity at the expense of low-level detail. These $20,000 burning, Gas Town-style orchestration matrices do anything but simplify high level concerns. In fact, they seem committed building extremely complex, low-level harnesses of testing and validation and looping cycles around agents upon agents to avoid actually trying to deal with whatever specific problem they are trying to solve.
How do you solve a problem you refuse to define explicitly? We end up with these Goodhart's Law solutions: they hit all of the required goals and declare victory, but completely fail in every reasonable metric that matters. Which I guess is an approach you make when you are selling agents by the token, but I don't see why anyone else is enamored with this approach.
You’re copping downvotes for this, but you’re not wrong.
“It will get better, and then we will use it to make many of you unemployed”
Colour-me-shocked that swathes of this industry might have an issue with that.
Two thoughts here:
First, remember when we had LLMs run optimisation passes last year? Alphaevolve doing square packing, and optimising ML kernels? The "anti" crowd was like "well, of course it can automatically optimise some code, that's easy". And things like "wake me up when it does hard tasks". Now, suddenly when they do hard tasks, we're back at "haha, but it's unoptimised and slow, laaame".
Second, if you could take 100 juniors, 100 mid level devs and 100 senior devs, lock them in a room for 2 weeks, how many working solutions that could boot up linux in 2 different arches, and almost boot in the third arch would you get? And could you have the same devs now do it in zig?
The thing that keeps coming up is that the "anti" crowd is fighting their own deamons, and have kinda lost the plot along the way. Every "debate" is about promisses, CEOs, billions, and so on. Meanwhile, at every step of the way these things become better and better. And incredibly useful in the right hands. I find it's best to just ignore the identity folks, and keep on being amazed at the progress. The haters will just find the next goalpost and the next fight with invisible entities. To paraphrase - those who can, do, those who can't, find things to nitpick.
You're heavily implying that because it can do this task, it can do any task at this difficulty or lower. Wrong. This thing isn't a human at the level of writing a compiler, and shouldn't be compared to one
Codex frustratingly failed at refactoring my tests for me the other day, despite me trying many, many prompts of increasing specificity. A task a junior could've done
Am I saying "haha it couldn't do a junior level task so therefor anything harder is out of reach?" No, of course not. Again, it's not a human. The comparison is irrelevant
Calculators are superhuman at arithmetic. Not much else, though. I predict this will be superhuman at some tasks (already is) and we'll be better at others
First off Alpha Evolve isn't an LLM. No more than a human is a kidney.
Second depends. If you told them to pretrain for writing C compiler however long it takes, I could see a smaller team doing it in a week or two. Keep in mind LLMs pretrain on all OSS including GCC.
> Meanwhile, at every step of the way these things become better and better.
Will they? Or do they just ingest more data and compute?[1] Again, time will tell. But to me this seems more like speed-running into an Idiocracy scenario than a revolution.
I think this will turn out another driverless car situation where last 1% needs 99% of the time. And while it might happen eventually it's going to take extremely long time.
[1] Because we don't have much more computing jumps left, nor will future data be as clean as now.
> First off Alpha Evolve isn't an LLM.
It's a system based on LLMs.
> AlphaEvolve, an evolutionary coding agent powered by large language models for general-purpose algorithm discovery and optimization. AlphaEvolve pairs the creative problem-solving capabilities of our Gemini models with automated evaluators that verify answers, and uses an evolutionary framework to improve upon the most promising ideas.
> AlphaEvolve leverages an ensemble of state-of-the-art large language models: our fastest and most efficient model, Gemini Flash, maximizes the breadth of ideas explored, while our most powerful model, Gemini Pro, provides critical depth with insightful suggestions. Together, these models propose computer programs that implement algorithmic solutions as code.
It’s more like a concept car vs a production line model. The capabilities it has were fine tuned for a specific scenario and are not yet available to the general public.
I think LLMs the technology is very cool and l’m frankly amazed at what it can do. What I’m ‘anti’ about is pushing the entire economy all in on LLM tech. The accelerationist take of ‘just keep going as fast as possible and it will work out, trust me bro’ is the most unhinged dangerous shit I’ve ever heard and unfortunately seems to be the default worldview of those in charge of the money. I’m not sure where all the AI tools will end up, but I am willing to bet big that the average person is not going to be better off 10 years from now. The direction the world is going scares the shit out of me and the usages of AI by bad actors is not helping assuage that fear.
Honestly? I think if we as a society could trust our leaders (government and industry) to not be total dirtbags the resistance to AI would be much lower.
In my experience, it is often the other way around. Enthusiasts are tasked with trying to open minds that seem very closed on the subject. Most serious users of these tools recognize the shortcomings and also can make well-educated guesses on the short term future. It's the anti crowd who get hellbent on this ridiculously unfounded "robots are just parrots and can't ever replace real programmers" shtick.
Maybe if AI evangelists would stop lying about what AI can do then people would hate it less.
But lying and hype is baked into the DNA of AI booster culture. At this point it can be safely assumed anything short of right-here-right-now proof is pure unfettered horseshit when coming from anyone and everyone promoting the value of AI.
Are you trying to demonstrate a textbook example of straw man argument?
Something that bothers me here is that Anthropic claimed in their blog post that the Linux kernel could boot on x86 - is this not actually true then? They just made that part up?
It seemed pretty unambiguous to me from the blog post that they were saying the kernel could boot on all three arch's, but clearly that's not true unless they did some serious hand-waving with kernel config options. Looking closer in the repo they only show a claimed Linux boot for RISC-V, so...
[0]: https://www.anthropic.com/engineering/building-c-compiler - "build a bootable Linux 6.9 on x86, ARM, and RISC-V."
[1]: https://github.com/anthropics/claudes-c-compiler/blob/main/B... - only shows a test of RISC-V
> Someone got it working on Compiler Explorer and remarked that the assembly output “reminds me of the quality of an undergraduate’s compiler assignment”. Which, to be fair, is both harsh and not entirely wrong when you look at the register spilling patterns.
This is what I've noticed about most LLM generated code, its about the quality of an undergrad, and I think there's a good reason for this - most of the code its been trained on is of undergrad quality. Stack overflow questions, a lot of undergrad open source projects, there are some professional quality open source projects (eg SqlLite) but they are outweighed by the mass of other code. Also things like Sqllite don't compare to things like Oracle or Sql Server which are proprietary.
> the build failed at the linker stage
> The compiler did its job fine
> Where CCC Succeeds Correctness: Compiled every C file in the kernel (0 errors)
I don't think that follows. It's entirely possible that the compiler produced garbage assembly for a bunch of the kernel code that would make it totally not work even if it did link. (The SQLite code passing its self tests doesn't convince me otherwise, because the Linux kernel uses way more advanced/low-level/uncommon features than SQLite does.)
I agree. Lack of errors is not an indicator of correct compilation. Piping something to /dev/null won't provide any errors either & so there is nothing we can conclude from it. The fact that it compiles SQLite correctly does provide some evidence that their compiler at least implements enough of the C semantics involved in SQLite.
It's really cool to see how slow unoptimised C is. You get so used to seeing C easily beat any other language in performance that you assume it's really just intrinsic to the language. The benchmark shows a SQLite3 unoptimised build 12x slower for CCC, 20x for optimised build. That's enormous!
I'm not dissing CCC here, rather I'm impressed with how much speed is squeezed out by GCC out of what is assumed to be already an intrinsically fast language.
I mean you can always make things slower. There are lots of non-optimizing or low optimizing compilers that are _MUCH_ faster than this. TCC is probably the most famous example, but hardly the only alternative C compiler with performance somewhere between -O1 and -O2 in GCC. By comparison as I understand it, CCC has performance worse than -O0 which is honestly a bit surprising to me, since -O0 should not be a hard to achieve target. As I understand it, at -O0 C is basically just macro expanding into assembly with a bit of order of operations thrown in. I don't believe it even does register allocation.
The speed of C is still largely intrinsic to the language.
The primatives are directly related to the actual silicon. A function call is actually going to turn into a call instruction (or get inlined). The order of bytes in your struct are how they exist in memory, etc. A pointer being dereferenced is a load/store.
The converse holds as well. Interpreted languages are slow because this association with the hardware isn't the case.
When you have a poopy compiler that does lots of register shuffling then you loose this association.
Specifically the constant spilling with those specific functions functions that were the 1000x slowdown, makes the C code look a lot more like Python code (where every variable is several dereference away).
Vibe coding is entertainment. Nothing wrong about entertainment, but when totally clueless people connect to their bank account, or control their devices with vibe coded programs, someone will be entertained for sure.
Large language models and small language models are very strong for solving problems, when the problem is narrow enough.
They are above human average for solving almost any narrow problem, independent of time, but when time is a factor, let's say less than a minute, they are better than experts.
An OS kernel is exactly a problem, that everyone prefers to be solved as correct as possible, even if arriving at the solution takes longer.
The author mentions stability and correctness of CCC, these are properties of Rust and not of vibe coding. Still impressive feat of claude code though.
Ironically, if they populated the repo first with objects, functions and methods with just todo! bodies, be sure the architecture compiles and it is sane, and only then let the agent fill the bodies with implementations most features would work correctly.
I am writing a program to do exactly that for Rust, but even then, how the user/programmer would know beforehand how many architectural details to specify using todo!, to be sure that the problem the agent tries to solve is narrow enough? That's impossible to know! If the problem is not narrow enough, then the implementation is gonna be a mess.
I think one of the issue is that the register allocation algorithm -- alongside the SSA generation -- is not enough.
Generally after the SSA pass, you convert all of them into register transfer language (RTL) and then do register allocation pass. And for GCC's case it is even more extreme -- You have GIMPLE in the middle that does more aggressive optimization, similar to rustc's MIR. CCC doesn't have all that, and for register allocation you can try to do simple linear scan just as the usual JIT compiler would do though (and from my understanding, something CCC should do at a simple cost), but most of the "hard part" of compiler today is actually optimization -- frontend is mostly a solved problem if you accept some hacks, unlike me who is still looking for an elegant academic solution to the typedef problem.
Note that the LLVM approach to IR is probably a bit more sane than the GCC one. GCC has ~3 completely different IRs at different stages in the pipeline, while LLVM mostly has only canonical IR form for passing data around through the optimization passes (and individual passes will sometimes make their own temporary IR locally to make a specific analysis easier).
What is the typedef problem?
If stevefan1999's referring to a nasty frontend issue, it might be due to the fact that a name introduced by a typedef and an identical identifier can mingle in the same scope, which makes parsing pretty nasty – e.g. (example from source at end):
https://eli.thegreenplace.net/2011/05/02/the-context-sensiti...I don't know off the top of my head whether there's a parser framework that makes this parse "straightforward" to express.
In your example bar is actually trivial, since both the type AA and the variable AA are ints both aa and bb ends up as 4 no matter how you parse it. AA has to be typedef'd to something other than int.
Lexical parsing C is simple, except that typedef's technically make it non-context-free. See https://en.wikipedia.org/wiki/Lexer_hack When handwriting a parser, it's no big deal, but it's often a stumbling block for parser generators or other formal approaches. Though, I recall there's a PEG-based parser for C99/C11 floating around that was supposed to be compliant. But I'm having trouble finding a link, and maybe it was using something like LPeg, which has features beyond pure PEG that help with context dependent parsing.
Clang's solution (presented at the end of the Wikipedia article you linked) seem much better - just use a single lexical token for both types and variables.
Then, only the parser needs to be context sensitive, for the A* B; construct which is either a no-op multiplication (if A is a variable) or a variable declaration of a pointer type (if A is a type)
The 158,000x slowdown on SQLite is the number that matters here, not whether it can parse C correctly. Parsing is the solved problem — every CS undergrad writes a recursive descent parser. The interesting (and hard) parts of a compiler are register allocation, instruction selection, and optimization passes, and those are exactly where this falls apart.
That said, I think the framing of "CCC vs GCC" is wrong. GCC has had thousands of engineer-years poured into it. The actually impressive thing is that an LLM produced a compiler at all that handles enough of C to compile non-trivial programs. Even a terrible one. Five years ago that would've been unthinkable.
The goalpost everyone should be watching isn't "can it match GCC" — it's whether the next iteration closes that 158,000x gap to, say, 100x. If it does, that tells you something real about the trajectory.
The part of the article about the 158,000x slowdown doesn't really make sense to me.
It says that a nested query does a large number of iterations through the SQLite bytecode evaluator. And it claims that each iteration is 4x slower, with an additional 2-3x penalty from "cache pressure". (There seems to be no explanation of where those numbers came from. Given that the blog post is largely AI-generated, I don't know whether I can trust them not to be hallucinated.)
But making each iteration 12x slower should only make the whole program 12x slower, not 158,000x slower.
Such a huge slowdown strongly suggests that CCC's generated code is doing something asymptotically slower than GCC's generated code, which in turn suggests a miscompilation.
I notice that the test script doesn't seem to perform any kind of correctness testing on the compiled code, other than not crashing. I would find this much more interesting if it tried to run SQLite's extensive test suite.
This thing has likely all of GCC, clang and any other open source C compiler in its training set.
It could have spotted out GCC source code verbatim and matched its performance.
It's kinda of a failure it didn't just spit out GCC isn't it?
If I had GCC and was asked for a C compiler I would just provide GCC..
"Ironically, among the four stages, the compiler (translation to assembly) is the most approachable one for an AI to build. It is mostly about pattern matching and rule application: take C constructs and map them to assembly patterns.
The assembler is harder than it looks. It needs to know the exact binary encoding of every instruction for the target architecture. x86-64 alone has thousands of instruction variants with complex encoding rules (REX prefixes, ModR/M bytes, SIB bytes, displacement sizes). Getting even one bit wrong means the CPU will do something completely unexpected.
The linker is arguably the hardest. It has to handle relocations, symbol resolution across multiple object files, different section types, position-independent code, thread-local storage, dynamic linking and format-specific details of ELF binaries. The Linux kernel linker script alone is hundreds of lines of layout directives that the linker must get exactly right."
I worked on compilers, assemblers and linkers and this is almost exactly backwards
Exactly this. Linker is threading given blocks together with fixups for position-independent code - this can be called rule application. Assembler is pattern matching.
This explanation confused me too:
If each iteration is X percent slower, then a billion iterations will also be X percent slower. I wonder what is actually going on.Claude one-shot a basic x86 assembler + linker for me. Missing lots of instructions, yes, but that is a matter of filling in tables of data mechanically.
Supporting linker scripts is marginally harder, but having manually written compilers before, my experience is the exact opposite of yours.
I am inclined to agree with you... but, did CC produce a working linker as well as a working compiler?
I thought it was just the compiler that Anthropic produced.
Why would the correct output of a C compiler not work with a standard linker?
CCC was and is a marketing stunt for a new model launch. Impressive, but still suffers from the same 80:20 rule. These 20% are optimizations, and we all know where the devel in “let me write my own language”.
As a neutral observation: it’s remarkable how quickly we as humans adjust expectations.
Imagine five years ago saying that you could have a general purpose AI write a c compiler that can handle the Linux kernel, by itself, from scratch for $20k by writing a simple English prompt.
That would have been completely unbelievable! Absurd! No one would take it seriously.
And now look at where we are.
You're right. It's been pretty incredible. It's also frustrating as hell though when people extrapolate from this progress
Just because we're here doesn't mean we're getting to AGI or software developers begging for jobs at Starbucks
"The miracle is not that the bear can dance well, it's that the bear can dance at all."
- Old Russian proverb.
Can someone explain to me, what’s the big deal about this? The AI model was trained on lots of code and spit out sonething similar than gcc. Why is this revolutionary?
It's HN, everything AI-adjacent is revolutionary, especially if it's also Anthropic-adjacent.
It's a marketing gimmick. Cursor did the same recently when they claimed to have created a working browsers but it was basically just a bunch of open source software glued together into something barely functional for a PR stunt.
If someone told you 5 years ago that a computer generated a working C compiler, would you think it was a big deal or not?
yeah its pretty amazing it can do this. The problem is the gaslighting by the companies making this - "see we can create compilers, we won't need programmers", programmers - "this is crap, are you insane?", classic gas lighting.
They should have gone one step further and also optimized for query performance (without editing the source code).
I have cough AI generated an x86 to x86 compiler (takes x86 in, replaces arbitrary instructions with functions and spits x86 out), at first it was horrible, but letting it work for 2 more days it was actually close to only 50% to 60% slowdown when every memory read instruction was replaced.
Now that's when people should get scared. But it's also reasonable to assume that CCC will look closer to GCC at that point, maybe influenced by other compilers as well. Tell it to write an arm compiler and it will never succeed (probably, maybe can use an intermeriadry and shove it into LLVM and it'll work, but at that point it is no longer a "C" compiler).
> CCC compiled every single C source file in the Linux 6.9 kernel without a single compiler error (0 errors, 96 warnings). This is genuinely impressive for a compiler built entirely by an AI.
It would be interesting to compare the source code used by CCC to other projects. I have a slight suspicion that CCC stole a lot of code from other projects.
The prospect of going the last mile to fix the remaining problems reminds me of the old joke:
"The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time."
I’ve always heard/repeated it as: “The first 90% is easy, it’s the second 90% that gets you. No one’s willing to talk about the third 90%.”
What does the smallest (simplest in terms of complexity / lines of code) C-compiler that can compile and run SQLite look like?
Perhaps that would be a more telling benchmark to evaluate the Claude compiler against.
Not as simple as it could be but I doubt anyone will manage to beat Fabrice Bellard: https://www.bellard.org/tcc/
Seeing that Claude can code a compiler doesn't help anyone if it's not coded efficiently, because getting it to be efficient is the hardest part, and it will be interesting seeing how long it will take to make it efficient. No one is gonna use some compiler that makes binaries run 700x longer.
I'm surprised that this wasn't possible before with just a bigger context size.
I wonder how well an LLM would do for a new CPU architecture for which no C compiler exists yet, just assembler.
It seems like if Anthropic released a super cool and useful _free_ utility (like a compiler, for example) that was better than existing counterparts or solved a problem that hadn’t been solved before[0] and just casually said “Here is this awesome thing that you should use every day. By the way our language model made this.” it would be incredible advertising for them.
But they instead made a blog post about how it would cost you twenty thousand dollars to recreate a piece of software that they do not, with a straight face, actually recommend that you use in any capacity beyond as a toy.
[0] I am categorically not talking about anything AI related or anything that is directly a part of their sales funnel. I am talking about a piece of software that just efficiently does something useful. GCC is an example, Everything by voidtools is an example, Wireshark is an example, etc. Claude is not an example.
I wonder how much more it would take Anthropic to make CCC on par with, or even better than, GCC.
I had no idea that SQLite performance was in fact compiler-dependent. The more you know!
You, know, it sure does add some additional perspective to the original Anthropic marketing materia... ahem, I mean article, to learn that the CCC compiled runtime for SQLite could potentially run up to 158,000 times slower than a GCC compiled one...
Nevertheless, the victories continue to be closer to home.
It might be interesting to feed this report in and see what the coding agent swarm can improve on.
Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?
I am pretty sure everybody agrees that this result is somewhere between slop code that barely works and the pinnacle of AI-assisted compiler technology. But discussions should not be held from the extreme points. Instead, I am looking for a realistic estimation from the HN community about where to place these results in a human context. Since I have no experience with compilers, I would welcome any of your opinions.
> Honest question: would a normal CS student, junior, senior, or expert software developer be able to build this kind of project, and in what amount of time?
I offered to do it, but without a deadline (I work f/time for money), only a cost estimation based on how many hours I think it should take me: https://news.ycombinator.com/item?id=46909310
The poster I responded to had claimed that it was not possible to produce a compiler capable of compiling a bootable Linux kernel within the $20k cost, nor for double that ($40k).
I offered to do it for $40k, but no takers. I initially offered to do it for $20k, but the poster kept evading, so I settled on asking for the amount he offered.
Did Anthropic release the scaffolding, harnesses, prompts, etc. they used to build their compiler? That would be an even cooler flex to be able to go and say "Here, if you still doubt, run this and build your own! And show us what else you can build using these techniques."
That would still require someone else to burn 20000$ to try it themselves.
mehh
But gcc is part of it's training data so of course it spit out an autocomplete of a working compiler
/s
This is actually a nice case study in why agentic LLMs do kind of think. It's by no means the same code or compiler. It had to figure out lots and lots of problems along the way to get to the point of tests passing.
> But gcc is part of it's training data so of course it spit out an autocomplete of a working compiler /s
Why the sarcasm tag? It is almost certainly trained on several compiler codebases, plus probably dozens of small "toy" C compilers created as hobby / school projects.
It's an interesting benchmark not because the LLM did something novel, but because it evidently stayed focused and maintained consistency long enough for a project of this complexity.
Since Claude Code can browse the web, is it fair to think of it as “rewriting and simplifying a compiler originally written in C++ into Rust”?
In the original post Anthropic did point out that Claude Code did not have access to the internet
Presumably it had access to GCC (and LLVM/Clang) sources in it's training data? All of which are hosted or mirrored on Github.
And all of which are in an entirely different language, and which use pretty different architectures to this compiler.