> On that note: GCC doesn't provide a nice library to give access to its internals (unlike LLVM). So we have to use libgccjit which, unlike the "jit" ("just in time", meaning compiling sub-parts of the code on the fly, only when needed for performance reasons and often used in script languages like Javascript) part in its name implies, can be used as "aot" ("ahead of time", meaning you compile everything at once, allowing you to spend more time on optimization).
Is libgccjit not “a nice library to give access to its internals?”
To use an illustrative (but inevitably flawed) metaphor: Using libgccjit for this is a bit like networking two computers via the MIDI protocol.
The MIDI protocol is pretty good for what it is designed for, and you can make it work for actual real networking, but the connections will be clunky, unergonomic, and will be missing useful features that you really want in a networking protocol.
I could be wrong, but my surface level understanding is that it's more of a library version of the external API of GCC than one that gives access to the internals.
When I studied compiler theory, a large part of the compilation involved a lexical analyser (e.g. `flex`) and a syntax analyser (e.g. `bison`), that would produce an internal representation of the input code (the AST), used to generate the compiled files.
It seems that the terminology as evolved, as we speak more broadly of frontends and backends.
So, I'm wondering if Bison and Flex (or equivalent tools) are still in use by the modern compilers? Or are they built directly in GCC, LLVM, ...?
The other answers are great, but let me just add that C++ cannot be parsed with conventional LL/LALR/LR parsers, because the syntax is ambiguous and requires disambiguation via type checking (i.e., there may be multiple parse trees but at most one will type check).
There was some research on parsing C++ with GLR but I don't think it ever made it into production compilers.
Other, more sane languages with unambiguous grammars may still choose to hand-write their parsers for all the reasons mentioned in the sibling comments. However, I would note that, even when using a parsing library, almost every compiler in existence will use its own AST, and not reuse the parse tree generated by the parser library. That's something you would only ever do in a compiler class.
Also I wouldn't say that frontend/backend is an evolution of previous terminology, it's just that parsing is not considered an "interesting" problem by most of the community so the focus has moved elsewhere (from the AST design through optimization and code generation).
Note that depending on what parsing lib you use, it may produce nodes of your own custom AST type
Personally I love the (Rust) combo of logos for lexing, chumsky for parsing, and ariadne for error reporting. Chumsky has options for error recovery and good performance, ariadne is gorgeous (there is another alternative for Rust, miette, both are good).
The only thing chumsky is lacking is incremental parsing. There is a chumsky-inspired library for incremental parsing called incpa though
If you want something more conservative for error reporting, annotate-snippets is finally at parity with rustc's current custom renderer and will soon become the default for both rustc and cargo.
GLR C++ parsers were for a short time in use on production code at Mozilla, in refactoring tools: Oink (and it's fork, pork). Not quite sure what ended that, but I don't think it was any issue with parsing.
This was in the olden days when your language's type system would maybe look like C's if you were serious and be even less of a thing when you were not.
The hard part about compiling Rust is not really parsing, it's the type system including parts like borrow checking, generics, trait solving (which is turing-complete itself), name resolution, drop checking, and of course all of these features interact in fun and often surprising ways. Also macros. Also all the "magic" types in the StdLib that require special compiler support.
This is why e.g. `rustc` has several different intermediate representations. You no longer have "the" AST, you have token trees, HIR, THIR, and MIR, and then that's lowered to LLVM or Cranelift or libgccjit. Each stage has important parts of the type system happen.
Not sure about GCC, but in general there has been a big move away from using parser generators like flex/bison/ANTLR/etc, and towards using handwritten recursive descent parsers. Clang (which is the C/C++ frontend for LLVM) does this, and so does rustc.
Table-driven parsers with custom per-statement tokenizers are still common in surviving Fortran compilers, with the exception of flang-new in LLVM. I used a custom parser combinator library there, inspired by a prototype in Haskell's Parsec, to implement a recursive descent algorithm with backtracking on failure. I'm still happy with the results, especially with the fact that it's all very strongly typed and coupled with the parse tree definition.
Most roll their own for three reasons: performance, context, and error handling. Bison/Menhir et al. are easy to write a grammar and get started with, but in exchange you get less flexibility overall. It becomes difficult to handle context-sensitive parts, do error recovery, and give the user meaningful errors that describe exactly what’s wrong. Usually if there’s a small syntax error we want to try to tell the user how to fix it instead of just producing “Syntax error”, and that requires being able to fix the input and keep parsing.
Menhir has a new mode where the parser is driven externally; this allows your code to drive the entire thing, which requires a lot more machinery than fire-and-forget but also affords you more flexibility.
If you're parsing a new language that you're trying to define, I do recommend using a parser generator to check your grammar, even if your "real" parser is handwritten for good reasons. A parser generator will insist on your grammar being unambiguous, or at least tell you where it is ambiguous. Without this sanity check, your unconstrained handwritten parser is almost guaranteed to not actually parse the language you think it parses.
"Frontend" as used by mainstream compilers is slightly broader than just lexing/parsing.
In typical modern compilers "frontend" is basically everything involving analyzing the source language and producing a compiler-internal IR, so lexing, parsing, semantic analysis and type checking, etc. And "backend" means everything involving producing machine code from the IR, so optimization and instruction selection.
In the context of Rust, rustc is the frontend (and it is already a very big and complicated Rust program, much more complicated than just a Rust lexer/parser would be), and then LLVM (typically bundled with rustc though some distros package them separately) is the backend (and is another very big and complicated C++ program).
I would just like to encourage all Rust devs to distribute binaries. No matter what compiler you choose, or what Rust version, users shouldn't have to build from source. I mostly see this with small projects to be fair.
And it shows how silly the idea is. gcc still sees plenty of forks from vendors who don't upstream, and llvm sees a lot more commercial participation. Unfortunately the Linux kernel equivalent doesn't exist.
It's also nakedly hypocritical behaviour on Stallman's part. Hoping (whether in vain or not) that GCC being Too Big to Fork ( https://news.ycombinator.com/item?id=6810259 ) will keep people from having access to the AST interface really isn't substantially different from saying "why do you need source code, can't you just disassemble the binary hahaha".
I wouldn't call Linux's stance silly. A working OS requires drivers for the hardware it will run on and having all the drivers in the kernel is a big reason we are able to use Linux everywhere we can today. Just like if they had used a more permissive license, we wouldn't have the Linux we do today. Compare the hardware supported by Linux vs the BSDs to see why these things are important.
Linux's position is more like "your out-of-tree code is not our problem". Linus didn't go out of his way to make out-of-tree modules more difficult to write.
LLVM wasn't the first modularization of codegen, see Amsterdam Compiler Kit for prior art, among others.
GCC approach is on purpose, plus even if they wanted to change, who would take the effort to make existing C, C++, Objective-C, Objective-C++, Fortran, Modula-2, Algol 68, Ada, D, and Go frontends adopt the new architecture?
Even clang with all the LLVM modularization is going to take a couple of years to move from plain LLVM IR into MLIR dialect for C based languages, https://github.com/llvm/clangir
Somewhat. Stallman claims to have tried to make it modular,[0] but also that he wants to avoid "misuse of [the] front ends".[1]
The idea is that you should link the front and back ends, to prevent out-of-process GPL runarounds. But because of that, the mingling of the front and back ends ended up winning out over attempts to stay modular.
>> The idea is that you should link the front and back ends, to prevent out-of-process GPL runarounds.
Valid points, but also the reason people wanting to create a more modular compiler created LLVM under a different license - the ultimate GPL runaround. OTOH now we have two big and useful compilers!
When gcc was built most compilers were proprietary. Stallman wanted a free compiler and to keep it free. The GPL license is more restrictive, but it's philosophy is clear. At the end of the day the code's writer can choose if and how people are allowed to use it. You don't have to use it, you can use something else or build you own. And maybe, just maybe Linux is thriving while Windows is dying because in the Linux ecosystem everybody works together and shares, while in Windows everybody helps together paying for Satya Nadellas next yacht.
> At the end of the day the code's writer can choose if and how people are allowed to use it.
If it's free software then I can modify and use it as I please. What's limited is redistributing the modified code (and offering a service to users over a network for Afferro).
That sounds like Stallman wants proprietary OSS ;)
If you're going to make it hard for anyone anywhere to integrate with your open source tooling for fear of commercial projects abusing them and not ever sharing their changes, why even use the GPL license?
Good lord Stallman is such a zealot and hypocrite. It's not open vs. closed it's mine vs. yours and he's openly declaring that he's nerfing software in order to prevent people from using it in a way he doesn't like. And refusing to talk about it in public because normal people hate that shit "misunderstanding" him.
--- From the post:
I let this drop back in March -- please forgive me.
> Maybe that's the issue for GCC, but for Emacs the issue is to get detailed
> info out of GCC, which is a different problem. My understanding is that
> you're opposed to GCC providing this useful info because that info would
> need to be complete enough to be usable as input to a proprietary
> compiler backend.
My hope is that we can work out a kind of "detailed output" that is
enough for what Emacs wants, but not enough for misuse of GCC front ends.
I don't want to discuss the details on the list, because I think that
would mean 50 messages of misunderstanding and tangents for each
message that makes progress. Instead, is there anyone here who would
like to work on this in detail?
He should just re-license GCC to close whatever perceived loophole, instead of actively making GCC more difficult to work with (for everyone!). RMS has done so much good, but he's so far from an ideal figure.
Not anymore. Modularization is somewhat tangential, but for awhile Stallman did actively oppose rearchitecting GCC to better support non-free plugins and front-ends. But Stallman lost that battle years ago. AFAIU, the current state of GCC is the result of intentional technical choices (certain kinds of decoupling not as beneficial as people might think--Rust has often been stymied by lack of features in LLVM, i.e. defacto (semantic?) coupling), works in progress (decoupling ongoing), or lack of time or wherewithal to commit to certain major changes (decoupling too onerous).
Personally, I think when you are making bad technical decisions in service of legal goals (making it harder to circumvent the GPL), that's a sure sign that you made a wrong turn somewhere.
Some in the Free Software community do not believe that making it harder to collaborate will reduce the amount of software created. For them, you are going to get the software and the choice is just “free” or not. And they imagine that permissively license code bases get “taken” and so copyleft licenses result in more code for “the community”.
I happen to believe that barriers to collaboration results in less software for everybody. I look at Clang and GCC and come away thinking that Clang is the better model because it results in more innovation and more software that I can enjoy. Others wonder why I am so naive and say that collaborating on Clang is only for corporate shills and apologists.
You can have whatever opinion you want. I do not care about the politics. I just want more Open Source software. I mean, so do the others guys I imagine but they don’t always seem to fact check their theories. We disagree about which model results in more software I can use.
This argument has been had thousands of times across thousands of forums and mailing lists in the preceding decades and we're unlikely to settle it here on the N + 1th iteration, but the short version of my own argument is that the entire point of Free Software is to allow end users to modify the software in the ways it serves them best. That's how it got started in the first place (see the origin story about Stallman and the Printer).
Stallman's insistence that gcc needed to be deliberately made worse to keep evil things from happening ran completely counter to his own supposed raison d'etre. Which you could maybe defend if it had actually worked, but it didn't: it just made everyone pack up and leave for LLVM instead, which easily could've been predicted and reduced gcc's leverage over the software ecosystem. So it was user-hostile, anti-freedom behavior for no benefit.
> the entire point of Free Software is to allow end users to modify the software in the ways it serves them best
Yes?
> completely counter to his own supposed raison d'etre
I can't follow your argument. You said yourself, that his point is the freedom of the *end user*, not the compiler vendor. He has no leverage on the random middle man between him and the end user other than adjusting his release conditions (aka. license).
I'm speaking here as an end user of gcc, who might want e.g. to make a nice code formatting plugin which has to parse the AST to work properly. For a long time, Stallman's demand was that gcc's codebase be as difficult, impenetrable, and non-modular as possible, to prevent companies from bolting a closed-source frontend to the backend, and he specifically opposed exporting the AST, which makes a whole bunch of useful programming tools difficult or impossible.
Whatever his motivations were, I don't see a practical difference between "making the code deliberately bad to prevent a user from modifying it" and something like Tivoization enforced by code signing. Either way, I as a gcc user can't modify the code if I find it unfit for purpose.
I have no idea what you think "gcc's leverage" would be if it were a useless GPL'd core whose only actively updated front and back ends are proprietary. Turning gcc into Android would be no victory for software freedom.
Yes, the law made a wrong turn when it comes to people controlling the software on the devices they own. Free Software is an ingenious hack which often needs patching to deal with specific cases.
Over the years several frontends for languages that used to be out-of-tree for years have been integrated. So both working in-tree & outside are definitely possible.
Not parent, but I share the ambivalence (at best) or outright negativity (at worst) toward the focus on Rust. It is a question of preference on my part, I don’t like the language and I do not want to see it continue to propagate through the software I use and want to control/edit/customize. This is particularly true of having Rust become entrenched in the depths of the open-source software I use on my personal and work machines. For me, Rust is just another dependency to add to a system and it also pulls along another compiler and the accompanying LLVM. I’m not going to learn a language that I disagree with strongly on multiple levels, so the less Rust in my open source the more control I retain over my software. So for me the less entrenched Rust remains the more ability I keep to work on the software I use.
That said, if Rust is going to continue entrenching itself in the open source software that is widely in use, it should at least be able to be compiled with by the mainline GPL compiler used and utilized by the open source community. Permissive licenses are useful and appreciated in some context, but the GPL’d character of the Linux stack’s core is worth fighting to hold onto.
It’s not Rust in open source I have a problem with, it is Rust being added to existing software that I use that I don’t want. A piece of software, open source, written in Rust is equivalent to proprietary software from my perspective. I’ll use it, but I will always prefer software I can control/edit/hack on as the key portions of my stack.
> I don’t like the language and I do not want to see it continue to propagate through the software I use and want to control/edit/customize.
This is how I feel about C/C++; I find Rust a lot easier to reason about, modify, and test, so I'm always happy to see that something I'm interested in is written in Rust (or, to a far lesser extent, golang).
> So for me the less entrenched Rust remains the more ability I keep to work on the software I use.
For me, the more entrenched Rust becomes the more ability I gain to work on the software I use.
> if Rust is going to continue entrenching itself in the open source software that is widely in use, it should at least be able to be compiled with by the mainline GPL compiler used and utilized by the open source community
I don't see why this ideological point should have any impact on whether a language is used or not. Clang/LLVM are also open-source, and I see no reason why GCC is better for these purposes than those. Unless you somehow think that using Clang/LLVM could lead to Rust becoming closed-source (or requiring closed-source tools), which is almost impossible to imagine, the benefits of using LLVM outweigh the drawbacks dramatically.
> A piece of software, open source, written in Rust is equivalent to proprietary software from my perspective.
This just sounds like 'not invented here syndrome'. Your refusal to learn new things does not reflect badly on Rust as a technology or on projects adopting it, it reflects on you. If you don't want to learn new things then that's fine, but don't portray your refusal to learn it as being somehow a negative for Rust.
> I will always prefer software I can control/edit/hack on as the key portions of my stack
You can control/edit/hack on Rust code, you just don't want to.
To be blunt, you're coming across as an old fogey who's set in his ways and doesn't want to learn anything new and doesn't want anything to change. "Everything was fine in my day, why is there all this new fangled stuff?" That's all fine, of course, you don't need to change or learn new things, but I don't understand the mindset of someone who wouldn't want to.
>> I don’t like the language and I do not want to see it continue to propagate through the software I use and want to control/edit/customize.
> This is how I feel about C/C++; I find Rust a lot easier to reason about, modify, and test, so I'm always happy to see that something I'm interested in is written in Rust (or, to a far lesser extent, golang).
You have to do better than "NO U" on this. The comparison to C/C++ is silly, because there is no way you're going to avoid C/C++ being woven throughout your entire existence for decades to come.
> I don't see why this ideological point should have any impact on whether a language is used or not. Clang/LLVM are also open-source, and I see no reason why GCC is better for these purposes than those.
I hope you don't expect people to debate about your sight and your imagination. You know why people choose the GPL, and you know why people are repulsed by the GPL. Playing dumb is disrespectful.
> don't portray your refusal to learn it as being somehow a negative for Rust.
But your sight, however, we should be discussing?
edit: I really, really like Rust, and I find it annoying that the clearest, most respectful arguments in this little subthread are from the people who just don't like Rust. The most annoying thing is that when they admit that they just don't like it, they're criticized for not making up reasons not to like it. They made it very clear that their main objection to its inclusion in Linux is licensing and integration issues, not taste. The response is name calling. I'm surprised they weren't flagkilled.
> edit: I really, really like Rust, and I find it annoying that the clearest, most respectful arguments in this little subthread are from the people who just don't like Rust.
Keywords right there. People who don’t-like-Rust are the most coddled anti-PL group. To the extent that they can just say: I really need to speak my mind here that I just don’t like it. End of story.
I don’t think anyone else feels entitled to complain about exactly nothing. I complain about languages. In the appropriate context. When it is relevant or germane to the topic.
A “genius” Rust program running on a supercomputer solving cancer would either get a golf-clap (“I don’t like Rust, but”) or cries that this means that the contagion is irreversibly spreading to their local supercomputer cluster.
One thing is people who work on projects where they would have to be burdened by at least (even if they don’t write it themselves) building Rust. That’s practical complaining, if that makes sense. Here people are whining about it entrenching itself in muh OSS.
We are on a fairly technical thread and me coming here, I expect to see interesting technical arguments and counter-arguments.
You started your comment with "I don't like the language". I can't find any technical or even legal-like argumentation (there is zero legal encumbering for using Rust AFAIK).
Your entire comment is more or less "I dislike Rust".
Question to you: what is the ideal imagined outcome of your comment? Do you believe that the Rust community will collectively disband and apologize for rubbing you the wrong way? Do you expect the Linux kernel to undo their decision to stop flagging Rust as an experiment in its code base?
Genuine question: imagine you had all the power to change something here; what would you change right away? And, much more interestingly: why?
If you respond, can we stick to technical argumentation? "I don't like X" is not informative for any future reader. Maybe expand on your multiple levels of disagreement with Rust?
Fair enough, but what are those disagreements? I was fully in the camp of not liking it, just because it was shoved down every projects throat. I used it, it turns out its fantastic once you get used to the syntax, and it replaced almost all other languages for me.
I just want to know if there are any actual pain points beyond syntax preference.
Edit: I partially agree with the compiler argument, but it's open source, and one of the main reasons the language is so fantastic IS the compiler, so I can stomach installing rustc and cargo.
> A piece of software, open source, written in Rust is equivalent to proprietary software from my perspective.
Unlike a project's license, this situation is entirely in your control. Rust is just a programming language like any other. It's pretty trivial to pick up any programming language well enough to be productive in a couple hours. If you need to hack on a project, you go learn whatever environment it uses, accomplish what you need to do, and move on. I've done this with Python, Bash, CMake, C++, JavaScript, CSS, ASM, Perl, weird domain-specific languages, the list goes on. It's fine to like some languages more than others (I'd be thrilled if C++ vanished from the universe), but please drop the drama queen stuff. You look really silly.
It's pretty disappointing when people like him try to block new technology just because they don't want to learn any more... but there's absolutely no way anyone is going to be productive in Rust in "a couple of hours".
Just be clear, it is not a case of I don’t want to learn anymore. That’s actually pretty far from the case. As an example and sticking to programming languages, I am currently putting Koka and Eff through their paces and learning a decent amount about the incorporation of algebraic effects into languages at scale, I’m also working my way through Idris 2’s adoption of Quantitative Type Theory. I genuinely enjoy learning, and particularly enjoy learning in the comp sci field.
But, that doesn’t have any bearing on my lack of desire to learn Rust. Several other comments basically demand I justify that dislike, and I may reply, but there is nothing wrong with not liking a language for personal or professional use. I have not taken any action to block Rust’s adoption in projects I use nor do I think I would succeed if I did try. I have occasionally bemoaned the inclusion of Rust in projects I use on forums, but even that isn’t taken well (my original comment as an example).
There's nothing wrong with disliking something. It's more that your dislike alone is not going to convince anyone else. Supporting arguments might either result in one or more of 1) people agreeing with you, or 2) you learning something that helps address your concern, or 3) Rust being improved to address your concern.
Productivity is incremental. In a couple of hours, you could figure out enough to clone a repository of a project you care about, build it successfully, and make a trivial change (e.g. improve an error message, or add an alias to a command-line argument). That doesn't mean you know enough to start using Rust for your next project.
Almost the only thing I don't like about Rust is that a bunch of people actively looking to subvert software freedom have set up shop around it. If everything was licensed correctly and designed to resist control by special interests, I'd be a lot happier with having committed to it.
The language itself I find wonderful, and I suspect that it will get significantly better. Being GPL-hostile, centralized without proper namespacing, and having a Microsoft dependency through Github registration is aggravating. When it all goes bad, all the people silencing everyone complaining about it will play dumb.
If there's anything I would want rewritten in something like Rust, it would be an OS kernel.
Never attribute to malice that which can be adequately explained by apathy. We have, unfortunately, reached a point where most people writing new software default to permissive and don't sufficiently care about copyleft. I wish we hadn't, but we have. This is not unique to Rust.
Ironically, we're better off when existing projects migrate to Rust, because they'll keep their licenses, while rewrites do what most new software does, and default to permissive.
Personally, I'm happy every time I see a new crate using the GPL.
> GPL-hostile
Rust is not GPL-hostile. LLVM was the available tool that spawned a renaissance of new languages; GCC wasn't. The compiler uses a permissive license; I personally wish it were GPL, but it isn't. But there's nothing at all wrong with writing GPLed software in Rust, and people do.
> having a Microsoft dependency through Github registration is aggravating
This one bugs a lot of us, and it is being worked on.
> On that note: GCC doesn't provide a nice library to give access to its internals (unlike LLVM). So we have to use libgccjit which, unlike the "jit" ("just in time", meaning compiling sub-parts of the code on the fly, only when needed for performance reasons and often used in script languages like Javascript) part in its name implies, can be used as "aot" ("ahead of time", meaning you compile everything at once, allowing you to spend more time on optimization).
Is libgccjit not “a nice library to give access to its internals?”
To use an illustrative (but inevitably flawed) metaphor: Using libgccjit for this is a bit like networking two computers via the MIDI protocol.
The MIDI protocol is pretty good for what it is designed for, and you can make it work for actual real networking, but the connections will be clunky, unergonomic, and will be missing useful features that you really want in a networking protocol.
Oh come on, SLIP over MIDI is tried and true.
I could be wrong, but my surface level understanding is that it's more of a library version of the external API of GCC than one that gives access to the internals.
libgccjit is much higher level than what's documented in the "GCC Internals" manual.
If the author reads this...
I'd be very interested if the author could provide a post with a more in depth view of the passes, as suggested!
> Little side-note: If enough people are interested by this topic, I can write a (much) longer explanation of these passes.
Yes, please!
When I studied compiler theory, a large part of the compilation involved a lexical analyser (e.g. `flex`) and a syntax analyser (e.g. `bison`), that would produce an internal representation of the input code (the AST), used to generate the compiled files.
It seems that the terminology as evolved, as we speak more broadly of frontends and backends.
So, I'm wondering if Bison and Flex (or equivalent tools) are still in use by the modern compilers? Or are they built directly in GCC, LLVM, ...?
The other answers are great, but let me just add that C++ cannot be parsed with conventional LL/LALR/LR parsers, because the syntax is ambiguous and requires disambiguation via type checking (i.e., there may be multiple parse trees but at most one will type check).
There was some research on parsing C++ with GLR but I don't think it ever made it into production compilers.
Other, more sane languages with unambiguous grammars may still choose to hand-write their parsers for all the reasons mentioned in the sibling comments. However, I would note that, even when using a parsing library, almost every compiler in existence will use its own AST, and not reuse the parse tree generated by the parser library. That's something you would only ever do in a compiler class.
Also I wouldn't say that frontend/backend is an evolution of previous terminology, it's just that parsing is not considered an "interesting" problem by most of the community so the focus has moved elsewhere (from the AST design through optimization and code generation).
Note that depending on what parsing lib you use, it may produce nodes of your own custom AST type
Personally I love the (Rust) combo of logos for lexing, chumsky for parsing, and ariadne for error reporting. Chumsky has options for error recovery and good performance, ariadne is gorgeous (there is another alternative for Rust, miette, both are good).
The only thing chumsky is lacking is incremental parsing. There is a chumsky-inspired library for incremental parsing called incpa though
If you want something more conservative for error reporting, annotate-snippets is finally at parity with rustc's current custom renderer and will soon become the default for both rustc and cargo.
GLR C++ parsers were for a short time in use on production code at Mozilla, in refactoring tools: Oink (and it's fork, pork). Not quite sure what ended that, but I don't think it was any issue with parsing.
I disagree. It is interesting, that is why there many languages out there without an LSP.
This was in the olden days when your language's type system would maybe look like C's if you were serious and be even less of a thing when you were not.
The hard part about compiling Rust is not really parsing, it's the type system including parts like borrow checking, generics, trait solving (which is turing-complete itself), name resolution, drop checking, and of course all of these features interact in fun and often surprising ways. Also macros. Also all the "magic" types in the StdLib that require special compiler support.
This is why e.g. `rustc` has several different intermediate representations. You no longer have "the" AST, you have token trees, HIR, THIR, and MIR, and then that's lowered to LLVM or Cranelift or libgccjit. Each stage has important parts of the type system happen.
Not sure about GCC, but in general there has been a big move away from using parser generators like flex/bison/ANTLR/etc, and towards using handwritten recursive descent parsers. Clang (which is the C/C++ frontend for LLVM) does this, and so does rustc.
I don't know a single mainstream language that uses parser generators. Python used to, and even they have moved.
AFAIK the reason is solely error messages: the customization available with handwritten parsers is just way better for the user.
I believe that GCC also moved to a handwritten parser, at least for c++, a couple of decades ago.
Table-driven parsers with custom per-statement tokenizers are still common in surviving Fortran compilers, with the exception of flang-new in LLVM. I used a custom parser combinator library there, inspired by a prototype in Haskell's Parsec, to implement a recursive descent algorithm with backtracking on failure. I'm still happy with the results, especially with the fact that it's all very strongly typed and coupled with the parse tree definition.
Not really. Here’s a comparison of different languages: https://notes.eatonphil.com/parser-generators-vs-handwritten...
Most roll their own for three reasons: performance, context, and error handling. Bison/Menhir et al. are easy to write a grammar and get started with, but in exchange you get less flexibility overall. It becomes difficult to handle context-sensitive parts, do error recovery, and give the user meaningful errors that describe exactly what’s wrong. Usually if there’s a small syntax error we want to try to tell the user how to fix it instead of just producing “Syntax error”, and that requires being able to fix the input and keep parsing.
Menhir has a new mode where the parser is driven externally; this allows your code to drive the entire thing, which requires a lot more machinery than fire-and-forget but also affords you more flexibility.
If you're parsing a new language that you're trying to define, I do recommend using a parser generator to check your grammar, even if your "real" parser is handwritten for good reasons. A parser generator will insist on your grammar being unambiguous, or at least tell you where it is ambiguous. Without this sanity check, your unconstrained handwritten parser is almost guaranteed to not actually parse the language you think it parses.
"Frontend" as used by mainstream compilers is slightly broader than just lexing/parsing.
In typical modern compilers "frontend" is basically everything involving analyzing the source language and producing a compiler-internal IR, so lexing, parsing, semantic analysis and type checking, etc. And "backend" means everything involving producing machine code from the IR, so optimization and instruction selection.
In the context of Rust, rustc is the frontend (and it is already a very big and complicated Rust program, much more complicated than just a Rust lexer/parser would be), and then LLVM (typically bundled with rustc though some distros package them separately) is the backend (and is another very big and complicated C++ program).
I would just like to encourage all Rust devs to distribute binaries. No matter what compiler you choose, or what Rust version, users shouldn't have to build from source. I mostly see this with small projects to be fair.
I find it shocking that 20 years after LLVM was created, gcc still hasn't moved towards modularization of codegen.
It is a political not a technical decision. Essentially the same like the Linux kernel not encouraging the use of out-of-tree kernel modules. https://gcc.gnu.org/legacy-ml/gcc/2000-01/msg00572.html
And it shows how silly the idea is. gcc still sees plenty of forks from vendors who don't upstream, and llvm sees a lot more commercial participation. Unfortunately the Linux kernel equivalent doesn't exist.
It's also nakedly hypocritical behaviour on Stallman's part. Hoping (whether in vain or not) that GCC being Too Big to Fork ( https://news.ycombinator.com/item?id=6810259 ) will keep people from having access to the AST interface really isn't substantially different from saying "why do you need source code, can't you just disassemble the binary hahaha".
There are several open BSDs.
AFAIK there's no evidence to suggest that permissive vs. copyleft license is the reason for the relative lack of success of the BSDs vs. Linux.
I wouldn't call Linux's stance silly. A working OS requires drivers for the hardware it will run on and having all the drivers in the kernel is a big reason we are able to use Linux everywhere we can today. Just like if they had used a more permissive license, we wouldn't have the Linux we do today. Compare the hardware supported by Linux vs the BSDs to see why these things are important.
Linux's position is more like "your out-of-tree code is not our problem". Linus didn't go out of his way to make out-of-tree modules more difficult to write.
LLVM wasn't the first modularization of codegen, see Amsterdam Compiler Kit for prior art, among others.
GCC approach is on purpose, plus even if they wanted to change, who would take the effort to make existing C, C++, Objective-C, Objective-C++, Fortran, Modula-2, Algol 68, Ada, D, and Go frontends adopt the new architecture?
Even clang with all the LLVM modularization is going to take a couple of years to move from plain LLVM IR into MLIR dialect for C based languages, https://github.com/llvm/clangir
Isn't that very much intentional on the part of GCC?
Somewhat. Stallman claims to have tried to make it modular,[0] but also that he wants to avoid "misuse of [the] front ends".[1]
The idea is that you should link the front and back ends, to prevent out-of-process GPL runarounds. But because of that, the mingling of the front and back ends ended up winning out over attempts to stay modular.
[0]: https://lists.gnu.org/archive/html/emacs-devel/2015-02/msg00...
[1]: https://lists.gnu.org/archive/html/emacs-devel/2015-01/msg00...
>> The idea is that you should link the front and back ends, to prevent out-of-process GPL runarounds.
Valid points, but also the reason people wanting to create a more modular compiler created LLVM under a different license - the ultimate GPL runaround. OTOH now we have two big and useful compilers!
When gcc was built most compilers were proprietary. Stallman wanted a free compiler and to keep it free. The GPL license is more restrictive, but it's philosophy is clear. At the end of the day the code's writer can choose if and how people are allowed to use it. You don't have to use it, you can use something else or build you own. And maybe, just maybe Linux is thriving while Windows is dying because in the Linux ecosystem everybody works together and shares, while in Windows everybody helps together paying for Satya Nadellas next yacht.
> At the end of the day the code's writer can choose if and how people are allowed to use it.
If it's free software then I can modify and use it as I please. What's limited is redistributing the modified code (and offering a service to users over a network for Afferro).
https://www.gnu.org/philosophy/free-sw.en.html#fs-definition
That sounds like Stallman wants proprietary OSS ;)
If you're going to make it hard for anyone anywhere to integrate with your open source tooling for fear of commercial projects abusing them and not ever sharing their changes, why even use the GPL license?
This is a big part of why I’ve always eschewed GPL.
Good lord Stallman is such a zealot and hypocrite. It's not open vs. closed it's mine vs. yours and he's openly declaring that he's nerfing software in order to prevent people from using it in a way he doesn't like. And refusing to talk about it in public because normal people hate that shit "misunderstanding" him.
--- From the post:
I let this drop back in March -- please forgive me.
My hope is that we can work out a kind of "detailed output" that is enough for what Emacs wants, but not enough for misuse of GCC front ends.I don't want to discuss the details on the list, because I think that would mean 50 messages of misunderstanding and tangents for each message that makes progress. Instead, is there anyone here who would like to work on this in detail?
He should just re-license GCC to close whatever perceived loophole, instead of actively making GCC more difficult to work with (for everyone!). RMS has done so much good, but he's so far from an ideal figure.
How in the world would you relicense GCC
Not anymore. Modularization is somewhat tangential, but for awhile Stallman did actively oppose rearchitecting GCC to better support non-free plugins and front-ends. But Stallman lost that battle years ago. AFAIU, the current state of GCC is the result of intentional technical choices (certain kinds of decoupling not as beneficial as people might think--Rust has often been stymied by lack of features in LLVM, i.e. defacto (semantic?) coupling), works in progress (decoupling ongoing), or lack of time or wherewithal to commit to certain major changes (decoupling too onerous).
Personally, I think when you are making bad technical decisions in service of legal goals (making it harder to circumvent the GPL), that's a sure sign that you made a wrong turn somewhere.
Why? When your goal is to have free software, having non-free software with better architecture won't suit you.
I would describe this more as "trying to prevent others from having non-free software if they wish to", which is a lot more questionable imo.
Some in the Free Software community do not believe that making it harder to collaborate will reduce the amount of software created. For them, you are going to get the software and the choice is just “free” or not. And they imagine that permissively license code bases get “taken” and so copyleft licenses result in more code for “the community”.
I happen to believe that barriers to collaboration results in less software for everybody. I look at Clang and GCC and come away thinking that Clang is the better model because it results in more innovation and more software that I can enjoy. Others wonder why I am so naive and say that collaborating on Clang is only for corporate shills and apologists.
You can have whatever opinion you want. I do not care about the politics. I just want more Open Source software. I mean, so do the others guys I imagine but they don’t always seem to fact check their theories. We disagree about which model results in more software I can use.
I am maybe part of the crowd you describe, but I don't disagree so much with you.
I just think, that:
> I happen to believe that barriers to collaboration results in less software for everybody.
is not a bad thing. There is absolutely no lack of supply for software. The "market" is flooded with software and most of it is shit. https://en.wikipedia.org/wiki/Sturgeon%27s_law
This argument has been had thousands of times across thousands of forums and mailing lists in the preceding decades and we're unlikely to settle it here on the N + 1th iteration, but the short version of my own argument is that the entire point of Free Software is to allow end users to modify the software in the ways it serves them best. That's how it got started in the first place (see the origin story about Stallman and the Printer).
Stallman's insistence that gcc needed to be deliberately made worse to keep evil things from happening ran completely counter to his own supposed raison d'etre. Which you could maybe defend if it had actually worked, but it didn't: it just made everyone pack up and leave for LLVM instead, which easily could've been predicted and reduced gcc's leverage over the software ecosystem. So it was user-hostile, anti-freedom behavior for no benefit.
> the entire point of Free Software is to allow end users to modify the software in the ways it serves them best
Yes?
> completely counter to his own supposed raison d'etre
I can't follow your argument. You said yourself, that his point is the freedom of the *end user*, not the compiler vendor. He has no leverage on the random middle man between him and the end user other than adjusting his release conditions (aka. license).
I'm speaking here as an end user of gcc, who might want e.g. to make a nice code formatting plugin which has to parse the AST to work properly. For a long time, Stallman's demand was that gcc's codebase be as difficult, impenetrable, and non-modular as possible, to prevent companies from bolting a closed-source frontend to the backend, and he specifically opposed exporting the AST, which makes a whole bunch of useful programming tools difficult or impossible.
Whatever his motivations were, I don't see a practical difference between "making the code deliberately bad to prevent a user from modifying it" and something like Tivoization enforced by code signing. Either way, I as a gcc user can't modify the code if I find it unfit for purpose.
I have no idea what you think "gcc's leverage" would be if it were a useless GPL'd core whose only actively updated front and back ends are proprietary. Turning gcc into Android would be no victory for software freedom.
Yes, the law made a wrong turn when it comes to people controlling the software on the devices they own. Free Software is an ingenious hack which often needs patching to deal with specific cases.
It is intentional to avoid non-free projects from building on top of gcc components.
I am not familiar enough with gcc to know how it impacts out-of-tree free projects or internal development.
The decision was taken a long time ago, it may be worth revisiting it.
Over the years several frontends for languages that used to be out-of-tree for years have been integrated. So both working in-tree & outside are definitely possible.
I don't necessary like the focus on Rust, but if it happens, then we need to have support in the free compiler!
Why not? Like what about the technology or ecosystem do you disagree with
Not parent, but I share the ambivalence (at best) or outright negativity (at worst) toward the focus on Rust. It is a question of preference on my part, I don’t like the language and I do not want to see it continue to propagate through the software I use and want to control/edit/customize. This is particularly true of having Rust become entrenched in the depths of the open-source software I use on my personal and work machines. For me, Rust is just another dependency to add to a system and it also pulls along another compiler and the accompanying LLVM. I’m not going to learn a language that I disagree with strongly on multiple levels, so the less Rust in my open source the more control I retain over my software. So for me the less entrenched Rust remains the more ability I keep to work on the software I use.
That said, if Rust is going to continue entrenching itself in the open source software that is widely in use, it should at least be able to be compiled with by the mainline GPL compiler used and utilized by the open source community. Permissive licenses are useful and appreciated in some context, but the GPL’d character of the Linux stack’s core is worth fighting to hold onto.
It’s not Rust in open source I have a problem with, it is Rust being added to existing software that I use that I don’t want. A piece of software, open source, written in Rust is equivalent to proprietary software from my perspective. I’ll use it, but I will always prefer software I can control/edit/hack on as the key portions of my stack.
> I don’t like the language and I do not want to see it continue to propagate through the software I use and want to control/edit/customize.
This is how I feel about C/C++; I find Rust a lot easier to reason about, modify, and test, so I'm always happy to see that something I'm interested in is written in Rust (or, to a far lesser extent, golang).
> So for me the less entrenched Rust remains the more ability I keep to work on the software I use.
For me, the more entrenched Rust becomes the more ability I gain to work on the software I use.
> if Rust is going to continue entrenching itself in the open source software that is widely in use, it should at least be able to be compiled with by the mainline GPL compiler used and utilized by the open source community
I don't see why this ideological point should have any impact on whether a language is used or not. Clang/LLVM are also open-source, and I see no reason why GCC is better for these purposes than those. Unless you somehow think that using Clang/LLVM could lead to Rust becoming closed-source (or requiring closed-source tools), which is almost impossible to imagine, the benefits of using LLVM outweigh the drawbacks dramatically.
> A piece of software, open source, written in Rust is equivalent to proprietary software from my perspective.
This just sounds like 'not invented here syndrome'. Your refusal to learn new things does not reflect badly on Rust as a technology or on projects adopting it, it reflects on you. If you don't want to learn new things then that's fine, but don't portray your refusal to learn it as being somehow a negative for Rust.
> I will always prefer software I can control/edit/hack on as the key portions of my stack
You can control/edit/hack on Rust code, you just don't want to.
To be blunt, you're coming across as an old fogey who's set in his ways and doesn't want to learn anything new and doesn't want anything to change. "Everything was fine in my day, why is there all this new fangled stuff?" That's all fine, of course, you don't need to change or learn new things, but I don't understand the mindset of someone who wouldn't want to.
>> I don’t like the language and I do not want to see it continue to propagate through the software I use and want to control/edit/customize.
> This is how I feel about C/C++; I find Rust a lot easier to reason about, modify, and test, so I'm always happy to see that something I'm interested in is written in Rust (or, to a far lesser extent, golang).
You have to do better than "NO U" on this. The comparison to C/C++ is silly, because there is no way you're going to avoid C/C++ being woven throughout your entire existence for decades to come.
> I don't see why this ideological point should have any impact on whether a language is used or not. Clang/LLVM are also open-source, and I see no reason why GCC is better for these purposes than those.
I hope you don't expect people to debate about your sight and your imagination. You know why people choose the GPL, and you know why people are repulsed by the GPL. Playing dumb is disrespectful.
> don't portray your refusal to learn it as being somehow a negative for Rust.
But your sight, however, we should be discussing?
edit: I really, really like Rust, and I find it annoying that the clearest, most respectful arguments in this little subthread are from the people who just don't like Rust. The most annoying thing is that when they admit that they just don't like it, they're criticized for not making up reasons not to like it. They made it very clear that their main objection to its inclusion in Linux is licensing and integration issues, not taste. The response is name calling. I'm surprised they weren't flagkilled.
> edit: I really, really like Rust, and I find it annoying that the clearest, most respectful arguments in this little subthread are from the people who just don't like Rust.
Keywords right there. People who don’t-like-Rust are the most coddled anti-PL group. To the extent that they can just say: I really need to speak my mind here that I just don’t like it. End of story.
I don’t think anyone else feels entitled to complain about exactly nothing. I complain about languages. In the appropriate context. When it is relevant or germane to the topic.
A “genius” Rust program running on a supercomputer solving cancer would either get a golf-clap (“I don’t like Rust, but”) or cries that this means that the contagion is irreversibly spreading to their local supercomputer cluster.
One thing is people who work on projects where they would have to be burdened by at least (even if they don’t write it themselves) building Rust. That’s practical complaining, if that makes sense. Here people are whining about it entrenching itself in muh OSS.
We are on a fairly technical thread and me coming here, I expect to see interesting technical arguments and counter-arguments.
You started your comment with "I don't like the language". I can't find any technical or even legal-like argumentation (there is zero legal encumbering for using Rust AFAIK).
Your entire comment is more or less "I dislike Rust".
Question to you: what is the ideal imagined outcome of your comment? Do you believe that the Rust community will collectively disband and apologize for rubbing you the wrong way? Do you expect the Linux kernel to undo their decision to stop flagging Rust as an experiment in its code base?
Genuine question: imagine you had all the power to change something here; what would you change right away? And, much more interestingly: why?
If you respond, can we stick to technical argumentation? "I don't like X" is not informative for any future reader. Maybe expand on your multiple levels of disagreement with Rust?
> I disagree with strongly on multiple levels
Fair enough, but what are those disagreements? I was fully in the camp of not liking it, just because it was shoved down every projects throat. I used it, it turns out its fantastic once you get used to the syntax, and it replaced almost all other languages for me.
I just want to know if there are any actual pain points beyond syntax preference.
Edit: I partially agree with the compiler argument, but it's open source, and one of the main reasons the language is so fantastic IS the compiler, so I can stomach installing rustc and cargo.
> A piece of software, open source, written in Rust is equivalent to proprietary software from my perspective.
Unlike a project's license, this situation is entirely in your control. Rust is just a programming language like any other. It's pretty trivial to pick up any programming language well enough to be productive in a couple hours. If you need to hack on a project, you go learn whatever environment it uses, accomplish what you need to do, and move on. I've done this with Python, Bash, CMake, C++, JavaScript, CSS, ASM, Perl, weird domain-specific languages, the list goes on. It's fine to like some languages more than others (I'd be thrilled if C++ vanished from the universe), but please drop the drama queen stuff. You look really silly.
It's pretty disappointing when people like him try to block new technology just because they don't want to learn any more... but there's absolutely no way anyone is going to be productive in Rust in "a couple of hours".
Just be clear, it is not a case of I don’t want to learn anymore. That’s actually pretty far from the case. As an example and sticking to programming languages, I am currently putting Koka and Eff through their paces and learning a decent amount about the incorporation of algebraic effects into languages at scale, I’m also working my way through Idris 2’s adoption of Quantitative Type Theory. I genuinely enjoy learning, and particularly enjoy learning in the comp sci field.
But, that doesn’t have any bearing on my lack of desire to learn Rust. Several other comments basically demand I justify that dislike, and I may reply, but there is nothing wrong with not liking a language for personal or professional use. I have not taken any action to block Rust’s adoption in projects I use nor do I think I would succeed if I did try. I have occasionally bemoaned the inclusion of Rust in projects I use on forums, but even that isn’t taken well (my original comment as an example).
There's nothing wrong with disliking something. It's more that your dislike alone is not going to convince anyone else. Supporting arguments might either result in one or more of 1) people agreeing with you, or 2) you learning something that helps address your concern, or 3) Rust being improved to address your concern.
Productivity is incremental. In a couple of hours, you could figure out enough to clone a repository of a project you care about, build it successfully, and make a trivial change (e.g. improve an error message, or add an alias to a command-line argument). That doesn't mean you know enough to start using Rust for your next project.
LLVM is also free
Rustc (+ LLVM) already is a free compiler.
Almost the only thing I don't like about Rust is that a bunch of people actively looking to subvert software freedom have set up shop around it. If everything was licensed correctly and designed to resist control by special interests, I'd be a lot happier with having committed to it.
The language itself I find wonderful, and I suspect that it will get significantly better. Being GPL-hostile, centralized without proper namespacing, and having a Microsoft dependency through Github registration is aggravating. When it all goes bad, all the people silencing everyone complaining about it will play dumb.
If there's anything I would want rewritten in something like Rust, it would be an OS kernel.
> actively looking to subvert software freedom
Never attribute to malice that which can be adequately explained by apathy. We have, unfortunately, reached a point where most people writing new software default to permissive and don't sufficiently care about copyleft. I wish we hadn't, but we have. This is not unique to Rust.
Ironically, we're better off when existing projects migrate to Rust, because they'll keep their licenses, while rewrites do what most new software does, and default to permissive.
Personally, I'm happy every time I see a new crate using the GPL.
> GPL-hostile
Rust is not GPL-hostile. LLVM was the available tool that spawned a renaissance of new languages; GCC wasn't. The compiler uses a permissive license; I personally wish it were GPL, but it isn't. But there's nothing at all wrong with writing GPLed software in Rust, and people do.
> having a Microsoft dependency through Github registration is aggravating
This one bugs a lot of us, and it is being worked on.
> GPL-hostile
Not sure if it is particularly hostile. There are several GPL crates like Slint.
> Microsoft dependency through Github registration is aggravating
This one is concerning.