Working with both I can say that this is a big no, go mod is as fast if not faster, usually Go dep are much faster because Go does not import as much dependencies as Rust.
Depends on how you write the Go or Rust code. The most optimal Rust re-write of the TypeScript compiler would very likely be faster than the most optimal version in Go. However they didn't want to do a re-write, they wanted to port the existing compiler codebase written in TS. Go like TS (ultimately the JS runtime) also has GC which makes a 1-to-1 port much easier.
maybe rust is faster. but to make fast rust code (as compared to fast go code) is barely impossible. borderline unusable language, both to humans and to LLMs (unless we get super-intelligence, maybe it can write good rust easily)
Used in multiple similar publications, including "Guiding Language Models of Code with Global Context using Monitors" (https://arxiv.org/abs/2306.10763), which uses static analysis beyond the type system to filter out e.g. invalid variable names, invalid control flow etc.
Yes this work is super cool too! Note that LSPs can not guarantee resolving the necessary types that we use to ensure the prefix property, which we leverage to avoid backtracking and generation loops.
I think TypeScript is uniquely positioned to be the optimal language for LLMs. Tons of training data (benefiting from all the JS examples as well) plus the structure of types for LLMs to follow and tools to enforce.
Completely agree. Even with the basic LLMs in the $20/month Cursor plan, I can work 10x faster on TypeScript codebases than I could otherwise, while for Python that multiple feels more like 2-3x. The autocompletions are especially impressive when there is a well-organized type system.
It’s better sure but as a power TS user it still sucks at generating better code, and consistently fucks up with generics (or doesn’t use them) or simple types sometimes.
LLMs work well with any static analysis tool. I frequently instruct Claude to use stuff like “go vet” and “deadcode” when it goes on a tear and writes a bunch of broken trash and declares mission accomplished.
tsc error messages are so bad that every time my LLM sees one of those "SomeType is not assignable to SomeLongAssTypeDontEvenTryToUnderstandWhatsGoingOnHere<<<<>>>>>>>>>>>>>>>>>>>>" it just gives up and casts to any. goes for python too.
TypeScript is arguably one of the weaker statically typed languages, with how it allows `any` to quietly violate the type checked assumptions. It makes it harder to do a lot of the basic typing mistakes in JS, but it doesn't prevent them by any means, especially if you have to interface with (typeless) JS code.
So for these reasons alone I would be against using TS as a lingua franca for LLM codegen (as is GP I assume). As another commenter mentioned, LLMs have a tendency to throw their hands^Hlogits up when presented with complex TS type errors and just resort to using `any` to get it to compile (and probably hiding bugs).
And that doesn't even touch the issues with the JS/TS ecosystem and runtimes more broadly.
tsc can be configured to avoid implicit use of any ("noImplicitAny": true) and ESLint can be set up to avoid explicit use of any. Typeless JS code is also a thing of the past.
But the devil is in the details - some libraries are typed quite crappily, some have unnecessary complex types, and the code that the LLMs was trained on is probably not the best in the world
There are languages that constrain types a lot more tightly than TypeScript, e.g. Kotlin, Rust, and Haskell. The more constrained the types, the more correct the program could be.
The program won’t be “more” correct. What would that even mean? Writing correct programs might be easier (or not) with more “constrained” (ill defined) typing.
Yep, and Rust famously goes beyond this by modelling memory ownership at compile time.
In fact, the more behaviour we can model at compile time the better when it comes to LLMs - there's some cool ideas here like transpiling Rust into languages for formal verification. See https://github.com/formal-land/coq-of-rust as an example.
Formal verification was one of those things that was previously so annoying to do that it rarely made it past academic use cases or extremely important libraries, but I think LLMs take the tedium out of it. Perhaps formal verification will have a "test driven development" type of moment in the sun thanks to this.
my experience - yes! but. It's more of an edit - compile - fix loop than a write (seemingly) correct code on the first try. This might be a feature.
There is a little occasional difficulty on syntax with rust, but there are often the same sort of logic errors / getting lost an llm would have on another codebase -- the compiler helps catch many of these.
I think so as well. The rust errors are some of the most "helpful" and easy to understand (once you grok the core concepts around rust) and it seems that the loop of - generate (maybe constrained) - check - fix benefits from this. In my testing it is better than python (long ass traces that you have to manually trim for the LLM).
Been using Devin for a few months now, for Typescript and Python.
I've never seen it check-in uncompilable code, but watching the Devin console I can see it building and using the code to ensure commits are not complete garbage. When it has checked-in compilable and almost right but slightly wrong code, automatically running lint and tests (it doesn't always run them before checking in) from ci triggers it to push a fix on its own.
Feedback loops are nice, but they can be expensive, and time consuming (oh look at me complain that it takes Devin a whopping 15 minutes to complete a task) so I can definitely see the value in type constraints.
Karpathy nerdbaited me on this last week! I'm almost done with aidocs and aidd.
The aidocs server: keeps track of generated llm-friendly docs for any github repo.
The aidocs daemon (aidd) is resident, and can watch a repo, find imports in a number of languages, request the docs from aidocs, serve them up in mcp, and/or put them into a directory in your repo. Planning on generating docs for a codebase and incremental docs creation later.
I could use a couple beta testers -- lmk if you're interested. macos for now, although the daemon is written in go and should be portable.
I saw that but it doesn't work for me. I use Gemini 2.5 Pro Preview right now, and it cannot fetch content from links. What I am looking for is a large text file with public API class, function, etc. signatures, plain text documentation and code examples.
we were thinking about doing exactly this, the closest current work is probably the amazing "Learning Formal Mathematics from Intrinsic Motivation" by Poesia et al (they use constraints too increase the likelihood of generating correct theorems/proofs during RL)
Yes, RL works well in fields where answer can be verified in different degree. That's why AlphaGo success, it also should work in code generation and math.
The general idea seems very promising, I had been hoping someone would do something like this since seeing JSON schema structured outputs for LLMs.
Need to dig in a bit more on the implementation, but I was surprised that the paper didn't mention hooking into existing language service/server. There's more than types that an LLM could leverage from existing language tooling. Auto imports is a good example, it is handy for the human developer to keep a linear writing flow, something a LLM needs even more.
the problem with LSPs is that they don't guarantee generating a type annotation that we can use for constraints, i.e. we can not ensure the prefix property using LSPs. so we had to roll our own :)
Pulling in more features to help the system is definitely worth looking into!
This was an obvious next step. Most current products can only restrict the token prediction to valid JSON or a specific JSON schema at best. There's no reason that this should be the only grammar available for constrained output mode.
The real challenge will be to make this detect and switch languages automatically. For example, a snippet of code could include a LaTeX formula in a comment and SQL in a string literal. There are many more examples, such as regex inside a shell script, and so on.
The obvious next step after that is back-tracking. It's possible to emit a token that is valid, but then allows no further completions that are valid. In other words, the model can paint itself into a corner. To my knowledge, no current online LLM service uses any kind of backtracking, they run in append ("forwards") mode only.
re detecting and switching language: you could run several constraint systems in parallel and switch as soon as one of them rejects the input and another accepts it
re backtracking: a core part of this paper is ensuring a prefix property. that is there is always a legitimate completion and the model can not "corner" itself!
research needs to be done for what kind of languages and language features this prefix property can be ensured
Hejlsberg mentioned the ability to quickly provide accurate type information to LLMs as one of the reasons for rewriting tsc into Go:
https://youtu.be/10qowKUW82U?t=3186
But isn't TypeScript already a typed language to begin with?
This is about the speed with which the compiler can advise an LLM that a particular thing checks or doesn't check. Typescript is much slower than Go
okay so basically the faster compiling means a tigher feedback loop for the LLM to -know- if the code compiles or not etc? interesting
is go faster than rust?
> is go faster than rust
No.
They rewrote in go because go is similar enough to typescript, while being faster than typescript.
Source: https://github.com/microsoft/typescript-go/discussions/411
Go’s compiler is WAY faster than Rust’s. As far as speed of the actual program, Rust will generally be faster.
Go or Rust compiler speeds won't have any effect here. The program in this context is the TypeScript compiler.
cargo check is WAY faster than go build
Working with both I can say that this is a big no, go mod is as fast if not faster, usually Go dep are much faster because Go does not import as much dependencies as Rust.
In Rust you only need to compile your dependencies once. After that it's just your app because dependencies don't change.
that is also the case in Go…?
Go has a very simple type system that is easy to typecheck on every token.
TypeScript has a type system that is complex enough, you can literally implement wasm inside it (and then use that to run e.g. Doom: https://socket.dev/blog/typescript-types-running-doom)
> is go faster than rust?
Depends on how you write the Go or Rust code. The most optimal Rust re-write of the TypeScript compiler would very likely be faster than the most optimal version in Go. However they didn't want to do a re-write, they wanted to port the existing compiler codebase written in TS. Go like TS (ultimately the JS runtime) also has GC which makes a 1-to-1 port much easier.
No. Ignore the other comments.
The reason for this decision is that they wanted a near 1:1 port of the typescript code to go, keeping design and structure almost identical.
You can’t do that in rust as easily because of all the cyclical references and indirection involved.
A rust port would be a rewrite. This is merely a migration.
maybe rust is faster. but to make fast rust code (as compared to fast go code) is barely impossible. borderline unusable language, both to humans and to LLMs (unless we get super-intelligence, maybe it can write good rust easily)
Also worth checking out MultiLSPy, effectively a python wrapper around multiple LSPs: https://github.com/microsoft/multilspy
Used in multiple similar publications, including "Guiding Language Models of Code with Global Context using Monitors" (https://arxiv.org/abs/2306.10763), which uses static analysis beyond the type system to filter out e.g. invalid variable names, invalid control flow etc.
Yes this work is super cool too! Note that LSPs can not guarantee resolving the necessary types that we use to ensure the prefix property, which we leverage to avoid backtracking and generation loops.
I think TypeScript is uniquely positioned to be the optimal language for LLMs. Tons of training data (benefiting from all the JS examples as well) plus the structure of types for LLMs to follow and tools to enforce.
Those who agree might be interested in "Introducing TypeChat" by Anders Hejlsberg + others (2023) [1]
[1]: https://microsoft.github.io/TypeChat/blog/introducing-typech...
Wish this project had more traction. Typechat with type checking could generate lots of synthetic data for model training too
Completely agree. Even with the basic LLMs in the $20/month Cursor plan, I can work 10x faster on TypeScript codebases than I could otherwise, while for Python that multiple feels more like 2-3x. The autocompletions are especially impressive when there is a well-organized type system.
Also in response to adjacent commenters - many mission-critical TS codebases will disable the use of an explicit "any" with eslint - https://typescript-eslint.io/rules/no-explicit-any/.
It’s better sure but as a power TS user it still sucks at generating better code, and consistently fucks up with generics (or doesn’t use them) or simple types sometimes.
LLMs work well with any static analysis tool. I frequently instruct Claude to use stuff like “go vet” and “deadcode” when it goes on a tear and writes a bunch of broken trash and declares mission accomplished.
> LLMs work well with any static analysis tool.
tsc error messages are so bad that every time my LLM sees one of those "SomeType is not assignable to SomeLongAssTypeDontEvenTryToUnderstandWhatsGoingOnHere<<<<>>>>>>>>>>>>>>>>>>>>" it just gives up and casts to any. goes for python too.
ha, that's always been my biggest gripe with ts
I can’t be the only one who hopes this was a joke.
God help us…
what do you dislike about it?
TypeScript is arguably one of the weaker statically typed languages, with how it allows `any` to quietly violate the type checked assumptions. It makes it harder to do a lot of the basic typing mistakes in JS, but it doesn't prevent them by any means, especially if you have to interface with (typeless) JS code.
So for these reasons alone I would be against using TS as a lingua franca for LLM codegen (as is GP I assume). As another commenter mentioned, LLMs have a tendency to throw their hands^Hlogits up when presented with complex TS type errors and just resort to using `any` to get it to compile (and probably hiding bugs).
And that doesn't even touch the issues with the JS/TS ecosystem and runtimes more broadly.
tsc can be configured to avoid implicit use of any ("noImplicitAny": true) and ESLint can be set up to avoid explicit use of any. Typeless JS code is also a thing of the past.
But the devil is in the details - some libraries are typed quite crappily, some have unnecessary complex types, and the code that the LLMs was trained on is probably not the best in the world
There are languages that constrain types a lot more tightly than TypeScript, e.g. Kotlin, Rust, and Haskell. The more constrained the types, the more correct the program could be.
The program won’t be “more” correct. What would that even mean? Writing correct programs might be easier (or not) with more “constrained” (ill defined) typing.
Yep, and Rust famously goes beyond this by modelling memory ownership at compile time.
In fact, the more behaviour we can model at compile time the better when it comes to LLMs - there's some cool ideas here like transpiling Rust into languages for formal verification. See https://github.com/formal-land/coq-of-rust as an example.
Formal verification was one of those things that was previously so annoying to do that it rarely made it past academic use cases or extremely important libraries, but I think LLMs take the tedium out of it. Perhaps formal verification will have a "test driven development" type of moment in the sun thanks to this.
Can LLMs properly code in Rust yet? There is way more TypeScript code out there compared to Rust, and I doubt structured output can alleviate this.
my experience - yes! but. It's more of an edit - compile - fix loop than a write (seemingly) correct code on the first try. This might be a feature.
There is a little occasional difficulty on syntax with rust, but there are often the same sort of logic errors / getting lost an llm would have on another codebase -- the compiler helps catch many of these.
> This might be a feature.
I think so as well. The rust errors are some of the most "helpful" and easy to understand (once you grok the core concepts around rust) and it seems that the loop of - generate (maybe constrained) - check - fix benefits from this. In my testing it is better than python (long ass traces that you have to manually trim for the LLM).
They can, yes.
We (.txt, the outlines people) had a brief thread about this paper on twitter if you're interested: https://x.com/dottxtai/status/1922322194379551128
Been using Devin for a few months now, for Typescript and Python.
I've never seen it check-in uncompilable code, but watching the Devin console I can see it building and using the code to ensure commits are not complete garbage. When it has checked-in compilable and almost right but slightly wrong code, automatically running lint and tests (it doesn't always run them before checking in) from ci triggers it to push a fix on its own.
Feedback loops are nice, but they can be expensive, and time consuming (oh look at me complain that it takes Devin a whopping 15 minutes to complete a task) so I can definitely see the value in type constraints.
is Devin worth the money? Would it be a big jump in productivity migrating from cursor to Devin?
Really cool results!
That this research comes out of universities, and not large AI labs, makes me think those labs believe that larger models are still the way to go.
thank you!
+1 this seems like healthy development
The code can be found here: https://github.com/eth-sri/type-constrained-code-generation
The correct way to do this is with finite model theory but we're not there yet.
we really need LLM trained on AST, instead of token, is there any research on this?
ASTrust: Towards More Trustworthy and Interpretable LLMs for Code through Syntax-Grounded Explanations
https://arxiv.org/abs/2407.08983
AST-T5: Structure-Aware Pretraining for Code Generation and Understanding
https://arxiv.org/abs/2401.03003
CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation
https://arxiv.org/abs/2405.02355
The downside is that you need to properly preprocess code, have less non-code Training Data, and can not adapt easily to new programming languages
Does llm really understand the code at this stage?
The vibe code society would benefit way more if libraries hosted their docs in a way that's easy to copy and paste into an LLM.
Karpathy nerdbaited me on this last week! I'm almost done with aidocs and aidd.
The aidocs server: keeps track of generated llm-friendly docs for any github repo.
The aidocs daemon (aidd) is resident, and can watch a repo, find imports in a number of languages, request the docs from aidocs, serve them up in mcp, and/or put them into a directory in your repo. Planning on generating docs for a codebase and incremental docs creation later.
I could use a couple beta testers -- lmk if you're interested. macos for now, although the daemon is written in go and should be portable.
many docs now include llms.txt https://llmstxt.org/
I saw that but it doesn't work for me. I use Gemini 2.5 Pro Preview right now, and it cannot fetch content from links. What I am looking for is a large text file with public API class, function, etc. signatures, plain text documentation and code examples.
https://ai-sdk.dev/llms.txt
Depends on the library I guess, I spent 12~ hours today vibe coding with LiveKit and their /llms.txt is https://docs.livekit.io/llms.txt
thanks for the feedback. let us see how we can organize this better for compat with diff LLMs.
We published a similar paper for MoonBit: Explore the Design of an AI-Friendly Programming Language https://conf.researchr.org/details/icse-2024/llm4code-2024-p...
Would it better if we move the feedback loops into RL-stage of LLM training?
Are there some related works?
we were thinking about doing exactly this, the closest current work is probably the amazing "Learning Formal Mathematics from Intrinsic Motivation" by Poesia et al (they use constraints too increase the likelihood of generating correct theorems/proofs during RL)
https://arxiv.org/abs/2407.00695
Yes, RL works well in fields where answer can be verified in different degree. That's why AlphaGo success, it also should work in code generation and math.
The general idea seems very promising, I had been hoping someone would do something like this since seeing JSON schema structured outputs for LLMs.
Need to dig in a bit more on the implementation, but I was surprised that the paper didn't mention hooking into existing language service/server. There's more than types that an LLM could leverage from existing language tooling. Auto imports is a good example, it is handy for the human developer to keep a linear writing flow, something a LLM needs even more.
the problem with LSPs is that they don't guarantee generating a type annotation that we can use for constraints, i.e. we can not ensure the prefix property using LSPs. so we had to roll our own :)
Pulling in more features to help the system is definitely worth looking into!
Honestly it's already working great in Cursor. Even adapting one type structure to another is quickly handled.
This was an obvious next step. Most current products can only restrict the token prediction to valid JSON or a specific JSON schema at best. There's no reason that this should be the only grammar available for constrained output mode.
The real challenge will be to make this detect and switch languages automatically. For example, a snippet of code could include a LaTeX formula in a comment and SQL in a string literal. There are many more examples, such as regex inside a shell script, and so on.
The obvious next step after that is back-tracking. It's possible to emit a token that is valid, but then allows no further completions that are valid. In other words, the model can paint itself into a corner. To my knowledge, no current online LLM service uses any kind of backtracking, they run in append ("forwards") mode only.
SRLCG: Self-Rectified Large-Scale Code Generation with Multidimensional Chain-of-Thought and Dynamic Backtracking
https://arxiv.org/abs/2504.00532
IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking
https://arxiv.org/abs/2410.07295
ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation
https://arxiv.org/abs/2411.07112v1
Another one: SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking https://arxiv.org/abs/2306.05426
There was also an hn thread: https://news.ycombinator.com/item?id=36425375
I believe Microsoft introduced a framework that did this sort of backtracking that you're suggesting. I'm not sure how much traction it got.
re detecting and switching language: you could run several constraint systems in parallel and switch as soon as one of them rejects the input and another accepts it
re backtracking: a core part of this paper is ensuring a prefix property. that is there is always a legitimate completion and the model can not "corner" itself!
research needs to be done for what kind of languages and language features this prefix property can be ensured
Backtracking idea is interesting, could maybe diffusion help? At some point it turns into sat solving.
Sat solving I guess because types encode proofs?
nice. the speed of AI development is accelerating so fast