I had quite a fun time getting those pelicans though... since GPT-5 Codex Mini isn't officially available via API yet I instead had OpenAI's Codex CLI tool extend itself (in Rust) to add a "codex prompt ..." tool which uses their existing custom auth scheme and backend API, then used that to generate the pelicans. Full details here: https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/
I'm half way through writing a typescript to native code translator (via .Net) compiling a large enough subset of current code with a lot of help from GPT5 and Codex CLI. It has completely blown me away.
I'd like to give you a concrete example which stood out (from by now, dozens). I wanted d.ts files from the .Net Standard Libs. One immediately obvious problem is that .Net would allow classes/interfaces to be redefined if the generic type arity is different. For example, there can be SomeClass<int> and SomeClass<int, int> which are completely separate. TypeScript of course, wouldn't allow this - you can have one with all types defined, but it'd obviously be a mess.
I was stuck with (quite ugly):
const users = new List_1<User>(...);
instead of
const users = new List<User>(...);
So GPT comes up with this:
declare const __unspecified: unique symbol;
type __ = typeof __unspecified;
// Your arity-anchored delegates exist elsewhere:
// import("internal/System").Action_0
// import("internal/System").Action_1<T1>
// import("internal/System").Action_2<T1, T2>
// import("internal/System").Action_3<T1, T2, T3>
// ... up to 17
export type Action<
T1 = __, T2 = __, T3 = __, // ... continue through T17 = __
> =
[T1] extends [__] ? import("internal/System").Action_0 :
[T2] extends [__] ? import("internal/System").Action_1<T1> :
[T3] extends [__] ? import("internal/System").Action_2<T1, T2> :
/* next lines follow the same pattern … */
import("internal/System").Action_3<T1, T2, T3>;
This lets me write:
const a: Action<number> = (n) => {}; // OK (void)
const f: Func<number, string> = (s) => 20; // OK (string -> number)
A human could come up with this, of course. But doing this at scale (there are many such problems which crop up), would take a lot of effort. Btw I'm using Claude for the grunt work (because its faster), but GPT5 is doing all the architecture/thinking/planning/review.
All "AI" providers cut corners in the models right now because the subsidized cost is unsustainable.
Grok's latest update made it far worse than the version right after the Grok-4 release. It makes outright mistakes now. Copilot has cut corners long ago. Google "AI" was always horrible.
The whole "AI" experiment was an outrageously expensive IP laundering parlor trick that is meeting economic realities now.
Charging developers $200/month for Claude Code and getting to a billion in ARR sounds like a pretty great business to be in to me, especially with this growth rate:
> Claude Code is reportedly close to generating $1 billion in annualized revenue, up from about $400 million in July.
Relative to its competitors, Anthropic seems to have a higher share of professional users paying premium subscriptions, which is probably more sustainable in the long term.
So Misanthropic claims that 416666.66 software developers have bought their expensive $200 subscription when there are 4.4 million software developers in the US.
That sounds reasonable given that 10% of software developers are talkers that need someone to output something that looks like a deliverable.
We were however talking profits here, not revenue.
Presumably their "$1bn ARR from Claude Code" number isn't just the $200/month subscribers, they have $20/month and $100/month plans too, both of which their internal analytics could be crediting to Claude Code based on API usage patterns.
That $1bn number was in a paywalled Information article which was then re-reported by TechCrunch so the actual source of the number isn't clear. I'm assuming someone leaked to the Information, they appear to have some very useful sources.
I doubt this is just US developers - they've boasted about how successful they are in Europe recently too:
> Businesses across Europe are trusting Claude with their most important work. As a result, EMEA has become our fastest-growing region, with a run-rate revenue that has grown more than 9x in the past year.
That's a very long-winded way of saying "it was subsidized so it could capture a large market segment, and now that's stopping", which is what SV companies have done since checks notes forever.
An LLM would have generated four pages on this topic in order to increase the token count!
LLMs are advertised for serious applications. I don't recall that CPUs generally hallucinate except for the FDIV bug. Or that AirBnB rents you apartments that don't exist in 30% of all cases. Or that Uber cars drive into a river during 20% of all rides.
Are we talking about economics, or about hallucinations?
"CPUs don't hallucinate" would be a reasonable argument if CPUs were an alternative to LLMs, which they aren't, so I'm not really sure what argument you're making there.
Seems like you're saying "a calculator makes fewer mistakes than an accountant", which is true, but I still pay an accountant to do my taxes, and not a calculator.
Not saying that you are completely wrong, but you could try to rephrase this to make a better conversation.
I agree that many new model versions are worse than the previous. But it is also related to base rules of the model - they try to please you and manipulate you to like them, way too much.
Looks like a leak: https://platform.openai.com/docs/models does not list it, and codex-mini-latest says that it's based on 4o. I wonder if it will be faster than codex; gpt-5-nano and -mini are still very slow for me on API, surprisingly so.
I noticed the same thing with -mini. It can be even slower than the full fat version. I'm guessing their infra for it is very cost-optimized to help them offer it at such a low price
I managed to get GPT-5-Codex-Mini to draw me a pelican. It's not a very good one! https://static.simonwillison.net/static/2025/codex-hacking-m...
For comparison, here's GPT-5-Codex (not mini) https://static.simonwillison.net/static/2025/codex-hacking-d... and full GPT-5: https://static.simonwillison.net/static/2025/codex-hacking-g...
I had quite a fun time getting those pelicans though... since GPT-5 Codex Mini isn't officially available via API yet I instead had OpenAI's Codex CLI tool extend itself (in Rust) to add a "codex prompt ..." tool which uses their existing custom auth scheme and backend API, then used that to generate the pelicans. Full details here: https://simonwillison.net/2025/Nov/9/gpt-5-codex-mini/
GPT-5 and GPT-5-Codex are already not clever enough for anything interesting.
What's your definition of interesting?
I'm half way through writing a typescript to native code translator (via .Net) compiling a large enough subset of current code with a lot of help from GPT5 and Codex CLI. It has completely blown me away.
I'd like to give you a concrete example which stood out (from by now, dozens). I wanted d.ts files from the .Net Standard Libs. One immediately obvious problem is that .Net would allow classes/interfaces to be redefined if the generic type arity is different. For example, there can be SomeClass<int> and SomeClass<int, int> which are completely separate. TypeScript of course, wouldn't allow this - you can have one with all types defined, but it'd obviously be a mess.
I was stuck with (quite ugly): const users = new List_1<User>(...); instead of const users = new List<User>(...);
So GPT comes up with this:
This lets me write: A human could come up with this, of course. But doing this at scale (there are many such problems which crop up), would take a lot of effort. Btw I'm using Claude for the grunt work (because its faster), but GPT5 is doing all the architecture/thinking/planning/review.not correct. are you using high reasoning? i am using codex everyday.
What sort of stuff are you using it for?
All "AI" providers cut corners in the models right now because the subsidized cost is unsustainable.
Grok's latest update made it far worse than the version right after the Grok-4 release. It makes outright mistakes now. Copilot has cut corners long ago. Google "AI" was always horrible.
The whole "AI" experiment was an outrageously expensive IP laundering parlor trick that is meeting economic realities now.
Charging developers $200/month for Claude Code and getting to a billion in ARR sounds like a pretty great business to be in to me, especially with this growth rate:
> Claude Code is reportedly close to generating $1 billion in annualized revenue, up from about $400 million in July.
https://techcrunch.com/2025/11/04/anthropic-expects-b2b-dema...
Relative to its competitors, Anthropic seems to have a higher share of professional users paying premium subscriptions, which is probably more sustainable in the long term.
So Misanthropic claims that 416666.66 software developers have bought their expensive $200 subscription when there are 4.4 million software developers in the US.
That sounds reasonable given that 10% of software developers are talkers that need someone to output something that looks like a deliverable.
We were however talking profits here, not revenue.
Presumably their "$1bn ARR from Claude Code" number isn't just the $200/month subscribers, they have $20/month and $100/month plans too, both of which their internal analytics could be crediting to Claude Code based on API usage patterns.
That $1bn number was in a paywalled Information article which was then re-reported by TechCrunch so the actual source of the number isn't clear. I'm assuming someone leaked to the Information, they appear to have some very useful sources.
I doubt this is just US developers - they've boasted about how successful they are in Europe recently too:
> Businesses across Europe are trusting Claude with their most important work. As a result, EMEA has become our fastest-growing region, with a run-rate revenue that has grown more than 9x in the past year.
https://www.anthropic.com/news/new-offices-in-paris-and-muni...
That's a very long-winded way of saying "it was subsidized so it could capture a large market segment, and now that's stopping", which is what SV companies have done since checks notes forever.
An LLM would have generated four pages on this topic in order to increase the token count!
LLMs are advertised for serious applications. I don't recall that CPUs generally hallucinate except for the FDIV bug. Or that AirBnB rents you apartments that don't exist in 30% of all cases. Or that Uber cars drive into a river during 20% of all rides.
Are we talking about economics, or about hallucinations?
"CPUs don't hallucinate" would be a reasonable argument if CPUs were an alternative to LLMs, which they aren't, so I'm not really sure what argument you're making there.
Seems like you're saying "a calculator makes fewer mistakes than an accountant", which is true, but I still pay an accountant to do my taxes, and not a calculator.
I was obviously responding to your "SV companies have been doing that forever". You have introduced the general topic.
I don't see how CPU bugs have anything to do with subsidizing a product to capture market share, can you elaborate?
Certainly!
Thinking ...
- The user is asking about the connection between CPU bugs and price dumping in order to capture market share.
- The user appears to miss the original thread starter that mentions cutting corners in models after the subsidy phase is over.
- The mention of CPUs, AirBnB and Uber appear to be examples where certain quality standards were usually kept even after the subsidy phase.
Generating response ...
if you don't want hallucinations:
- set temp to 0
- be more specific
But I'd argue that if your LLM isn't hallucinating, then it's useless
The Chinese open source models don't have this problem -- and they're state of the art!
Not saying that you are completely wrong, but you could try to rephrase this to make a better conversation.
I agree that many new model versions are worse than the previous. But it is also related to base rules of the model - they try to please you and manipulate you to like them, way too much.
If any open AI devs reading this comment section: is it possible for us to get api access at runable.com ?
Looks like a leak: https://platform.openai.com/docs/models does not list it, and codex-mini-latest says that it's based on 4o. I wonder if it will be faster than codex; gpt-5-nano and -mini are still very slow for me on API, surprisingly so.
They announced it on Twitter yesterday: https://x.com/OpenAIDevs/status/1986861734619947305 and https://x.com/OpenAIDevs/status/1986861736041853368
> GPT-5-Codex-Mini allows roughly 4x more usage than GPT-5-Codex, at a slight capability tradeoff due to the more compact model.
> Available in the CLI and IDE extension when you sign in with ChatGPT, with API support coming soon.
I noticed the same thing with -mini. It can be even slower than the full fat version. I'm guessing their infra for it is very cost-optimized to help them offer it at such a low price