Lots of downtime on CC the last few days. They also pushed an bad release to CC that kept doing 'No response from API · Retrying in .... check your network'. 'claude install stable' fixed that. How I was not on stable I have no idea. I can only guess how many tokens I was billed for sending requests that CC got 'No response from API' for. I have a feeling the big AI vendors won't have the loyalty that IDEs and other dev tools generate. They really need to work on trust now to avoid people hopping later.
They have been screwing around in the back-end, for a while. They are doing something and not being transparent, re-routing the models, degrading performance and intelligence and all week post-fable ban the quality was garbage, last day and a half it improved, last half an hour degraded, now it's down.
My speculation is that we'll get fable back tomorrow.
Usually we see messy devops stuff or maybe nerfing(not much these days) around the time they're releasing new models
Times like this remind me that despite GLM and Codex and other models being hyped up as Claude Opus 4.8 replacements, I still would not trust them with my most important work. For example, right now I'm working on a huge refactoring project, and even Opus has struggled with it after several days. I cannot even imagine how GLM, Codex, or other models would handle this. So the only option for me is to wait until this outage is over.
And it's not like open models are cheap to run even as alternatives. For example, with my $100/mo subscription for Claude Code, I often burn more than $100 a day several times a week. But if I were to use the API of GLM, it would be about $300.
Gpt 5.5 uses a third of the opus 4.8 tokens for the same task and scores higher. Glm 5.2 was worse in quality but used half the tokens - 5.3 is not tested yet but will be higher.
It depends what you’re doing. I have no problem using an open source model to build a static website or other simple tasks. And of course, open source is catching up and the things you can trust it with grows every month.
Not to mention the way the proprietary models patronize you if they think you’re up to no good. The other day I was trying to use Claude to transcribe a song on YouTube to sheet music; the link was broken and it had not been available for purchase for years. Of course I first had to prove to Claude that I was not just being cheap and trying to get around paying the $5 for the download, which I would have had no problem with. “I am going to be up front with you —- my research raises some red flags with your initial premise that the sheet music is no longer available…”
I think in the next 5-10 years inference costs will come down substantially and the open source models will get so advanced that only someone in an extremely niche, cutting edge field will need a frontier model. Everyone else will have a dedicated box somewhere on their network that runs their LLM of choice. No tokens, none of your data getting sent to a third party, no arguing with it over whether or not a link is actually broken or if you’re just trying to be cheap.
I think OpenAI and Anthropic realize this which is the reason they’re in a rush to go public.
z.ai has subscription based plans if you want to use them to use GLM. (Same for Codex, although I'm not going to pretend GPT-5.5 is as strong of a model as Opus.)
I would give GLM a try. I'm shocked at how well it's been able to handle some things I've thrown at it.
A lot of the perception of open source models being garbage is that they're still using the same piss-poor sampling algorithms that OpenAI/Anthropic force on their users, i.e. Top-p, top-k.
These lead to small accumulation of sampling errors which makes it all but inevitable that open source models will shit the bed by the 200K token mark or even sooner.
If you set your opencode to use a good sampling algorithm, such as min_p or top-n sigma (llamacpp supports both), you'll find that at least for long running tasks, your model gets a lot better.
It won't make GLM as good as Opus 4.8, but it will stop the feeling of "brain damage" from running open source models at the edge of their context windows.
And yes, there is an upcoming (hopefully NeurIPS) paper titled "Long Context Generation is a Sampling Problem" for more details about this. Give it two months and it'll be on Arxiv one way or another.
Lots of downtime on CC the last few days. They also pushed an bad release to CC that kept doing 'No response from API · Retrying in .... check your network'. 'claude install stable' fixed that. How I was not on stable I have no idea. I can only guess how many tokens I was billed for sending requests that CC got 'No response from API' for. I have a feeling the big AI vendors won't have the loyalty that IDEs and other dev tools generate. They really need to work on trust now to avoid people hopping later.
529s as a forcing function for taking a walk
/rc has improved my health
API Error: 529 Overloaded. This is a server-side issue, usually temporary — try again in a moment. If it persists, check https://status.claude.com.
we may be coming back online
Per their status page, the main product now has one 9 of uptime.
Hopefully still in the "tens" digit!
Good news! We have 6 9s of reliability!
Bad news! You need 128 bit floating point to see them.
88.888% uptime
Auspicious uptime!
They have been screwing around in the back-end, for a while. They are doing something and not being transparent, re-routing the models, degrading performance and intelligence and all week post-fable ban the quality was garbage, last day and a half it improved, last half an hour degraded, now it's down.
I noticed it in real time, unfortunately. Perhaps the token bonfire I've been feeding all day is to blame...
Better buy more capacity from SpaceX.
Interestingly not Opus 4.5 apparently, which is still available via API.
Perhaps Claude approved the wrong compute allocation plan
How can you switch vsclaude to opus 4.5?
Sonnet 5 here we come
This mess sounds like a begining to skynet:)
This happens all the time. Nothing to see here
It has been like that all of last week
Maybe fable coming back
Seen this yesterday as well.
My speculation is that we'll get fable back tomorrow. Usually we see messy devops stuff or maybe nerfing(not much these days) around the time they're releasing new models
I have two sessions going. It’s my fault. Sorry guys.
Rookie numbers
dang et al,
"Service X is down" is not "news". Kinda feels like points-farming.
Anyone caring that X works is gonna know it's down - they can't work! And probably why they are at HN ;)
HN as an "is service X down?" detector is less reliable than trying the actual service :)
If HN is gonna keep letting people post "X is down" no worries. But it seems worth a flag.
Times like this remind me that despite GLM and Codex and other models being hyped up as Claude Opus 4.8 replacements, I still would not trust them with my most important work. For example, right now I'm working on a huge refactoring project, and even Opus has struggled with it after several days. I cannot even imagine how GLM, Codex, or other models would handle this. So the only option for me is to wait until this outage is over.
And it's not like open models are cheap to run even as alternatives. For example, with my $100/mo subscription for Claude Code, I often burn more than $100 a day several times a week. But if I were to use the API of GLM, it would be about $300.
Since you cannot imagine how they'd perform, isn't this the perfect opportunity to test your assumption?
yeah and lets not forget codex and glm have subscriptions too, with even more usage per dollar
but they also burn more tokens per task, so in the end, Claude comes out as the more efficient one, despite giving you less tokens.
You've got it backwards. Opus is the token/money burning one https://deepswe.datacurve.ai/
Gpt 5.5 uses a third of the opus 4.8 tokens for the same task and scores higher. Glm 5.2 was worse in quality but used half the tokens - 5.3 is not tested yet but will be higher.
It depends what you’re doing. I have no problem using an open source model to build a static website or other simple tasks. And of course, open source is catching up and the things you can trust it with grows every month.
Not to mention the way the proprietary models patronize you if they think you’re up to no good. The other day I was trying to use Claude to transcribe a song on YouTube to sheet music; the link was broken and it had not been available for purchase for years. Of course I first had to prove to Claude that I was not just being cheap and trying to get around paying the $5 for the download, which I would have had no problem with. “I am going to be up front with you —- my research raises some red flags with your initial premise that the sheet music is no longer available…”
I think in the next 5-10 years inference costs will come down substantially and the open source models will get so advanced that only someone in an extremely niche, cutting edge field will need a frontier model. Everyone else will have a dedicated box somewhere on their network that runs their LLM of choice. No tokens, none of your data getting sent to a third party, no arguing with it over whether or not a link is actually broken or if you’re just trying to be cheap.
I think OpenAI and Anthropic realize this which is the reason they’re in a rush to go public.
z.ai has subscription based plans if you want to use them to use GLM. (Same for Codex, although I'm not going to pretend GPT-5.5 is as strong of a model as Opus.)
I would give GLM a try. I'm shocked at how well it's been able to handle some things I've thrown at it.
A lot of the perception of open source models being garbage is that they're still using the same piss-poor sampling algorithms that OpenAI/Anthropic force on their users, i.e. Top-p, top-k.
These lead to small accumulation of sampling errors which makes it all but inevitable that open source models will shit the bed by the 200K token mark or even sooner.
If you set your opencode to use a good sampling algorithm, such as min_p or top-n sigma (llamacpp supports both), you'll find that at least for long running tasks, your model gets a lot better.
It won't make GLM as good as Opus 4.8, but it will stop the feeling of "brain damage" from running open source models at the edge of their context windows.
And yes, there is an upcoming (hopefully NeurIPS) paper titled "Long Context Generation is a Sampling Problem" for more details about this. Give it two months and it'll be on Arxiv one way or another.
Fix Fable pls.