I don't know if any of this is true, but as a user of Azure every day this would explain so much.
The Azure UI feels like a janky mess, barely being held together. The documentation is obviously entirely written by AI and is constantly out of date or wrong. They offer such a huge volume of services it's nearly impossible to figure out what service you actually want/need without consultants, and when you finally get the services up who knows if they actually work as advertised.
I'm honestly shocked anything manages to stay working at all.
We migrated some services to AKS because the upper management thought it was a good deal to get so many credits, and now pods are randomly crashing and database nodes have random spikes in disk latency. What ran reliably on GCP became quite unpredictable.
What are we reading here? These are extraordinary statements. Also with apparent credibility. They sound reasonable. Is this a whistleblower or an ex employee with a grudge? The appearance is the first. Is it? They’ve put their name to some clear and worrying statements.
> On January 7, 2025… I sent a more concise executive summary to the CEO. … When those communications produced no acknowledgment, I took the customary step of writing to the Board through the corporate secretary.
Why is that customary? I have not come across it, and though I have seen situations of some concern in the past, I previously had little experience with US corporate norms. What is normal here for such a level of concern?
More, why is this public not a court case for wrongful termination?
Is Azure really this unreliable? There are concrete numbers in this blog. For those who use Azure, does it match your external experience?
In my experience Azure is full of consistency issues and race conditions. It's enough of an issue that I was talking about new OpenAI models becoming available via Bedrock on AWS and how convenient that was since I wouldn't have to deal with Azure and my colleague in enterprise architecture went on an unprompted rant about these exact issues. It's not the first time something like this has happened and I've experienced these issues first hand, so yes. I'd say reliability is a critical issue for Azure and it hasn't gotten better each time I've gone back to check.
I recall seeing some pretty damning reports from a security pentester that was able to escape from a container on Azure and found the management controller for the service was years old with known critical unpatched vulnerabilities. Always been a bit sceptical of them since then
The CEO is accountable to the board. If they are derelict in their obligations to the company, that's where you need to raise a stink so they can fix it.
Well, yeah, that’s what a board does, but I think the issue is whether it is customary to go to the board directly in this situation. The answer is a resounding NO. Very odd, but cool idea and approach.
Some previous colleague of mine has to work with Azure on their day to day, and everything explained in this article makes a lot of sense when I get to hear about their massive rantings of the platform.
12 years ago I had to choose whether to specialize myself in AWS, GCP or Azure, and from my very brief foray with Azure I could see it was an absolute mess of broken, slow and click-ops methodology. This article confirms my suspicions at that time, and my colleague experience.
The post is so dramatized and clearly written by someone with a grudge such that it really detracts from any point that is trying to be made, if there is any.
From another former Az eng now elsewhere still working on big systems, the post gets way way more boring when you realize that things like "Principle Group Manager" is just an M2 and Principal in general is L6 (maybe even L5) Google equivalent. Similarly Sev2 is hardly notable for anyone actually working on the foundational infra. There are certainly problems in Azure, but it's huge and rough edges are to be expected. It mostly marches on. IMO maturity is realizing this and working within the system to improve it rather than trying to lay out all the dirty laundry to an Internet audience that will undoubtedly lap it up and happily cry Microslop.
Last thing, the final part 6 comes off as really childish, risks to national security and sending letters to the board, really? Azure is still chugging along apparently despite everything being mentioned. People come in all the time crying that everything is broken and needs to be scrapped and rewritten but it's hardly ever true.
AWS and Google Cloud are both huge and are significantly better in UX/DX. My only experience with Azure was that it barely worked, provided very little in the way of information about why it didn't. I only have negative impressions of Azure whereas at least GC and AWS I can say my experiences are mixed.
> People come in all the time crying that everything is broken and needs to be scrapped and rewritten but it's hardly ever true.
Or… you’ve just normalised the deviation.
One of the few reliable barometers of an organisation (or their products) is the wtf/day exclaimed by new hires.
After about three or four weeks everyone adapts, learns what they can and can’t criticise without fallout, and settles into the mud to wallow with everyone else that has become accustomed to the filth.
As an Azure user I can tell you that it’s blindingly obvious even from the outside that the engineering quality is rock bottom. Throwing features over the fence as fast as possible to catch up to AWS was clearly the only priority for over a decade and has resulted in a giant ball of mud that now they can’t change because published APIs and offered products must continue to have support for years. Those rushed decisions have painted Azure into a corner.
You may puff your chest out, and even take legitimate pride in building the second largest public cloud in the world, but please don’t fool yourself that the quality of this edifice is anything other than rickety and falling apart at the seams.
Remind me: can I use IPv6 safely yet? Does it still break Postgres in other networks? Can azcopy actually move files yet, like every other bulk copy tool ever made by man? Can I upgrade a VM in-place to a new SKU without deleting and recreating it to work around your internal Hyper-V cluster API limitations? Premium SSDv2 disks for boot disks… when? Etc…
You may list excuses for these quality gaps, but these kinds of things just weren’t an issue anywhere else I’ve worked as far back as twenty years ago! Heck, I built a natively “all IPv6” VMware ESXi cluster over a decade ago!
> The post is so dramatized and clearly written by someone with a grudge such that it really detracts from any point that is trying to be made, if there is any
I guessed that from the title on the main hn page. Glad to see it confirmed.
This reads pretty bad, and I believe it was. I worked on (and was at least partly responsible for) systems that do the same thing he described. It took constant force of will, fighting, escalation, etc to hold the line and maintain some basic level of stability and engineering practice.
And I've worked other places that had problems similar to the core problems described, not quite as severe, and not at the same scale, but bad enough to doom them (IMO) to a death loop they won't recover from.
> Microsoft, meanwhile, conducted major layoffs—approximately 15,000 roles across waves in May and July 2025 —most likely to compensate for the immediate losses to CoreWeave ahead of the next earnings calls.
This is what people should know when seeing massive layoffs due to AI.
> The direct corollary is that any successful compromise of the host can give an attacker access to the complete memory of every VM running on that node. Keeping the host secure is therefore critical.
> In that context, hosting a web service that is directly reachable from any guest VM and running it on the secure host side created a significantly larger attack surface than I expected.
His writing style is fairly over the top (he is Swiss, and I have seen this before, but not most of the time), but most of the technical content seems true to me.
I had the misfortune of having to use Azure back in 2018 and was appalled at the lack of quality, slowness. I was in GitHub forums, helping other customers suffering from lack of basic functionality, incredible prices with abysmal performance. This article explains a lot honestly.
Google’s Cloud feels like the best engineered one, though lack of proper human support is worrying there compared to AWS.
Any complex system - and these cloud systems must be immensely complex - accumulate cruft and bloat and bugs until the entire thing starts to look like an old hotel that hasn’t been renovated in 30 years.
I downvoted this comment for sounding like a summarizing LLM, not adding anything substantial beyond the title of the post, before realizing you were the poster and author.
What's your assessment of AWS and GCP? Do you think it's likely they suffer from some of the same issues (eg the manual access of what should be highly secure, private systems, the instability, the lack of security)?
I don't know if any of this is true, but as a user of Azure every day this would explain so much.
The Azure UI feels like a janky mess, barely being held together. The documentation is obviously entirely written by AI and is constantly out of date or wrong. They offer such a huge volume of services it's nearly impossible to figure out what service you actually want/need without consultants, and when you finally get the services up who knows if they actually work as advertised.
I'm honestly shocked anything manages to stay working at all.
We migrated some services to AKS because the upper management thought it was a good deal to get so many credits, and now pods are randomly crashing and database nodes have random spikes in disk latency. What ran reliably on GCP became quite unpredictable.
What are we reading here? These are extraordinary statements. Also with apparent credibility. They sound reasonable. Is this a whistleblower or an ex employee with a grudge? The appearance is the first. Is it? They’ve put their name to some clear and worrying statements.
> On January 7, 2025… I sent a more concise executive summary to the CEO. … When those communications produced no acknowledgment, I took the customary step of writing to the Board through the corporate secretary.
Why is that customary? I have not come across it, and though I have seen situations of some concern in the past, I previously had little experience with US corporate norms. What is normal here for such a level of concern?
More, why is this public not a court case for wrongful termination?
Is Azure really this unreliable? There are concrete numbers in this blog. For those who use Azure, does it match your external experience?
In my experience Azure is full of consistency issues and race conditions. It's enough of an issue that I was talking about new OpenAI models becoming available via Bedrock on AWS and how convenient that was since I wouldn't have to deal with Azure and my colleague in enterprise architecture went on an unprompted rant about these exact issues. It's not the first time something like this has happened and I've experienced these issues first hand, so yes. I'd say reliability is a critical issue for Azure and it hasn't gotten better each time I've gone back to check.
I recall seeing some pretty damning reports from a security pentester that was able to escape from a container on Azure and found the management controller for the service was years old with known critical unpatched vulnerabilities. Always been a bit sceptical of them since then
Yes it is that unreliable. Even when given free credits, I would rather pay for the offerings from Amazon/Google.
He is, I think, Swiss, perhaps a cultural difference?
Azure is when you have a different version of the same product/api in each region.
The CEO is accountable to the board. If they are derelict in their obligations to the company, that's where you need to raise a stink so they can fix it.
Well, yeah, that’s what a board does, but I think the issue is whether it is customary to go to the board directly in this situation. The answer is a resounding NO. Very odd, but cool idea and approach.
Some previous colleague of mine has to work with Azure on their day to day, and everything explained in this article makes a lot of sense when I get to hear about their massive rantings of the platform.
12 years ago I had to choose whether to specialize myself in AWS, GCP or Azure, and from my very brief foray with Azure I could see it was an absolute mess of broken, slow and click-ops methodology. This article confirms my suspicions at that time, and my colleague experience.
The post is so dramatized and clearly written by someone with a grudge such that it really detracts from any point that is trying to be made, if there is any.
From another former Az eng now elsewhere still working on big systems, the post gets way way more boring when you realize that things like "Principle Group Manager" is just an M2 and Principal in general is L6 (maybe even L5) Google equivalent. Similarly Sev2 is hardly notable for anyone actually working on the foundational infra. There are certainly problems in Azure, but it's huge and rough edges are to be expected. It mostly marches on. IMO maturity is realizing this and working within the system to improve it rather than trying to lay out all the dirty laundry to an Internet audience that will undoubtedly lap it up and happily cry Microslop.
Last thing, the final part 6 comes off as really childish, risks to national security and sending letters to the board, really? Azure is still chugging along apparently despite everything being mentioned. People come in all the time crying that everything is broken and needs to be scrapped and rewritten but it's hardly ever true.
AWS and Google Cloud are both huge and are significantly better in UX/DX. My only experience with Azure was that it barely worked, provided very little in the way of information about why it didn't. I only have negative impressions of Azure whereas at least GC and AWS I can say my experiences are mixed.
I think he did kind of point at the lack of seniority in the org, so I'm not sure he was trying to exaggerate with the titles.
I'm really struck that they have such Jr people in charge of key systems like that.
> People come in all the time crying that everything is broken and needs to be scrapped and rewritten but it's hardly ever true.
Or… you’ve just normalised the deviation.
One of the few reliable barometers of an organisation (or their products) is the wtf/day exclaimed by new hires.
After about three or four weeks everyone adapts, learns what they can and can’t criticise without fallout, and settles into the mud to wallow with everyone else that has become accustomed to the filth.
As an Azure user I can tell you that it’s blindingly obvious even from the outside that the engineering quality is rock bottom. Throwing features over the fence as fast as possible to catch up to AWS was clearly the only priority for over a decade and has resulted in a giant ball of mud that now they can’t change because published APIs and offered products must continue to have support for years. Those rushed decisions have painted Azure into a corner.
You may puff your chest out, and even take legitimate pride in building the second largest public cloud in the world, but please don’t fool yourself that the quality of this edifice is anything other than rickety and falling apart at the seams.
Remind me: can I use IPv6 safely yet? Does it still break Postgres in other networks? Can azcopy actually move files yet, like every other bulk copy tool ever made by man? Can I upgrade a VM in-place to a new SKU without deleting and recreating it to work around your internal Hyper-V cluster API limitations? Premium SSDv2 disks for boot disks… when? Etc…
You may list excuses for these quality gaps, but these kinds of things just weren’t an issue anywhere else I’ve worked as far back as twenty years ago! Heck, I built a natively “all IPv6” VMware ESXi cluster over a decade ago!
He might sound like he has a grudge but you sound like you’re personally invested. Shill?
> The post is so dramatized and clearly written by someone with a grudge such that it really detracts from any point that is trying to be made, if there is any
I guessed that from the title on the main hn page. Glad to see it confirmed.
This reads pretty bad, and I believe it was. I worked on (and was at least partly responsible for) systems that do the same thing he described. It took constant force of will, fighting, escalation, etc to hold the line and maintain some basic level of stability and engineering practice.
And I've worked other places that had problems similar to the core problems described, not quite as severe, and not at the same scale, but bad enough to doom them (IMO) to a death loop they won't recover from.
It's a nice read. Thank you for sharing this.
> Microsoft, meanwhile, conducted major layoffs—approximately 15,000 roles across waves in May and July 2025 —most likely to compensate for the immediate losses to CoreWeave ahead of the next earnings calls.
This is what people should know when seeing massive layoffs due to AI.
> The direct corollary is that any successful compromise of the host can give an attacker access to the complete memory of every VM running on that node. Keeping the host secure is therefore critical.
> In that context, hosting a web service that is directly reachable from any guest VM and running it on the secure host side created a significantly larger attack surface than I expected.
That is quite scary
Scary is the understatement of the day. I can't imagine the environment where someone think that architecture is a good idea.
"For fiscal 2025, Microsoft CEO Satya Nadella earned total pay of $96.5 million, up 22% from a year earlier." -CNBC.com
and
"I also see I have 2 instances of Outlook, and neither of those are working." -Artemis II astronaut
> 2 instances of Outlook
That's 2 too many.
They should have used the third outlook they didn't know about... Outlook, Outlook (new), and the well-hidden Outlook (classic) that actually works.
That outlook was part of the ablative outlook armor thats suppose to burn off on reentry
The first couple of paragraphs felt like a parody of a guy who goes to a diner and gets upset the waitress doesn’t address him as Dr.
It didn’t get any better.
His writing style is fairly over the top (he is Swiss, and I have seen this before, but not most of the time), but most of the technical content seems true to me.
I had the misfortune of having to use Azure back in 2018 and was appalled at the lack of quality, slowness. I was in GitHub forums, helping other customers suffering from lack of basic functionality, incredible prices with abysmal performance. This article explains a lot honestly.
Google’s Cloud feels like the best engineered one, though lack of proper human support is worrying there compared to AWS.
Title: How Microsoft Vaporized a Trillion Dollars
What a fascinating view into how the sausage is made
This is an insanely blunt look into some serious issues with microsoft.
Any complex system - and these cloud systems must be immensely complex - accumulate cruft and bloat and bugs until the entire thing starts to look like an old hotel that hasn’t been renovated in 30 years.
What an epic takedown.
Microsoft should have promoted this guy instead of laying him off.
Did Microsoft really lose OpenAI as a customer?
A former Azure Core engineer’s 6-part account of the technical and leadership decisions that eroded trust in Azure.
Why do you speak about yourself in the third person?
Also, after this:
https://news.ycombinator.com/item?id=20341022
You continued to work at Microsoft and now there is this takedown?
I'm no friend of MS (to put it very mildly) but it seems to me your story is a bit inconsistent as well as the 7 year break between postings.
I downvoted this comment for sounding like a summarizing LLM, not adding anything substantial beyond the title of the post, before realizing you were the poster and author.
What's your assessment of AWS and GCP? Do you think it's likely they suffer from some of the same issues (eg the manual access of what should be highly secure, private systems, the instability, the lack of security)?