It has been pretty rough. Their own numbers report just a single `9` for Actions in Feb 2026 with 98% uptime. But that said -- I don't get the 90% number.
Anecdotally, it seems believable that 1 in 50 times (2%) in Feb that Actions barfed. Which is not very nice, but it wasn't at 1 in 10 times (10%).
It looks like the aggregate stats are more of a venn diagram than an average. So if 1/N services are down, the aggregate is considered down. I don't think this is an accurate way to calculate this. It should be weighted or in some way show partial outages. This belief is derived from the Google SRE book, in particular chapters 3 (embracing risk) and 4 (service level objectives)
If you're using all services, then any partial outage is essentially a full outage.
Of course, you can massage the numbers to make it look nicer in the way you described but the conservative approach is better for the customers.
If you insist, one could create this metric for selected services only to "better reflect users".
That being said, even when looking at the split uptimes, you'd have to do a very skewed weighting to achieve a number with more than one 9.
I mean I think it's useful. It answers the question, "what percentage of the time can I rely on every part of GitHub to work correctly?". The answer seems to be roughly 90% of the time.
Nobody cares about every part of GitHub working correctly. I mean, ok, their SREs are supposed to, but tabling the question of whether that's true: if tomorrow they announced a distributed no-op service with 100% downtime, you should not have the intuition that the overall availability of the platform is now worse.
An aggregate number like that doesn’t seem to be a reasonable measure. Should OpenAI models being unavailable in CoPilot because OpenAI has an outage be considered GitHub “downtime”?
I don't think that's a fair comparison. Google Maps, Google Calendar, Google Drive, Google Search, Google Chrome, Google Ads, etc. are all clearly completely different products which have very little to do each other, they're just made by the same company called Google.
GitHub is a different situation. There's one "thing" users interact with, github.com, and it does a bunch of related things. Git operations, web hooks, the GitHub API (and thus their CLI tool), issues, pull requests, Actions; it's all part of the one product users think of as "GitHub", even if they happen to be implemented as different services which can fail separately.
Completely agree, it makes it worse actually as Github's secondary functions so to speak are things we implicitely rely on.
When I merge to master I expect a deploy to follow. This goes through git, webhooks and actions. Especially the latter two can fail silently if you haven't invested time in observation tools.
If maps is down I notice it and immediately can pivot. No such option with Github.
From the point of view of an individual developer, it may be "fraction of tasks affected by downtime" - which would lie between the average and the aggregate, as many tasks use multiple (but not all) features.
But if you take the point of view of a customer, it might not matter as much 'which' part is broken. To use a bad analogy, if my car is in the shop 10% of the time, it's not much comfort if each individual component is only broken 0.1% of the time.
> But if you take the point of view of a customer, it might not matter as much 'which' part is broken. To use a bad analogy, if my car is in the shop 10% of the time, it's not much comfort if each individual component is only broken 0.1% of the time.
Not to go too out of my way to defend GH's uptime because it's obviously pretty patchy, but I think this is a bad analogy. Most customers won't have a hard reliability on every user-facing gh feature. Or to put it another way there's only going to be a tiny fraction of users who actually experienced something like the 90% uptime reported by the site. Most people are in practice are probably experienceing something like 97-98%.
Sorry, by 'customer' I meant to say something like a large corporate customer - you're buying the whole package, and across your org, you're likely to be a little affected by even minor outages of niche services.
But yeah, totally agree that at the individual level, the observed reliability is between 90% and 99%, and probably toward the upper end of that range.
A better analogy is if one bulb in the right rear brake light group is burnt out. Technically the car is broken. But realistically you will be able to do all the things you want to do unless the thing you want to do is measure that all the bulbs in your brake lights are working.
These are two pages telling two different things, albeit with the same stats. The information is presented by OP in a way to show the results of the Microsoft acquisition.
It’s biaised to show this without the dates at which features were introduced. A lot of the downtimes in the breakdown are GitHub Actions, which launched in August 2019; so yeah what a surprise there was no Actions downtime before because Actions didn’t exist.
You'd think they'd do all the testing elsewhere and use a much shorter window of time to implement Azure after testing. I don't think this fully explains over 6 years of poor uptime.
I got Claude to make me the exact same graph a few weeks ago! I had hypothesized that we'd see a sharp drop off, instead what I found (as this project also shows) is a rather messy average trend of outages that has been going on for some time.
The graph being all nice before the Microsoft acquisition is a fun narrative, until you realize that some products (like actions, announced on October 16th, 2018) didn't exist and therefore had no outages. Easy to correct for by setting up start dates, but not done here. For the rest that did exist (API requests, Git ops, pages, etc) I figured they could just as easily be explained with GitHub improving their observability.
It feels like they launched actions and it quickly turned out to be an operations and availability nightmare. Since then, they've been firefighting and now the problems have spread to previously stable things like issues and PRs
Github actions needs to go away. Git, in the linux mantra, is a tool written to do one job very well. Productizing it, bolting shit onto the sides of it, and making it more than it should be was/is a giant mistake.
The whole "just because we could doesn't mean we should" quote applies here.
The same philosophy would suggest that running some other command immediately following a particular (successful) git command is fine; it is composing relatively simple programs into a greater system. Other than the common security pitfalls of the former, said philosophy has no issue with using (for example) Jenkins instead of Actions.
I'm not a GitHub apologist, but that graph isn't at scale, at all. It's massively zoomed in, with a lower band of 99.5%. It makes it look far worse than it is.
If you plotted it from zero, then a horrible service and a great service would be indistinguishable. Their SLA for enterprise customers is 99.9%. The low end of that chart is 5x that amount downtime. It is a reasonable scale for the range people are concerned about and it looks bad because it is bad.
> If you started the y-axis at zero, you wouldn't see much of anything.
That's... kind of my point.
As a reliability engineer, I'm disappointed in GitHub's 99.5% availability periods, especially as they impact paying customers. On the other hand, most users are non-paying users, and a 99.5% availability for a free service seems to me to be a reasonable tradeoff relative to the potential cost of improving reliability for them.
I think the unicorn is only for web pages. Things like git api services might be broken independently (and often are!) and they might show up on the status page after some time.
I feel like by now GitHub has a worse downtime record than my self hosted services on my single server where I frequently experiment, stop services or reboot.
It's ok because we're still paying for it. QoS degradation is worth it. No need to have 99.999% then you can have 90.84% and still people to pay for it.
It is ridiculous how company owned by Microsoft, making non sense money on Azure, is let to die like this. That's have to be a soft of plan or something. So sad to watch it.
I'm convinced one of my org's repos is just haunted now. It doesn't matter what the status page says. I'll get a unicorn about twice a day. Once you have 8000 commits, 15k issues, and two competing project boards, things seem to get pretty bad. Fresh repos run crazy fast by comparison.
Nearly every time Github has an outage, Azure is having issues also.
Actually the last 4-5 outages from Github, Our Azure environments have issues (that they rarely post on the status page) and lo and behold I'll notice that Github is also having the same problem.
I can only assume most of this is from the Azure migration path. Such an abysmal platform to be on. I loathe it.
Looks like there's an internal service health bulletin:
Impact Statement: Starting at 19:53 UTC on 31 Mar 2026, some customers using the Key Vault service in the East US region may experience issues accessing Key Vaults. This may directly impact performing operations on the control plane or data plane for Key Vault or for supported scenarios where Key Vault is integrated with other Azure services.
Honestly all of the key vault functions are offline for us in that region. Just another day in paradise.
Also the fact that the azure status page remains green is normal. Just assume it's statically green unless enough people notice.
GitHub is 100x the size today with 100x the product surface area. Pre-Microsoft GitHub was just a git host. Now, whether GitHub should have become what it is today is a fair question but to say “GitHub” is less stable today vs. 10 years ago ignores the significant changes. Also, much of these incidents are limited to products that are unreliable by nature, e.g: CoPilot depends on OpenAI and OpenAI has outages. The entire LLM API industry expects some requests to fail.
GitHub’s reliability could stand to be improved but without narrowing down to products these sort of comparisons are meaningless.
And even just that aspect of the service is now extremely unreliable. If outages in the LLM side can cause that to break, that would indicate some serious architectural problems.
interesting to see the correlation between outages and major feature launches — the big ones almost always coincide with infrastructure changes rather than random failures. Would be curious to overlay this with GitHub's engineering blog posts about what was happening behind the scenes.
I will chime in that Jira and Bitbucket have drastically improved performance and reliability over this same time period. It actually feels snappy and they seem to listen to feedback.
When I say that Microsoft writes very bad code some people get offended. For example for Azure Event Hubs they have almost no documentation and Java libraries that mostly do not run.
I mean I'm as annoyed as the next person about the outages but I'm not sure correlating with the Microsoft acquisition tells the whole story? GitHub usage has been growing massively I'd imagine?
Nearly all the variance is from Actions, a product that didn’t exist beforehand.
It’s despicable to see everyone punching down on GitHub. Even under Microsoft they’ve continued to provide an invaluable and free service to open source developers .
And now , while vibe coders smother them to death, we ridicule them . Shameful , really
I was with you until your comment about vibe coders. Microsoft paid for and brought this vibe coding hell upon themselves. GitHub Copilot, investment in/partnership with OpenAI, and everything else they’ve done to enshitify software and the internet.
If it brings them down, they’ve only themselves to blame. More likely it’ll just hasten the end of free public repos, which will be a shame, but we’ll find other ways to share code that aren’t reliant on one semi-benevolent megacorp.
I’m grateful for GitHub and their support for open source, but they’re not getting any sympathy for the AI mess they’re generating (and they’re contributing more to the mess than many other organisations, due to their size, position and product strategy).
They’re a big enough corporation that we can have nuanced feelings about them. Simultaneously grateful for one part of what they do, and unsympathetic for the consequences of a different part of what they do.
Is the pre-2018 data actually accurate? There seem to have been a number of outages before then: https://hn.algolia.com/?dateEnd=1545696000&dateRange=custom&...
Maybe that's just the date when they started tracking uptime using this sytem?
Data comes from the official status page. It may be more a marketing/communication page than an observability page (especially before selling)
Even better IMO is this status page: https://mrshu.github.io/github-statuses/
"The Missing GitHub Status Page" with overall aggregate percentages. Currently at 90.84% over the last 90 days. It was at 90.00% a couple days ago.
It has been pretty rough. Their own numbers report just a single `9` for Actions in Feb 2026 with 98% uptime. But that said -- I don't get the 90% number.
Anecdotally, it seems believable that 1 in 50 times (2%) in Feb that Actions barfed. Which is not very nice, but it wasn't at 1 in 10 times (10%).
It looks like the aggregate stats are more of a venn diagram than an average. So if 1/N services are down, the aggregate is considered down. I don't think this is an accurate way to calculate this. It should be weighted or in some way show partial outages. This belief is derived from the Google SRE book, in particular chapters 3 (embracing risk) and 4 (service level objectives)
https://sre.google/sre-book/embracing-risk/
https://sre.google/sre-book/service-level-objectives/
That's how you count uptime. You system is not up if it keeps failing when the user does some thing.
The problem here is the specification of what the system is. It's a bit unfair to call GH a single service, but it's how Microsoft sells it.
> That's how you count uptime.
It's not how I and many others calculate uptime. There is not uniformity, especially when you look at contracts.
If you're using all services, then any partial outage is essentially a full outage. Of course, you can massage the numbers to make it look nicer in the way you described but the conservative approach is better for the customers. If you insist, one could create this metric for selected services only to "better reflect users".
That being said, even when looking at the split uptimes, you'd have to do a very skewed weighting to achieve a number with more than one 9.
> That being said, even when looking at the split uptimes, you'd have to do a very skewed weighting to achieve a number with more than one 9.
It's definitely bad no matter how it you slice the pie
I mean I think it's useful. It answers the question, "what percentage of the time can I rely on every part of GitHub to work correctly?". The answer seems to be roughly 90% of the time.
Nobody cares about every part of GitHub working correctly. I mean, ok, their SREs are supposed to, but tabling the question of whether that's true: if tomorrow they announced a distributed no-op service with 100% downtime, you should not have the intuition that the overall availability of the platform is now worse.
I don't use half of the services, the answer is not straight forward
https://mrshu.github.io/github-statuses/
In a nutshell, why would the consumer care (for the SLO) care about how the vendor sliced the solution into microservices?
It will depend on the contract.
When I was at IBM, they didn't meet their SLOs for Watson and customers got a refund for that portion of their spend
An aggregate number like that doesn’t seem to be a reasonable measure. Should OpenAI models being unavailable in CoPilot because OpenAI has an outage be considered GitHub “downtime”?
As long as they brand it as a part of GitHub by calling it "GitHub Copilot" and integrate it into the GitHub UI, I think it's fair game.
What is Google's uptime (including every single little thing with Google in the name)?
I don't think that's a fair comparison. Google Maps, Google Calendar, Google Drive, Google Search, Google Chrome, Google Ads, etc. are all clearly completely different products which have very little to do each other, they're just made by the same company called Google.
GitHub is a different situation. There's one "thing" users interact with, github.com, and it does a bunch of related things. Git operations, web hooks, the GitHub API (and thus their CLI tool), issues, pull requests, Actions; it's all part of the one product users think of as "GitHub", even if they happen to be implemented as different services which can fail separately.
Completely agree, it makes it worse actually as Github's secondary functions so to speak are things we implicitely rely on.
When I merge to master I expect a deploy to follow. This goes through git, webhooks and actions. Especially the latter two can fail silently if you haven't invested time in observation tools.
If maps is down I notice it and immediately can pivot. No such option with Github.
I think reasonable people can disagree on this.
From the point of view of an individual developer, it may be "fraction of tasks affected by downtime" - which would lie between the average and the aggregate, as many tasks use multiple (but not all) features.
But if you take the point of view of a customer, it might not matter as much 'which' part is broken. To use a bad analogy, if my car is in the shop 10% of the time, it's not much comfort if each individual component is only broken 0.1% of the time.
> But if you take the point of view of a customer, it might not matter as much 'which' part is broken. To use a bad analogy, if my car is in the shop 10% of the time, it's not much comfort if each individual component is only broken 0.1% of the time.
Not to go too out of my way to defend GH's uptime because it's obviously pretty patchy, but I think this is a bad analogy. Most customers won't have a hard reliability on every user-facing gh feature. Or to put it another way there's only going to be a tiny fraction of users who actually experienced something like the 90% uptime reported by the site. Most people are in practice are probably experienceing something like 97-98%.
Sorry, by 'customer' I meant to say something like a large corporate customer - you're buying the whole package, and across your org, you're likely to be a little affected by even minor outages of niche services.
But yeah, totally agree that at the individual level, the observed reliability is between 90% and 99%, and probably toward the upper end of that range.
A better analogy is if one bulb in the right rear brake light group is burnt out. Technically the car is broken. But realistically you will be able to do all the things you want to do unless the thing you want to do is measure that all the bulbs in your brake lights are working.
Or if your kettle is not working the house is considered not working?
These are two pages telling two different things, albeit with the same stats. The information is presented by OP in a way to show the results of the Microsoft acquisition.
It’s biaised to show this without the dates at which features were introduced. A lot of the downtimes in the breakdown are GitHub Actions, which launched in August 2019; so yeah what a surprise there was no Actions downtime before because Actions didn’t exist.
You can click on "Breakdown" and then on "Actions" to hide it.
Check the breakdown page. Like yes the magnitude is reduced obviously for individual services. But they all show the same trend.
Even worse, those features show "100% uptime" pre-existence on the breakdowns page too.
FWIW if people are looking for a reason why, here's why I think it's happening: https://thenewstack.io/github-will-prioritize-migrating-to-a...
It's absolutely this. Our Azure outages correlate heavily with Github outages. It's almost a meme for us at this point.
You'd think they'd do all the testing elsewhere and use a much shorter window of time to implement Azure after testing. I don't think this fully explains over 6 years of poor uptime.
It certainly explains the issues _now_, IMO.
The fact that even they struggle with github actions is a real testimate to the fact that nobody wants to host their own CD workers.
I got Claude to make me the exact same graph a few weeks ago! I had hypothesized that we'd see a sharp drop off, instead what I found (as this project also shows) is a rather messy average trend of outages that has been going on for some time.
The graph being all nice before the Microsoft acquisition is a fun narrative, until you realize that some products (like actions, announced on October 16th, 2018) didn't exist and therefore had no outages. Easy to correct for by setting up start dates, but not done here. For the rest that did exist (API requests, Git ops, pages, etc) I figured they could just as easily be explained with GitHub improving their observability.
It feels like they launched actions and it quickly turned out to be an operations and availability nightmare. Since then, they've been firefighting and now the problems have spread to previously stable things like issues and PRs
Github actions needs to go away. Git, in the linux mantra, is a tool written to do one job very well. Productizing it, bolting shit onto the sides of it, and making it more than it should be was/is a giant mistake.
The whole "just because we could doesn't mean we should" quote applies here.
But GitHub actions is not Git?
The same philosophy would suggest that running some other command immediately following a particular (successful) git command is fine; it is composing relatively simple programs into a greater system. Other than the common security pitfalls of the former, said philosophy has no issue with using (for example) Jenkins instead of Actions.
PR merging broken right now https://www.githubstatus.com/incidents/ml7wplmxbt5l
I'm not a GitHub apologist, but that graph isn't at scale, at all. It's massively zoomed in, with a lower band of 99.5%. It makes it look far worse than it is.
If you plotted it from zero, then a horrible service and a great service would be indistinguishable. Their SLA for enterprise customers is 99.9%. The low end of that chart is 5x that amount downtime. It is a reasonable scale for the range people are concerned about and it looks bad because it is bad.
It's an uptime chart and shouldn't need to show much more than the 99% range.
If you started the y-axis at zero, you wouldn't see much of anything. Logarithmic scale would still be a bit much imo.
> If you started the y-axis at zero, you wouldn't see much of anything.
That's... kind of my point.
As a reliability engineer, I'm disappointed in GitHub's 99.5% availability periods, especially as they impact paying customers. On the other hand, most users are non-paying users, and a 99.5% availability for a free service seems to me to be a reasonable tradeoff relative to the potential cost of improving reliability for them.
Historical *
https://www.merriam-webster.com/grammar/everything-youve-eve...
I remember a lot of unicorn pages back in the days. Maybe the status page was just not updated that regularly back then?
I think the unicorn is only for web pages. Things like git api services might be broken independently (and often are!) and they might show up on the status page after some time.
I feel like by now GitHub has a worse downtime record than my self hosted services on my single server where I frequently experiment, stop services or reboot.
It's ok because we're still paying for it. QoS degradation is worth it. No need to have 99.999% then you can have 90.84% and still people to pay for it.
Those electricity savings can better used to fuel the token bonfire
It does have a worse downtime record than my tiny VPS that has a recurrent packet routing problem and keeps going offline. Measurably so.
It’s actually great to see a living example of how sensitive users* are to what to a lay person would look like a small amount of downtime.
The fact that we’re all talking about it, and not at all surprised, is a great example we can take when making the case for more 9’s of reliability.
* well, very technical power users.
The biggest spikes are Github Actions, starting November 2019. They didn't go GA until November 13, 2019: https://siliconangle.com/2019/11/13/github-universe-announce...
Unsolicited feedback ... changing the y-axis to be hours (not % uptime) might be more intuitive for folks to understand.
The data is there, you just have to hover over each data point.
It could even be both % and offline hours per year. To me the percentage is simpler to understand.
It is ridiculous how company owned by Microsoft, making non sense money on Azure, is let to die like this. That's have to be a soft of plan or something. So sad to watch it.
I'm convinced one of my org's repos is just haunted now. It doesn't matter what the status page says. I'll get a unicorn about twice a day. Once you have 8000 commits, 15k issues, and two competing project boards, things seem to get pretty bad. Fresh repos run crazy fast by comparison.
It could also be that they have more customers / clients now, or offer more capabilities.
Nearly every time Github has an outage, Azure is having issues also.
Actually the last 4-5 outages from Github, Our Azure environments have issues (that they rarely post on the status page) and lo and behold I'll notice that Github is also having the same problem.
I can only assume most of this is from the Azure migration path. Such an abysmal platform to be on. I loathe it.
Looks like there's an internal service health bulletin:
Impact Statement: Starting at 19:53 UTC on 31 Mar 2026, some customers using the Key Vault service in the East US region may experience issues accessing Key Vaults. This may directly impact performing operations on the control plane or data plane for Key Vault or for supported scenarios where Key Vault is integrated with other Azure services.
Honestly all of the key vault functions are offline for us in that region. Just another day in paradise.
Also the fact that the azure status page remains green is normal. Just assume it's statically green unless enough people notice.
Do we have metrics for the uptime of other major services? Would be interesting to see if this is just a GitHub problem or industry-wide.
Bitbucket Cloud incident history: https://bitbucket.status.atlassian.com/history
Though I will be the first to say I don't fully trust it based on the flakey git clone errors we see in CI.
GitHub is 100x the size today with 100x the product surface area. Pre-Microsoft GitHub was just a git host. Now, whether GitHub should have become what it is today is a fair question but to say “GitHub” is less stable today vs. 10 years ago ignores the significant changes. Also, much of these incidents are limited to products that are unreliable by nature, e.g: CoPilot depends on OpenAI and OpenAI has outages. The entire LLM API industry expects some requests to fail.
GitHub’s reliability could stand to be improved but without narrowing down to products these sort of comparisons are meaningless.
> Pre-Microsoft GitHub was just a git host.
And even just that aspect of the service is now extremely unreliable. If outages in the LLM side can cause that to break, that would indicate some serious architectural problems.
The article provides a way to do just that - click breakdown then you can deselect any product areas.
Just the Git operations show way more instability post acquisition.
How much of the downtime is due to all the AI code being committed?
Programming is a solved problem, btw.
This at least makes me feel like I am not going crazy when I say "Github used to be much more reliable before Microsoft bought them"
The significance of the changeover would be much more impactful if the chart showed a longer history.
interesting to see the correlation between outages and major feature launches — the big ones almost always coincide with infrastructure changes rather than random failures. Would be curious to overlay this with GitHub's engineering blog posts about what was happening behind the scenes.
I wonder if they got moved to Azure in 2019?
I will chime in that Jira and Bitbucket have drastically improved performance and reliability over this same time period. It actually feels snappy and they seem to listen to feedback.
When I say that Microsoft writes very bad code some people get offended. For example for Azure Event Hubs they have almost no documentation and Java libraries that mostly do not run.
I mean I'm as annoyed as the next person about the outages but I'm not sure correlating with the Microsoft acquisition tells the whole story? GitHub usage has been growing massively I'd imagine?
I guess "centralizing everything" to GitHub was never a good idea and called it 6 years ago. [0]
Looking at this now, you might as well self host and you would still get better uptime than GitHub.
[0] https://news.ycombinator.com/item?id=22867803
That's pretty stark.
hot take: I would accept ads under every PR comment in GitHub if we could get back to 3 or 4 nines of reliability.
Nearly all the variance is from Actions, a product that didn’t exist beforehand.
It’s despicable to see everyone punching down on GitHub. Even under Microsoft they’ve continued to provide an invaluable and free service to open source developers .
And now , while vibe coders smother them to death, we ridicule them . Shameful , really
I was with you until your comment about vibe coders. Microsoft paid for and brought this vibe coding hell upon themselves. GitHub Copilot, investment in/partnership with OpenAI, and everything else they’ve done to enshitify software and the internet.
If it brings them down, they’ve only themselves to blame. More likely it’ll just hasten the end of free public repos, which will be a shame, but we’ll find other ways to share code that aren’t reliant on one semi-benevolent megacorp.
The smothering would happen with or without Copilot. This just sounds like an excuse to be ungrateful .
I hope GitHub shuts down free tier , maybe developers will finally be grateful .
I’m grateful for GitHub and their support for open source, but they’re not getting any sympathy for the AI mess they’re generating (and they’re contributing more to the mess than many other organisations, due to their size, position and product strategy).
They’re a big enough corporation that we can have nuanced feelings about them. Simultaneously grateful for one part of what they do, and unsympathetic for the consequences of a different part of what they do.