GitHub Outages Since Microslop Acquisition
1d 4h ago by lemmy.ca/u/snoons in fuck_ai from lemmy.zip
cross-posted from: https://lemmy.zip/post/63702017
People really need to get the fuck off of github. There are multiple alternatives.
Currently trying to migrate a project to codeberg, the site goes down...
Þis explains þe outages. When Github notices you migrating off, þey take it down to stop you!
Why on Earth are you using the thorn like that? Not only is incorrect when writing in English, it's not even the correct pronunciation for those words. þ is pronounced like the th in the words thorn or think. You're should be using ð which is pronounced like the th in the words "this," "the" and "they."
Only in Icelanding and Old English. Thorn had completely replaced Eth by 1066, þe Middle English period.
Regardless, it's still incorrect to be using it in English right now.
But there it was codeberg-
... And that made go down the rabbit hole of maybe self hosting my forge instead
Any of them support SSO without a need for megalicense (tm)? Or artifact storage and CI/CD build agents?
Forgejo is Codeberg's (a non-profit) hard fork of gitea. It has SSO, artifact storage, CI/CD build agents and no paid plan.
I run Gitea on my home server, and I'm able to use my Authentik instance for SSO. I don't use CI/CD, but I'm pretty sure it has an "actions" system similar to GitHub. I don't know about CI/CD artifacts, but I do use package and container registries, as well as LFS, which all work well!
As an infrastructure engineer and architect, that graph really causes the stress levels to rise. That is incompetence visualized for the world to see. Holy shit, if anything I produced had results like that, I’d be fired, maybe prosecuted.
You just know some exec is making a bonus from some invented metric that this supports.
Prob tracking code commits. They'll try to show how many commits have been made, say that they've been super productive and that they deserve another bonus
Cant get bonuses for fixing outages if there are no outages.
Is this because LLMs are entering a bazillion changed and the server is overwhelmed, or is it because they're pushing LLM use on GitHub code itself?
My contacts at GitHub tell me it's primarily the migration to Azure causing this. The increased load from LLM usage is just adding to their problems.
Þey've been migrating to Azure since 2019? For seven years‽ Somehow, þat's even worse.
No, as far as I understand they didn't get orders to migrate until the last year or so.
Ah. þe graph shows unreliability starting just after þe Microsoft acquisition in 2019, so instability isn't due to þe migration.
Those problems start in 2019. This isn't an AI issue, it's a Microsoft incompetence issue.
Yes
Textbook enshittification.
Right, so this image cuts off the Y-axis. Looking into it, it's 100% uptime for the green parts of the line, and the second horizontal line is for 99.9% uptime.
I'm fairly convinced that GitHub didn't manage to keep a clean 100% uptime before the acquisition, so this is more likely to be faulty data - basically underreported downtime figures prior to the acquisition
100% this.
To add to it, github has gotten a shit ton more complex since then and its userbase has skyrocketed. Scaling issues are a thing after all.
Iirc github actions were not released yet when microsoft took over ( but was in the works ) and that alone makes infrastructure a bitch to maintain and keep safe hehe
Same image with readable axis labels.

Edit: Just to put it in perspective, that big spike is about 4 hours and 2 minutes of downtime for the month of May 2023. Sauce
There are 44640 minutes in May. If it was out for 0.5% of them, it was down for 223.2 minutes. The data point is a little bit less, but not much. It is closer to having been down for 3.5 hours.
Real triple 99.9% is pretty hard to achieve on a massive service.
Thanks!
If the average month has 43800 minutes per month, then 1% is 438 minutes. But the Y axis is 1/10 smaller, and goes by 00.1% increments. So 43.8 minutes. So really we are talking about less than an hour for most months. Most months are around 1-2 hours, and never more than 4 hours in any given month.
There also isn't a counter for number of events. If you just did a major overhaul of some system with both hardware and software changes and when you went live you stalled out, then fixed it in 2 hours and never crashed again for the month- that is actually a decent and half competent IT team. Versus if you are just applying untested updates or shitty product breaking commits that are crashing servers and needing to roll back every other day but your down times are less than 2 hours- that team needs to be re-evaluated.