This is basically a weaponized, highly destructive version of the old MySpace Samy worm. Hitting MediaWiki:Common.js is the absolute nightmare scenario for MediaWiki deployments because that script gets executed by literally every single visitor and editor across the entire site, creating a massive, instant propagation loop. The fact that it specifically targets admins and then uses jQuery to blind them by hiding the UI elements while it silently triggers Special:Nuke in the background is incredibly insidious. It really exposes the foundational danger of legacy web architectures that still allow executable JavaScript to be stored and served directly from user-editable namespaces. Cleaning this up is going to be an absolute forensic nightmare for the Wikimedia team since the database history itself is the active distribution vector.
> Cleaning this up is going to be an absolute forensic nightmare for the Wikimedia team since the database history itself is the active distribution vector.
Well, worm didn't get root -- so if wikimedia snapshots or made a recent backup, probably not so much of a nightmare? Then the diffs can tell a fairly detailed forensic story, including indicators of motive.
Snapshotting is a very low-overhead operation, so you can make them very frequently and then expire them after some time.
Even if they reset to several days ago and lose, say, thousands of edits, even tens of thousands of minor edits, they're still in a pretty good place. Losing a few days of edits is less-than-ideal but very tolerable for Wikipedia as a whole
At $work we're hosting business knowledge databases. Interestingly enough, if you need to revert a day or two of edits, you're better off to do it asap, over postponing and mulling over it. Especially if you can keep a dump or an export around.
People usually remember what they changed yesterday and have uploaded files and such still around. It's not great, but quite possible. Maybe you need to pull a few content articles out from the broken state if they ask. No huge deal.
If you decide to roll back after a week or so, editors get really annoyed, because now they are usually forced to backtrack and reconcile the state of the knowledge base, maybe you need a current and a rolled-back system, it may have regulatory implications and it's a huge pain in the neck.
Nah, you can snapshot every 15 minutes. The snapshot interval depends on the frequency of changes and their capacity, but it's up to them how to allocate these capacities... but it's definitely doable and there are real reasons for doing so. You can collapse deltas between snapshots after some time to make them last longer. I'd be surprised if they don't do that.
As an aside, snapshotting would have prevented a good deal of horror stories shared by people who give AI access to the FS. Well, as long as you don't give it root.......
obviously you can. but, what is the actual snapshot frequency? like, what is the timestamp of the last known good snapshot? that is what matters.
in any case, the comment you are replying to is a hypothetical, which correctly points out that even a day or two of lost edits is fine (not ideal, but fine). your reply doesnt engage with their comment at all.
> the comment you are replying to is a hypothetical, which correctly points out that even a day or two of lost edits is fine (not ideal, but fine). your reply doesnt engage with their comment at all.
I did engage, by pointing out that it wasn't relevant nor a realistic scenario for a competent sysadmin. (Did you read the OP?) That's a /you/ problem if you rely on infrequent backups, especially for a service with so much flux.
> what is the actual snapshot frequency? like, what is the timestamp of the last known good snapshot?
? Why would I know what their internal operations are?
>I did engage, by pointing out that it wasn't relevant nor a realistic scenario for a competent sysadmin.
>Why would I know what their internal operations are?
i mean... you must, right? you know that once-a-day snapshots is not relevant to this specific incident. you know that their sysadmins are apparently competent. i just assumed you must have some sort of insider information to be so confident.
Nowadays I refuse to do any serious work that isn't in source control anywhere besides my NAS that takes copy-on-write snapshots every 15 minutes. It has saved my butt more times than I can count.
The problem isn't the granularity of the backup but since the worm silently nukes pages, it's virtually impossible to reconcile the state before the attack and the current state, so you have to just forfeit any changes made since then and ask the contributors to do the leg work of reapplying the correct changes
Could you point to where you found the details of the exploit? It’s not in the linked page. Really interested. Especially the part about modifying it and the other users propagating it?
In short, a Wikimedia Foundation account was doing some sort of test which involved loading a large number of user scripts. They decided to just start loading random user scripts, instead of creating some just for this test.
The user who ran this test is a Staff Security Engineer at WMF, and naturally they decided to do this test under their highly-privileged Wikimedia Foundation staff account, which has permissions to edit the global CSS and JS that runs on every page.
One of those random scripts was a 2 year old malicious script from ruwiki. This script injects itself in the global Javascript on every page, and then in the userscripts of any user that runs into it, so it started spreading and doing damage really fast. This triggered tons of alerts, until the decision was made to turn the Wiki read-only.
The fact of this obvious LLM slop being at the top of this discussion is incredibly insidious. The "facts" it mentions are made up. Has this vapid style finally become so normalized that nobody is seeing it anymore?
I didn't even notice it until you pointed it out, but I checked that account's comment history and it uses em dashes. Also, "the database history itself is the active distribution vector" Is just semantic nonsense.
I still have a basic assumption that if something I'm reading doesn't make much sense to me, I probably just don't understand it. Over the last few years I've had to get used to the new assumption that it's because I'm reading LLM output.
I've also always used em-dashes, it's not a very reliable indicator. That style is a dead giveaway, though. Some of its comments seem to be written by a human, but several definitely aren't.
I've been spending less and less time here, the moderation is obviously overwhelmed and is losing the battle.
That user, epicprogrammer's comment history suggests alignment with the Musk/Thiel/Anduril/DoW/anti-Anthropic crowd who are incessantly trying to damage Wikipedia's reputation to push a "Grokipedia" where they can define the narrative.
I wouldn't be surprised if that group were the origin of this attack too.
Yes, but we did that over the last 15 years. We just never realized that's what we were seeing.
It only clicked for me a few weeks ago, in one thread or another here when I realized that no one could ever do what Google did once: Cloudflare and other antibot technologies have closed off traditional search-as-the-result-of-web-crawling permanently. It's not that no one will do it because they think there's no money in it, or that no one will do it because the upfront costs are gigantic... literally it can no longer be done.
There are still a few options. I recently had the idea of doing search engine queries on 9 search engines.
Mojeek is a good independent search browser, it isn't the best but at that Hackernews comment/analysis I was doing I found it to be the only one which worked for that case.
Brave exists too.
I know the situation is very critical/dire tho but there is still some chance. All be it quite small.
Mojeek IIRC, is operated by one single guy for 15 years.
It is true that they have a particularly robust, distributed backup system that can/has come in handy, but FWIW the timing matters to them. English Wikipedia receives ~2 edits per second, or 172,800 per day. Many of them are surely minor and/or automated, but still: 1,036,800 lost edits is a lot!
Filesystem & database snapshots are very cheap to make, you can make them every 15 minutes. You can expire old snapshots (or collapse the deltas between them) depending on the storage requirements.
In fact, as long as the malware is just doing deletes, you can just merge the two "timelines" by restoring the snapshot and then replaying all the edits but ignoring the deletes. Lost deletes really aren't much of a problem!
I've never understood why client-side execution is so heavy in modern web pages. Theoretically, the costs to execute it are marginal, but in practice, if I'm browsing a web page from a battery-powered device, all that compute power draining the battery not only affects how long I can use the device between charges, but is also adding wear to the battery, so I'll have to replace it sooner. Also, a lot of web pages are downright slow, because my phone can only perform 10s of billions of operations per second, which isn't enough to responsively arrange text and images (which are composited by dedicated hardware acceleration) through all of the client-side bloat on many modern web pages. If there was that much bloat on the server side, the web server would run out of resources with even moderate usage.
There's also a lot of client-side authentication, even with financial transactions, e.g. with iOS and Android locally verifying a users password, or worse yet a PIN or biographic information, then sending approval to the server. Granted, authentication of any kind is optional for credit card transactions in the US, so all the rest is security theater, but if it did matter, it would be the worst way to do it.
In the early 2010’s I worked for a company whose primary income was subscriptions to site protection services - one of which included cleaning up malware-infected Wordpress installations. I worked on the team that did this job.
This exact type of database-stored executable javascript was one of the most annoying types of infections to clean up.
Ok, so there are tons of mediawiki installations all over the internet. What do these operators do? Set their wikis to read-only mode, hang tight, and wait for a security patch?
There is nothing to do, the incident was not caused by a vulnerability in mediawiki.
Basically someone who had permissions to alter site js, accidentally added malicious js. The main solution is to be very careful about giving user accounts permission to edit js.
[There are of course other hardening things that maybe should be done based on lessons learned]
Well, admins (or anybody other than the developers / deployment pipeline) having permissions to alter the JS sounds like a significant vulnerability. Maybe it wasn't in the early 2000s, but unencrypted HTTP was also normal then.
> Well, admins (or anybody other than the developers / deployment pipeline) having permissions to alter the JS sounds like a significant vulnerability.
It's a common feature of CMS'es and "tag management systems." Its presence is a massive PITA to developers even _besides_ the security, but PMs _love them_, in my experience.
There are already tools and techniques to validate served JS is as-intended, and these techniques could be beefed up by adding browser checks. I've been surprised these haven't been widely adopted given the spate of recent JS-poisoning attacks.
Too much app logic in the client side (Javascript) has always been an attack vector. The more that can reasonably be server side, the more that can't be seen.
The amount of javascript is really beside the point here. The problem is that privileged users can easily edit the code without strong 2FA, allowing automatic propagation.
If they required 2FA every time you wanted to modify JS then it couldn't propagate automatically. Just requiring 2FA when you first log in wouldn't help, of course.
> Hitting MediaWiki:Common.js is the absolute nightmare scenario for MediaWiki deployments because that script gets executed by literally every single visitor
...except for us security wonks who have js turned off by default, don't enable it without good reason, disable it ASAP, and take a dim view of websites that require it.
Not too many years ago this behavior was the domain of Luddites and schizophrenics. Today it has become a useful tool in the toolbox of reasonable self-defense for anybody with UID 0.
Perhaps the WMF should re-evaluate just how specialsnowflake they think their UI is and see if, maybe just maybe, they can get by without js. Just a thought.
It warms my heart that there's basically a 0% chance that they ever approach this camp's viewpoint based on the Herculean effort it took to switch over to a slightly more modern frontend a few years back. I'm glad you don't think of yourself of a Luddite, but I think you're vastly overstating how open people are to a purely-static web.
Also, FWIW: Wikipedia is "specialsnowflake". If it isn't, that's merely because it was so specialsnowflake that there's now a healthy of ecosystem of sites that copied their features! It's far, far more capable than a simple blog, especially when you get into editing it.
The Wikipedia community takes a cavalier attitude towards security. Any user with "interface administrator" status can change global JavaScript or CSS for all users on a given Wiki with no review. They added mandatory 2FA only a few years ago...
Prior to this, any admin had that ability until it was taken away due to English Wikipedia admins reverting Wikimedia changes to site presentation (Mediaviewer).
But that's not all. Most "power users" and admins install "user scripts", which are unsandboxed JavaScript/CSS gadgets that can completely change the operation of the site. Those user scripts are often maintained by long abandoned user accounts with no 2 factor authentication.
Based on the fact user scripts are globally disabled now I'm guessing this was a vector.
The Wikimedia foundation knows this is a security nightmare. I've certainly complained about this when I was an editor.
But most editors that use the website are not professional developers and view attempts to lock down scripting as a power grab by the Wikimedia Foundation.
Wow. This worm is fascinating. It seems to do the following:
- Inject itself into the MediaWiki:Common.js page to persist globally, and into the User:Common.js page to do the same as a fallback
- Uses jQuery to hide UI elements that would reveal the infection
- Vandalizes 20 random articles with a 5000px wide image and another XSS script from basemetrika.ru
- If an admin is infected, it will use the Special:Nuke page to delete 3 random articles from the global namespace, AND use the Special:Random with action=delete to delete another 20 random articles
EDIT! The Special:Nuke is really weird. It gets a default list of articles to nuke from the search field, which could be any group of articles, and rubber-stamps nuking them. It does this three times in a row.
> Vandalizes 20 random articles with a 5000px wide image and another XSS script from basemetrika.ru
Note while this looks like its trying to trigger an xss, what its doing is ineffective, so basemetrika.ru would never get loaded (even ignoring that the domain doesnt exist)
As someone on the Wikipediocracy forums pointed out, basemetrika.ru does not exist. I get an NXDomain response trying to resolve it. The plot thickens.
I registered it about 40 minutes ago, but it seems the DNS has been cached by everyone as a result of the wikipedia hack & not even the NS is propagating. Can't get an SSL certificate .
Did you know… ukraine still lets Russian gas transit through ukraine territory? Making ukraine the largest sponsor of terrorism against ukraine?
Did you know, when war started it, ukraine was letting Russia make around $1 billion PER DAY for like a year before reducing that amount ? You didn’t know that. But hey, protesting by not letting some one buy .ru will certainly do damage to Putin!
reg.ru, the most popular registrar, sells .ru domains for $1.65, very little of which goes to the national registry. What is their profit on this domain, a couple of cents?
You have helped to bring peace by approximately zero nanoseconds, while doing absolutely nothing about western countries still buying massive amounts of natural resources from Putin. Tax income on their exports make the primary source of income for the federal budget, which directly funds the military.
Good virtue signaling, though. I'm completely disillusioned with the West, this is nothing new.
I wouldn't be surprised either. But the original formatting of the worm makes me think it was human written, or maybe AI assisted, but not 100% AI. It has a lot of unusual stylistic choices that I don't believe an AI would intentionally output.
I would. AI designed software in general does not include novel ideas. And this is the kind of novel software AI is not great at, because there's not much training data.
Of course it's very possible someone wrote it with AI help. But almost no chance it was designed by AI.
Also, I’m also surprised an XSS attack like hasn’t yet been actually used to harvest credentials like passwords through browser autofill[0].
It seems like the worm code/the replicated code only really attacks stuff on site. But leaking credentials (and obviously people reuse passwords across sites) could be sooo much worse.
I completely understand marking the software that controls drinking water as critical infrastructure- but at some point a state based cyber attack that just wipes wikipedia off the net is deeply damaging to our modern society’s ability to agree on common facts …
Just now thought “if Wikipedia vanished what would it mean … and it’s not on the level of safe drinking water, but it is a level.
That someone would need to restore some backups, and in the meantime, use mirrors.
Seriously, not that big of a deal. I don't know how many copies of Wikipedia are lying around but considering that archives are free to download, I guess a lot. And if you count text-only versions of the English Wikipedia without history and talk pages, it is literally everywhere as it is a common dataset for natural language processing tasks. It is likely to be the most resilient piece of data of that scale in existence today.
The only difficulty in the worst case scenario would be rebuilding a new central location and restarting the machinery with trusted admins, editors, etc... Any of the tech giants could probably make a Wikipedia replacement in days, with all data restored, but it won't be Wikipedia.
What you're suggesting is literally impossible. There are plenty of mirrors and random people that download the thing in its entirety. The entire planet would have to be nuked for that to be possible.
Not the GP, and I don't believe in the existence of "common facts" in general, but Wikipedia is indeed a good place to figure out what other people might agree as common facts...
There are so many mirrors anyway and trivial to get a local copy? What is much more concerning is government censorship and age verification/digital id laws where what articles you read becomes part of your government record the police sees when they pull you over.
> but at some point a state based cyber attack that just wipes wikipedia off the net is deeply damaging to our modern society’s ability to agree on common facts
Haven't we hit that point already with bad faith (and potentially government-run) coordinated editing and voting campaigns, as both Wales and Sanger have been pointing out for a while now?
"The incident appears to have been a cross-site scripting hack. The origin of rhe malicious scripts was a userpage on the Russian Wikipedia. The script contained Russian language text.
During the shutdown, users monitoring [https://meta.wikimedia.org/wiki/special:RecentChanges Recent changes page on Meta] could view WMF operators manually reverting what appeared to be a worm propagated in common.js
Hopefully this means they won't have to do a database rollback, i.e. no lost edits. "
Interesting to note how trivial it is today to fake something as coming "from the Russians".
A theory on phab: "Some investigation was made in Russian Wikipedia discord chat, maybe it will be useful.
1. In 2023, vandal attacks was made against two Russian-language alternative wiki projects, Wikireality and Cyclopedia. Here https://wikireality.ru/wiki/РАОрг is an article about organisators of these attacks.
I remember someone mass-defacing the ruwiki almost exactly a year ago (March 3 2025) with some immature insults towards certain ruwiki admins. If I'm not mistaken it was a similar method.
And they probably used mind-control to make the admin run random userscripts on his privileged account as well, the capabilities of russian hackers is scary.
Interesting angle. Everyone has already pointed out that there are backups basically everywhere, and from an information standpoint, shaving off a day (or whatever) of edits just to get to a known-good point is effectively zero cost. But I wonder what the cost is of the potentially bad data getting baked into those models, and if anyone really cares enough to scrap it.
We should be using federated organizational architectures when appropriate.
For Wikipedia, consider a central read-only aggregated mirror that delegates the editorial function to specialized communities. Common, suggested tooling (software and processes) could be maintained centrally but each community might be improved with more independence. This separation of concerns may be a better fit for knowledge collection and archival.
Note: I edited to stress central mirroring of static content with delegation of editorial function to contributing organizations. I'm expressly not endorsing technical "dynamic" federation approaches.
> Also the language that has made me millions over my career with no degree.
Well done.
> Also the language that allows people to be up and running in seconds (with or without AI).
People getting up and running without any opportunity to be taught about security concerns (even those as simple as the risks of inadequate input verification), especially considering the infamous inconsistency in PHP's APIs which can lead to significant foot-guns, is both a blessing and a curse… Essentially a pre-cursor to some of the crap that is starting to be published now via vibe-coding with little understanding.
PHP is a fine language. It started my career. That said, it has a lot of baggage that can let you shoot yourself in the foot. Modern PHP is pretty awesome though.
Yep, that's the sad truth - a language popularity often has nothing to do with it's security properties. People will happily keep churning out insecure junk as long as it makes them millions, botnet and data compromises be damned.
Try not to take criticisms of tools personally. Phillips head screws are shit for a great many applications, while simultaneously being involved in billions of dollars of economic activity, and being a driver that everyone has available.
I've not used PHP in anger in well over a decade, but if the general environment out there is anything like it was back then there are likely a lot of people, mostly on cheap shared hosting arrangements, running PHP versions older than that and for the most part knowing no better.
That isn't the fault of the language of course, but a valid reason for some of the “ick” reaction some get when it is mentioned.
This is unfortunate that Wikipedia is under attack. It seems as if there are more malicious actors now than, say, 5 years ago.
This may be unrelated but I also noticed more attacks on e. g. libgen, Anna's archive and what not. I am not at all saying this is similar to Wikipedia as such, mind you, but it really seems as if there are more actors active now who target people's freedom now (e. g. freedom of choice of access to any kind of information; age restriction aka age "verification" taps into this too).
They have no incentive to improve the site, because they’re a for-profit entity.
Despite the constant screeching for donations, the entire site is owned by a company with shareholders. All the “donations” go to them. They already met their funding needs for the next century a long time ago, this is all profit.
Another reason to make the default disabling JS on all websites, and the website should offer a service without JS, especially those implemented in obsolete garbage tech. If it's not an XSS from a famous website, it will be an exploit from a sketchy website.
Yeah, but the purpose of an encyclopedia like Wikipedia (a tertiary source) is to relatively neutrally summarize the consensus of those who spend the time and effort to analyze and interpret the primary sources (and thus produce secondary sources), or if necessary to cite other tertiary summaries of those.
In a discussion forum like HN, pointing to primary sources is the most reliable input to the other readers' research on/synthesis of their own secondary interpretation of what may be going on. Pointing to other secondary interpretations/analyses is also useful, but not without including the primary source so that others can - with apologies to the phrase currently misused by the US right wing - truly do their own research.
If you spend any time on Wikipedia, you'll find that secondary sources from an existing list are always preferred. The mandate from the link in GP (https://en.wikipedia.org/wiki/Wikipedia:No_original_research) extends, or at least is interpreted to mean to extend to, actively punishing editors who attempt to analyze or interpret primary sources.
I mean sure, but that's never going to happen, so complaining about it is just shaking your fist at the sky. The only way it will change is if the economics of the web change. Maybe that is the economics of developer time (it being easier/fast/more resilient and thus cheaper to do native dev), or maybe it is that dynamic scripting leads to such extreme vulnerabilities that ease of deployment/development/consumer usage change the macroeconomics of web deployment enough to shift the scales to local.
But if there's one thing I've learned over the years as a technologist, it's this: the "best technology" is not often the "technology that wins".
Engineering is not done in a vacuum. Indeed, my personal definition of engineering is that it is "constraint-based applied science". Yes, some of those constraints are "VC buxx" wanting to see a return on investment, but even the OSS world has its own set of constraints - often overlapping. Time, labor, existing infrastructure, domain knowledge.
Imagine if wikipedia was a native app, what this vuln would have caused. I for one prefer using stuff in the browser where at least it's sandboxed. Also, there's nothing stopping you from disabling JS in your browser.
This is basically a weaponized, highly destructive version of the old MySpace Samy worm. Hitting MediaWiki:Common.js is the absolute nightmare scenario for MediaWiki deployments because that script gets executed by literally every single visitor and editor across the entire site, creating a massive, instant propagation loop. The fact that it specifically targets admins and then uses jQuery to blind them by hiding the UI elements while it silently triggers Special:Nuke in the background is incredibly insidious. It really exposes the foundational danger of legacy web architectures that still allow executable JavaScript to be stored and served directly from user-editable namespaces. Cleaning this up is going to be an absolute forensic nightmare for the Wikimedia team since the database history itself is the active distribution vector.
> Cleaning this up is going to be an absolute forensic nightmare for the Wikimedia team since the database history itself is the active distribution vector.
Well, worm didn't get root -- so if wikimedia snapshots or made a recent backup, probably not so much of a nightmare? Then the diffs can tell a fairly detailed forensic story, including indicators of motive.
Snapshotting is a very low-overhead operation, so you can make them very frequently and then expire them after some time.
Even if they reset to several days ago and lose, say, thousands of edits, even tens of thousands of minor edits, they're still in a pretty good place. Losing a few days of edits is less-than-ideal but very tolerable for Wikipedia as a whole
At $work we're hosting business knowledge databases. Interestingly enough, if you need to revert a day or two of edits, you're better off to do it asap, over postponing and mulling over it. Especially if you can keep a dump or an export around.
People usually remember what they changed yesterday and have uploaded files and such still around. It's not great, but quite possible. Maybe you need to pull a few content articles out from the broken state if they ask. No huge deal.
If you decide to roll back after a week or so, editors get really annoyed, because now they are usually forced to backtrack and reconcile the state of the knowledge base, maybe you need a current and a rolled-back system, it may have regulatory implications and it's a huge pain in the neck.
Nah, you can snapshot every 15 minutes. The snapshot interval depends on the frequency of changes and their capacity, but it's up to them how to allocate these capacities... but it's definitely doable and there are real reasons for doing so. You can collapse deltas between snapshots after some time to make them last longer. I'd be surprised if they don't do that.
As an aside, snapshotting would have prevented a good deal of horror stories shared by people who give AI access to the FS. Well, as long as you don't give it root.......
>Nah, you can snapshot every 15 minutes.
obviously you can. but, what is the actual snapshot frequency? like, what is the timestamp of the last known good snapshot? that is what matters.
in any case, the comment you are replying to is a hypothetical, which correctly points out that even a day or two of lost edits is fine (not ideal, but fine). your reply doesnt engage with their comment at all.
> the comment you are replying to is a hypothetical, which correctly points out that even a day or two of lost edits is fine (not ideal, but fine). your reply doesnt engage with their comment at all.
I did engage, by pointing out that it wasn't relevant nor a realistic scenario for a competent sysadmin. (Did you read the OP?) That's a /you/ problem if you rely on infrequent backups, especially for a service with so much flux.
> what is the actual snapshot frequency? like, what is the timestamp of the last known good snapshot?
? Why would I know what their internal operations are?
>I did engage, by pointing out that it wasn't relevant nor a realistic scenario for a competent sysadmin.
>Why would I know what their internal operations are?
i mean... you must, right? you know that once-a-day snapshots is not relevant to this specific incident. you know that their sysadmins are apparently competent. i just assumed you must have some sort of insider information to be so confident.
I think you are misreading my comments and made a bad assumption. The reason I'm confident is because this has been my bread and butter for a decade.
>The reason I'm confident is because this has been my bread and butter for a decade.
my decade of dealing with incompetent sysadmins and broken backups (if they even exist) has given me the opposite of confidence.
but im glad you have had a different experience
Nowadays I refuse to do any serious work that isn't in source control anywhere besides my NAS that takes copy-on-write snapshots every 15 minutes. It has saved my butt more times than I can count.
Yeah same here. Earlier I had a sync error that corrupted my .git, somehow. no problem; I go back 15 minutes and copy the working version.
Feels good to pat oneself in the back. Mine is sore, though. My E&O/cyber insurance likes me.
The problem isn't the granularity of the backup but since the worm silently nukes pages, it's virtually impossible to reconcile the state before the attack and the current state, so you have to just forfeit any changes made since then and ask the contributors to do the leg work of reapplying the correct changes
Why would nuked pages matter? Snapshots capture everything and are not part of wikimedia software.
Could you point to where you found the details of the exploit? It’s not in the linked page. Really interested. Especially the part about modifying it and the other users propagating it?
See the public phab ticket: https://phabricator.wikimedia.org/T419143
In short, a Wikimedia Foundation account was doing some sort of test which involved loading a large number of user scripts. They decided to just start loading random user scripts, instead of creating some just for this test.
The user who ran this test is a Staff Security Engineer at WMF, and naturally they decided to do this test under their highly-privileged Wikimedia Foundation staff account, which has permissions to edit the global CSS and JS that runs on every page.
One of those random scripts was a 2 year old malicious script from ruwiki. This script injects itself in the global Javascript on every page, and then in the userscripts of any user that runs into it, so it started spreading and doing damage really fast. This triggered tons of alerts, until the decision was made to turn the Wiki read-only.
Didn't realise this was some historic evil script and not some active attacker who could change tack at any moment.
That makes the fix pretty easy. Write a regex to detect the evil script, and revert every page to a historic version without the script.
The fact of this obvious LLM slop being at the top of this discussion is incredibly insidious. The "facts" it mentions are made up. Has this vapid style finally become so normalized that nobody is seeing it anymore?
I didn't even notice it until you pointed it out, but I checked that account's comment history and it uses em dashes. Also, "the database history itself is the active distribution vector" Is just semantic nonsense.
I still have a basic assumption that if something I'm reading doesn't make much sense to me, I probably just don't understand it. Over the last few years I've had to get used to the new assumption that it's because I'm reading LLM output.
I've also always used em-dashes, it's not a very reliable indicator. That style is a dead giveaway, though. Some of its comments seem to be written by a human, but several definitely aren't.
I've been spending less and less time here, the moderation is obviously overwhelmed and is losing the battle.
https://aphyr.com/posts/389-the-future-of-forums-is-lies-i-g...
That user, epicprogrammer's comment history suggests alignment with the Musk/Thiel/Anduril/DoW/anti-Anthropic crowd who are incessantly trying to damage Wikipedia's reputation to push a "Grokipedia" where they can define the narrative.
I wouldn't be surprised if that group were the origin of this attack too.
Perhaps we're at last watching the internet die.
Yes, but we did that over the last 15 years. We just never realized that's what we were seeing.
It only clicked for me a few weeks ago, in one thread or another here when I realized that no one could ever do what Google did once: Cloudflare and other antibot technologies have closed off traditional search-as-the-result-of-web-crawling permanently. It's not that no one will do it because they think there's no money in it, or that no one will do it because the upfront costs are gigantic... literally it can no longer be done.
The internet died.
There are still a few options. I recently had the idea of doing search engine queries on 9 search engines.
Mojeek is a good independent search browser, it isn't the best but at that Hackernews comment/analysis I was doing I found it to be the only one which worked for that case.
Brave exists too.
I know the situation is very critical/dire tho but there is still some chance. All be it quite small.
Mojeek IIRC, is operated by one single guy for 15 years.
I just checked a wiki, and the "MediaWiki:Common.js" page there was read-only, even for wikisysop users.
>Cleaning this up
Find the first instance and reset to the backup before then. An hour, a day, a week? Doesn't matter that much in this case.
It is true that they have a particularly robust, distributed backup system that can/has come in handy, but FWIW the timing matters to them. English Wikipedia receives ~2 edits per second, or 172,800 per day. Many of them are surely minor and/or automated, but still: 1,036,800 lost edits is a lot!
Filesystem & database snapshots are very cheap to make, you can make them every 15 minutes. You can expire old snapshots (or collapse the deltas between them) depending on the storage requirements.
Are they really lost though? I think they should not be lost; they could be stored in a separate database additionally.
In fact, as long as the malware is just doing deletes, you can just merge the two "timelines" by restoring the snapshot and then replaying all the edits but ignoring the deletes. Lost deletes really aren't much of a problem!
I've never understood why client-side execution is so heavy in modern web pages. Theoretically, the costs to execute it are marginal, but in practice, if I'm browsing a web page from a battery-powered device, all that compute power draining the battery not only affects how long I can use the device between charges, but is also adding wear to the battery, so I'll have to replace it sooner. Also, a lot of web pages are downright slow, because my phone can only perform 10s of billions of operations per second, which isn't enough to responsively arrange text and images (which are composited by dedicated hardware acceleration) through all of the client-side bloat on many modern web pages. If there was that much bloat on the server side, the web server would run out of resources with even moderate usage.
There's also a lot of client-side authentication, even with financial transactions, e.g. with iOS and Android locally verifying a users password, or worse yet a PIN or biographic information, then sending approval to the server. Granted, authentication of any kind is optional for credit card transactions in the US, so all the rest is security theater, but if it did matter, it would be the worst way to do it.
There's thousands of copies of the whole wikipedia in sql form though, IIRC it's just like 47GB.
In the early 2010’s I worked for a company whose primary income was subscriptions to site protection services - one of which included cleaning up malware-infected Wordpress installations. I worked on the team that did this job.
This exact type of database-stored executable javascript was one of the most annoying types of infections to clean up.
Ok, so there are tons of mediawiki installations all over the internet. What do these operators do? Set their wikis to read-only mode, hang tight, and wait for a security patch?
Also, does this worm have a name?
There is nothing to do, the incident was not caused by a vulnerability in mediawiki.
Basically someone who had permissions to alter site js, accidentally added malicious js. The main solution is to be very careful about giving user accounts permission to edit js.
[There are of course other hardening things that maybe should be done based on lessons learned]
Well, admins (or anybody other than the developers / deployment pipeline) having permissions to alter the JS sounds like a significant vulnerability. Maybe it wasn't in the early 2000s, but unencrypted HTTP was also normal then.
> Well, admins (or anybody other than the developers / deployment pipeline) having permissions to alter the JS sounds like a significant vulnerability.
It's a common feature of CMS'es and "tag management systems." Its presence is a massive PITA to developers even _besides_ the security, but PMs _love them_, in my experience.
There are already tools and techniques to validate served JS is as-intended, and these techniques could be beefed up by adding browser checks. I've been surprised these haven't been widely adopted given the spate of recent JS-poisoning attacks.
Too much app logic in the client side (Javascript) has always been an attack vector. The more that can reasonably be server side, the more that can't be seen.
The amount of javascript is really beside the point here. The problem is that privileged users can easily edit the code without strong 2FA, allowing automatic propagation.
How does 2FA prevent this here?
If they required 2FA every time you wanted to modify JS then it couldn't propagate automatically. Just requiring 2FA when you first log in wouldn't help, of course.
> Hitting MediaWiki:Common.js is the absolute nightmare scenario for MediaWiki deployments because that script gets executed by literally every single visitor
...except for us security wonks who have js turned off by default, don't enable it without good reason, disable it ASAP, and take a dim view of websites that require it.
Not too many years ago this behavior was the domain of Luddites and schizophrenics. Today it has become a useful tool in the toolbox of reasonable self-defense for anybody with UID 0.
Perhaps the WMF should re-evaluate just how specialsnowflake they think their UI is and see if, maybe just maybe, they can get by without js. Just a thought.
It warms my heart that there's basically a 0% chance that they ever approach this camp's viewpoint based on the Herculean effort it took to switch over to a slightly more modern frontend a few years back. I'm glad you don't think of yourself of a Luddite, but I think you're vastly overstating how open people are to a purely-static web.
Also, FWIW: Wikipedia is "specialsnowflake". If it isn't, that's merely because it was so specialsnowflake that there's now a healthy of ecosystem of sites that copied their features! It's far, far more capable than a simple blog, especially when you get into editing it.
This was only a matter of time.
The Wikipedia community takes a cavalier attitude towards security. Any user with "interface administrator" status can change global JavaScript or CSS for all users on a given Wiki with no review. They added mandatory 2FA only a few years ago...
Prior to this, any admin had that ability until it was taken away due to English Wikipedia admins reverting Wikimedia changes to site presentation (Mediaviewer).
But that's not all. Most "power users" and admins install "user scripts", which are unsandboxed JavaScript/CSS gadgets that can completely change the operation of the site. Those user scripts are often maintained by long abandoned user accounts with no 2 factor authentication.
Based on the fact user scripts are globally disabled now I'm guessing this was a vector.
The Wikimedia foundation knows this is a security nightmare. I've certainly complained about this when I was an editor.
But most editors that use the website are not professional developers and view attempts to lock down scripting as a power grab by the Wikimedia Foundation.
Maybe somewhat unrelated, but I'm reminded of the fact that people have deleted the main page on a few occasions: https://en.wikipedia.org/wiki/Wikipedia:Don%27t_delete_the_m...
Most admins on Wikipedia are incompetent.
Most admins on Wikipedia are competent in areas outside of webdev and security.
Wow. This worm is fascinating. It seems to do the following:
- Inject itself into the MediaWiki:Common.js page to persist globally, and into the User:Common.js page to do the same as a fallback
- Uses jQuery to hide UI elements that would reveal the infection
- Vandalizes 20 random articles with a 5000px wide image and another XSS script from basemetrika.ru
- If an admin is infected, it will use the Special:Nuke page to delete 3 random articles from the global namespace, AND use the Special:Random with action=delete to delete another 20 random articles
EDIT! The Special:Nuke is really weird. It gets a default list of articles to nuke from the search field, which could be any group of articles, and rubber-stamps nuking them. It does this three times in a row.
> Vandalizes 20 random articles with a 5000px wide image and another XSS script from basemetrika.ru
Note while this looks like its trying to trigger an xss, what its doing is ineffective, so basemetrika.ru would never get loaded (even ignoring that the domain doesnt exist)
As someone on the Wikipediocracy forums pointed out, basemetrika.ru does not exist. I get an NXDomain response trying to resolve it. The plot thickens.
Yeah, basemetrika.ru is free now. Should we occupy it? ;)
I registered it about 40 minutes ago, but it seems the DNS has been cached by everyone as a result of the wikipedia hack & not even the NS is propagating. Can't get an SSL certificate .
It means giving money to the Russian government, so no.
If anyone from the Russian government is reading this, get the fuck out of Ukraine. Thank you.
Well done, it's finally over
Did you know… ukraine still lets Russian gas transit through ukraine territory? Making ukraine the largest sponsor of terrorism against ukraine? Did you know, when war started it, ukraine was letting Russia make around $1 billion PER DAY for like a year before reducing that amount ? You didn’t know that. But hey, protesting by not letting some one buy .ru will certainly do damage to Putin!
You must be fun at parties
reg.ru, the most popular registrar, sells .ru domains for $1.65, very little of which goes to the national registry. What is their profit on this domain, a couple of cents?
You have helped to bring peace by approximately zero nanoseconds, while doing absolutely nothing about western countries still buying massive amounts of natural resources from Putin. Tax income on their exports make the primary source of income for the federal budget, which directly funds the military.
Good virtue signaling, though. I'm completely disillusioned with the West, this is nothing new.
Namecheap won’t sell it which is great because it made me pause and wonder whether it's legal for an American to send Russians money for a TLD.
I'm half-tempted to try and claim it myself for fun and profit, but I think I'll leave it for someone else.
What should we put there, anyway?
A JavaScript call to window.alert to pause the JavaScript VM.
Go old school and have the script inject the "how did this get here im not good with computers" cat onto random pages
I'd log requests and echo them back in the page
The antinuke
Wouldn't be surprised if elaborate worms like this are AI-designed
I wouldn't be surprised either. But the original formatting of the worm makes me think it was human written, or maybe AI assisted, but not 100% AI. It has a lot of unusual stylistic choices that I don't believe an AI would intentionally output.
I would. AI designed software in general does not include novel ideas. And this is the kind of novel software AI is not great at, because there's not much training data.
Of course it's very possible someone wrote it with AI help. But almost no chance it was designed by AI.
Additional context:
https://wikipediocracy.com/forum/viewtopic.php?f=8&t=14555
https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(techni...
https://old.reddit.com/r/wikipedia/comments/1rllcdg/megathre...
Apparent JS worm payload: https://ru.wikipedia.org/w/index.php?title=%D0%A3%D1%87%D0%B...
Nice to see jQuery still getting used :)
Wikipediocracy link gives "not authorized".
works for me
Woah this looks like an old school XSS worm https://meta.wikimedia.org/wiki/Special:RecentChanges?hidebo...
I’ve always thought the fact that MediaWiki sometimes lets editors embed JavaScript could be dangerous.
Also, I’m also surprised an XSS attack like hasn’t yet been actually used to harvest credentials like passwords through browser autofill[0].
It seems like the worm code/the replicated code only really attacks stuff on site. But leaking credentials (and obviously people reuse passwords across sites) could be sooo much worse.
[0] https://varun.ch/posts/autofill/
Chrome doesnt actually autofill before you interact. It only displays what it would fill in at the same location visually.
but any interaction is good for Chrome, like dismissing a cookie banner
Time to add 2FA...
I completely understand marking the software that controls drinking water as critical infrastructure- but at some point a state based cyber attack that just wipes wikipedia off the net is deeply damaging to our modern society’s ability to agree on common facts …
Just now thought “if Wikipedia vanished what would it mean … and it’s not on the level of safe drinking water, but it is a level.
> if Wikipedia vanished what would it mean …
That someone would need to restore some backups, and in the meantime, use mirrors.
Seriously, not that big of a deal. I don't know how many copies of Wikipedia are lying around but considering that archives are free to download, I guess a lot. And if you count text-only versions of the English Wikipedia without history and talk pages, it is literally everywhere as it is a common dataset for natural language processing tasks. It is likely to be the most resilient piece of data of that scale in existence today.
The only difficulty in the worst case scenario would be rebuilding a new central location and restarting the machinery with trusted admins, editors, etc... Any of the tech giants could probably make a Wikipedia replacement in days, with all data restored, but it won't be Wikipedia.
What you're suggesting is literally impossible. There are plenty of mirrors and random people that download the thing in its entirety. The entire planet would have to be nuked for that to be possible.
All persistent data should have backup.
It's not a high bar.
If you're using wikipedia to "agree on common facts" I think you might have bigger problems...
Not the GP, and I don't believe in the existence of "common facts" in general, but Wikipedia is indeed a good place to figure out what other people might agree as common facts...
There are so many mirrors anyway and trivial to get a local copy? What is much more concerning is government censorship and age verification/digital id laws where what articles you read becomes part of your government record the police sees when they pull you over.
> but at some point a state based cyber attack that just wipes wikipedia off the net is deeply damaging to our modern society’s ability to agree on common facts
Haven't we hit that point already with bad faith (and potentially government-run) coordinated editing and voting campaigns, as both Wales and Sanger have been pointing out for a while now?
See, for example,
* Sanger: https://en.wikipedia.org/wiki/User:Larry_Sanger/Nine_Theses
* Wales: https://en.wikipedia.org/wiki/Talk:Gaza_genocide/Archive_22#...
* PirateWires: https://www.piratewires.com/p/how-wikipedia-is-becoming-a-ma...
A comment from my wiki-editor friend:
Interesting to note how trivial it is today to fake something as coming "from the Russians".A theory on phab: "Some investigation was made in Russian Wikipedia discord chat, maybe it will be useful.
1. In 2023, vandal attacks was made against two Russian-language alternative wiki projects, Wikireality and Cyclopedia. Here https://wikireality.ru/wiki/РАОрг is an article about organisators of these attacks.
2. In 2024, ruwiki user Ololoshka562 created a page https://ru.wikipedia.org/wiki/user:Ololoshka562/test.js containing script used in these attacks. It was inactive next 1.5 years.
3. Today, sbassett massively loaded other users' scripts into his global.js on meta, maybe for testing global API limits: https://meta.wikimedia.org/wiki/Special:Contributions/SBasse... . In one edit, he loaded Ololoshka's script: https://meta.wikimedia.org/w/index.php?diff=prev&oldid=30167... and run it."
I remember someone mass-defacing the ruwiki almost exactly a year ago (March 3 2025) with some immature insults towards certain ruwiki admins. If I'm not mistaken it was a similar method.
I’m not saying that this is related to Wikipedia ditching archive.is but timing in combination with Russian messages is at least…weird.
And they probably used mind-control to make the admin run random userscripts on his privileged account as well, the capabilities of russian hackers is scary.
/s
It is just another human acting human again.
I wonder if any poisoned data made it into LLM training data pipelines?
Interesting angle. Everyone has already pointed out that there are backups basically everywhere, and from an information standpoint, shaving off a day (or whatever) of edits just to get to a known-good point is effectively zero cost. But I wonder what the cost is of the potentially bad data getting baked into those models, and if anyone really cares enough to scrap it.
We should be using federated organizational architectures when appropriate.
For Wikipedia, consider a central read-only aggregated mirror that delegates the editorial function to specialized communities. Common, suggested tooling (software and processes) could be maintained centrally but each community might be improved with more independence. This separation of concerns may be a better fit for knowledge collection and archival.
Note: I edited to stress central mirroring of static content with delegation of editorial function to contributing organizations. I'm expressly not endorsing technical "dynamic" federation approaches.
Exactly. Wikipedia should be used on ipfs
Here before someone says that it's because MediaWiki is written in PHP.
PHP is the language where "return flase" causes it to return true.
https://danielc7.medium.com/remote-code-execution-gaining-do...
Also the language that runs half of the web.
Also the language that has made me millions over my career with no degree.
Also the language that allows people to be up and running in seconds (with or without AI).
I could go on.
> Also the language that has made me millions over my career with no degree.
Well done.
> Also the language that allows people to be up and running in seconds (with or without AI).
People getting up and running without any opportunity to be taught about security concerns (even those as simple as the risks of inadequate input verification), especially considering the infamous inconsistency in PHP's APIs which can lead to significant foot-guns, is both a blessing and a curse… Essentially a pre-cursor to some of the crap that is starting to be published now via vibe-coding with little understanding.
PHP is a fine language. It started my career. That said, it has a lot of baggage that can let you shoot yourself in the foot. Modern PHP is pretty awesome though.
Pretty sure we've seen people coding in essentially every other programming language also shoot themselves in the foot.
Every language has foot-guns of some sort. The difference is how easy it is to accidentally pull the trigger.
PHP makes it easy.
The language is not what makes you nor the product. You could've written the same thing in RoR, PHP was just first and it's why it still exists
PHP performance is significantly better than Ruby on Rails, which I think plays a part in its continued popularity.
Also the language that runs half of the web.
The bottom half.
;)
I use it on the backends of my stuff.
Works great, but, like any tool, usage matters.
People who use tools badly, get bad results.
I've always found the "Fishtank Graph" to be relevant: https://w3techs.com/technologies/history_overview/programmin...
People who use tools badly inflict bad results on other people, quite often far more so than they do so on themselves.
Yep, that's the sad truth - a language popularity often has nothing to do with it's security properties. People will happily keep churning out insecure junk as long as it makes them millions, botnet and data compromises be damned.
Try not to take criticisms of tools personally. Phillips head screws are shit for a great many applications, while simultaneously being involved in billions of dollars of economic activity, and being a driver that everyone has available.
PHP is insanely great, and very fast. The hate has no clout.
Perl still runs the other half?
FWIW this was fixed in 2020
I've not used PHP in anger in well over a decade, but if the general environment out there is anything like it was back then there are likely a lot of people, mostly on cheap shared hosting arrangements, running PHP versions older than that and for the most part knowing no better.
That isn't the fault of the language of course, but a valid reason for some of the “ick” reaction some get when it is mentioned.
Except that in a contemporary PHP that doesn't work any more.
This means game over, the script stops there.This is unfortunate that Wikipedia is under attack. It seems as if there are more malicious actors now than, say, 5 years ago.
This may be unrelated but I also noticed more attacks on e. g. libgen, Anna's archive and what not. I am not at all saying this is similar to Wikipedia as such, mind you, but it really seems as if there are more actors active now who target people's freedom now (e. g. freedom of choice of access to any kind of information; age restriction aka age "verification" taps into this too).
Looking forward to the postmortem...
I can edit it
It's reassuring to know Wikipedia has these kinds of security mechanisms in place.
They have no incentive to improve the site, because they’re a for-profit entity.
Despite the constant screeching for donations, the entire site is owned by a company with shareholders. All the “donations” go to them. They already met their funding needs for the next century a long time ago, this is all profit.
Another reason to make the default disabling JS on all websites, and the website should offer a service without JS, especially those implemented in obsolete garbage tech. If it's not an XSS from a famous website, it will be an exploit from a sketchy website.
"Закрываем проект" is Russian for "Closing the project"
GOD am I thankful to my old self for disabling js by default. And sticking with it.
edit: lol downvoted with no counterpoint, is it hitting a nerve?
How do they know? Has this been published in a Reliable Source?
This is the official Wikimedia Foundation status page for the whole of Wikipedia, so it's a reliable primary source.
Actually, usage of primary sources is kinda complicated [0], generally Wikipedia prefers secondary and tertiary sources.
[0] https://en.wikipedia.org/wiki/Wikipedia:No_original_research...
Yeah, but the purpose of an encyclopedia like Wikipedia (a tertiary source) is to relatively neutrally summarize the consensus of those who spend the time and effort to analyze and interpret the primary sources (and thus produce secondary sources), or if necessary to cite other tertiary summaries of those.
In a discussion forum like HN, pointing to primary sources is the most reliable input to the other readers' research on/synthesis of their own secondary interpretation of what may be going on. Pointing to other secondary interpretations/analyses is also useful, but not without including the primary source so that others can - with apologies to the phrase currently misused by the US right wing - truly do their own research.
If you spend any time on Wikipedia, you'll find that secondary sources from an existing list are always preferred. The mandate from the link in GP (https://en.wikipedia.org/wiki/Wikipedia:No_original_research) extends, or at least is interpreted to mean to extend to, actively punishing editors who attempt to analyze or interpret primary sources.
My original post was a joke about this.
Long past time to eliminate JavaScript from existence
This.
Actually fuck the whole dynamic web. Just give us hypertext again and build native apps.
Edit: perhaps I shouldn't say this on an VC driven SaaS wankfest forum...
You may be interested in https://geminiprotocol.net/
I mean sure, but that's never going to happen, so complaining about it is just shaking your fist at the sky. The only way it will change is if the economics of the web change. Maybe that is the economics of developer time (it being easier/fast/more resilient and thus cheaper to do native dev), or maybe it is that dynamic scripting leads to such extreme vulnerabilities that ease of deployment/development/consumer usage change the macroeconomics of web deployment enough to shift the scales to local.
But if there's one thing I've learned over the years as a technologist, it's this: the "best technology" is not often the "technology that wins".
Engineering is not done in a vacuum. Indeed, my personal definition of engineering is that it is "constraint-based applied science". Yes, some of those constraints are "VC buxx" wanting to see a return on investment, but even the OSS world has its own set of constraints - often overlapping. Time, labor, existing infrastructure, domain knowledge.
Imagine if wikipedia was a native app, what this vuln would have caused. I for one prefer using stuff in the browser where at least it's sandboxed. Also, there's nothing stopping you from disabling JS in your browser.