Its interesting that being unable to find a legal route to dig up dirt on archive.is, they're going the route of CSAM allegations.
I first heard of this technique on a discussion on Lowendtalk from a hoster discussing how pressure campaigns were orchestrated.
The host used to host VMs for a customer that was not well liked but otherwise within the bounds of free speech in the US (I guess something on the order of KF/SaSu/SF), so a given user would upload CSAM on the forum, then report the same CSAM to the hoster. They used to use the same IP address for their entire operation. When the host and the customer compared notes, they'd find about these details.
Honestly at the time I thought the story was bunk, in the age of residential proxies and VPNs and whatnot, surely whoever did this wouldn't just upload said CSAM from their own IP, but one possible explanation would be that the forum probably just blocked datacenter IPs wholesale and the person orchestrating the campaign wasn't willing to risk the legal fallout of uploading CSAM out of some regular citizen's infected device.
In this case, I assume law enforcement just sets up a website with said CSAM, gets archive.is to crawl it, and then pressurize DNS providers about it.
It’s the digital equivalent of a dirty cop planting a gun after shooting a suspect. Of course it happens. Three letter agencies probably do things like this all the time. Half of their legitimate work is probably illegal to begin with.
The current federal government in the USA actively encourages federal agents to use illegal and unethical methods, and promised them protection and immunity.
You could use something that is legal in one country, and illegal in another country, for example, an anime-style drawing of a young girl, or a textual description.
Indeed, if you're paying attention to local news in Massachusetts, you might be shocked, or not, that cops from Canton, Boston, and the Massachusetts state police, and the county District Attorney, and judges, are all complicit in railroading a woman who was dating a cop who was likely killed by another cop. The web of deceit is so thick, it can't have been just for this one case. It must be long-standing and pervasive and there must be many victims. It's also unlikely that Massachusetts is the worst place in the US in this respect.
I don't think I am naive, just imagine the repercussions of the headline "FBI collected thousands of child rape photos for blackmail" or "Cop work computer was found filled with child porn"
Anything linked to pedophilia in the US and elsewhere is without remorse, and will continue that way due to parental fears.
How about "The President was close friends with a known child sexual predator and his entire government spends a significant amount of time covering up their connections because it seems fairly obvious the president fucked teenagers and then fomented a coup and put literal criminals and felons in his cabinet so no one would hold him accountable while destroying the nation's economy and starting wars nobody can even understand"
I've spent enough time on telegram to see this happening more times to ban groups. Csam shit storm, content gets flagged, the group gets banned (or at least, unavailable for some time)
KF is almost certainly KiwiFarms, an infamous gossip forum where terminally online mentally ill people come together to make fun of other terminally online mentally ill people. With a large amount of doxxing and harassment accusations being thrown at it. I think harassment is against the site rules, but doxxing isn't. The site, being what it is, got itself some serious enemies. Including people with enough influence in IT space to nearly get the entire site pulled off the web.
SF is probably Stormfront, an infamous neo-nazi website. Not an "anyone right of center is a nazi" kind of neo-nazi - actual self-proclaimed neo-nazis, complete with swastikas, Holocaust denial and calls for racial segregation. Even more hated and scrutinized than KiwiFarms, and under pressure by multiple governments and many more activist groups, over things like neo-nazi hate speech and ties with real life hate groups.
It would be a damn shame if archive.is fell under the same kind of scrutiny as those. I have an impression, completely unfounded, that the archive.is crew knew things were heading that way, and worked with that in mind for a long time now. But that doesn't guarantee they'll endure. Just gives them a fighting chance.
I doubt they’d have to. If the site truly doesn’t remove CSAM automatically I’ve no doubt plenty of it would end up there organically. You wouldn’t have to upload any anywhere, you’d only need to know some URLs to look for which presumably any major law enforcement agency would.
I read the whole thing you just didn’t understand my comment. That’s my fault because I left out one word, “automatically”. Fixed it.
The person to whom I was replying thought that perhaps someone wanting to stop Archive was uploading CSAM and getting them to crawl it. I was pointing out that they didn’t have to do the first step, the internet has lots of that stuff apparently, they merely had to have a list of urls (law enforcement could easily provide) and check Archive for them.
Archive doesn’t do this automatically apparently, as some platforms do, so there’s probably plenty of it there.
> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".
I don’t know why you are downvoted, this is absolutely what happened semi-frequently until Reddit was finally forced to crack down on it. The same thing happened on Twitter/X for a while where bots would mass reply to targeted users with gore and CSAM.
I've been seeing something similar on some youtube videos, endless unflagged comments advocating hatred and violence, completely unrelated to the video topic or channel.
It's unlikely law enforcement would take the risk to handle CSAM just to make a case against a Russian pirate, jeopardizing their careers and freedom, when the copyright case is pretty strong already.
These are the doings of one of the myriad freelance "intelectual rights enforcement agents", which are paid on success and employed by some large media organization. Another possibility is that a single aggrieved individual who found themselves doxed or their criminal conviction archived etc. took action after failing to enforce their so called "right to be forgotten".
Unfortunately, archive.is operating model is uniquely vulnerable to such false flag attacks.
This is probably the realm of intelligence agencies, who have less accountability and many reasons to eliminate public archives (primarily perception management).
I don't know anything about Adguard, but good on the team for doing the extra digging instead of just going along with the claim. Even better that they're sharing what they've found with everyone else.
The wording and tone of the emails sent to Adguard reads just like phishing emails with a hint of political SMS spam. Glad to see the people behind there thinking critically and acting rationally despite such language.
A few weeks ago I noticed DNS4EU couldn’t resolve archive.is and assumed it was just a configuration mistake. I emailed them about it, and after a couple of days or weeks (not really sure) the domain started resolving again. Given AdGuard’s recent report about suspicious pressure on DNS providers to block Archive.today, I’m starting to wonder if DNS4EU’s temporary block was actually related to the same campaign
I still can't wrap my head around why a DNS provider is required to block websites, especially one that is not associated with ISP or used as default on any device. Oversimplifying this, it's a glorified hash map, so whoever wants to take down the illegal content should just deal with the website owner?
This just shows that LCEN, DMCA, etc are poorly crafted laws. They ineffectually stop the abuse they claim to end (like copyright infringement). But it does allow large organizations a cudgel to protect their own IP.
So they're pressuring a DNS resolver to block a specific website? That seems like an incredibly slippery slope.
What stops them from forcing Chrome to block the website, or LetsEncrypt to not issue any more certificates for the domain, or Microsoft and Apple to add them to their firewalls? Hell, can they go after the infrastructure software developers and say, force nginx to add a check and refuse to serve the domain?
Then what happens when a fake report is sent to an open source project without budget for lawyers?
The FBI investigation might be a coincidence. Unsurprisingly, archive.today is attacked with CSAM uploads+reports all the time, you can find occasional mentions of this in their blog from 3 and 9 years ago, and I bet there was a ton of this in between.
I speculate, and the conspiracy theorist in me believes, something of a compromising nature has been archived and they want that data inaccessible, but at the same time, pointing out what they want hidden would shine a light on it.
It is even more interesting the US government is coming after archive.today at the same time, or maybe that is just a coincidence, and this is just a tech-savvy philanderer trying to hide something from his wife.
If we're speculating, there is another reason to censor archiving site - if you recently committed well documented genocide and want the evidence erased. Given the systematic removal of such content from social media, it would not be surprising if this was related.
I used the site several times to archive some page or send it to someone who cannot access the site directly. I never archived anything illegal and never stumbled upon illegal things there. So I don't know why they want to arrest the owner.
Also the site is pretty advanced, it can handle complicated sites and even social networks.
> But because it can also be used to bypass paywalls
How? Does the site pay for subscription for every newspaper?
> Unfortunately, we couldn’t dig any deeper about who exactly is behind WAAD.
That's a red flag. Why would an NGO doing work for the public hide its founder(s) and information about itself? Using NGOs to suggest/promote/lobby certain decisions is a well known trick in authoritarian countries to pretend the idea is coming from "the people", not from the government. I hope nobody falls for such tricks today.
Furthermore, they seem to have no way to donate them money. That's even the redder flag.
Also France doesn't have a good reputation in relation to the observing rule of law. For example, they arrested Russian agent^w enterpreneur Durov, owner of Telegram, claiming they have lot of evidence against him involved in drug trafficking, fraud and money laundering [1], but a year later let him free (supposedly after he did what they wanted). France also bars popular unwanted candidates from elections. Both these cases strongly resemble what Russia does.
France possibly found a way to pressure Durov into cooperating. Preempting similar actions by Russia. Classic intelligence methods to get someone to come over to the other side.
Perhaps the DGSE also got to plug a cable in to the Telegram infrastructure, which would be huge plus for them and the west in general not in the least because of the war. You could say France has pwnd Durov.
If I'm not mistaken some significant arrest was made shortly after they captured Durov, in the case of this child exploitation stuff.
>> But because it can also be used to bypass paywalls
> How? Does the site pay for subscription for every newspaper?
Someone with a subscription logs into the site, then archives it. Archive.is uses the current user's session and can therefore see the paywalled content.
Do they have such an option? I don't see it on the site, and the browser extension seems to send only the URL [1] to the server. Can you provide more information?
Does it still leak your IP, e.g. if the page rendered by the site you're archiving includes it? You'd think they'd create a simple filter to redact that out.
Archive.is doesn't work on all sites to bypass the paywall. Media companies that are truly concerned about this should modify their paywall configuration.
I've seen some theories or maybe more like guesses as to how the paywall bypass works - I don't think anyone (or at least no one posting places like here) seems to know.
One I saw suggested they've a set of subscriptions to the paywalled sites and some minimal custom work to hide the signed in account used - which seems plausible. That makes the defense most likely used to catch the account used and ban them - which would be a right pain.
Have you ever wanted to reread an old article/blog from a ling dead website? Needed to compare the old TOS with the latest TOS? Been looking for what was that video on your playlist about that got removed from youtube?
Things like that are fairly average. It can also be helpful for dispute resolutuon and holding public parties to account.
On behalf of all Indians I apologize. An Indian who read the email you received would be able to quickly guess that email originated from India. We are the global hub of scam call centers. We also write emails just like in that text. Run the text of that email via LLM. Ask the LLM: How likely is it that this email is a scam from India? Answer:80%
If they were any good at it, they'd have blocked the Internet Archive via robots.txt. For some inexplicable reason, IA responds to that by wiping out past, present, and future archivals of that site. They haven't taken that easy step, so I doubt they'd go the further, more involved step of focusing on this smaller actor.
IA also blocks some content in Russia, for example this [1] says: "This URL has been excluded from the Wayback Machine in your region.". I was sincerely surprised to learn that while not paying much attention to US copyright law, they have high respect for messages from Vladimir.
(in case someone is curious what about is that article, it is a fictional comparison of life of a fictional character in Springfield, USA and Chusovoy, USSR in 80s and I cannot even understand why it was banned in Russia)
The Wayback Machine has ignored robots.txt for a few years at this point. The only way to get them to stop scraping or remove content is by asking them directly.
That's most likely the reason pressure is being put on them. Big media companies successfully shutdown 12ft.io, which was used to bypass paywalls, and forced the BPC (Bypass Paywalls Chrome) browser extension off the Mozilla Extension store, then Gitlab, then Github. Now the dev is hosting it on a Russian Github clone, presumably making it untouchable.
Since archive[.]today is using some very obscure hosting methods with multiple international mirrors, it makes it incredibly difficult for law enforcement to go after.
I guess it might fall under a bulletproof hosting type of setup. [1] There have been many people investigating to try and figure out who owns & operates who is actually behind archive[.]today and how they're continuously able to bypass the paywalls of paid sites, continue operating with such large infrastructure with no apparent income source.
There was quite a good article posted here on HN about someone trying to figure out those questions, but I can't seem to find it.
100%. It's like Lenin said, you look for the person who will benefit… and, uh, uh, you know… You know, you'll uh, uh—well, you know what I'm trying to say…
Its interesting that being unable to find a legal route to dig up dirt on archive.is, they're going the route of CSAM allegations.
I first heard of this technique on a discussion on Lowendtalk from a hoster discussing how pressure campaigns were orchestrated.
The host used to host VMs for a customer that was not well liked but otherwise within the bounds of free speech in the US (I guess something on the order of KF/SaSu/SF), so a given user would upload CSAM on the forum, then report the same CSAM to the hoster. They used to use the same IP address for their entire operation. When the host and the customer compared notes, they'd find about these details.
Honestly at the time I thought the story was bunk, in the age of residential proxies and VPNs and whatnot, surely whoever did this wouldn't just upload said CSAM from their own IP, but one possible explanation would be that the forum probably just blocked datacenter IPs wholesale and the person orchestrating the campaign wasn't willing to risk the legal fallout of uploading CSAM out of some regular citizen's infected device.
In this case, I assume law enforcement just sets up a website with said CSAM, gets archive.is to crawl it, and then pressurize DNS providers about it.
It’s the digital equivalent of a dirty cop planting a gun after shooting a suspect. Of course it happens. Three letter agencies probably do things like this all the time. Half of their legitimate work is probably illegal to begin with.
That would work but it is a very risky technique. For the mere mortal in your example this means possible jail time just to get some site closed down.
For law enforcement personnel, at the very least would mean an end of a career if caught (also possible jail time)
The current federal government in the USA actively encourages federal agents to use illegal and unethical methods, and promised them protection and immunity.
You could use something that is legal in one country, and illegal in another country, for example, an anime-style drawing of a young girl, or a textual description.
You are naive about cops, at least in the US, and what they will or will not do and what consequence they may or may not face.
Indeed, if you're paying attention to local news in Massachusetts, you might be shocked, or not, that cops from Canton, Boston, and the Massachusetts state police, and the county District Attorney, and judges, are all complicit in railroading a woman who was dating a cop who was likely killed by another cop. The web of deceit is so thick, it can't have been just for this one case. It must be long-standing and pervasive and there must be many victims. It's also unlikely that Massachusetts is the worst place in the US in this respect.
Can you provide link at least? I am not sure what railroading a person involves..
I don't think I am naive, just imagine the repercussions of the headline "FBI collected thousands of child rape photos for blackmail" or "Cop work computer was found filled with child porn"
Anything linked to pedophilia in the US and elsewhere is without remorse, and will continue that way due to parental fears.
> I don't think I am naive, just imagine the repercussions of the headline "FBI collected thousands of child rape photos for blackmail"
What were the repercussions of this: "FBI ran website sharing thousands of child porn images" (https://www.usatoday.com/story/news/2016/01/21/fbi-ran-websi...)
How about "The President was close friends with a known child sexual predator and his entire government spends a significant amount of time covering up their connections because it seems fairly obvious the president fucked teenagers and then fomented a coup and put literal criminals and felons in his cabinet so no one would hold him accountable while destroying the nation's economy and starting wars nobody can even understand"
I've spent enough time on telegram to see this happening more times to ban groups. Csam shit storm, content gets flagged, the group gets banned (or at least, unavailable for some time)
> KF/SaSu/SF
SaSu: Sanctioned Suicide [1]
But I don't know what KF and SF are supposed to stand for.
[1] https://en.wikipedia.org/wiki/Sanctioned_Suicide
KF is almost certainly KiwiFarms, an infamous gossip forum where terminally online mentally ill people come together to make fun of other terminally online mentally ill people. With a large amount of doxxing and harassment accusations being thrown at it. I think harassment is against the site rules, but doxxing isn't. The site, being what it is, got itself some serious enemies. Including people with enough influence in IT space to nearly get the entire site pulled off the web.
SF is probably Stormfront, an infamous neo-nazi website. Not an "anyone right of center is a nazi" kind of neo-nazi - actual self-proclaimed neo-nazis, complete with swastikas, Holocaust denial and calls for racial segregation. Even more hated and scrutinized than KiwiFarms, and under pressure by multiple governments and many more activist groups, over things like neo-nazi hate speech and ties with real life hate groups.
It would be a damn shame if archive.is fell under the same kind of scrutiny as those. I have an impression, completely unfounded, that the archive.is crew knew things were heading that way, and worked with that in mind for a long time now. But that doesn't guarantee they'll endure. Just gives them a fighting chance.
> I assume law enforcement just sets up a website with said CSAM
Sentences like this make me sincerely believe that not everyone has a soul.
wait...! do you mean the commenter or the people in law enforcement?
Cocaine is a hell of a drug
I doubt they’d have to. If the site truly doesn’t remove CSAM automatically I’ve no doubt plenty of it would end up there organically. You wouldn’t have to upload any anywhere, you’d only need to know some URLs to look for which presumably any major law enforcement agency would.
they removed it promptly.
remember: god kills a kitten every time you comment/assume something without reading it...
I read the whole thing you just didn’t understand my comment. That’s my fault because I left out one word, “automatically”. Fixed it.
The person to whom I was replying thought that perhaps someone wanting to stop Archive was uploading CSAM and getting them to crawl it. I was pointing out that they didn’t have to do the first step, the internet has lots of that stuff apparently, they merely had to have a list of urls (law enforcement could easily provide) and check Archive for them.
Archive doesn’t do this automatically apparently, as some platforms do, so there’s probably plenty of it there.
[delayed]
> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".
https://news.ycombinator.com/newsguidelines.html
It's the same technique that people on Reddit use to take down subreddits that don't agree with the carefully curated "hive mind".
I don’t know why you are downvoted, this is absolutely what happened semi-frequently until Reddit was finally forced to crack down on it. The same thing happened on Twitter/X for a while where bots would mass reply to targeted users with gore and CSAM.
I've been seeing something similar on some youtube videos, endless unflagged comments advocating hatred and violence, completely unrelated to the video topic or channel.
It's unlikely law enforcement would take the risk to handle CSAM just to make a case against a Russian pirate, jeopardizing their careers and freedom, when the copyright case is pretty strong already.
These are the doings of one of the myriad freelance "intelectual rights enforcement agents", which are paid on success and employed by some large media organization. Another possibility is that a single aggrieved individual who found themselves doxed or their criminal conviction archived etc. took action after failing to enforce their so called "right to be forgotten".
Unfortunately, archive.is operating model is uniquely vulnerable to such false flag attacks.
It’s grimly hilarious that anyone in 2025 believes the police wouldn’t do something because that thing is unethical and against their own standards.
> handle CSAM
They wouldn’t “handle” it, they’d have some third party do their dirty work.
The FBI has a large archive of CSAM used for content ID:
https://cybernews.com/editorial/war-on-child-exploitation/
Of course in a pinch it could also be used for other things like pretext.
This is probably the realm of intelligence agencies, who have less accountability and many reasons to eliminate public archives (primarily perception management).
I don't know anything about Adguard, but good on the team for doing the extra digging instead of just going along with the claim. Even better that they're sharing what they've found with everyone else.
Yes kudo. The pressure could simply be inferred as due to the arrogant trend one can observe, the editing of history.
The amount of forces seemingly actively trying to kill the internet of old is disconcerting.
Chat control, DNS as arbiter of whats allowed, walled gardens etc.
Don’t forget cloudflare
The wording and tone of the emails sent to Adguard reads just like phishing emails with a hint of political SMS spam. Glad to see the people behind there thinking critically and acting rationally despite such language.
A few weeks ago I noticed DNS4EU couldn’t resolve archive.is and assumed it was just a configuration mistake. I emailed them about it, and after a couple of days or weeks (not really sure) the domain started resolving again. Given AdGuard’s recent report about suspicious pressure on DNS providers to block Archive.today, I’m starting to wonder if DNS4EU’s temporary block was actually related to the same campaign
The wording in that follow-up email is so emotive it reads more like a Tweet than formal contact from a federal organisation.
That in itself is quite shocking really.
I still can't wrap my head around why a DNS provider is required to block websites, especially one that is not associated with ISP or used as default on any device. Oversimplifying this, it's a glorified hash map, so whoever wants to take down the illegal content should just deal with the website owner?
Presumably they have failed to do the latter and are just reaching at this point.
https://archive.is/Nirff
This just shows that LCEN, DMCA, etc are poorly crafted laws. They ineffectually stop the abuse they claim to end (like copyright infringement). But it does allow large organizations a cudgel to protect their own IP.
I think they’re well crafted laws because I think that’s their intended purpose.
“The purpose of a system is what it does.”
...or their goal is simply not what they advertise.
So they're pressuring a DNS resolver to block a specific website? That seems like an incredibly slippery slope.
What stops them from forcing Chrome to block the website, or LetsEncrypt to not issue any more certificates for the domain, or Microsoft and Apple to add them to their firewalls? Hell, can they go after the infrastructure software developers and say, force nginx to add a check and refuse to serve the domain?
Then what happens when a fake report is sent to an open source project without budget for lawyers?
The FBI investigation might be a coincidence. Unsurprisingly, archive.today is attacked with CSAM uploads+reports all the time, you can find occasional mentions of this in their blog from 3 and 9 years ago, and I bet there was a ton of this in between.
I speculate, and the conspiracy theorist in me believes, something of a compromising nature has been archived and they want that data inaccessible, but at the same time, pointing out what they want hidden would shine a light on it.
It is even more interesting the US government is coming after archive.today at the same time, or maybe that is just a coincidence, and this is just a tech-savvy philanderer trying to hide something from his wife.
If we're speculating, there is another reason to censor archiving site - if you recently committed well documented genocide and want the evidence erased. Given the systematic removal of such content from social media, it would not be surprising if this was related.
I used the site several times to archive some page or send it to someone who cannot access the site directly. I never archived anything illegal and never stumbled upon illegal things there. So I don't know why they want to arrest the owner.
Also the site is pretty advanced, it can handle complicated sites and even social networks.
> But because it can also be used to bypass paywalls
How? Does the site pay for subscription for every newspaper?
> Unfortunately, we couldn’t dig any deeper about who exactly is behind WAAD.
That's a red flag. Why would an NGO doing work for the public hide its founder(s) and information about itself? Using NGOs to suggest/promote/lobby certain decisions is a well known trick in authoritarian countries to pretend the idea is coming from "the people", not from the government. I hope nobody falls for such tricks today.
Furthermore, they seem to have no way to donate them money. That's even the redder flag.
Also France doesn't have a good reputation in relation to the observing rule of law. For example, they arrested Russian agent^w enterpreneur Durov, owner of Telegram, claiming they have lot of evidence against him involved in drug trafficking, fraud and money laundering [1], but a year later let him free (supposedly after he did what they wanted). France also bars popular unwanted candidates from elections. Both these cases strongly resemble what Russia does.
[1] https://en.wikipedia.org/wiki/Arrest_and_indictment_of_Pavel...
France possibly found a way to pressure Durov into cooperating. Preempting similar actions by Russia. Classic intelligence methods to get someone to come over to the other side.
Perhaps the DGSE also got to plug a cable in to the Telegram infrastructure, which would be huge plus for them and the west in general not in the least because of the war. You could say France has pwnd Durov.
If I'm not mistaken some significant arrest was made shortly after they captured Durov, in the case of this child exploitation stuff.
>> But because it can also be used to bypass paywalls
> How? Does the site pay for subscription for every newspaper?
Someone with a subscription logs into the site, then archives it. Archive.is uses the current user's session and can therefore see the paywalled content.
Do they have such an option? I don't see it on the site, and the browser extension seems to send only the URL [1] to the server. Can you provide more information?
[1] https://github.com/JNavas2/Archive-Page/blob/main/Firefox/ba...
Does it still leak your IP, e.g. if the page rendered by the site you're archiving includes it? You'd think they'd create a simple filter to redact that out.
Finally someone does some digging
Archive.is doesn't work on all sites to bypass the paywall. Media companies that are truly concerned about this should modify their paywall configuration.
I've seen some theories or maybe more like guesses as to how the paywall bypass works - I don't think anyone (or at least no one posting places like here) seems to know.
One I saw suggested they've a set of subscriptions to the paywalled sites and some minimal custom work to hide the signed in account used - which seems plausible. That makes the defense most likely used to catch the account used and ban them - which would be a right pain.
Different question, but what are realistic use cases of archive.today that could be interesting for average person?
Have you ever wanted to reread an old article/blog from a ling dead website? Needed to compare the old TOS with the latest TOS? Been looking for what was that video on your playlist about that got removed from youtube? Things like that are fairly average. It can also be helpful for dispute resolutuon and holding public parties to account.
On behalf of all Indians I apologize. An Indian who read the email you received would be able to quickly guess that email originated from India. We are the global hub of scam call centers. We also write emails just like in that text. Run the text of that email via LLM. Ask the LLM: How likely is it that this email is a scam from India? Answer:80%
You don't need to apologise for something you didn't do just because it happened in your country.
archive.is is frequently used to bypass paywalls, I wonder if this is motivated by that somehow
DOGE is busy replacing official US govt websites, and does not want anyone bringing up the past.
If they were any good at it, they'd have blocked the Internet Archive via robots.txt. For some inexplicable reason, IA responds to that by wiping out past, present, and future archivals of that site. They haven't taken that easy step, so I doubt they'd go the further, more involved step of focusing on this smaller actor.
IA also blocks some content in Russia, for example this [1] says: "This URL has been excluded from the Wayback Machine in your region.". I was sincerely surprised to learn that while not paying much attention to US copyright law, they have high respect for messages from Vladimir.
(in case someone is curious what about is that article, it is a fictional comparison of life of a fictional character in Springfield, USA and Chusovoy, USSR in 80s and I cannot even understand why it was banned in Russia)
[1] https://web.archive.org/web/20250418160713/https://habr.com/...
The Wayback Machine has ignored robots.txt for a few years at this point. The only way to get them to stop scraping or remove content is by asking them directly.
That's most likely the reason pressure is being put on them. Big media companies successfully shutdown 12ft.io, which was used to bypass paywalls, and forced the BPC (Bypass Paywalls Chrome) browser extension off the Mozilla Extension store, then Gitlab, then Github. Now the dev is hosting it on a Russian Github clone, presumably making it untouchable.
Since archive[.]today is using some very obscure hosting methods with multiple international mirrors, it makes it incredibly difficult for law enforcement to go after.
What obscure methods are they using?
I guess it might fall under a bulletproof hosting type of setup. [1] There have been many people investigating to try and figure out who owns & operates who is actually behind archive[.]today and how they're continuously able to bypass the paywalls of paid sites, continue operating with such large infrastructure with no apparent income source.
There was quite a good article posted here on HN about someone trying to figure out those questions, but I can't seem to find it.
[1] https://en.wikipedia.org/wiki/Bulletproof_hosting
Isn't it just a question of pretending to be a search bot ? Sites will allow google bot to bypass the paywall so stuff gets indexed.
100%. It's like Lenin said, you look for the person who will benefit… and, uh, uh, you know… You know, you'll uh, uh—well, you know what I'm trying to say…
I’m not sure what you’re referencing, but the principal goes back way back to the Romans: Cui bono? [0]
0: https://en.wikipedia.org/wiki/Cui_bono%3F
They're referencing this: https://m.youtube.com/watch?v=HlZhPuDYqbU