I live in a country where the selection of available books, especially in English, is very limited. Buying online from foreign markets comes with a long list of administrative hurdles and limits.
If it were not for Anna's Archive and Z-Library, I would've never been able to read the books that shaped who I am today, or keep my passion for learning alive.
Thanks, AA and ZLib! (Also, thank you to the authors whose books and knowledge I consumed without being able to pay them back.)
It seems like bounties for new sources of training data would be useful to the big model builders. I follow a guy who hoards vast quantities of old analog media of all kinds, a lot of it local. Bounties could be a way for him to cash in. But I'm not sure if it's an appreciating asset or if they'll find it anyway and it'll lose its value.
Lying about your assets to avoid paying a lawful fine is criminal. Just because they can’t see your money doesn’t mean they can’t prove that you have it, and can’t jail you for hiding it to get out paying a fine.
I wonder how long it will be before they offer bounties for internet scrapes.
Cloudflare captchas have made the internet unusable for me, and I'm sure it will only get worse over time. I'd much rather just browse (or even torrent) a copy of archive.is or similar. The latter would be much better for privacy, and hey, I run ad blockers anyway.
The US should just find a way to quietly share literature access with the Russians, rather than letting piracy be promoted and facilitated for US consumers as freedom-fighter "archiving".
Between all the piracy, and all the AI training and the purchase/visitor-circumventing AI services, the practice of writing and publishing genuinely good work is being wiped out.
We're killing the goose that lays the eggs, for selfish gain.
This ship has sailed for academic publications, and academics define that term very liberally. The shadow libraries started off as a way for scholars in ex-Soviet countries in particular (but also India, SE Asia, etc.) to access literature that simply wasn’t available in their country. But the shadow libraries proved so successful and convenient that researchers in all countries are using them now, even if they have access to official subscription services. I use AA several times a day and so do the researchers around me in my office; at conferences, if the presenter mentions an interesting publication, the whole room immediately opens AA on their laptop, etc.
Even if projects like AA didn’t have nation-state-level support, academics would find a way to keep as much of it as possible going. After all, we’re the ones who do all the hard work of scanning from our institutional libraries stuff that doesn’t exist anywhere in digital form.
Possibly but this act of governmental self-harm is useful to The People. We live in a world where if your valuation is ~1T you can more or less just do what you like. And the work of The People is stolen from you and launderd.
In such a world, isnt it useful that governments are stupid enough to give adversaries reasons to undermine it? When the government props up a corporate tyranny domestically, and racketeering, should we make a temporary alliance with all its enemies?
(Eg., the provision to AI companies of all corporate secretes and competitive practices via prompts, eventually to be used against their capital interests and their labour interests).
That’s misunderstanding why these models are behind. A large part of why they’re behind is they aren’t able to do the reinforcement learning post-training steps that takes a pre-trained model and turns it into a frontier model like GPT 5 or Opus. Instead they do their best to recreate these models using distillation.
Fundamentally, you can never distill your way to being the teacher, so these approaches will not advance the frontier.
Chinese companies giving away expensive models for free is a symptom of the AI bubble, too. It's not a law of nature that they'll always be able to scrounge up the money for yet another training run.
Shaping the tool that does the thinking is quite valuable when you're in the business of changing how people think - I think we can expect propaganda agencies to be subsidizing model creation forever.
This doesn't strike me as a symptom of a bubble - except in so far as the bubble pushes the competitors models forwards and thus they need to invest more to stay competitive.
I think it's a deliberate business strategy of commoditization of their complement. China acts like an entire bloc, not single companies, and they want to monetize hardware.
I live in a country where the selection of available books, especially in English, is very limited. Buying online from foreign markets comes with a long list of administrative hurdles and limits.
If it were not for Anna's Archive and Z-Library, I would've never been able to read the books that shaped who I am today, or keep my passion for learning alive.
Thanks, AA and ZLib! (Also, thank you to the authors whose books and knowledge I consumed without being able to pay them back.)
https://SourceLibrary.org has about 16,000 rare books translated and 50,000 archived. More tokens than English Wikipedia and about .75 petabytes
It seems like bounties for new sources of training data would be useful to the big model builders. I follow a guy who hoards vast quantities of old analog media of all kinds, a lot of it local. Bounties could be a way for him to cash in. But I'm not sure if it's an appreciating asset or if they'll find it anyway and it'll lose its value.
Who is behind Annas archive, there is a lot of english speakers involved in the team and forums! Anyway as long as buying isn´t owning no issues here.
Anyone afraid of being laid off at google right now? Perhaps this is a backup :)
I think if you get caught exfiltrating data they'll sue you for much more than $200K.
Copy data into extra large capacity micro sdcard and hide it in your rubiks cube, nobody will suspect a thing
If your money is in private crypto or offshore you have nothing to worry about.
Except perhaps jail time.
Lying about your assets to avoid paying a lawful fine is criminal. Just because they can’t see your money doesn’t mean they can’t prove that you have it, and can’t jail you for hiding it to get out paying a fine.
I wonder how long it will be before they offer bounties for internet scrapes.
Cloudflare captchas have made the internet unusable for me, and I'm sure it will only get worse over time. I'd much rather just browse (or even torrent) a copy of archive.is or similar. The latter would be much better for privacy, and hey, I run ad blockers anyway.
https://x.com/CloudflareDev/status/2031488099725754821
Well, there is this little conflict of interest
https://xcancel.com/CloudflareDev/status/2031488099725754821
The US should just find a way to quietly share literature access with the Russians, rather than letting piracy be promoted and facilitated for US consumers as freedom-fighter "archiving".
Between all the piracy, and all the AI training and the purchase/visitor-circumventing AI services, the practice of writing and publishing genuinely good work is being wiped out.
We're killing the goose that lays the eggs, for selfish gain.
This ship has sailed for academic publications, and academics define that term very liberally. The shadow libraries started off as a way for scholars in ex-Soviet countries in particular (but also India, SE Asia, etc.) to access literature that simply wasn’t available in their country. But the shadow libraries proved so successful and convenient that researchers in all countries are using them now, even if they have access to official subscription services. I use AA several times a day and so do the researchers around me in my office; at conferences, if the presenter mentions an interesting publication, the whole room immediately opens AA on their laptop, etc.
Even if projects like AA didn’t have nation-state-level support, academics would find a way to keep as much of it as possible going. After all, we’re the ones who do all the hard work of scanning from our institutional libraries stuff that doesn’t exist anywhere in digital form.
Possibly but this act of governmental self-harm is useful to The People. We live in a world where if your valuation is ~1T you can more or less just do what you like. And the work of The People is stolen from you and launderd.
In such a world, isnt it useful that governments are stupid enough to give adversaries reasons to undermine it? When the government props up a corporate tyranny domestically, and racketeering, should we make a temporary alliance with all its enemies?
(Eg., the provision to AI companies of all corporate secretes and competitive practices via prompts, eventually to be used against their capital interests and their labour interests).
So AA is a front for openai?
How did you come to that conclusion?
the bounty would be a bit higher with openAI money behind it
Some more interesting bounties they offer: https://software.annas-archive.gl/AnnaArchivist/annas-archiv...
> Purchase all Library of Congress MARC datasets — $3,000 bounty
> English Wikipedia pages about relevant institutions — up to $100 per new page
> Internet Archive Digital Lending — $5000 per 1 million pdf files
> Text version of our full library — $20,000
...
Piracy / copyright predictions?
The current situation feels untenable with renting. So many regular people I know have learned about VPN, NAS, etc.
It was never sustainable, just regulatory capture by large IP owners.
Spotify, Netflix, Amazon etc provided OK value for a while, but now enshitification is biting, this is due a massive comeback.
Hopefully the guillotines. Look up how much the authors and artists who create the actual work get paid.
Curious as to how you would approach this. I have no experience in this area, anyone on this forum willing to share their expertise?
One of my hopes is that when the AI bubble bursts, some brave person will sneak out a copy of the last frontier model.
Not worried about that, you will only have to wait 3-6 months and get a Chinese model just as good.
That’s misunderstanding why these models are behind. A large part of why they’re behind is they aren’t able to do the reinforcement learning post-training steps that takes a pre-trained model and turns it into a frontier model like GPT 5 or Opus. Instead they do their best to recreate these models using distillation.
Fundamentally, you can never distill your way to being the teacher, so these approaches will not advance the frontier.
Chinese companies giving away expensive models for free is a symptom of the AI bubble, too. It's not a law of nature that they'll always be able to scrounge up the money for yet another training run.
Shaping the tool that does the thinking is quite valuable when you're in the business of changing how people think - I think we can expect propaganda agencies to be subsidizing model creation forever.
This doesn't strike me as a symptom of a bubble - except in so far as the bubble pushes the competitors models forwards and thus they need to invest more to stay competitive.
All the models, have to respect their local laws, and most of all, pressure from users and the employees.
They all carry political weights, because humans behind defend their interests, and are promoting some social values.
https://pastebin.com/hjhvsBFg
This answer from Claude is so biased that it is ridiculous
I think it's a deliberate business strategy of commoditization of their complement. China acts like an entire bloc, not single companies, and they want to monetize hardware.
If it's a bubble, why do you care about frontier models?
Prediction markets can solve this.