I agree with the thrust of this article, that norms and what we perceive as good or desirable extend considerably beyond the minimum established by law.
But a point that was not made strongly, which highlights this even more, is that this goes in every direction.
If this kind of reimplementation is legal, then I can take any permissive OSS and rebuild it as proprietary. I can take any proprietary software and rebuild it as permissive. I can take any proprietary software and rebuild it as my own proprietary software.
Either the law needs to catch up and prevent this kind of behavior, or we're going to enter an effectively post-copyright world with respect to software. Which ISN'T GOOD, because that will disincentivize any sort of open license at all, and companies will start protecting/obfuscating their APIs like trade secrets.
Someone should put this to the test. Take the recently leaked Minecraft source code and have Copilot build an exact replica in another programming language and then publish it as open source. See if Microsoft believes AI is copyright infringement or not.
As described, this would not be the same thing. If the AI is looking at the source and effectively porting it, that is likely infringement. The idea instead should be "implement Minecraft from scratch" but with behavior, graphics, etc. identical. Note that you'll need to have an AI generate assets or something since you can't just reuse textures and models.
AI models have already looked at the source of GPL software and contain it in their dataset. Adding the minecraft source to the mix wouldn't seem much different. Of course art assets and trade marks would have to be replaced. But an AI "clean room" implementation has yet to be legally tested.
The big question is: if copyrighted material was used in the training material, is the LLM's output copyright infringement when it resembles the training material? In your example, you are taking the copyrighted material and giving it to the LLM as input and instructing the LLM to process it. Regardless of where the legal cards fall, this is a much less ambiguous scenario.
Wow, it feels like this argument rewired my brain.
When I first read about the chardet situation, I was conflicted but largely sided on the legal permissibility side of things. Uncomfortably I couldn't really fault the vibers; I guess I'm just liberal at heart.
The argument from the commons has really invoked my belief in the inherent morality of a public good. Something being "impermissible" sounds bad until you realize that otherwise the arrow of public knowledge suddenly points backwards.
Seeing this example play out in real life has had retroactive effects on my previously BSD-aligned brain. Even though the argument itself may have been presented before, I now understand the morals that a GPL license text underpins better.
The really interesting question to me is if this transcends copyright and unravels the whole concept of intellectual property. Because all of it is premised on an assumption that creativity is "hard". But LLMs are not just writing software, they are rapidly being engineered to operate completely generally as knowledge creation engines: solving math proofs, designing drugs, etc.
So: once it's not "hard" any more, does IP even make sense at all? Why grant monopoly rights to something that required little to no investment in the first place? Even with vestigial IP law - let's say, patents: it just becomes and input parameter that the AI needs to work around the patents like any other constraints.
> So: once it's not "hard" any more, does IP even make sense at all? Why grant monopoly rights to something that required little to no investment in the first place? Even with vestigial IP law - let's say, patents: it just becomes and input parameter that the AI needs to work around the patents like any other constraints.
I think it still does: IIRC, the current legal situation is AI-output does not qualify for IP protections (at least not without substantial later human modification). IP protections are solely reserved for human work.
And I'm fine with that: if a person put in the work, they should have protections so their stuff can't be ripped off for free by all the wealthy major corporations that find some use for it. Otherwise: who cares about the LLMs.
I think you have a rather idealized model of IP in mind. In practice, IP law tends to be an expensive weapon the wealthy major corporations use against the little guy. Deep enough pockets and a big enough warchest of broad parents will drain the little guy every time.
If you think about creative outcomes as n dimensional 'volumes', AI expressions can cover more than humans in many domains. These are precisely artistic styles, music styles etc. and tbh not everyone can be a Mozart but may be a lot more with AI can be Mozart lite. This begs the question how much of creativity is appreciated as a shared experience
It might unravel intellectual property, just not in a fair way. When capitalism started, public land was enclosed to create private property. Despite this being in many cases a quite unfair process, we still respect this arrangement.
With AI, a similar process is happening - publicly available information becomes enclosed by the model owners. We will probably get a "vestigial" intellectual property in the form of model ownership, and everyone will pay a rent to use it. In fact, companies might start to gatekeep all the information to only their own LLM flavor, which you will be required to use to get to the information. For example, product documentation and datasheets will be only available by talking to their AI.
Don't worry. The courts have consistently sided with huge companies on copyright. In the US. In Europe. Doesn't matter.
Company incorporates GPL code in their product? Never once have courts decided to uphold copyright. HP did that many times. Microsoft got caught doing it. And yet the GPL was never applied to their products. Every time there was an excuse. An inconsistent excuse.
Schoolkid downloads a movie? 30,000 USD per infraction PLUS armed police officer goes in and enforces removal of any movies.
Or take the very subject here. AI training WAS NOT considered fair use when OpenAI violated copyright to train. Same with Anthropic, Google, Microsoft, ... They incorporated harry potter and the linux kernel in ChatGPT, in the model itself. Undeniable. Literally. So even if you accept that it's changed now, OpenAI should still be forced to redistribute the training set, code, and everything needed to run the model for everything they did up to 2020. Needless to say ... courts refused to apply that.
So just apply "the law", right. Courts' judgement of using AI to "remove GPL"? Approved. Using AI to "make the next Disney-style movie"? SEND IN THE ARMY! Whether one or the other violates the law according to rational people? Whatever excuse to avoid that discussion is good enough.
I think the missing thing here is that the license violation already happened. Most of the big models trained on data in a manner that violated terms of service. We'll need a court case but I think it's extremely reasonable to consider any model trained on GPL code to be infected with open licensing requirements.
You might wish that were true, but there are very strong arguments it's not. Training on copyleft licensed code is not a license violation. Any more than a person reading it is. In copyright terms, it's such an extreme transformative use that copyright no longer applies. It's fair use.
But agreed that we're waiting for a court case to confirm that. Although really, the main questions for any court cases are not going to be around the principle of fair use itself or whether training is transformative enough (it obviously is), but rather on the specifics:
1) Was any copyrighted material acquired legally (not applicable here), and
2) Is the LLM always providing a unique expression (e.g. not regurgitating books or libraries verbatim)
And in this particular case, they confirmed that the new implementation is 98.7% unique.
> Training on copyleft licensed code is not a license violation. Any more than a person reading it is.
Some might hold that we've granted persons certain exemptions, on account of them being persons. We do not have to grant machines the same.
> In copyright terms, it's such an extreme transformative use that copyright no longer applies.
Has the model really performed an extreme transformation if it is able to produce the training data near-verbatim? Sure, it can also produce extremely transformed versions, but is that really relevant if it holds within it enough information for a (near-)verbatim reproduction?
I agree there has to be a court case about it. I think the current argument, however, is that it is transformative, and therefore falls under fair use.
Yea, a finding that training is transformative would be pretty significant and it's likely that the precedent of thumbnail creation being deemed transformative would likely steer us towards such a finding. Transformative is always a hard thing to bank on because it is such a nebulous and judgement based call. There are excellent examples of how precise and gritty this can get in audio sampling.
Didn't know about thumbnails being fair use. In that case, I just don't see an argument that genAI training on source code is less transformative than thumbnails.
> He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch.
From GPL2:
> The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable.
Is a project's test suite not considered part of its source code?
If the test suite is part of this library's source code, and Claude was fed the test suite or interface definition files, is the output not considered a work based on the library under the terms of LGPL 2.1?
Legally, using the tests to help create the reimplementation is fine.
However, it seems possible you can't redistribute the same tests under the MIT license. So the reimplementation MIT distribution could need to be source code only, not source code plus tests. Or, the tests can be distributed in parallel but still under LGPL, not MIT. It doesn't really matter since compiled software won't be including the tests anyways.
Google vs Oracle ruled that APIs fall under copyright (the contrary was thought before). However, it was ruled that, in that specific case, fair use applied, because of interoperability concerns. That's the important part of this case: fair use is never automatic, it is assessed case by case.
Regarding chardet, I'm not sure "I wanted to circumvent the license" is a good way to argue fair use.
I believe it is a narrow view of the situation. If we take a look into the history, into the reasons for inventing GPL, we'll see that it was an attempt to fight copyrights with copyrights. The very name 'copyleft' is trying to convey the idea.
What AI are eroding is copyright. You can re-implement not just a GPL program, but to reverse engineer and re-implement a closed source program too, people have demonstrated it already, there were stories here on HN about it.
AI is eroding copyright, so there may no longer be a need for the GPL. GNU should stop and rethink its stance, chuck away the GPL as the main tool to fight evil software corporations and embrace LLM as the main weapon.
LLM's - to date - seem to require massive capital expenditures to have the highest quality ones, which is a monumental shift in power towards mega corporations and away from the world of open source where you could do innovative work on your own computer running Linux or FreeBSD or some other open OS.
I don't think that's an exciting idea for the Free Software Foundation.
Perhaps with time we'll be able to run local ones that are 'good enough', but we're not there yet.
There's also an ethical/moral question that these things have been trained on millions of hours of people's volunteer work and the benefits of that are going to accrue to the mega corporations.
Edit: I guess the conclusion I come to is that LLM's are good for 'getting things done', but the context in which they are operating is one where the balance of power is heavily tilted towards capital, and open source is perhaps less interesting to participate in if the machines are just going to slurp it up and people don't have to respect the license or even acknowledge your work.
>Perhaps with time we'll be able to run local ones that are 'good enough', but we're not there yet.
Right now, we can get local models that you can run on consumer hardware, that match capabilities of state of the art models from two years ago. The improvements to model architecture may or may not maintain the same pace in the future, but we will get a local equivalent to Opus 4.6 or whatever other benchmark of "good enough" you have, in the foreseeable future.
> LLM's - to date - seem to require massive capital expenditures to have the highest quality ones, which is a monumental shift in power towards mega corporations and away from the world of open source
Yeah, a bit of a conundrum. But I don't think that fighting for copyright now can bring any benefits for FOSS. GNU should bring Stallman back and see whether he can come with any new ideas and a new strategy. Alternatively they could try without Stallman. But the point is: they should stop and think again. Maybe they will find a way forward, maybe they won't but it means that either they could continue their fight for a freedom meaningfully, or they could just stop fighting and find some other things to do. Both options are better then fighting for copyright.
> There's also an ethical/moral question that these things have been trained on millions of hours of people's volunteer work and the benefits of that are going to accrue to the mega corporations.
I want a clarify this statement a bit. The thing with LLM relying on work of others are not against GPU philosophy as I understand it: algorithms have to be free. Nothing wrong with training LLMs on them or on programs implementing them. Nothing wrong with using these LLMs to write new (free) programs. What is wrong are corporations reaping all the benefits now and locking down new algorithms later.
I think it is important, because copyright is deemed to be an ethical thing by many (I think for most people it is just a deduction: abiding the law is ethical, therefore copyright is ethical), but not for GNU.
IMO the primary significant trend in AI. Doesn't get talked about nearly enough. Means the AI is working, I guess.
>GNU should bring Stallman back ... Alternatively they could try without Stallman.
Leave Britney alone >:(
>copyright is deemed to be an ethical thing by many (I think for most people it is just a deduction: abiding the law is ethical, therefore copyright is ethical)
I've busted out "intellectual property is a crime against humanity" at layfolk to see if that shortcuts through that entire little politico-philosophical minefield. They emote the requisite mild shock when such things as crimes against humanity are mentioned; as well as at someone making such a radical statement which seems to come from no familiar species of echo chamber; and then a moment later they begin to very much look like they see where I'm coming from.
How do you even argue such a thing? I've had no such luck, I've met many people who seem to view copyright and a person owning their ideas and work as a sort of inherent moral.
Not saying this gets through to people, but copyright is purely about the legal ability to restrict what other people do. Whereas property rights are about not allowing others to restrict what you do (e.g. by taking your stuff).
> LLM's - to date - seem to require massive capital expenditures to have the highest quality ones, which is a monumental shift in power towards mega corporations and away from the world of open source where you could do innovative work on your own computer running Linux or FreeBSD or some other open OS.
When the FSF and GPL were created, I don't think this was really a consideration. They were perfectly happy with requiring Big Iron Unix or an esoteric Lisp Machine to use the software - they just wanted to have the ability to customize and distribute fixes and enhancements to it.
> LLM's - to date - seem to require massive capital expenditures to have the highest quality ones
There are near-SOTA LLM's available under permissive licenses. Even running them doesn't require prohibitive expenses on hardware unless you insist on realtime use.
How close are we to good enough and who's working on that? I would be interested in supporting that work; to my mind, many of the real objections to LLMs are diminished if we can make them small and cheap enough to run in the home (and, perhaps, trained with distributed shared resources, although the training problem is the harder one).
Is massive capital expenditure not also required to enforce the GPL? If some company steals your GPLed code and doesn't follow the license, you will have to sue them and somebody will have to pay the lawyers.
> Is massive capital expenditure not also required to enforce the GPL?
It's nowhere near the order of magnitude of the kind of spending they're sinking into LLM's. The FSF and other groups were reasonably successful at enforcing the GPL, operating on a budget 1000's of times smaller than that of AI companies.
Right but LLM companies are building frontier models with frontier talent while trying to sock up demand with a loss leader strategy, on top of an historic infrastructure build out.
Being able to coat efficiently run frontier models is i think, not a high priced endeavor for an org (compared to an individual).
IMO the proposition is little fishy, but its not totally without merit and imo deserves investigation. If we are all worried about our jobs, even via building custom for sale software, there is likely something there that may obviate the need at least for end user applications. Again, im deeply skeptical, but it is interesting.
> Being able to coat efficiently run frontier models is i think, not a high priced endeavor for an org
Running proprietary model would make you subject to whatever ToS the LLM companies choose on a particular day, and what you can produce with them, which circles back to the raison d'etre for the GPL and GNU.
Until all software copyright is dead and buried, there is no need for copyleft to change tack. Otherwise there rising tide may rise high enough to drown GPL, but not proprietary software.
Open source is easier to counterfeit/license-launder/re-implement using LLMs because source code is much lower-hanging fruit, and is understood by more people than closed-source assembly.
> There's also an ethical/moral question that these things have been trained on millions of hours of people's volunteer work and the benefits of that are going to accrue to the mega corporations.
This was already the case and it just got worse, not better.
At a certain point, I think we had reached a kind of equilibrium where some corporations were decent open source citizens. They understood that they could open source things like infrastructure or libraries and keep their 'crown jewels' closed. And while Stallman types might not have been happy with that, it seemed to work out for people.
Now they've just hoovered up all the free stuff into machines that can mix it up enough to spit it out in a way that doesn't even require attribution, and you have to pay to use their machine.
AI essentially gatekeeps all of open source to companies to pluck from to their hearts content. And individual contributors using these tools and freely mixing it with their own - usual minor - contributions are another step of whitewashing because they're definitely not going to own up to writing only 5% of the stuff they got paid for.
Before we had RedHat and Ubuntu, who at least were contributing back, now we have Microsoft, Anthropic and OpenAI who are racing to lock the barn door around their new captive sheep. It's just a massive IP laundromat.
Copyleft is a mirror of copyright, not a way to fight copyright. It grants rights to the consumer where copyright grants rights to the creator. Importantly, it gives the end-user the right to modify the software running on their devices.
Unfortunately, there are cases where you simply can't just "re-implement" something. E.g., because doing so requires access to restricted tools, keys, or proprietary specifications.
"So, I looked for a way to stop that from happening. The method I came up with is called “copyleft.” It's called copyleft because it's sort of like taking copyright and flipping it over. [Laughter] Legally, copyleft works based on copyright. We use the existing copyright law, but we use it to achieve a very different goal."
"very different goal" isn't the same as "fundamentally destroying copyright"
the very different goal include to protect public code to stay public, be properly attributed, prevent companies from just "sizing" , motivate other to make their code public too etc.
and even if his goals where not like that, it wouldn't make a difference as this is what many people try to archive with using such licenses
this kind of AI usage is very much not in line with this goals,
and in general way cheaper to do software cloning isn't sufficient to fix many of the issues the FOSS movement tried to fix, especially not when looking at the current ecosystem most people are interacting with (i.e. Phones)
---
("sizing"): As in the typical MS embrace, extend and extinguish strategy of first embracing the code then giving it proprietary but available extensions/changes/bug fixes/security patches to then make them no longer available if you don't pay them/play by their rules.
---
Through in the end using AI as a "fancy complicated" photocopier for code is as much removing copyright as using a photocopier for code would. It doesn't matter if you use the photocopier blind folded and never looked at the thing you copied.
That’s not a rebuttal of the OP’s point. None of that says anything about fighting copyright. It literally says he flipped it which is wha the OP said when they said it’s a mirror.
> AI is eroding copyright, so there may no longer be a need for the GPL. GNU should stop and rethink its stance, chuck away the GPL as the main tool to fight evil software corporations and embrace LLM as the main weapon.
Is this LLM thing freely available or is it owned and controlled by these companies? Are we going to rent the tools to fight "evil software corporations"?
There already are LLMs with open weights that are better at code than state of the art closed source models from a year ago. For now, you most people may have to rent the hardware to run those models, since it's too expensive for most people to own something that can run inference on one trillion parameters, but I wouldn't consider LLMs to be controlled by "evil software corporations" at this point.
With the release of GLM-5, I would say that they are pretty much almost as good. Basically 90% as good as Opus 4.6 on most tasks for 20% of inference cost, and open weights.
I agree with almost all of that, except the part about GNU changing their stance. I think GNU should stay true and consistent, if for no other reason than to not make many of their supporters who aren't on board with AI feel betrayed and have GNUs legacy soured. If the cause of LLMs conquering proprietary software needs an organization to champion it, let that be a new organization, not GNU.
Its purpose "if you run the software you should be able to inspect and modify that software, and to share those modifications with your peers" not explicitly resist copyright. Yes copyright is bad in that it often prevents one from doing that, but it is not the purpose of the GPL to dismantle copyright.
Reducing it to "well you can clone the proprietary software you're forced to use by LLM" is really missing the soul of the GPL.
Just because something is copyleft doesn't mean the person who gave you the binary you're using has to supply you with the code the used to build it. That's what the GPL does.
> we'll see that it was an attempt to fight copyrights with copyrights
it's not that simple
yes, GPLs origins have the idea of "everyone should be able to use"
but it also is about attribution the original author
and making sure people can't just de-facto "size public goods"
the kind of AI usage is removing attribution and is often sizing public goods in a way far worse then most companies which just ignored the license did
so today there is more need then ever in the last few decades for GPL like licenses
That's naive. Copyright doesn't just apply to software. There already have been countless lawsuits about copying music long before the term "open source" was invented. No, changing the lyrics a bit doesn't circumvent copyright. Nor does translating a Stephen King novel to German and switching the names of the places and characters.
A court ordered the first Nosferatu movie to be destroyed because it had too many similarities to Dracula. Despite the fact that the movie makes rather large deviations from the original.
If Claude was indeed asked to reimplement the existing codebase, just in Rust and a bit optimized, that could well be a copyright violation. Just like rephrasing A Song ot Ice and Fire a bit, and switching to a different language, doesn't remove its copyright.
> Just like rephrasing A Song ot Ice and Fire a bit, and switching to a different language, doesn't remove its copyright.
There is some precedent for this, e.g. Alchemised is a recent best seller that had just enough changed from its Harry Potter fan fiction source in order to avoid copyright infringement: https://en.wikipedia.org/wiki/Alchemised
(I avoided the term “remove copyright” here because the new work is still under copyright, just not Harry Potter - related copyright.)
Claude was asked to implement a public API, not an entire codebase. The definition of a public API is largely functional; even in an unusually complex case like the Java standard facilities (which are unusually creative even in the structure and organization of the API itself) the reimplementation by Google was found to be fair use.
> Claude was asked to implement a public API, not an entire codebase.
Allegedly. There have been several people who doubted this story. So how to find out who is right? Well, just let Claude compare the sources. Coincidentally, Claude Opus 4.6 doesn't just score 75.6% on SWE-bench Verified but also 90.2% on BigLaw Bench.
It's like our copyright lawyer is conveniently also a developer. And possibly identical to the AI that carried out the rewrite/reimplemention in question in the first place.
So not only are we moving goalposts here, but we've decided the GNU team should join the other team? I don't understand how GNU would see mass model LLM training as anything but the most flagrant violations of their ethos. LLM labs, in their view, would be among the most evil software corporations to have ever existed.
While I personally agree with you, Richard Stallman (the creator of the GPL) does not. He has always advocated in favor of strong copyright protection, because the foundation of the GPL is the monopoly power granted by copyright. The problem that the GPL is intended to solve is proprietary software.
Generative models (AI) are not really eroding copyright. They are calling its bluff. The very notion of intellectual property depends on a property line: some arbitrary boundary where the property begins and ends. Generative models blur that line, making it impractical to distinguish which property belongs to whom.
Ironically, these models are made by giant monopolistic corporations whose wealth is quite literally a market valuation (stock price) of their copyrights! If generative models ever become good enough to reimplement CUDA, what value will NVIDIA have left?
The reality is that generative models are nowhere near good enough to actually call the bluff. Copyright is still the winning hand, and that is likely to continue, particularly while IP holders are the primary authors of law.
---
This whole situation is missing the forest for the trees. Intellectual Property is bullshit. A system predicated on monopoly power can only result in consolidated wealth driving the consolidation of power; which is precisely what has happened. The words "starving artist" ring every bit as familiar today as any time in history. Copyright has utterly failed the very goals it was explicitly written with.
It isn't the GPL that needs changing. So long as a system of copyright rules the land, copyleft is the best way to participate. What we really need is a cohesive political movement against monopoly power; one that isn't conveniently ignorant of copyright as its most significant source.
> Blanchard's account is that he never looked at the existing source code directly. He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch
This feels sort of like saying "I just blindly threw paint at that canvas on the wall and it came out in the shape of Mickey Mouse, and so it can't be copyright infringement because it was created without the use of my knowledge of Micky Mouse"
Blanchard is, of course, familiar with the source code, he's been its maintainer for years. The premise is that he prompted Claude to reimplement it, without using his own knowledge of it to direct or steer.
> Blanchard is, of course, familiar with the source code, he's been its maintainer for years.
I would argue it's irrelevant if they looked or didn't look at the code. As well as weather he was or wasn't familiar with it.
What matters is, that they feed to original code into a tool which they setup to make a copy of it. How that tool works doesn't really matter. Neither does it make a difference if you obfuscate that it's an copy.
If I blindfold myself when making copies of books with a book scanner + printer I'm still engaging in copyright infringement.
If AI is a tool, that should hold.
If it isn't "just" a tool, then it did engage in copyright infringement (as it created the new output side by side with the original) in the same way an employee might do so on command of their boss. Which still makes the boss/company liable for copyright infringement and in general just because you weren't the one who created an infringing product doesn't mean you aren't more or less as liable of distributing it, as if you had done so.
>that they feed to original code into a tool which they setup to make a copy of it
Well, no. They fed the spec (test cases, etc) into a tool which made a new program matching the spec. This is not a copy of the original code.
But also this feels like arguing over the color of the iceberg while the titanic sinks. If you have a tool that can make code to spec, what is the value in source code anymore? Even if your app is closed-source, you can just tell claude to write new code that does the same thing.
Everyone writes as if he just fed the spec and tests to Claude Code. Ignoring for now that the tests are under LGPL as well, the commit history shows that this has been done with two weeks of steering Claude Code towards the desired output. At every one of these interactions, the maintainer used his deep knowledge of the chardet codebase to steer Claude.
Blanchard fed the spec to the tool, and Anthropic fed the code to the tool, so Blanchard didn't do anything wrong, and Anthropic didn't do anything wrong. Nothing to see here.
Copyright protects even very abstract aspects of human creative expression, not just the specific form in which it is originally expressed. If you translate a book into another language, or turn it into a silent movie, none of the actual text may survive, but the story itself remains covered by the original copyright.
So when you clone the behavior of a program like chardet without referencing the original source code except by executing it to make sure your clone produces exactly the same output, you may still be infringing its copyright if that output reflects creative choices made in the design of chardet that aren't fully determined by the functional purpose of the program.
What does derivative mean here? Because IMO it means that the existing work was used as input. So if you used a LLM and it was trained on the existing work, that's a derivative work. If you rot13 encode something as input, so you can't personally read it, and then a device decides to rot13 on it again and output it, that's a derivative work.
In order for it to be creatively derivative you would need to copy the structure, logic, organization, and sequence of operations not just reimplement the functionality. It is pretty clear in this case that wasn't done.
As a cynical person I assume all the frontier LLMs were trained on datasets that include every open source project, but as a thought experiment, if an LLM was trained on a dataset that included every open source project _execept_ chardet, do you think said LLM would still be able to easily implement something very similar?
Of course, the problem with this interpretation is that all modern LLMs are derivatives from huge amounts of text under completely different licenses, including "All rights reserved", and therefore can not be used for any purpose.
I'm not sure how you square the circle of "it's alright to use the LLM to write code, unless the code is a rewrite of an open source project to change its license".
> Of course, the problem with this interpretation is that all modern LLMs are derivatives from huge amounts of text under completely different licenses, including "All rights reserved", and therefore can not be used for any purpose.
> I'm not sure how you square the circle of "it's alright to use the LLM to write code
You seem like you're on the cusp of stating the obvious correct conclusion: it isn't.
LLMs do not encode nor encrypt their training data. The fact they can recite training data is a defect not a default. You can understand this more simply by calculating the model size as an inverse of a fantasy compression algorithm that is 50% better than SOTA. You'll find you'd still be missing 80-90% of the training data even if it were as much of a stochastic parrot as you may be implying. The outputs of AI are not derivative just because they saw training data including the original library.
Then onto prompting: 'He fed only the API and (his) test suite to Claude'
This is Google v Oracle all over again - are APIs copyrightable?
> This is Google v Oracle all over again - are APIs copyrightable?
Yes this is the best way to ask the question. If I take a public facing API and reimplement everything, whether it's by human or machine, it should be sufficient. After all, that's what Google did, and it's not like their engineers never read a single line of the Java source code. Even in "clean room" implementations, a human might still have remembered or recalled a previous implementation of some function they had encountered before.
I find the "compression" argument not very strong, both because copyright still applies to (very) lossy codecs (e.g. your 16kbps Opus file of Thriller infringes, even if the original 192khz/32bit wav file was 12,000kbps), and because copyright still applies to transformed derivative works (a tiny midi file of Thriller might still be enough for the Jackson's label to get you)
If you pirate a movie and reencode it, does that apply as well? You can still watch the movie and it is “obviously” the same movie, even though the bytes are completely different. Here you can use the program and it is, to the user, also the same.
So, let's say that rather than actually touching any copyrighted material, a human merely tells an AI about how to go onto the internet and find copyrighted material, download it, and ingest it for training. The AI, fully autonomously, does so, and after training itself on the material deletes it so no human ever downloads, consumes, or shares it.
If we are saying AI is "more than a tool", which seems to be the case courts are leaning since they've ruled AI output without direct human involvement is not copyrightable[0], then the above seems like it would be entirely legal.
Someone would likely get prosecuted if they instructed AI agent to run say a pump and dump scheme...
Even if the final output doesn't have copyright protection it might still be copyright violation. I think it could be reasonable to have work that itself violates copyright when distributed even if it does not have copy right itself.
I just don't see how it's relevant whether he did look or didn't. In my opinion, it's not just legally valid to make a re-implementation of something if you've seen the code as long as it doesn't copy expressive elements. I think it's also ethically fine as well to use source code as a reference for re-implementing something as long as it doesn't turn into an exact translation.
Right. The alternative is that we reward Dan for his 14 years of volunteer maintenance of a project... by banning him from working on anything similar under a different license for the rest of his life.
It's actually not legally fine, or at least it's extremely dangerous. Projects that re-implement APIs presented by extremely litigious companies specifically do not allow people who, for instance, have seen the proprietary source code to then work on the project.
I don't think fear or legal action makes it illegal.
If I know it is legal to make a turn at a red light. And I know a court will uphold that I was in the right but a police officer will fine me regardless and I would need to go to actually pursue some legal remedy I'm unlikely to do it regardless of whether it is legal because it is expensive, if not in money but time.
In the case of copyright lawsuits they are notoriously expensive and long so even if a court would eventually deem it fine, why take the chance.
That's my point. It's dangerous and there are sharks in the water. That sounds like you're not going to have a good time if you do the described approach to someone who might assert you're infringing.
Ignoring the legal or ethical concerns. Let’s say we live in a world where the cost of copying code is so close to zero that it’s indistinguishable from a world without copyright.
Anything you put out can and will be used by whatever giant company wants to use it with no attribution whatsoever.
Doesn’t that massively reduce the incentive to release the source of anything ever?
No, because (most) people don't work on OSS for vanity, they do it to help other people, whether it's individuals or groups of individuals, ie corporations.
It's the same question as, if an AI can generate "art", or photographers can capture a scene better than any (realistic) painter, then will people still create art? Obviously yes, and we see it of course after Stable Diffusion was released three years ago, people are still creating.
I don’t know what a world without copyright does to corporate sponsored open source. It certainly reduces it because there are many corporate sponsored projects that monetize through dual licensing. My guess is in a world where you can’t even guarantee attribution, it’s much harder to convince your boss to let you open source a project in the first place.
So ignoring people who are being paid by corporations directly to work on open source, in my experience the vast majority of contributors expect to be able to monetize their work eventually in a way that requires attribution. And out of the small number who don’t expect a monetary return of any kind, a still smaller number don’t expect recognition.
If this weren’t the case you’d see a much larger amount of anonymous contributions. There are people who anonymously donate to charity. The vast majority want some kind of recognition.
Obviously we still see art, if you greatly reduce the monetary benefit to producing art, you’ll see a lot less of it. This is especially true of non trivial open source software that unlike static artwork requires continual maintenance.
Yes, and it reduces the incentives to release binaries too. Such a world will be populated by almost entirely SaaS, which can still compete on freedom.
>This feels sort of like saying "I just blindly threw paint at that canvas on the wall and it came out in the shape of Mickey Mouse, and so it can't be copyright infringement because it was created without the use of my knowledge of Micky Mouse"
IANAL, but that analogy wouldn't work because Mickey Mouse is a trademark, so it doesn't matter how it is created.
Oracle had it's day in court with Google over the Java APIs. Reimplementing APIs can be done without copyright infringement, but Oracle must have tried to find real infringement during discovery.
In this case, we could theoretically prove that the new chardet is a clean reimplementation. Blanchard can provide all of the prompts necessary to re-implement again, and for the cost of the tokens anyone can reproduce the results.
If you only stick to the API and ignore the implementation, it is not Mickey Mouse any more but a rodent. If it was just a clone it wouldn't be 50x as fast. Nevertheless, APIs apparently can be copyrightable. I generally disagree with this; it's how PC compatibles took off, giving consumers better options.
The article is poorly written. Blanchard was a chardet maintainer for years. Of course he had looked at it's code!
What he claimed, and what was interesting, was that Claude didn't look at the code, only the API and the test suite. The new implementation is all Claude. And the implementation is different enough to be considered original, completely different structure, design, and hey, a 48x improvement in performance! It's just API-compatible with the original. Which as per the Google Vs oracle 2021 decision is to be considered fair use.
What if we said that generative AI output is simply not copyrightable. Anything an AI spits out would automatically be public domain, except in cases where the output directly infringes the rights of an existing work.
This would make it so relicensing with AI rewrites is essentially impossible unless your goal is to transition the work to be truly public domain.
I think this also helps somewhat with the ethical quandary of these models being trained on public data while contributing nothing of value back to the public, and disincentivize the production of slop for profit.
> No Copyright Protection for AI-Assisted Creations: Thaler v. Perlmutter
> A recent key judicial development on this topic occurred when the U.S. Supreme Court declined to review the case of Thaler v. Perlmutter on March 2, 2026, effectively upholding lower court rulings that AI-generated works lacking human authorship are not eligible for copyright protection under U.S. law
> > A recent key judicial development on this topic occurred when the U.S. Supreme Court declined to review the case of Thaler v. Perlmutter on March 2, 2026, effectively upholding lower court rulings that AI-generated works lacking human authorship are not eligible for copyright protection under U.S. law
This was AI summary? Those words were not in the article.
The courts said Thaler could not have copyright because he refused to list himself as an author.
In the corporate world, we've started using reimplementation as a way to access tooling that security won't authorize.
Sec has a deny by default policy. Eng has a use-more-AI policy. Any code written in-house is accepted by default. You can see where this is going.
We've been using AI to reimplement tooling that security won't approve. The incentives conspired in the worst outcome, yet here we are. If you want a different outcome, you need to create different incentives.
If Blanchard is claiming not to have been substantively involved in the creation of the new implementation of chardet (i.e. "Claude did it"), then the new implementation is machine generated, and in the USA cannot be copyright and thus cannot be licensed.
If he is claiming to have been somehow substantively "enough" involved to make the code copyrightable, then his own familiarity with the previous LGPL implementation makes the new one almost certainly a derivative of the original.
>then his own familiarity with the previous LGPL implementation makes the new one almost certainly a derivative of the original.
The "clean room rewrite" is just an extreme way to have a bulletproof shield against litigation. Not doing it that way doesn't automatically make all new code he writes derivative solely because he saw how the code worked previously.
If the clean room re-write was done entirely by Claude, then the result cannot be copyright in the USA, and thus there is no license at all.
And if he was in fact more involved (which he appears to deny) that it's a bit weak to say that someone with huge familiarity with chardet could choose to reimplement chardet without the result being derivative.
There's a difference between "I've read a LGPL code once, maybe I could do something similar" and "I've been reading this LGPL code for 12 years and now I'm going to do exactly the same thing".
This is only worth arguing about because software has value. Putting this in context of a world where the cost of writing code is trending to 0, there are two obvious futures:
1. The cost continues to trend to 0, and _all_ software loses value and becomes immediately replaceable. In this world, proprietary, copyleft and permissive licenses do not matter, as I can simply have my AI reimplement whatever I want and not distribute it at all.
2. The coding cost reduction is all some temporary mirage, to be ended soon by drying VC money/rising inference costs, regulatory barriers, etc. In that world we should be reimplementing everything we can as copyleft while the inferencing is good.
There’s an other option. The cost of copying existing software trends to 0, but the cost of writing new software stays far enough above 0 that it is still relatively expensive.
There was a recent ruling that LLM output is inherently public domain (presumably unless it infringes some existing copyright). In which case it's not possible to use them to "reimplement everything we can as copyleft".
it's more complicated, the ruling was that AI can't be an author and the thing in question is (de-facto) public domain because it has no author in context of the "dev" claim it was fully build by AI
but AI assisted code has an author and claiming it's AI assisted even if it is fully AI build is trivial (if you don't make it public that you didn't do anything)
also some countries have laws which treat it like a tool in the sense that the one who used it is the author by default AFIK
There will always be cost though. Even if perfect code is getting one-shotted out, that is constantly maintained and adapted to changing conditions and technology, it simply can't stay at 0 forever because one day the power is surely going to go out!
More and more I am drawn to these kinds of ideas lately, perhaps as a kind of ethical sidestep, but still:
It's not going to solve any general issue here, but the one thing these freaks need that can't be generated by their models is energy, tons of it. So, the one thing I can do as an individual and in my (digital) community is work to be, in a word, self-sustainable. And depending on my company I guess, if I was a CEO I would hope I was wise enough to be thinking on the same lines.
Everyone is making beautiful mountains from paper and wire. I will just be happy to make a small dollhouse of stone, I think it will be worth it. How can we see not just at least some small-level of hubris otherwise?
The article is proceeding from the premise that a reimplementation is legal (but evil). To help my understanding of your comment, do you mean:
1. An LLM recreating a piece of software violates its copyright and is illegal, in which case LLM output can never be legally used because someone somewhere probably has a copyright on some portion of any software that an LLM could write.
2. You read my example as "copying a project without distributing it", vs. "having an LLM write the same functionality just for me"
Surprised they don't mention Google LLC v. Oracle America, Inc. Seems a bit myopic to condone the general legality while arguing "you can only use it how I like it".
It also doesn't talk about the far more interesting philosophical queston. Does what Blanchard did cover ALL implementations from Claude? What if anyone did exactly what he did, feed it the test cases and say "re-implement from scratch", ostensibly one would expect the results to be largely similar (technically under the right conditions deterministically similar)
could you then fork the project under your own name and a commercial license? when you use an LLM like this, to basically do what anyone else could ask it to do how do you attach any license to it? Is it first come first serve?
If an agent is acting mostly on its own it feels like if you found a copy of Harry Potter in the fictional library of Babel, you didn't write it, just found it amongst the infinite library, but if you found it first could you block everyone else that stumbles on a near-identical copy elsewhere in the library? or does each found copy represent a "Re-implementation" that could be individually copyrighted?
It seems that this chap didn't go and implement a new library, he reimplemented an existing one and became sole-controller of it. i.e. he seems to have taken its reputation, brand whatever you call it away from the contributors and entirely to himself. Their work of establishing it as a well known solution is no longer recognised.
So of course we feel that something wrong has happened even if it's not easy to put one's finger on it.
It should be noted that the Rust community is also guilty of something similar. That is, porting old GPL programs, typically written in C, to Rust and relicensing them as MIT.
Well, the license change sounds pretty strange, but to be honest if I were to use this software I would use it without adhering to the MIT. It's machine-created content which is not, in general, copyrightable. You can assert whatever license you want on such content, but I am not going to adhere to it. For example, I declare you may use the following under the Elastic License
without discussing copyright, I don't believe any of this is copied. Which I think should be the argument that actually matters.
I downloaded both 6.0 and 7.0 and based on only a light comparison of a few key files, nothing would suggest to me that 7.0 was copied from 6.0, especially for a 41x faster implementation.
It is a lot more organized and readable in my armature opinion, and the code is about 1/10th the size.
There's a Japanese version of that page, written in classical text writing direction, in columns. Which is cool. Makes me wonder, though - how readable is it with so many English loanwords which should be rotated sideways to fit into columns?
Total digression but yeah, that layout is stupid and the way those words are dropped in using Romaji makes no sense. That's not how Japanese people lay out pages on the web. In fact I don't think I've ever seen a Japanese web page laid out like a book like this, and in general I'd expect the English proper nouns and words that don't have obvious translations to get transliterated into Katakana. Smells like automatic conversion added by someone not really familiar with common practices for presenting Japanese on the web.
He also has a Korean vertical layout that lays out Latin-character words the same way. Is this common in Korea when vertical layout is used? The author seems to be Korean.
You can't put a copyright and MIT license on something you generated with AI. It is derived from the work of many unknown, uncredited authors.
Think about it; the license says that copies of the work must be reproduced with the copyright notice and licensing clauses intact. Why would anyone obey that, knowing it came from AI?
Countless instances of such licenses were ignored in the training data.
When learning is sufficiently atomized and recombined, creations cease to be "derived from" in a legal sense.
A lego sculpture is copyrighted. Lego blocks are not. The threshold between blocks and sculpture is not well-defined, but if an AI isn't prompted specifically to attempt to mimic an existing work, its output will be safely on the non-copyrighted side of things.
A derivative work is separately copyrightable, but redistribution needs permission from the original author too. Since that usually won't be granted or would be uneconomical, the derivative work can't usually be redistributed.
AI-produced material is inherently not copyrightable, but not because it's a derivative work.
Token prediction is a form of "learning" that is reinforced by the goal of reproducing the correct next token of the work, rather that acquiring ideas and concepts. For instance, given the prefix "Four score and seven years", the weights are adjusted until "ago" is correctly predicted, which is a fancy way of saying that it was stored in the model in a lossy way. The model "learned" that "ago" follows "four score and seven years" exactly the way your hard drive "learns" the audio and video frames of a movie when you download a .mp4 file.
I dispute the idea that token sequences reproduced from the model are not derived works.
I predict, no pun intended, that a time is coming when the idea that it's not a derived work will be challenged in mainstream law.
The slop merchants are getting a free ride for the time being.
This article is setting up a bit of a moving target. Legal vs legitimate is at least only a single vague question to be defined but then the target changes to “socially legitimate” defined only indirectly by way of example, like aggressive tax avoidance as “antisocial”— and while I tend to agree with that characterization my agreement is predicated on a layering of other principals.
The fundamental problem is that once you take something outside the realm of law and rule of law in its many facets as the legitimizing principal, you have to go a whole lot further to be coherent and consistent.
You can’t just leave things floating in a few ambiguous things you don’t like and feel “off” to you in some way- not if you’re trying to bring some clarity to your own thoughts, much less others. You don’t have to land on a conclusion either. By all means chew over things, but once you try to settle, things fall apart if you haven’t done the harder work of replacing the framework of law with that of another conceptual structure.
You need to at least be asking “to what ends? What purpose is served by the rule?” Otherwise you’re stuck in things where half the time you end up arguing backwards in ways that put purpose serving rules, the maintenance of the rule with justifications ever further afield pulled in when the rule is questioned and edge cases reached. If you’re asking, essentially, “is the spirit of the rule still there?” You’ve got to stop and fill in what that spirit is or you or people that want to control you or have an agenda will sweep in with their own language and fill the void to their own ends.
> If source code can now be generated from a specification, the specification is where the essential intellectual content of a GPL project resides. Blanchard's own claim—that he worked only from the test suite and API without reading the source—is, paradoxically, an argument for protecting that test suite and API specification under copyleft terms.
This is an interesting reversal in itself. If you make the specification protected under copyright, then the whole practice of clean room implementations is invalid.
Broadly speaking, the “freedom of users” is often protected by competition from competing alternatives. The GNU command line tools were replacements for system utilities. Linux was was a replacement for other Unix kernels. People chose to install them instead of proprietary alternatives. Was it due to ideology or lower cost or more features? All of the above. Different users have different motivations.
Copyleft could be seen as an attempt to give Free Software an edge in this competition for users, to counter the increased resources that proprietary systems can often draw on. I think success has been mixed. Sure, Linux won on the server. Open source won for libraries downloaded by language-specific package managers. But there’s a long tail of GPL apps that are not really all that appealing, compared to all the proprietary apps available from app stores.
But if reimplementing software is easy, there’s just going to be a lot more competition from both proprietary and open source software. Software that you can download for free that has better features and is more user-friendly is going to have an advantage.
With coding agents, it’s likely that you’ll be able to modify apps to your own needs more easily, too. Perhaps plugin systems and an AI that can write plugins for you will become the norm?
"Antirez closes his careful legal analysis as though it settles the matter. Ronacher acknowledges that “there is an obvious moral question here, but that isn't necessarily what I'm interested in.” Both pieces treat legal permissibility as a proxy for social legitimacy. "
This whole article is just complaining that other people didn't have the discussion he wanted.
Ronacher even acknowledged that it's a different discussion, and not one they were trying to have at the moment.
If you want to have it, have it. Don't blast others for not having it for you.
Having this discussion involves blasting others for not considering it. Consider the rest of the paragraph you quoted:
> But law only says what conduct it will not prevent—it does not certify that conduct as right. Aggressive tax minimization that never crosses into illegality may still be widely regarded as antisocial. A pharmaceutical company that legally acquires a patent on a long-generic drug and raises the price a hundredfold has not done something legal and therefore fine. Legality is a necessary condition; it is not a sufficient one.
It's clear that we're entering a new era of copyright _expectations_ (whether we get new _legislation_ is different), but for now realise this: the people like me who like copyleft can do this too. We can take software we like, point an agent at it, and tell it to make a new version with the AGPL3.0-or-later badge on the front.
no, it isn't. The point of the GPL is to grant users of the software four basic freedoms (run, study, modify and redistribute). There's no restriction to distribution per se, other than disallowing the removal of these freedoms to other users.
IMHO, the API and Test Suite, particularly the latter, define the contract of the functional definition of the software. It almost doesn't matter what that definition looks like so long as it conforms to the contract.
There was an issue where Google did something similar with the JVM, and ultimately it came down to whether or not Oracle owned the copyright to the header files containing the API. It went all the way to the US supreme court, and they ruled in Google's favour; finding that the API wasn't the implementation, and that the amount of shared code was so minimal as to be irrelevant.
They didn't anticipate that in less than half a decade we'd have technology that could _rapidly_ reimplement software given a strong functional definition and contract enforcing test suite.
Why are people even having problems with sharing their changes to begin with? Just publishing it somewhere does not seem too expensive. The risk of accidentally including stuff that is not supposed to become public? Or are people regularly completely changing codebases and do not want to make the effort freely available, maybe especially to competitors? I would have assumed that the common case is adding a missing feature here, tweaking something there, if you turn the entire thing on its head, why not have your own alternative solution from scratch?
i've been following this for a while.. and the trend for copyright (of any form - books code pictures music whatever) being laundered by reinventing the "same" thing in-some-way.. is kind-of clear.
But what happens with the new things? Has the era of software-making (or creating things at large) finished, and from now on everything will be re-(gurgitated|implemented|polished) old stuff?
Or all goes back to proprietary everything.. Babylon-tower style, noone talks to noone?
edit: another view - is open-source from now on only for resume-building? "see-what-i've-built" style
Not a lawyer, but my understanding is: In theory, copyright only protects the creative expression of source code; this is the point of the "clean room" dance, that you're keeping only the functional behavior (not protected by copyright). Patents are, of course, an entirely different can of worms. So using an LLM to strip all of the "creative expression" out of source code but create the same functionality feels like it could be equivalent enough.
I like the article's point of legal vs. legitimate here, though; copyright is actually something of a strange animal to use to protect source code, it was just the most convenient pre-existing framework to shove it in.
which is the actual relevant part: they didn't do that dance AFIK
AI is a tool, they set it up to make a non-verbatim copy of a program.
Then they feed it the original software (AFIK).
Which makes it a side by side copy, as in the original source was used as reference to create the new program. Which tend to be seen as derived work even if very different.
IMHO They would have to:
1. create a specification of the software _without looking at the source code_, i.e. by behavior observation (and an interface description). I.e. you give the AI access to running the program, but not to looking into the insides of it. I really don't think they did it as even with AI it's a huge pain as you normally can't just brute force all combinations of inputs and instead need to have a scientific model=>test=>refine loop (which AI can do, but can take long and get stuck, so you want it human assisted, and the human can't have inside knowledge about the program).
2. then generate a new program from specification, And only from it. No git history, no original source code access, no program access, no shared AI state or anything like that.
Also for the extra mile of legal risk avoidance do both human assisted and use unrelated 3rd parties without inside knowledge for both steps.
While this does majorly cut cost of a clean room approach, it still isn't cost free. And still is a legal mine field if done by a single person, especially if they have enough familiarity to potentially remember specific peaces of code verbatim.
My understanding is they did do the dance. From the article: "He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch."
One could still make the argument that using the test suite was a critical contributing factor, but it is not a part of the resulting library. So in my uninformed opinion, it seems to me like the clean room argument does apply.
Well sure they didn't do the dance, but you don't have to do the dance. The reason to do it is that it's a good defense in a lawsuit. Like you say, all of this is a legal minefield.
So my understanding was that the original code was specifically not fed into Claude. But was almost certainly part of its training data, which complicates things, but if that's fair use then it's not relevant? If training's not fair use and taints the output, then new-chardet is a derivative of a lot of things, not just old-chardet...
This is all new legal ground. I'm not sure if anyone will go to court over chardet, though, but something that's an actual money-maker or an FSF flagship project like readline, on the other hand, well that's a lot more likely.
> When GNU reimplemented the UNIX userspace, the vector ran from proprietary to free. Stallman was using the limits of copyright law to turn proprietary software into free software. […] The vector in the chardet case runs the other way.
That’s just your subjective opinion which many other people would disagree. I bet Armin Ronacher would agree that an MIT licensed library is even freer than an LGPL licensed library. To them, the vector is running from free to freer.
If you decide to improve it in any way to fit your needs you can merely tell your own AI to re-implement it with your changes. Then it's proprietary to you.
I feel like the licenses that suffer the most isn't the GPL, but the ones like SSPL. If your code can be re-implemented easily and legally by AWS using an LLM, why risk publishing it?
It does feel like open source is about to change. My hunch is that commercial open source (beyond the consultation model) risks disappearing. Though I'd be happy to be proven wrong.
See also "A Declaration of the Independence of Cyberspace" (https://www.eff.org/cyberspace-independence), and what a goofy, naive, misguided disaster that early internet optimism turned into.
No, AI does not mean the end of either copyright or copyleft, it means that the laws need to catch up. And they should, and they will.
I don't think this part is correct: "If you distribute modified code, or offer it as a networked service, you must make the source available under the same terms."
One of the things that irks me about this whole thing is, if it’s so clean room and distinct, why make the changes to the existing project? Why not make an entirely new library?
The answer to that, I think, is that the authors wanted to squat an existing successful project and gain a platform from it. Hence we have news cycle discussing it.
Nobody cares about a new library using AI, but squash an existing one with this stuff, and you get attention. It’s the reputation, the GitHub stars, whatever
I mean, Blanchard was the longtime maintainer of chardet already, and had wanted to relicense it for years. So I think that complicates your picture of "squatting an existing successful project".
Honestly it's a weird test case for this sort of thing. I don't think you'd see an equivalent in most open source projects.
Imagine if the author has his way, and when we have AI write software, it becomes legally under the license of some other sufficiently similar piece of software. Which may or may not be proprietary. "I see you have generated a todo app very similar to Todoist. So they now own it." That does not seem like a good path either for open source software or for opening up the benefits of AI generated software.
That's a non-sequitur. chardet v7 is GPL-derived work (currently in clear violation of the GPL). If xe wanted it to be a different thing xe should've published as such. Simple as.
What if someone doesn't declare that it has been reimplemented using an LLM? Isn't it enough to simply declare that you have reimplemented the software without using an LLM? Good luck proving that in court...
One thing is certain, however: copyleft licenses will disappear: If I can't control the redistribution of my code (through a GPL or similar license), I choose to develop it in closed source.
If the model wasn't trained on copyleft, if he didn't use a copyleft test suite and if he wasn't the maintainer for years. Clearly the intent here is copyright infringement.
If you have software your testsuite should be your testsuite, you do dev with a testsuite and then mit without releasing one. Depending on the test-suite it may break clean room rules, especially for ttd codebases.
I think what is happening is the collapse of the “greater good”. Open source is dependent upon providing information for the greater good and general benefit of its readers. However now that no one is reading anything, its purpose is for the great good of the most clever or most convincing or richest harvester.
> Ronacher notes this as an irony and moves on. But the irony cuts deeper than he lets on. Next.js is MIT licensed. Cloudflare's vinext did not violate any license—it did exactly what Ronacher calls a contribution to the culture of openness, applied to a permissively licensed codebase. Vercel's reaction had nothing to do with license infringement; it was purely competitive and territorial. The implicit position is: reimplementing GPL software as MIT is a victory for sharing, but having our own MIT software reimplemented by a competitor is cause for outrage. This is what the claim that permissive licensing is “more share-friendly” than copyleft looks like in practice. The spirit of sharing, it turns out, runs in one direction only: outward from oneself.
This argument makes no sense. Are they arguing that because Vercel, specifically, had this attitude, this is an attitude necessitated by AI, reimplementation, and those who are in favor of it towards more permissive licenses? That certainly doesn't seem to be an accurate way to summarize what antirez or Ronacher believe. In fact, under the legal and ethical frameworks (respectively) that those two put forward, Vercel has no right to claim that position and no way to enforce it, so it seems very strange to me to even assert that this sort of thing would be the practical result of AI reimplementations. This seems to just be pointing towards the hypocrisy of one particular company, and assuming that this would be the inevitable universal, attitude, and result when there's no evidence to think so.
It's ironic, because antirez actually literally addresses this specific argument. They completely miss the fact that a lot of his blog post is not actually just about legal but also about ethical matters. Specifically, the idea he puts forward is that yes, corporations can do these kinds of rewrites now, but they always had the resources and manpower to do so anyway. What's different now is that individuals can do this kind of rewrites when they never have the ability to do so before, and the vector of such a rewrite can be from a permissive to copyleft or even from decompile the proprietary to permissive or copyleft. The fact that it hasn't been so far is a more a factor of the fact that most people really hate copyleft and find an annoying and it's been losing traction and developer mind share for decades, not that this tactic can't be used that way. I think that's actually one of the big points he's trying to make with his GNU comparison — not just that if it was legal for GNU to do it, then it's legal for you to do with AI, and not even just the fundamental libertarian ethical axiom (that I agree with for the most part) that it should remain legal to do such a rewrite in either direction because in terms of the fundamental axioms that we enforce with violence in our society, there should be a level playing field where we look at the action itself and not just whether we like or dislike the consequences, but specifically the fact that if GNU did it once with the ability to rewrite things, it can be done again, even in the same direction, it now even more easily using AI.
> They completely miss the fact that a lot of his blog post is not actually just about legal but also about ethical matters.
Honestly I was confused about the summarization of my blog post into just a legal matter as well. I hope my blog post will be able to flash at least a short time in the HN front page so that the actual arguments it contain will get a bit more exposure.
I'm failing to see what in the quoted text you took to be about AI rewrites specifically? It just reads as a slightly catty aside about the social reaction of rewrites in general (by implying the one example is generalizable.)
Actually I think the last 20 years of the Internet demonstrates that copyright is more important than ever, because unless it's enforced, people with more capital than the copyright owner will simply steal creative works and profit from them.
The idea that "information wants to be free" was always a lie, meant to transfer value from creators to platform owners. The result of that has been disastrous, and it's long past time to push the pendulum in the other direction.
IMO the core idea of copyright isn't nonsense, but I do think the current implementation (70+ years after death) is egregiously overpowered. I've always thought the current laws were too deeply entrenched to ever change, but I'm tentatively optimistic AI will shock the system hard enough to trigger actual reform.
I think AI is very much eroding the legitimacy of copyright - at least to software, which is long been questioned since it's more like math than creative expression.
I think the industry will realize that it made a huge mistake by leaning on copyright for protection rather than on patents.
Probably a wiser approach is to consider different times require different measures (in general!).
I did not study in detail if copyright "has always been nonsense", but I do agree that nowadays some of the copyright regulations are nonsense (for example the very long duration of life + 70 years)
I think we're going one step too far even, AI itself is a gray area and how can they guarantee it was trained legally or if it's even legal what they're doing and how can they assert that the input training data didn't contain any copyrighted data.
Google already spent billions of dollars and decades of lawyer hours proving it out as fair use. The legal challenges we see now are the dying convulsions of an already broken system of publishers and IP hoarders using every resource at their disposal to manipulate authors and creators and the public into thinking that there's any legitimacy or value underlying modern copyright law.
AI will destroy the current paradigm, completely and utterly, and there's nothing they can do to stop it. It's unclear if they can even slow it, and that's a good thing.
We will be forced to legislate a modern, digital oriented copyright system that's fair and compatible with AI. If producing any software becomes a matter of asking a machine to produce it - if things like AI native operating systems come about, where apps and media are generated on demand, with protocols as backbone, and each device is just generating its own scaffolding around the protocols - then nearly none of modern licensing, copyright, software patents, or IP conventions make any sense whatsoever.
You can't have horse and buggy traffic conventions for airplanes. We're moving in to a whole new paradigm, and maybe we can get legislation that actually benefits society and individuals, instead of propping up massive corporations and making lawyers rich.
Google has cut out some very specific ruling that have nothing to do with modern AI. These systems are just a really slow/lossy git clone, current law has no trouble with it, it's broadly illegal.
If corporations are allowed to launder someone else work as their own people will simply stop working and just start endlessly remixing a la popular music.
I agree with the thrust of this article, that norms and what we perceive as good or desirable extend considerably beyond the minimum established by law.
But a point that was not made strongly, which highlights this even more, is that this goes in every direction.
If this kind of reimplementation is legal, then I can take any permissive OSS and rebuild it as proprietary. I can take any proprietary software and rebuild it as permissive. I can take any proprietary software and rebuild it as my own proprietary software.
Either the law needs to catch up and prevent this kind of behavior, or we're going to enter an effectively post-copyright world with respect to software. Which ISN'T GOOD, because that will disincentivize any sort of open license at all, and companies will start protecting/obfuscating their APIs like trade secrets.
Someone should put this to the test. Take the recently leaked Minecraft source code and have Copilot build an exact replica in another programming language and then publish it as open source. See if Microsoft believes AI is copyright infringement or not.
As described, this would not be the same thing. If the AI is looking at the source and effectively porting it, that is likely infringement. The idea instead should be "implement Minecraft from scratch" but with behavior, graphics, etc. identical. Note that you'll need to have an AI generate assets or something since you can't just reuse textures and models.
AI models have already looked at the source of GPL software and contain it in their dataset. Adding the minecraft source to the mix wouldn't seem much different. Of course art assets and trade marks would have to be replaced. But an AI "clean room" implementation has yet to be legally tested.
I’ve often thought that the key to fighting this is through this exact method. Turn the tool against them
The big question is: if copyrighted material was used in the training material, is the LLM's output copyright infringement when it resembles the training material? In your example, you are taking the copyrighted material and giving it to the LLM as input and instructing the LLM to process it. Regardless of where the legal cards fall, this is a much less ambiguous scenario.
You will probably run into design patents.
Wow, it feels like this argument rewired my brain.
When I first read about the chardet situation, I was conflicted but largely sided on the legal permissibility side of things. Uncomfortably I couldn't really fault the vibers; I guess I'm just liberal at heart.
The argument from the commons has really invoked my belief in the inherent morality of a public good. Something being "impermissible" sounds bad until you realize that otherwise the arrow of public knowledge suddenly points backwards.
Seeing this example play out in real life has had retroactive effects on my previously BSD-aligned brain. Even though the argument itself may have been presented before, I now understand the morals that a GPL license text underpins better.
The really interesting question to me is if this transcends copyright and unravels the whole concept of intellectual property. Because all of it is premised on an assumption that creativity is "hard". But LLMs are not just writing software, they are rapidly being engineered to operate completely generally as knowledge creation engines: solving math proofs, designing drugs, etc.
So: once it's not "hard" any more, does IP even make sense at all? Why grant monopoly rights to something that required little to no investment in the first place? Even with vestigial IP law - let's say, patents: it just becomes and input parameter that the AI needs to work around the patents like any other constraints.
> So: once it's not "hard" any more, does IP even make sense at all? Why grant monopoly rights to something that required little to no investment in the first place? Even with vestigial IP law - let's say, patents: it just becomes and input parameter that the AI needs to work around the patents like any other constraints.
I think it still does: IIRC, the current legal situation is AI-output does not qualify for IP protections (at least not without substantial later human modification). IP protections are solely reserved for human work.
And I'm fine with that: if a person put in the work, they should have protections so their stuff can't be ripped off for free by all the wealthy major corporations that find some use for it. Otherwise: who cares about the LLMs.
I think you have a rather idealized model of IP in mind. In practice, IP law tends to be an expensive weapon the wealthy major corporations use against the little guy. Deep enough pockets and a big enough warchest of broad parents will drain the little guy every time.
If you think about creative outcomes as n dimensional 'volumes', AI expressions can cover more than humans in many domains. These are precisely artistic styles, music styles etc. and tbh not everyone can be a Mozart but may be a lot more with AI can be Mozart lite. This begs the question how much of creativity is appreciated as a shared experience
Nothing changes for drug patents regardless of whether an LLM was used in the discovery process.
It might unravel intellectual property, just not in a fair way. When capitalism started, public land was enclosed to create private property. Despite this being in many cases a quite unfair process, we still respect this arrangement.
With AI, a similar process is happening - publicly available information becomes enclosed by the model owners. We will probably get a "vestigial" intellectual property in the form of model ownership, and everyone will pay a rent to use it. In fact, companies might start to gatekeep all the information to only their own LLM flavor, which you will be required to use to get to the information. For example, product documentation and datasheets will be only available by talking to their AI.
Don't worry. The courts have consistently sided with huge companies on copyright. In the US. In Europe. Doesn't matter.
Company incorporates GPL code in their product? Never once have courts decided to uphold copyright. HP did that many times. Microsoft got caught doing it. And yet the GPL was never applied to their products. Every time there was an excuse. An inconsistent excuse.
Schoolkid downloads a movie? 30,000 USD per infraction PLUS armed police officer goes in and enforces removal of any movies.
Or take the very subject here. AI training WAS NOT considered fair use when OpenAI violated copyright to train. Same with Anthropic, Google, Microsoft, ... They incorporated harry potter and the linux kernel in ChatGPT, in the model itself. Undeniable. Literally. So even if you accept that it's changed now, OpenAI should still be forced to redistribute the training set, code, and everything needed to run the model for everything they did up to 2020. Needless to say ... courts refused to apply that.
So just apply "the law", right. Courts' judgement of using AI to "remove GPL"? Approved. Using AI to "make the next Disney-style movie"? SEND IN THE ARMY! Whether one or the other violates the law according to rational people? Whatever excuse to avoid that discussion is good enough.
I think the missing thing here is that the license violation already happened. Most of the big models trained on data in a manner that violated terms of service. We'll need a court case but I think it's extremely reasonable to consider any model trained on GPL code to be infected with open licensing requirements.
You might wish that were true, but there are very strong arguments it's not. Training on copyleft licensed code is not a license violation. Any more than a person reading it is. In copyright terms, it's such an extreme transformative use that copyright no longer applies. It's fair use.
But agreed that we're waiting for a court case to confirm that. Although really, the main questions for any court cases are not going to be around the principle of fair use itself or whether training is transformative enough (it obviously is), but rather on the specifics:
1) Was any copyrighted material acquired legally (not applicable here), and
2) Is the LLM always providing a unique expression (e.g. not regurgitating books or libraries verbatim)
And in this particular case, they confirmed that the new implementation is 98.7% unique.
> Training on copyleft licensed code is not a license violation. Any more than a person reading it is.
Some might hold that we've granted persons certain exemptions, on account of them being persons. We do not have to grant machines the same.
> In copyright terms, it's such an extreme transformative use that copyright no longer applies.
Has the model really performed an extreme transformation if it is able to produce the training data near-verbatim? Sure, it can also produce extremely transformed versions, but is that really relevant if it holds within it enough information for a (near-)verbatim reproduction?
I agree there has to be a court case about it. I think the current argument, however, is that it is transformative, and therefore falls under fair use.
Yea, a finding that training is transformative would be pretty significant and it's likely that the precedent of thumbnail creation being deemed transformative would likely steer us towards such a finding. Transformative is always a hard thing to bank on because it is such a nebulous and judgement based call. There are excellent examples of how precise and gritty this can get in audio sampling.
Didn't know about thumbnails being fair use. In that case, I just don't see an argument that genAI training on source code is less transformative than thumbnails.
From the article:
> He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch.
From GPL2:
> The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable.
Is a project's test suite not considered part of its source code?
If the test suite is part of this library's source code, and Claude was fed the test suite or interface definition files, is the output not considered a work based on the library under the terms of LGPL 2.1?
It's transformative, so no.
Legally, using the tests to help create the reimplementation is fine.
However, it seems possible you can't redistribute the same tests under the MIT license. So the reimplementation MIT distribution could need to be source code only, not source code plus tests. Or, the tests can be distributed in parallel but still under LGPL, not MIT. It doesn't really matter since compiled software won't be including the tests anyways.
> It's transformative, so no.
I'm not following your logic there, and I don't see any mention of "transformative" in the license. Can you explain what you mean?
Google v. Oracle ruled that use of APIs are fair game and could be argued that test cases are strictly a use of APIs and not implementation.
Google vs Oracle ruled that APIs fall under copyright (the contrary was thought before). However, it was ruled that, in that specific case, fair use applied, because of interoperability concerns. That's the important part of this case: fair use is never automatic, it is assessed case by case.
Regarding chardet, I'm not sure "I wanted to circumvent the license" is a good way to argue fair use.
I believe it is a narrow view of the situation. If we take a look into the history, into the reasons for inventing GPL, we'll see that it was an attempt to fight copyrights with copyrights. The very name 'copyleft' is trying to convey the idea.
What AI are eroding is copyright. You can re-implement not just a GPL program, but to reverse engineer and re-implement a closed source program too, people have demonstrated it already, there were stories here on HN about it.
AI is eroding copyright, so there may no longer be a need for the GPL. GNU should stop and rethink its stance, chuck away the GPL as the main tool to fight evil software corporations and embrace LLM as the main weapon.
> LLM as the main weapon
LLM's - to date - seem to require massive capital expenditures to have the highest quality ones, which is a monumental shift in power towards mega corporations and away from the world of open source where you could do innovative work on your own computer running Linux or FreeBSD or some other open OS.
I don't think that's an exciting idea for the Free Software Foundation.
Perhaps with time we'll be able to run local ones that are 'good enough', but we're not there yet.
There's also an ethical/moral question that these things have been trained on millions of hours of people's volunteer work and the benefits of that are going to accrue to the mega corporations.
Edit: I guess the conclusion I come to is that LLM's are good for 'getting things done', but the context in which they are operating is one where the balance of power is heavily tilted towards capital, and open source is perhaps less interesting to participate in if the machines are just going to slurp it up and people don't have to respect the license or even acknowledge your work.
>Perhaps with time we'll be able to run local ones that are 'good enough', but we're not there yet.
Right now, we can get local models that you can run on consumer hardware, that match capabilities of state of the art models from two years ago. The improvements to model architecture may or may not maintain the same pace in the future, but we will get a local equivalent to Opus 4.6 or whatever other benchmark of "good enough" you have, in the foreseeable future.
> LLM's - to date - seem to require massive capital expenditures to have the highest quality ones, which is a monumental shift in power towards mega corporations and away from the world of open source
Yeah, a bit of a conundrum. But I don't think that fighting for copyright now can bring any benefits for FOSS. GNU should bring Stallman back and see whether he can come with any new ideas and a new strategy. Alternatively they could try without Stallman. But the point is: they should stop and think again. Maybe they will find a way forward, maybe they won't but it means that either they could continue their fight for a freedom meaningfully, or they could just stop fighting and find some other things to do. Both options are better then fighting for copyright.
> There's also an ethical/moral question that these things have been trained on millions of hours of people's volunteer work and the benefits of that are going to accrue to the mega corporations.
I want a clarify this statement a bit. The thing with LLM relying on work of others are not against GPU philosophy as I understand it: algorithms have to be free. Nothing wrong with training LLMs on them or on programs implementing them. Nothing wrong with using these LLMs to write new (free) programs. What is wrong are corporations reaping all the benefits now and locking down new algorithms later.
I think it is important, because copyright is deemed to be an ethical thing by many (I think for most people it is just a deduction: abiding the law is ethical, therefore copyright is ethical), but not for GNU.
>Yeah, a bit of a conundrum.
IMO the primary significant trend in AI. Doesn't get talked about nearly enough. Means the AI is working, I guess.
>GNU should bring Stallman back ... Alternatively they could try without Stallman.
Leave Britney alone >:(
>copyright is deemed to be an ethical thing by many (I think for most people it is just a deduction: abiding the law is ethical, therefore copyright is ethical)
I've busted out "intellectual property is a crime against humanity" at layfolk to see if that shortcuts through that entire little politico-philosophical minefield. They emote the requisite mild shock when such things as crimes against humanity are mentioned; as well as at someone making such a radical statement which seems to come from no familiar species of echo chamber; and then a moment later they begin to very much look like they see where I'm coming from.
How do you even argue such a thing? I've had no such luck, I've met many people who seem to view copyright and a person owning their ideas and work as a sort of inherent moral.
Not saying this gets through to people, but copyright is purely about the legal ability to restrict what other people do. Whereas property rights are about not allowing others to restrict what you do (e.g. by taking your stuff).
> LLM's - to date - seem to require massive capital expenditures to have the highest quality ones, which is a monumental shift in power towards mega corporations and away from the world of open source where you could do innovative work on your own computer running Linux or FreeBSD or some other open OS.
When the FSF and GPL were created, I don't think this was really a consideration. They were perfectly happy with requiring Big Iron Unix or an esoteric Lisp Machine to use the software - they just wanted to have the ability to customize and distribute fixes and enhancements to it.
> LLM's - to date - seem to require massive capital expenditures to have the highest quality ones
There are near-SOTA LLM's available under permissive licenses. Even running them doesn't require prohibitive expenses on hardware unless you insist on realtime use.
> running them doesn't require prohibitive expenses on hardware
What async tasks could a local LLM accomplish on Intel 11th gen CPU with 32GB RAM?
Maybe a good open source idea is to "seti at home" style crowd-source training, assuming that's possible.
How close are we to good enough and who's working on that? I would be interested in supporting that work; to my mind, many of the real objections to LLMs are diminished if we can make them small and cheap enough to run in the home (and, perhaps, trained with distributed shared resources, although the training problem is the harder one).
Is massive capital expenditure not also required to enforce the GPL? If some company steals your GPLed code and doesn't follow the license, you will have to sue them and somebody will have to pay the lawyers.
> Is massive capital expenditure not also required to enforce the GPL?
It's nowhere near the order of magnitude of the kind of spending they're sinking into LLM's. The FSF and other groups were reasonably successful at enforcing the GPL, operating on a budget 1000's of times smaller than that of AI companies.
Right but LLM companies are building frontier models with frontier talent while trying to sock up demand with a loss leader strategy, on top of an historic infrastructure build out.
Being able to coat efficiently run frontier models is i think, not a high priced endeavor for an org (compared to an individual).
IMO the proposition is little fishy, but its not totally without merit and imo deserves investigation. If we are all worried about our jobs, even via building custom for sale software, there is likely something there that may obviate the need at least for end user applications. Again, im deeply skeptical, but it is interesting.
> Being able to coat efficiently run frontier models is i think, not a high priced endeavor for an org
Running proprietary model would make you subject to whatever ToS the LLM companies choose on a particular day, and what you can produce with them, which circles back to the raison d'etre for the GPL and GNU.
Until all software copyright is dead and buried, there is no need for copyleft to change tack. Otherwise there rising tide may rise high enough to drown GPL, but not proprietary software.
Open source is easier to counterfeit/license-launder/re-implement using LLMs because source code is much lower-hanging fruit, and is understood by more people than closed-source assembly.
> There's also an ethical/moral question that these things have been trained on millions of hours of people's volunteer work and the benefits of that are going to accrue to the mega corporations.
This was already the case and it just got worse, not better.
At a certain point, I think we had reached a kind of equilibrium where some corporations were decent open source citizens. They understood that they could open source things like infrastructure or libraries and keep their 'crown jewels' closed. And while Stallman types might not have been happy with that, it seemed to work out for people.
Now they've just hoovered up all the free stuff into machines that can mix it up enough to spit it out in a way that doesn't even require attribution, and you have to pay to use their machine.
AI essentially gatekeeps all of open source to companies to pluck from to their hearts content. And individual contributors using these tools and freely mixing it with their own - usual minor - contributions are another step of whitewashing because they're definitely not going to own up to writing only 5% of the stuff they got paid for.
Before we had RedHat and Ubuntu, who at least were contributing back, now we have Microsoft, Anthropic and OpenAI who are racing to lock the barn door around their new captive sheep. It's just a massive IP laundromat.
Copyleft is a mirror of copyright, not a way to fight copyright. It grants rights to the consumer where copyright grants rights to the creator. Importantly, it gives the end-user the right to modify the software running on their devices.
Unfortunately, there are cases where you simply can't just "re-implement" something. E.g., because doing so requires access to restricted tools, keys, or proprietary specifications.
These are words of Stallman:
"So, I looked for a way to stop that from happening. The method I came up with is called “copyleft.” It's called copyleft because it's sort of like taking copyright and flipping it over. [Laughter] Legally, copyleft works based on copyright. We use the existing copyright law, but we use it to achieve a very different goal."
https://writings.hongminhee.org/2026/03/legal-vs-legitimate/
> flipping it over.
i.e. mirroring it
> use it to achieve a very different goal."
"very different goal" isn't the same as "fundamentally destroying copyright"
the very different goal include to protect public code to stay public, be properly attributed, prevent companies from just "sizing" , motivate other to make their code public too etc.
and even if his goals where not like that, it wouldn't make a difference as this is what many people try to archive with using such licenses
this kind of AI usage is very much not in line with this goals,
and in general way cheaper to do software cloning isn't sufficient to fix many of the issues the FOSS movement tried to fix, especially not when looking at the current ecosystem most people are interacting with (i.e. Phones)
---
("sizing"): As in the typical MS embrace, extend and extinguish strategy of first embracing the code then giving it proprietary but available extensions/changes/bug fixes/security patches to then make them no longer available if you don't pay them/play by their rules.
---
Through in the end using AI as a "fancy complicated" photocopier for code is as much removing copyright as using a photocopier for code would. It doesn't matter if you use the photocopier blind folded and never looked at the thing you copied.
> We use the existing copyright law, but we use it to achieve a very different goal.
For the right goal, he should have called it "rightcopy".
That’s not a rebuttal of the OP’s point. None of that says anything about fighting copyright. It literally says he flipped it which is wha the OP said when they said it’s a mirror.
> It grants rights to the consumer where copyright grants rights to the creator.
It also grants one major right/feature to the creator, the ability to spread their work while keeping it as open as they intend.
> chuck away the GPL as the main tool to fight evil software corporations and embrace LLM as the main weapon.
LLMs are one of the primary manifestations of 'evil software corporations' currently.
> AI is eroding copyright, so there may no longer be a need for the GPL. GNU should stop and rethink its stance, chuck away the GPL as the main tool to fight evil software corporations and embrace LLM as the main weapon.
Is this LLM thing freely available or is it owned and controlled by these companies? Are we going to rent the tools to fight "evil software corporations"?
There already are LLMs with open weights that are better at code than state of the art closed source models from a year ago. For now, you most people may have to rent the hardware to run those models, since it's too expensive for most people to own something that can run inference on one trillion parameters, but I wouldn't consider LLMs to be controlled by "evil software corporations" at this point.
Open models do exist. They’re nowhere near aa good as frontier models, but they’re getting better all the time.
It’s probably only a matter of time before open models are as good as Claude code is today.
With the release of GLM-5, I would say that they are pretty much almost as good. Basically 90% as good as Opus 4.6 on most tasks for 20% of inference cost, and open weights.
easy, we ask Claude to write an open-source freely-available version of Claude with equal or better capabilities.
I agree with almost all of that, except the part about GNU changing their stance. I think GNU should stay true and consistent, if for no other reason than to not make many of their supporters who aren't on board with AI feel betrayed and have GNUs legacy soured. If the cause of LLMs conquering proprietary software needs an organization to champion it, let that be a new organization, not GNU.
Its purpose "if you run the software you should be able to inspect and modify that software, and to share those modifications with your peers" not explicitly resist copyright. Yes copyright is bad in that it often prevents one from doing that, but it is not the purpose of the GPL to dismantle copyright.
Reducing it to "well you can clone the proprietary software you're forced to use by LLM" is really missing the soul of the GPL.
If not for copyright, you could always do that and copyleft wouldn't be needed.
Just because something is copyleft doesn't mean the person who gave you the binary you're using has to supply you with the code the used to build it. That's what the GPL does.
> we'll see that it was an attempt to fight copyrights with copyrights
it's not that simple
yes, GPLs origins have the idea of "everyone should be able to use"
but it also is about attribution the original author
and making sure people can't just de-facto "size public goods"
the kind of AI usage is removing attribution and is often sizing public goods in a way far worse then most companies which just ignored the license did
so today there is more need then ever in the last few decades for GPL like licenses
You've said "size" twice in comments, did you mean "seize"?
That's naive. Copyright doesn't just apply to software. There already have been countless lawsuits about copying music long before the term "open source" was invented. No, changing the lyrics a bit doesn't circumvent copyright. Nor does translating a Stephen King novel to German and switching the names of the places and characters.
A court ordered the first Nosferatu movie to be destroyed because it had too many similarities to Dracula. Despite the fact that the movie makes rather large deviations from the original.
If Claude was indeed asked to reimplement the existing codebase, just in Rust and a bit optimized, that could well be a copyright violation. Just like rephrasing A Song ot Ice and Fire a bit, and switching to a different language, doesn't remove its copyright.
> Just like rephrasing A Song ot Ice and Fire a bit, and switching to a different language, doesn't remove its copyright.
There is some precedent for this, e.g. Alchemised is a recent best seller that had just enough changed from its Harry Potter fan fiction source in order to avoid copyright infringement: https://en.wikipedia.org/wiki/Alchemised
(I avoided the term “remove copyright” here because the new work is still under copyright, just not Harry Potter - related copyright.)
That's apparently a different story with different plot, so that's not comparable.
Claude was asked to implement a public API, not an entire codebase. The definition of a public API is largely functional; even in an unusually complex case like the Java standard facilities (which are unusually creative even in the structure and organization of the API itself) the reimplementation by Google was found to be fair use.
> Claude was asked to implement a public API, not an entire codebase.
Allegedly. There have been several people who doubted this story. So how to find out who is right? Well, just let Claude compare the sources. Coincidentally, Claude Opus 4.6 doesn't just score 75.6% on SWE-bench Verified but also 90.2% on BigLaw Bench.
It's like our copyright lawyer is conveniently also a developer. And possibly identical to the AI that carried out the rewrite/reimplemention in question in the first place.
> What AI are eroding is copyright.
At the moment it's people that are eroding copyright. E.g. in this case someone did something.
"AI" didn't have a brain, woke up and suddenly decided to do it.
Realistically nothing to do with AI. Having a gun doesn't mean you randomly shoot.
> AI is eroding copyright
Unless it is IP of the same big corpos that consumed all content available. Good luck with eroding them.
So not only are we moving goalposts here, but we've decided the GNU team should join the other team? I don't understand how GNU would see mass model LLM training as anything but the most flagrant violations of their ethos. LLM labs, in their view, would be among the most evil software corporations to have ever existed.
While I personally agree with you, Richard Stallman (the creator of the GPL) does not. He has always advocated in favor of strong copyright protection, because the foundation of the GPL is the monopoly power granted by copyright. The problem that the GPL is intended to solve is proprietary software.
Generative models (AI) are not really eroding copyright. They are calling its bluff. The very notion of intellectual property depends on a property line: some arbitrary boundary where the property begins and ends. Generative models blur that line, making it impractical to distinguish which property belongs to whom.
Ironically, these models are made by giant monopolistic corporations whose wealth is quite literally a market valuation (stock price) of their copyrights! If generative models ever become good enough to reimplement CUDA, what value will NVIDIA have left?
The reality is that generative models are nowhere near good enough to actually call the bluff. Copyright is still the winning hand, and that is likely to continue, particularly while IP holders are the primary authors of law.
---
This whole situation is missing the forest for the trees. Intellectual Property is bullshit. A system predicated on monopoly power can only result in consolidated wealth driving the consolidation of power; which is precisely what has happened. The words "starving artist" ring every bit as familiar today as any time in history. Copyright has utterly failed the very goals it was explicitly written with.
It isn't the GPL that needs changing. So long as a system of copyright rules the land, copyleft is the best way to participate. What we really need is a cohesive political movement against monopoly power; one that isn't conveniently ignorant of copyright as its most significant source.
Right, anything that can be copied instantly for free cannot be realistically owned.
> Blanchard's account is that he never looked at the existing source code directly. He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch
This feels sort of like saying "I just blindly threw paint at that canvas on the wall and it came out in the shape of Mickey Mouse, and so it can't be copyright infringement because it was created without the use of my knowledge of Micky Mouse"
Blanchard is, of course, familiar with the source code, he's been its maintainer for years. The premise is that he prompted Claude to reimplement it, without using his own knowledge of it to direct or steer.
> Blanchard is, of course, familiar with the source code, he's been its maintainer for years.
I would argue it's irrelevant if they looked or didn't look at the code. As well as weather he was or wasn't familiar with it.
What matters is, that they feed to original code into a tool which they setup to make a copy of it. How that tool works doesn't really matter. Neither does it make a difference if you obfuscate that it's an copy.
If I blindfold myself when making copies of books with a book scanner + printer I'm still engaging in copyright infringement.
If AI is a tool, that should hold.
If it isn't "just" a tool, then it did engage in copyright infringement (as it created the new output side by side with the original) in the same way an employee might do so on command of their boss. Which still makes the boss/company liable for copyright infringement and in general just because you weren't the one who created an infringing product doesn't mean you aren't more or less as liable of distributing it, as if you had done so.
>that they feed to original code into a tool which they setup to make a copy of it
Well, no. They fed the spec (test cases, etc) into a tool which made a new program matching the spec. This is not a copy of the original code.
But also this feels like arguing over the color of the iceberg while the titanic sinks. If you have a tool that can make code to spec, what is the value in source code anymore? Even if your app is closed-source, you can just tell claude to write new code that does the same thing.
Everyone writes as if he just fed the spec and tests to Claude Code. Ignoring for now that the tests are under LGPL as well, the commit history shows that this has been done with two weeks of steering Claude Code towards the desired output. At every one of these interactions, the maintainer used his deep knowledge of the chardet codebase to steer Claude.
Blanchard fed the spec to the tool, and Anthropic fed the code to the tool, so Blanchard didn't do anything wrong, and Anthropic didn't do anything wrong. Nothing to see here.
> Blanchard fed the spec to the tool,
Yes...
> and Anthropic fed the code to the tool,
Presumably, as part of the massive amount of open-source code that must have been fed in to train their model.
> so Blanchard didn't do anything wrong, and Anthropic didn't do anything wrong. Nothing to see here.
This is meant as irony, right?
if the actual text of the code isn't the same or obviously derivative, copyright doesn't apply at all.
Copyright protects even very abstract aspects of human creative expression, not just the specific form in which it is originally expressed. If you translate a book into another language, or turn it into a silent movie, none of the actual text may survive, but the story itself remains covered by the original copyright.
So when you clone the behavior of a program like chardet without referencing the original source code except by executing it to make sure your clone produces exactly the same output, you may still be infringing its copyright if that output reflects creative choices made in the design of chardet that aren't fully determined by the functional purpose of the program.
What does derivative mean here? Because IMO it means that the existing work was used as input. So if you used a LLM and it was trained on the existing work, that's a derivative work. If you rot13 encode something as input, so you can't personally read it, and then a device decides to rot13 on it again and output it, that's a derivative work.
In order for it to be creatively derivative you would need to copy the structure, logic, organization, and sequence of operations not just reimplement the functionality. It is pretty clear in this case that wasn't done.
It's not clear at all.
As a cynical person I assume all the frontier LLMs were trained on datasets that include every open source project, but as a thought experiment, if an LLM was trained on a dataset that included every open source project _execept_ chardet, do you think said LLM would still be able to easily implement something very similar?
There is no doubt in my mind that it could still do it.
Of course, the problem with this interpretation is that all modern LLMs are derivatives from huge amounts of text under completely different licenses, including "All rights reserved", and therefore can not be used for any purpose.
I'm not sure how you square the circle of "it's alright to use the LLM to write code, unless the code is a rewrite of an open source project to change its license".
> Of course, the problem with this interpretation is that all modern LLMs are derivatives from huge amounts of text under completely different licenses, including "All rights reserved", and therefore can not be used for any purpose.
> I'm not sure how you square the circle of "it's alright to use the LLM to write code
You seem like you're on the cusp of stating the obvious correct conclusion: it isn't.
> Because IMO it means that the existing work was used as input
That's your opinion (since you said "IMO"), not the actual legal definition.
LLMs do not encode nor encrypt their training data. The fact they can recite training data is a defect not a default. You can understand this more simply by calculating the model size as an inverse of a fantasy compression algorithm that is 50% better than SOTA. You'll find you'd still be missing 80-90% of the training data even if it were as much of a stochastic parrot as you may be implying. The outputs of AI are not derivative just because they saw training data including the original library.
Then onto prompting: 'He fed only the API and (his) test suite to Claude'
This is Google v Oracle all over again - are APIs copyrightable?
> This is Google v Oracle all over again - are APIs copyrightable?
Yes this is the best way to ask the question. If I take a public facing API and reimplement everything, whether it's by human or machine, it should be sufficient. After all, that's what Google did, and it's not like their engineers never read a single line of the Java source code. Even in "clean room" implementations, a human might still have remembered or recalled a previous implementation of some function they had encountered before.
I find the "compression" argument not very strong, both because copyright still applies to (very) lossy codecs (e.g. your 16kbps Opus file of Thriller infringes, even if the original 192khz/32bit wav file was 12,000kbps), and because copyright still applies to transformed derivative works (a tiny midi file of Thriller might still be enough for the Jackson's label to get you)
See also: https://monolith.sourceforge.net/, which seeks to ask the question:
> But how far away from direct and explicit representations do we have to go before copyright no longer applies?
If you pirate a movie and reencode it, does that apply as well? You can still watch the movie and it is “obviously” the same movie, even though the bytes are completely different. Here you can use the program and it is, to the user, also the same.
> If it isn't "just" a tool, then it did engage in copyright infringement
Copyright infringement is a thing humans do. It's not a human.
Just like how the photos taken by a monkey with a camera have no copyright. Human law binds humans.
Correct. The human who shares the copy is the one who engages in copyright infringement.
So, let's say that rather than actually touching any copyrighted material, a human merely tells an AI about how to go onto the internet and find copyrighted material, download it, and ingest it for training. The AI, fully autonomously, does so, and after training itself on the material deletes it so no human ever downloads, consumes, or shares it.
If we are saying AI is "more than a tool", which seems to be the case courts are leaning since they've ruled AI output without direct human involvement is not copyrightable[0], then the above seems like it would be entirely legal.
[0] https://www.copyright.gov/newsnet/2025/1060.html
Someone would likely get prosecuted if they instructed AI agent to run say a pump and dump scheme...
Even if the final output doesn't have copyright protection it might still be copyright violation. I think it could be reasonable to have work that itself violates copyright when distributed even if it does not have copy right itself.
I just don't see how it's relevant whether he did look or didn't. In my opinion, it's not just legally valid to make a re-implementation of something if you've seen the code as long as it doesn't copy expressive elements. I think it's also ethically fine as well to use source code as a reference for re-implementing something as long as it doesn't turn into an exact translation.
Right. The alternative is that we reward Dan for his 14 years of volunteer maintenance of a project... by banning him from working on anything similar under a different license for the rest of his life.
It's actually not legally fine, or at least it's extremely dangerous. Projects that re-implement APIs presented by extremely litigious companies specifically do not allow people who, for instance, have seen the proprietary source code to then work on the project.
I don't think fear or legal action makes it illegal.
If I know it is legal to make a turn at a red light. And I know a court will uphold that I was in the right but a police officer will fine me regardless and I would need to go to actually pursue some legal remedy I'm unlikely to do it regardless of whether it is legal because it is expensive, if not in money but time.
In the case of copyright lawsuits they are notoriously expensive and long so even if a court would eventually deem it fine, why take the chance.
That's my point. It's dangerous and there are sharks in the water. That sounds like you're not going to have a good time if you do the described approach to someone who might assert you're infringing.
My understanding is that that is a maximalist position for the avoidance of risk, and is sufficient but probably not necessary.
Ignoring the legal or ethical concerns. Let’s say we live in a world where the cost of copying code is so close to zero that it’s indistinguishable from a world without copyright.
Anything you put out can and will be used by whatever giant company wants to use it with no attribution whatsoever.
Doesn’t that massively reduce the incentive to release the source of anything ever?
No, because (most) people don't work on OSS for vanity, they do it to help other people, whether it's individuals or groups of individuals, ie corporations.
It's the same question as, if an AI can generate "art", or photographers can capture a scene better than any (realistic) painter, then will people still create art? Obviously yes, and we see it of course after Stable Diffusion was released three years ago, people are still creating.
I don’t know what a world without copyright does to corporate sponsored open source. It certainly reduces it because there are many corporate sponsored projects that monetize through dual licensing. My guess is in a world where you can’t even guarantee attribution, it’s much harder to convince your boss to let you open source a project in the first place.
So ignoring people who are being paid by corporations directly to work on open source, in my experience the vast majority of contributors expect to be able to monetize their work eventually in a way that requires attribution. And out of the small number who don’t expect a monetary return of any kind, a still smaller number don’t expect recognition.
If this weren’t the case you’d see a much larger amount of anonymous contributions. There are people who anonymously donate to charity. The vast majority want some kind of recognition.
Obviously we still see art, if you greatly reduce the monetary benefit to producing art, you’ll see a lot less of it. This is especially true of non trivial open source software that unlike static artwork requires continual maintenance.
Most commercial software that I've used has the model of a legal moat around a pretty crappy database schema.
The non IP protection has largely been in the effort involved in replicating an application's behavior and that effort is dropping precipitously.
You must not have used much commercial software outside of crappy business SaaS.
Yes, and it reduces the incentives to release binaries too. Such a world will be populated by almost entirely SaaS, which can still compete on freedom.
>This feels sort of like saying "I just blindly threw paint at that canvas on the wall and it came out in the shape of Mickey Mouse, and so it can't be copyright infringement because it was created without the use of my knowledge of Micky Mouse"
IANAL, but that analogy wouldn't work because Mickey Mouse is a trademark, so it doesn't matter how it is created.
Oracle had it's day in court with Google over the Java APIs. Reimplementing APIs can be done without copyright infringement, but Oracle must have tried to find real infringement during discovery.
In this case, we could theoretically prove that the new chardet is a clean reimplementation. Blanchard can provide all of the prompts necessary to re-implement again, and for the cost of the tokens anyone can reproduce the results.
Can anyone find the actual quote where Blanchard said this?
My understanding was that his claim was that Claude was not looking at the existing source code while writing it.
Conveniently ignoring the likelihood that Claude had been trained on the freely accessible source code.
Does he have access to Claude's training data? How can he claim Claude wasn't trained on the original code?
Isn't this a red herring? An API definition is fair use under Google v. Oracle, but the test suite is definitely copyrightable code!
If you only stick to the API and ignore the implementation, it is not Mickey Mouse any more but a rodent. If it was just a clone it wouldn't be 50x as fast. Nevertheless, APIs apparently can be copyrightable. I generally disagree with this; it's how PC compatibles took off, giving consumers better options.
Wait what, didn't oracle lose the case against Google? Have I been living in an alternate reality where API compatibility is fair use?
> This feels sort of like saying "I just blindly threw paint at that canvas on the wall and
> He fed only the API and the test suite to Claude and asked it
Difference being Claude looked; so not blind. The equivalent is more like I blindly took a photo of it and then used that to...
Technically did look.
The article is poorly written. Blanchard was a chardet maintainer for years. Of course he had looked at it's code!
What he claimed, and what was interesting, was that Claude didn't look at the code, only the API and the test suite. The new implementation is all Claude. And the implementation is different enough to be considered original, completely different structure, design, and hey, a 48x improvement in performance! It's just API-compatible with the original. Which as per the Google Vs oracle 2021 decision is to be considered fair use.
did he claim that Claude wasn't trained on the original? Or just that he didn't personally provide Claude with a copy?
I recon the latter, how would he know what was in Claude's training data?
> What he claimed, and what was interesting, was that Claude didn't look at the code
Who opened the PR? Who co-authored the commits? It's clearly on Github.
> Blanchard was a chardet maintainer for years. Of course he had looked at its code!
So there you have it. If he looked, he co-authored then there's that.
If I put my signature on Picasso painting, it doesn't make me co-author of said painting.
Blanchard is very clear that he didn't write a single line of code. He isn't an author, he isn't a co-author.
Signing GitHub commit doesn't change that.
> Blanchard is very clear that he didn't write a single line of code
He used Claude to write it. Difference? The fact that I write on the notepad vs printed it out = I didn't do it?
> Signing GitHub commit doesn't change that.
That's the equivalent of me saying I didn't kill anyone. The fingerprints on the knife doesn't change that.
I'll take a commit authored by someone else and then git amend the author to myself, did I write that commit then? By your logic I did apparently.
What if we said that generative AI output is simply not copyrightable. Anything an AI spits out would automatically be public domain, except in cases where the output directly infringes the rights of an existing work.
This would make it so relicensing with AI rewrites is essentially impossible unless your goal is to transition the work to be truly public domain.
I think this also helps somewhat with the ethical quandary of these models being trained on public data while contributing nothing of value back to the public, and disincentivize the production of slop for profit.
We did in fact say so.
https://www.carltonfields.com/insights/publications/2025/no-...
> No Copyright Protection for AI-Assisted Creations: Thaler v. Perlmutter
> A recent key judicial development on this topic occurred when the U.S. Supreme Court declined to review the case of Thaler v. Perlmutter on March 2, 2026, effectively upholding lower court rulings that AI-generated works lacking human authorship are not eligible for copyright protection under U.S. law
> > A recent key judicial development on this topic occurred when the U.S. Supreme Court declined to review the case of Thaler v. Perlmutter on March 2, 2026, effectively upholding lower court rulings that AI-generated works lacking human authorship are not eligible for copyright protection under U.S. law
This was AI summary? Those words were not in the article.
The courts said Thaler could not have copyright because he refused to list himself as an author.
> This would make it so relicensing with AI rewrites is essentially impossible unless your goal is to transition the work to be truly public domain.
That's not true at all. Anyone could follow these steps:
1. Have the LLM rewrite GPL code.
2. Do not publish that public domain code. You have no obligation to.
3. Make a few tweaks to that code.
4. Publish a compiled binary/use your code to host a service under a proprietary license of your choice.
In the corporate world, we've started using reimplementation as a way to access tooling that security won't authorize.
Sec has a deny by default policy. Eng has a use-more-AI policy. Any code written in-house is accepted by default. You can see where this is going.
We've been using AI to reimplement tooling that security won't approve. The incentives conspired in the worst outcome, yet here we are. If you want a different outcome, you need to create different incentives.
If Blanchard is claiming not to have been substantively involved in the creation of the new implementation of chardet (i.e. "Claude did it"), then the new implementation is machine generated, and in the USA cannot be copyright and thus cannot be licensed.
If he is claiming to have been somehow substantively "enough" involved to make the code copyrightable, then his own familiarity with the previous LGPL implementation makes the new one almost certainly a derivative of the original.
>then his own familiarity with the previous LGPL implementation makes the new one almost certainly a derivative of the original.
The "clean room rewrite" is just an extreme way to have a bulletproof shield against litigation. Not doing it that way doesn't automatically make all new code he writes derivative solely because he saw how the code worked previously.
If the clean room re-write was done entirely by Claude, then the result cannot be copyright in the USA, and thus there is no license at all.
And if he was in fact more involved (which he appears to deny) that it's a bit weak to say that someone with huge familiarity with chardet could choose to reimplement chardet without the result being derivative.
So if I read any LGPL code in my life, I can never think about working on something similar in my life?
There's a difference between "I've read a LGPL code once, maybe I could do something similar" and "I've been reading this LGPL code for 12 years and now I'm going to do exactly the same thing".
This is only worth arguing about because software has value. Putting this in context of a world where the cost of writing code is trending to 0, there are two obvious futures:
1. The cost continues to trend to 0, and _all_ software loses value and becomes immediately replaceable. In this world, proprietary, copyleft and permissive licenses do not matter, as I can simply have my AI reimplement whatever I want and not distribute it at all.
2. The coding cost reduction is all some temporary mirage, to be ended soon by drying VC money/rising inference costs, regulatory barriers, etc. In that world we should be reimplementing everything we can as copyleft while the inferencing is good.
There’s an other option. The cost of copying existing software trends to 0, but the cost of writing new software stays far enough above 0 that it is still relatively expensive.
There was a recent ruling that LLM output is inherently public domain (presumably unless it infringes some existing copyright). In which case it's not possible to use them to "reimplement everything we can as copyleft".
it's more complicated, the ruling was that AI can't be an author and the thing in question is (de-facto) public domain because it has no author in context of the "dev" claim it was fully build by AI
but AI assisted code has an author and claiming it's AI assisted even if it is fully AI build is trivial (if you don't make it public that you didn't do anything)
also some countries have laws which treat it like a tool in the sense that the one who used it is the author by default AFIK
There will always be cost though. Even if perfect code is getting one-shotted out, that is constantly maintained and adapted to changing conditions and technology, it simply can't stay at 0 forever because one day the power is surely going to go out!
More and more I am drawn to these kinds of ideas lately, perhaps as a kind of ethical sidestep, but still:
- https://wiki.xxiivv.com/site/permacomputing.html - https://permacomputing.net/
It's not going to solve any general issue here, but the one thing these freaks need that can't be generated by their models is energy, tons of it. So, the one thing I can do as an individual and in my (digital) community is work to be, in a word, self-sustainable. And depending on my company I guess, if I was a CEO I would hope I was wise enough to be thinking on the same lines.
Everyone is making beautiful mountains from paper and wire. I will just be happy to make a small dollhouse of stone, I think it will be worth it. How can we see not just at least some small-level of hubris otherwise?
The value of software has never been tied to the cost of writing it, even if you don't distribute it your still breaking the law.
The article is proceeding from the premise that a reimplementation is legal (but evil). To help my understanding of your comment, do you mean:
1. An LLM recreating a piece of software violates its copyright and is illegal, in which case LLM output can never be legally used because someone somewhere probably has a copyright on some portion of any software that an LLM could write.
2. You read my example as "copying a project without distributing it", vs. "having an LLM write the same functionality just for me"
Surprised they don't mention Google LLC v. Oracle America, Inc. Seems a bit myopic to condone the general legality while arguing "you can only use it how I like it".
It also doesn't talk about the far more interesting philosophical queston. Does what Blanchard did cover ALL implementations from Claude? What if anyone did exactly what he did, feed it the test cases and say "re-implement from scratch", ostensibly one would expect the results to be largely similar (technically under the right conditions deterministically similar)
could you then fork the project under your own name and a commercial license? when you use an LLM like this, to basically do what anyone else could ask it to do how do you attach any license to it? Is it first come first serve?
If an agent is acting mostly on its own it feels like if you found a copy of Harry Potter in the fictional library of Babel, you didn't write it, just found it amongst the infinite library, but if you found it first could you block everyone else that stumbles on a near-identical copy elsewhere in the library? or does each found copy represent a "Re-implementation" that could be individually copyrighted?
It seems that this chap didn't go and implement a new library, he reimplemented an existing one and became sole-controller of it. i.e. he seems to have taken its reputation, brand whatever you call it away from the contributors and entirely to himself. Their work of establishing it as a well known solution is no longer recognised.
So of course we feel that something wrong has happened even if it's not easy to put one's finger on it.
It should be noted that the Rust community is also guilty of something similar. That is, porting old GPL programs, typically written in C, to Rust and relicensing them as MIT.
> porting old GPL programs, typically written in C, to Rust and relicensing them as MIT
Everything for memory safety.
Well, the license change sounds pretty strange, but to be honest if I were to use this software I would use it without adhering to the MIT. It's machine-created content which is not, in general, copyrightable. You can assert whatever license you want on such content, but I am not going to adhere to it. For example, I declare you may use the following under the Elastic License
without discussing copyright, I don't believe any of this is copied. Which I think should be the argument that actually matters.
I downloaded both 6.0 and 7.0 and based on only a light comparison of a few key files, nothing would suggest to me that 7.0 was copied from 6.0, especially for a 41x faster implementation. It is a lot more organized and readable in my armature opinion, and the code is about 1/10th the size.
There's a Japanese version of that page, written in classical text writing direction, in columns. Which is cool. Makes me wonder, though - how readable is it with so many English loanwords which should be rotated sideways to fit into columns?
Total digression but yeah, that layout is stupid and the way those words are dropped in using Romaji makes no sense. That's not how Japanese people lay out pages on the web. In fact I don't think I've ever seen a Japanese web page laid out like a book like this, and in general I'd expect the English proper nouns and words that don't have obvious translations to get transliterated into Katakana. Smells like automatic conversion added by someone not really familiar with common practices for presenting Japanese on the web.
He also has a Korean vertical layout that lays out Latin-character words the same way. Is this common in Korea when vertical layout is used? The author seems to be Korean.
Looks like Wikipedia has an example of Traditional Chinese vertical layout with the Latin letters rotated as in TFA's layout (https://en.wikipedia.org/wiki/Horizontal_and_vertical_writin...)
You can't put a copyright and MIT license on something you generated with AI. It is derived from the work of many unknown, uncredited authors.
Think about it; the license says that copies of the work must be reproduced with the copyright notice and licensing clauses intact. Why would anyone obey that, knowing it came from AI?
Countless instances of such licenses were ignored in the training data.
When learning is sufficiently atomized and recombined, creations cease to be "derived from" in a legal sense.
A lego sculpture is copyrighted. Lego blocks are not. The threshold between blocks and sculpture is not well-defined, but if an AI isn't prompted specifically to attempt to mimic an existing work, its output will be safely on the non-copyrighted side of things.
A derivative work is separately copyrightable, but redistribution needs permission from the original author too. Since that usually won't be granted or would be uneconomical, the derivative work can't usually be redistributed.
AI-produced material is inherently not copyrightable, but not because it's a derivative work.
Token prediction is a form of "learning" that is reinforced by the goal of reproducing the correct next token of the work, rather that acquiring ideas and concepts. For instance, given the prefix "Four score and seven years", the weights are adjusted until "ago" is correctly predicted, which is a fancy way of saying that it was stored in the model in a lossy way. The model "learned" that "ago" follows "four score and seven years" exactly the way your hard drive "learns" the audio and video frames of a movie when you download a .mp4 file.
I dispute the idea that token sequences reproduced from the model are not derived works.
I predict, no pun intended, that a time is coming when the idea that it's not a derived work will be challenged in mainstream law.
The slop merchants are getting a free ride for the time being.
Courts have already ruled that AI-generated work belongs to the public domain. So, even the MIT license does not apply.
This article is setting up a bit of a moving target. Legal vs legitimate is at least only a single vague question to be defined but then the target changes to “socially legitimate” defined only indirectly by way of example, like aggressive tax avoidance as “antisocial”— and while I tend to agree with that characterization my agreement is predicated on a layering of other principals.
The fundamental problem is that once you take something outside the realm of law and rule of law in its many facets as the legitimizing principal, you have to go a whole lot further to be coherent and consistent.
You can’t just leave things floating in a few ambiguous things you don’t like and feel “off” to you in some way- not if you’re trying to bring some clarity to your own thoughts, much less others. You don’t have to land on a conclusion either. By all means chew over things, but once you try to settle, things fall apart if you haven’t done the harder work of replacing the framework of law with that of another conceptual structure.
You need to at least be asking “to what ends? What purpose is served by the rule?” Otherwise you’re stuck in things where half the time you end up arguing backwards in ways that put purpose serving rules, the maintenance of the rule with justifications ever further afield pulled in when the rule is questioned and edge cases reached. If you’re asking, essentially, “is the spirit of the rule still there?” You’ve got to stop and fill in what that spirit is or you or people that want to control you or have an agenda will sweep in with their own language and fill the void to their own ends.
> If source code can now be generated from a specification, the specification is where the essential intellectual content of a GPL project resides. Blanchard's own claim—that he worked only from the test suite and API without reading the source—is, paradoxically, an argument for protecting that test suite and API specification under copyleft terms.
This is an interesting reversal in itself. If you make the specification protected under copyright, then the whole practice of clean room implementations is invalid.
Broadly speaking, the “freedom of users” is often protected by competition from competing alternatives. The GNU command line tools were replacements for system utilities. Linux was was a replacement for other Unix kernels. People chose to install them instead of proprietary alternatives. Was it due to ideology or lower cost or more features? All of the above. Different users have different motivations.
Copyleft could be seen as an attempt to give Free Software an edge in this competition for users, to counter the increased resources that proprietary systems can often draw on. I think success has been mixed. Sure, Linux won on the server. Open source won for libraries downloaded by language-specific package managers. But there’s a long tail of GPL apps that are not really all that appealing, compared to all the proprietary apps available from app stores.
But if reimplementing software is easy, there’s just going to be a lot more competition from both proprietary and open source software. Software that you can download for free that has better features and is more user-friendly is going to have an advantage.
With coding agents, it’s likely that you’ll be able to modify apps to your own needs more easily, too. Perhaps plugin systems and an AI that can write plugins for you will become the norm?
> Was it due to ideology or lower cost or more features?
It was due to access.
"Antirez closes his careful legal analysis as though it settles the matter. Ronacher acknowledges that “there is an obvious moral question here, but that isn't necessarily what I'm interested in.” Both pieces treat legal permissibility as a proxy for social legitimacy. "
This whole article is just complaining that other people didn't have the discussion he wanted.
Ronacher even acknowledged that it's a different discussion, and not one they were trying to have at the moment.
If you want to have it, have it. Don't blast others for not having it for you.
Having this discussion involves blasting others for not considering it. Consider the rest of the paragraph you quoted:
> But law only says what conduct it will not prevent—it does not certify that conduct as right. Aggressive tax minimization that never crosses into illegality may still be widely regarded as antisocial. A pharmaceutical company that legally acquires a patent on a long-generic drug and raises the price a hundredfold has not done something legal and therefore fine. Legality is a necessary condition; it is not a sufficient one.
If the discussion inherently cannot be had without blasting innocent bystanders, I don't think it's a discussion worth having.
It might even be morally abhorrent to have such a discussion in the first place!
It's clear that we're entering a new era of copyright _expectations_ (whether we get new _legislation_ is different), but for now realise this: the people like me who like copyleft can do this too. We can take software we like, point an agent at it, and tell it to make a new version with the AGPL3.0-or-later badge on the front.
But the LLM contributions would likely be ruled public domain, so AGPL may not be enforceable on these.
The point of GPL is to restrict distribution. If there’s already an MIT version, it’s useless.
> The point of GPL is to restrict distribution.
no, it isn't. The point of the GPL is to grant users of the software four basic freedoms (run, study, modify and redistribute). There's no restriction to distribution per se, other than disallowing the removal of these freedoms to other users.
but the point of an EULA is to restrict distribution, so AGPL3 can help there.
IMHO, the API and Test Suite, particularly the latter, define the contract of the functional definition of the software. It almost doesn't matter what that definition looks like so long as it conforms to the contract.
There was an issue where Google did something similar with the JVM, and ultimately it came down to whether or not Oracle owned the copyright to the header files containing the API. It went all the way to the US supreme court, and they ruled in Google's favour; finding that the API wasn't the implementation, and that the amount of shared code was so minimal as to be irrelevant.
They didn't anticipate that in less than half a decade we'd have technology that could _rapidly_ reimplement software given a strong functional definition and contract enforcing test suite.
Why are people even having problems with sharing their changes to begin with? Just publishing it somewhere does not seem too expensive. The risk of accidentally including stuff that is not supposed to become public? Or are people regularly completely changing codebases and do not want to make the effort freely available, maybe especially to competitors? I would have assumed that the common case is adding a missing feature here, tweaking something there, if you turn the entire thing on its head, why not have your own alternative solution from scratch?
i've been following this for a while.. and the trend for copyright (of any form - books code pictures music whatever) being laundered by reinventing the "same" thing in-some-way.. is kind-of clear.
But what happens with the new things? Has the era of software-making (or creating things at large) finished, and from now on everything will be re-(gurgitated|implemented|polished) old stuff?
Or all goes back to proprietary everything.. Babylon-tower style, noone talks to noone?
edit: another view - is open-source from now on only for resume-building? "see-what-i've-built" style
Not a lawyer, but my understanding is: In theory, copyright only protects the creative expression of source code; this is the point of the "clean room" dance, that you're keeping only the functional behavior (not protected by copyright). Patents are, of course, an entirely different can of worms. So using an LLM to strip all of the "creative expression" out of source code but create the same functionality feels like it could be equivalent enough.
I like the article's point of legal vs. legitimate here, though; copyright is actually something of a strange animal to use to protect source code, it was just the most convenient pre-existing framework to shove it in.
> this is the point of the "clean room" dance
which is the actual relevant part: they didn't do that dance AFIK
AI is a tool, they set it up to make a non-verbatim copy of a program.
Then they feed it the original software (AFIK).
Which makes it a side by side copy, as in the original source was used as reference to create the new program. Which tend to be seen as derived work even if very different.
IMHO They would have to:
1. create a specification of the software _without looking at the source code_, i.e. by behavior observation (and an interface description). I.e. you give the AI access to running the program, but not to looking into the insides of it. I really don't think they did it as even with AI it's a huge pain as you normally can't just brute force all combinations of inputs and instead need to have a scientific model=>test=>refine loop (which AI can do, but can take long and get stuck, so you want it human assisted, and the human can't have inside knowledge about the program).
2. then generate a new program from specification, And only from it. No git history, no original source code access, no program access, no shared AI state or anything like that.
Also for the extra mile of legal risk avoidance do both human assisted and use unrelated 3rd parties without inside knowledge for both steps.
While this does majorly cut cost of a clean room approach, it still isn't cost free. And still is a legal mine field if done by a single person, especially if they have enough familiarity to potentially remember specific peaces of code verbatim.
> Then they feed it the original software (AFIK).
My understanding is they did do the dance. From the article: "He fed only the API and the test suite to Claude and asked it to reimplement the library from scratch."
One could still make the argument that using the test suite was a critical contributing factor, but it is not a part of the resulting library. So in my uninformed opinion, it seems to me like the clean room argument does apply.
Well sure they didn't do the dance, but you don't have to do the dance. The reason to do it is that it's a good defense in a lawsuit. Like you say, all of this is a legal minefield.
So my understanding was that the original code was specifically not fed into Claude. But was almost certainly part of its training data, which complicates things, but if that's fair use then it's not relevant? If training's not fair use and taints the output, then new-chardet is a derivative of a lot of things, not just old-chardet...
This is all new legal ground. I'm not sure if anyone will go to court over chardet, though, but something that's an actual money-maker or an FSF flagship project like readline, on the other hand, well that's a lot more likely.
> When GNU reimplemented the UNIX userspace, the vector ran from proprietary to free. Stallman was using the limits of copyright law to turn proprietary software into free software. […] The vector in the chardet case runs the other way.
That’s just your subjective opinion which many other people would disagree. I bet Armin Ronacher would agree that an MIT licensed library is even freer than an LGPL licensed library. To them, the vector is running from free to freer.
Why does anyone need his new library? They can do what he did and make their own.
I'm glad we can fork things at a point and thumb our noses at those who wish to cash in on other's work.
Why would I make my own? The new library is released under MIT license and faster than the old one.
If you decide to improve it in any way to fit your needs you can merely tell your own AI to re-implement it with your changes. Then it's proprietary to you.
I feel like the licenses that suffer the most isn't the GPL, but the ones like SSPL. If your code can be re-implemented easily and legally by AWS using an LLM, why risk publishing it?
It does feel like open source is about to change. My hunch is that commercial open source (beyond the consultation model) risks disappearing. Though I'd be happy to be proven wrong.
LPGL is dead, long live the AI rewrites of your barely open source code
Buried in here: Mark Pilgrim suddenly reappearing after his sudden disappearance years ago! Has he been up to anything since then?
See also "A Declaration of the Independence of Cyberspace" (https://www.eff.org/cyberspace-independence), and what a goofy, naive, misguided disaster that early internet optimism turned into.
No, AI does not mean the end of either copyright or copyleft, it means that the laws need to catch up. And they should, and they will.
I don't think this part is correct: "If you distribute modified code, or offer it as a networked service, you must make the source available under the same terms."
That's what something like AGPL does.
One of the things that irks me about this whole thing is, if it’s so clean room and distinct, why make the changes to the existing project? Why not make an entirely new library?
The answer to that, I think, is that the authors wanted to squat an existing successful project and gain a platform from it. Hence we have news cycle discussing it.
Nobody cares about a new library using AI, but squash an existing one with this stuff, and you get attention. It’s the reputation, the GitHub stars, whatever
I mean, Blanchard was the longtime maintainer of chardet already, and had wanted to relicense it for years. So I think that complicates your picture of "squatting an existing successful project".
Honestly it's a weird test case for this sort of thing. I don't think you'd see an equivalent in most open source projects.
I agree. But you can't copyright goodwill and reputation. Trademark does provide some protection there, right?
Easy solution for now:
Add something like this to NEW gpl /bsd/mit licenses:
'you are forbidden from reimplementing it with AI'
or just:
'all clones, reimpletetions with ai etc must still be GPL'
I'm less concerned about AI eroding copyleft and more exited about AI eroding copy right.
A lot of untagged IANAL takes here today.
Someone be brave, and do this to ZFS. Poke the Oracle bear!
Perhaps software patents may play an even bigger role in the future.
Or, hopefully, even less of a role.
Imagine if the author has his way, and when we have AI write software, it becomes legally under the license of some other sufficiently similar piece of software. Which may or may not be proprietary. "I see you have generated a todo app very similar to Todoist. So they now own it." That does not seem like a good path either for open source software or for opening up the benefits of AI generated software.
That's a non-sequitur. chardet v7 is GPL-derived work (currently in clear violation of the GPL). If xe wanted it to be a different thing xe should've published as such. Simple as.
What if someone doesn't declare that it has been reimplemented using an LLM? Isn't it enough to simply declare that you have reimplemented the software without using an LLM? Good luck proving that in court...
One thing is certain, however: copyleft licenses will disappear: If I can't control the redistribution of my code (through a GPL or similar license), I choose to develop it in closed source.
Arguably, the GPL has always been the wrong choice if you want to authoritatively control redistribution.
If the model wasn't trained on copyleft, if he didn't use a copyleft test suite and if he wasn't the maintainer for years. Clearly the intent here is copyright infringement.
If you have software your testsuite should be your testsuite, you do dev with a testsuite and then mit without releasing one. Depending on the test-suite it may break clean room rules, especially for ttd codebases.
I think what is happening is the collapse of the “greater good”. Open source is dependent upon providing information for the greater good and general benefit of its readers. However now that no one is reading anything, its purpose is for the great good of the most clever or most convincing or richest harvester.
shall we now have to think about the tradeoffs in adopting
- proprietary
- free
- slop-licensed
software?
We should just use LLMs to free more software and HW. Make it work against the system.
> Ronacher notes this as an irony and moves on. But the irony cuts deeper than he lets on. Next.js is MIT licensed. Cloudflare's vinext did not violate any license—it did exactly what Ronacher calls a contribution to the culture of openness, applied to a permissively licensed codebase. Vercel's reaction had nothing to do with license infringement; it was purely competitive and territorial. The implicit position is: reimplementing GPL software as MIT is a victory for sharing, but having our own MIT software reimplemented by a competitor is cause for outrage. This is what the claim that permissive licensing is “more share-friendly” than copyleft looks like in practice. The spirit of sharing, it turns out, runs in one direction only: outward from oneself.
This argument makes no sense. Are they arguing that because Vercel, specifically, had this attitude, this is an attitude necessitated by AI, reimplementation, and those who are in favor of it towards more permissive licenses? That certainly doesn't seem to be an accurate way to summarize what antirez or Ronacher believe. In fact, under the legal and ethical frameworks (respectively) that those two put forward, Vercel has no right to claim that position and no way to enforce it, so it seems very strange to me to even assert that this sort of thing would be the practical result of AI reimplementations. This seems to just be pointing towards the hypocrisy of one particular company, and assuming that this would be the inevitable universal, attitude, and result when there's no evidence to think so.
It's ironic, because antirez actually literally addresses this specific argument. They completely miss the fact that a lot of his blog post is not actually just about legal but also about ethical matters. Specifically, the idea he puts forward is that yes, corporations can do these kinds of rewrites now, but they always had the resources and manpower to do so anyway. What's different now is that individuals can do this kind of rewrites when they never have the ability to do so before, and the vector of such a rewrite can be from a permissive to copyleft or even from decompile the proprietary to permissive or copyleft. The fact that it hasn't been so far is a more a factor of the fact that most people really hate copyleft and find an annoying and it's been losing traction and developer mind share for decades, not that this tactic can't be used that way. I think that's actually one of the big points he's trying to make with his GNU comparison — not just that if it was legal for GNU to do it, then it's legal for you to do with AI, and not even just the fundamental libertarian ethical axiom (that I agree with for the most part) that it should remain legal to do such a rewrite in either direction because in terms of the fundamental axioms that we enforce with violence in our society, there should be a level playing field where we look at the action itself and not just whether we like or dislike the consequences, but specifically the fact that if GNU did it once with the ability to rewrite things, it can be done again, even in the same direction, it now even more easily using AI.
> They completely miss the fact that a lot of his blog post is not actually just about legal but also about ethical matters.
Honestly I was confused about the summarization of my blog post into just a legal matter as well. I hope my blog post will be able to flash at least a short time in the HN front page so that the actual arguments it contain will get a bit more exposure.
I'm failing to see what in the quoted text you took to be about AI rewrites specifically? It just reads as a slightly catty aside about the social reaction of rewrites in general (by implying the one example is generalizable.)
Perhaps we should finally admit that copyright has always been nonsense, and abolish this ridiculous measure once and for all
Actually I think the last 20 years of the Internet demonstrates that copyright is more important than ever, because unless it's enforced, people with more capital than the copyright owner will simply steal creative works and profit from them.
The idea that "information wants to be free" was always a lie, meant to transfer value from creators to platform owners. The result of that has been disastrous, and it's long past time to push the pendulum in the other direction.
IMO the core idea of copyright isn't nonsense, but I do think the current implementation (70+ years after death) is egregiously overpowered. I've always thought the current laws were too deeply entrenched to ever change, but I'm tentatively optimistic AI will shock the system hard enough to trigger actual reform.
I think AI is very much eroding the legitimacy of copyright - at least to software, which is long been questioned since it's more like math than creative expression.
I think the industry will realize that it made a huge mistake by leaning on copyright for protection rather than on patents.
Probably a wiser approach is to consider different times require different measures (in general!).
I did not study in detail if copyright "has always been nonsense", but I do agree that nowadays some of the copyright regulations are nonsense (for example the very long duration of life + 70 years)
I think we're going one step too far even, AI itself is a gray area and how can they guarantee it was trained legally or if it's even legal what they're doing and how can they assert that the input training data didn't contain any copyrighted data.
Google already spent billions of dollars and decades of lawyer hours proving it out as fair use. The legal challenges we see now are the dying convulsions of an already broken system of publishers and IP hoarders using every resource at their disposal to manipulate authors and creators and the public into thinking that there's any legitimacy or value underlying modern copyright law.
AI will destroy the current paradigm, completely and utterly, and there's nothing they can do to stop it. It's unclear if they can even slow it, and that's a good thing.
We will be forced to legislate a modern, digital oriented copyright system that's fair and compatible with AI. If producing any software becomes a matter of asking a machine to produce it - if things like AI native operating systems come about, where apps and media are generated on demand, with protocols as backbone, and each device is just generating its own scaffolding around the protocols - then nearly none of modern licensing, copyright, software patents, or IP conventions make any sense whatsoever.
You can't have horse and buggy traffic conventions for airplanes. We're moving in to a whole new paradigm, and maybe we can get legislation that actually benefits society and individuals, instead of propping up massive corporations and making lawyers rich.
Google has cut out some very specific ruling that have nothing to do with modern AI. These systems are just a really slow/lossy git clone, current law has no trouble with it, it's broadly illegal.
If corporations are allowed to launder someone else work as their own people will simply stop working and just start endlessly remixing a la popular music.