Astro - Hacker News

50 comments

kgeist 43 minutes ago ago
Binaries are copyrightable in both the US and the EU, and they are not technically produced by a human either, they're produced by a computer program. I honestly don't understand why this isn't extended to AI-generated code. Isn't it the same thing? One could argue that compilers merely transform source code into binaries "as is," while AI models have some "knowledge" baked in that they extract and paste as code. But there are compilers that also generate binaries by selecting ready-to-use binary patches authored by compiler developers and combining them into a program. One could also argue that, in the case of compilers, at least the input source code is authored by a human. But why can't we treat prompts as "source code in natural language" too? Where is the line between authorship and non-authorship, and how is the line defined? "Your prompt was too basic to constitute authorship" doesn't sound like an objectibe criterion.
Maybe for lawyers, AI is some kind of magical thing on its own. But having successfully created a working inference engine for Qwen3, and seeing how the core loop is just ~50 lines of very simple matrix multiplication code, I can't see LLMs as anything more than pretty simple interpreters that process "neural network bytecode," which can output code from pre-existing templates just like some compilers. And I'm not sure how this is different from transpilers or autogenerated code (like server generators based on an OpenAPI schema)
Sure, if an LLM was trained on GPL code, it's possible it may output GPL-licensed code verbatim, but that's a different matter from the question of whether AI-generated code is copyrightable in principle.
Interestingly, I found an opinion here [0] that binaries technically shouldn't be copyrightable, and currently they are because:
```
  the copyright office listened to software publishers, and they wanted binaries protected by copyright so they could sell them that way
```
[0] https://freesoftwaremagazine.com/articles/what_if_copyright_...
[-]
- wahern 14 minutes ago ago
  
  That linked opinion overstates the case. In the real-world, two different programs performing any non-trivial but functionally identical task will look substantially dissimilar in their source code, and that dissimilarity will carry over to the compiled binary, meaning what was expressive (if anything) is largely preserved. To the extent two different programs do end up with identical code, then that aspect was likely primarily functional and non-copyrightable, or at least the expressive character didn't carry over to the binary. Ordering and naming of APIs in source code can be expressive, and that indeed is often lost (literally or at least the expressive character) during the compilation process, but there are other expressive aspects to software programing that will be preserved and protected in the binary form.
  IMO, your intuition regarding AI is right--it's not a magic copyright laundering machine, and AFAIU courts have very quickly agreed that infringement is occurring. But in copyright law establishing infringement (or the possibility of infringement) is the easy, straight-forward part. Copyright infringement liability is a much more complex question. Transformative uses in particular are a Fair Use, and Fair Use is technically treated as an affirmative defense to infringement.[1] If something is Fair Use, infringement is effectively presumed. But Fair Uses are typically very fact-intensive questions, and unlike the case with search engines I'm not sure we'll get to the point where there's a well-defined fence protecting "AI".
  [1] There's a scholarly pedantic debate about whether Fair Use is properly a "defense", rather than "exception" to infringement, but it walks and talks like a defense in the sense that the defendant has the burden of proving Fair Use after the plaintiff has established infringement. There's a similarly pedantic (though slightly more substantive) debate in criminal law regarding affirmative defenses. But the very term "affirmative defense" was coined to recognize and avoid these pedantic debates.
FeepingCreature 2 hours ago ago

> So as of today, the Copyright system does not have a way for the output of a non-human produced set of files to contain the grant of permissions which the OpenBSD project needs to perform combination and redistribution.
This seems extremely confused. The copyright system does not have a way to grant these permissions because the material is not covered under copyright! You can distribute it at will, not due to any sort of legal grant but simply because you have the ability and the law says nothing to stop you.
[-]
- plorg an hour ago ago
  
  This all relies, as the article points out, on everyone looking directly at code that both looks like and works like the only extant codebase for EXT4 and nonetheless concluding that in fact the computer conjured it from the aether. If I wrote a program that zipped up the Linux kernel source, unzipped it, and grepped -v for comments it would not then be magically transformed into unattributable public domain software.
- jagged-chisel 2 hours ago ago
  
  Eh … the argument will likely be things created by Thing at the behest of Author is owned by the Author. It’ll take a few cases going through the courts, or an Act of Congress to solidify this stuff.
  [-]
  - wongarsu an hour ago ago
    
    Just like we settled on photographers havin copyright on the works created by their camera. The same arguments seem to apply
    The US Copyright Office has published a piece that argues otherwise, but a) unless they pass regulation their opinion doesn't really matter, and b) there is way too much money resting on the assumption code can be copyrighted despite AI involvement.
    
    [-]
    
    fragmede an hour ago ago
    
    It's not settled. The monkey selfie copyright dispute ruled that a monkey that pressed the button to take a selfie, does not and cannot open the copyright to that photo, and neither does the photographer who's camera it was. How that extends to AI generated code is for the courts to decide, but there are some parallels to that case.
    https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...
    
    [-]
    
    wongarsu an hour ago ago
    
    But with the monkey there are two levels of separation from the artist: the human makes the creative decision to hand the camera to a monkey, who presses the trigger, and the camera makes the picture. Compared to the single layer of separation of a photographer choosing framing and camera parameters, pressing the trigger and the camera taking the picture. Or the zero levels of separation when the artist paints the picture.
    A programmer writing code would be like the painter, and the programmer writing a prompt for Claude looks a lot like the photographer. The prompt is the creative work that makes it copyrightable, just like the artistic choices of the photographer make the photo copyrightable
    You could argue that the prompt is more like a technical description than a creative work. But then the same should probably be true of the code itself, and consequently copyright should not apply to code at all
    The copyright office's argument is that the AI is more like a freelancer than like a machine like a camera. Which you might equate to the monkey, who's also a bit freelancer like. But I have my doubts that holds up in court. Monkeys are a lot more sentient than AIs
    
    KallDrexx an hour ago ago
    
    The copyright office is pretty clear on this if you read: https://www.copyright.gov/ai/Copyright-and-Artificial-Intell....
    There is case law surrounding the fact that just because you commission a work to another entity doesn't give you co-authorship, the entity doing the work and making creative decisions is the entity that gets copyright.
    In order for you to have co-authorship of the commissioned work you have to be involved and pretty much giving instruction level detail to the real author. The opinion shows many cases that its not the case with how LLM prompts work.
    The monkey selfie case is relevant also because since it also solidifies that non-persons cannot claim copyright, that means the LLM cannot claim copyright, and therefore it does not have copyright that can be passed onto the LLM operator.
    
    michaelmrose an hour ago ago
    
    The law is whatever it needs to be to satisfy monied interests with the degree of acceptable of adaptation being a function of the unity of those interests and the political ascendancy of those in favor.
    Overwhelmingly this is in favor of treating ai as a tool like Photoshop.
    Even those against AI disagree on different matters and will overwhelmingly want a cut not a different interpretation.
    
    charcircuit an hour ago ago
    
    This filesystem driver was made by a human using AI, not a monkey.
  - HappySweeney 2 hours ago ago
    
    Haven't there already been a few cases, each of which found that mechanically-produced works are not copywritable?
    
    [-]
    
    senko an hour ago ago
    
    no
- themafia an hour ago ago
  
  Just because you can distribute something doesn't mean you aren't violating someone else's copyright. You cannot assume that just because a language model popped out some code for you that it is clear of any other claims.
  This is just lazy copyright whitewashing.
LeFantome 2 hours ago ago

The article is largely about the copyright concerns of LLM generated code that was almost certainly trained on the GPL original.
Also, it is essentially an ext2 filesystem as it does not support journaling.
ethin an hour ago ago

> Lacking Copyright (or similarily a Public Domain declaration by a human), we don't receive sufficient rights grants which would permit us to include it into the aggregate body of source code, without that aggregate body becoming less free than it is now.
Can someone explain this to me? I was under the impression that if a work of authorship was not copyrightable because it was AI generated and not authored by a human, it was in the public domain and therefore you could do whatever you wanted with it. Normal copyright restrictions would not apply here.
[-]
- Joel_Mckay 23 minutes ago ago
  
  Data theft of service or piracy from the web and "AI" users content are used in the model training sets, and when codified the statistical saliency is significant if popular content is present.
  For example, when an LLM does a vector search, there is a high probability of pirated content bleed-though and isomorphic plagiarism in the high dimensional vector space results. Thus, often when you coincidentally type in "name a cartoon mouse", there is a higher probability Disney "Micky Mouse" will pop out in the output rather than "Mighty Mouse". Note Trademarks never expire if the fees are paid, and Disney can still technically sue anyone that messes with their mouse.
  Much like em dashes "--", telling the current set of models to stop using them inappropriately often fails. Also, activation capping is used to improve the models behavioral vector, and have nothing to do with the Anthropic CEO developing political ethics.
  LLM are useful for context search, but can't function properly without constantly stealing from actual humans. Thus, will often violate copyright, trademark, and patents. In a commercial context it is legally irrelevant how the output has misappropriated IP, and one can bet your wallet the lawyers won't care either. No, IP is not public domain for a long time (17 to 78 years) regardless of peoples delusions, even if some kid in a place like India (no software patents) thinks it is..
  This channel offers several simplified explanations of the work being done with models, and Anthropic posts detailed research papers on its website.
  https://www.youtube.com/watch?v=YDdKiQNw80c
  https://www.youtube.com/watch?v=Xx4Tpsk_fnM
  https://www.youtube.com/watch?v=JAcwtV_bFp4
  Many YC bots are poisoning discourse -- so this thread will likely get negative karma. Some LLM users seem to develop emotional or delusional relationships with the algorithms. The internet is already >52% generated nonsense and growing. =3
joshstrange an hour ago ago

> Who is the copyright holder in this case? It clearly draws heavily from an existing work, and it's clear the human offering the patch didn't do it. It's not the AI, because only persons can own copyright. Is it the set of people whose work was represented in the training corpus? Was the it the set of people who wrote ext4 and whose work was in the training corpus? The company who own the AI who wrote the code? Someone else?
I don't love this take. Specifically:
> it's clear the human offering the patch didn't do it
I find it hard to believe that there wasn't a good bit of "blood, sweat, and tears" invested by a human directing the LLM to make this happen. Yes, LLMs can spit out full projects in 1 prompt but that's not what happened here. From his blog the work on this spanned 5 months at least. And while he probably wasn't working on it exclusively during that time, I find it hard to believe it was him sending "continue" periodically to an LLM.
Anyone who has built something large or complicated with LLM assistance knows that it takes more than just asking the LLM to accomplish your end goal, saying "it's clear the human offering the patch didn't do it" is insulting.
I've done a number of things with the help of LLMs, in all but the most contrived of cases it required knowledge, input from me, and careful guidance to accomplish. Multiple plans, multiple rollbacks, the knowledge of when we needed to step back and when to push forward. The LLM didn't bring that to the table. It brought the ability to crank out code to test a theory, to implement a plan only after we had gone 10+ rounds, or to function as grep++ or google++.
LLMs are tools, they aren't a magic "Make me ext4 for OpenBSD"-button (or at least they sure as hell aren't that today, or 5 months ago when this was started).
g0xA52A2A 2 hours ago ago

Wow that thread just kept going. Whilst the LWN article covered most of the "highlights" I think this reply from Theo is pretty suscient on the topic at large [1].
[1] https://marc.info/?l=openbsd-tech&m=177425035627562&w=2
[-]
- bt1a 2 hours ago ago
  
  > Lacking Copyright (or similarily a Public Domain declaration by a human), we don't receive sufficient rights grants which would permit us to include it into the aggregate body of source code, without that aggregate body becoming less free than it is now.
  Thats awesome lmao
  [-]
  - raggi an hour ago ago
    
    that's not a statement from a lawyer, and it's confused. there is one true thing in there which is that at least under US considerations the LLM output may not be copyrightable due to insufficient human involvement, but the rest of the implications are poorly extrapolated.
    there are lots of portions of code today, prior to AI authorship, that are already not copyrightable due to the way they are produced. the existence of such code does not decimate the copyright of an overall collective work.
LeFantome 2 hours ago ago

Vibe coding and OpenBSD. The perfect combination.
[-]
- croes 2 hours ago ago
  
  Vibe coding and file systems are even better
  [-]
  - himata4113 2 hours ago ago
    
    trying to load with linux ext4 hmm doesn't load, but it works with my version!
    Must be a bug in the linux kernel, let me git clone and build an out-of-tree module...
  - LeFantome 2 hours ago ago
    
    Kent Overstreet has already blazed that trail.
  - api an hour ago ago
    
    It's clearly an experiment.
- whalesalad an hour ago ago
  
  I vibe-configured an Edgerouter 4 as a hot-drop box that would establish a secure tunnel and create a fake WAN for some servers that had to be temporarily pulled from service but remain operational in someones home garage. I overnight shipped it to them with two of the ports labeled, they plugged in home internet on one port, the rack on the other port, and it secure tunneled to a Linode VPS to get a public IP, circumventing all the Verizon home internet crap. I used OpenBSD. Claude did most of the work.
cachius an hour ago ago

I'd like to see it AFL fuzzed and compared to the original. Took 2 hours to first bug ten years ago in 2016.
Discussion then https://news.ycombinator.com/item?id=11469535
Mirror of the slides https://events.static.linuxfound.org/sites/events/files/slid...
throwatdem12311 2 hours ago ago

Can someone just copyright wash Windows already.
[-]
- wongarsu 2 hours ago ago
  
  The Windows 2000 and Windows XP sources are readily available and must have made it into the training data. But most software has dropped XP support. You really need at least some of the Win 8 and Win 10 APIs to claim compatibility with modern software, and I doubt claude has seen those from the inside
- greyface- 2 hours ago ago
  
  ReactOS did this without any need for an LLM.
  [-]
  - ziml77 an hour ago ago
    
    No they didn't. It would be copyright washing if someone contributed to ReactOS who remembered large portions of the Windows code and wrote the ReactOS implementations based on that.
longislandguido an hour ago ago

~20 years ago, the Linux camp accused OpenBSD of importing GPL'd code (a wireless driver IIRC) and cried foul. The code was removed.
Fast forward to 2026, Theo says no to vibe-coded slop, prove to me your magic oracle LLM didn't ingest gobs of GPL code before spitting out an answer.
People are big mad of course, but you want me to believe Theo is the bad guy here for playing it conservatively?
[-]
- ksherlock an hour ago ago
  
  The history is a bit backwards but the point is good. OpenBSD atheros wireless code was imported into linux, the BSD attributions were removed, and it was re-declared as GPL. That was later changed back.
ptidhomme 42 minutes ago ago

I liked this reply in the thread :
There's another issue surrounding developer skill atrophy or stunting that I find \ particularly concerning on an existential level.
If we allow people to use LLMs to write code for a given project/platform, experience \ in that platform will potentially atrophy or under develop as contributors \ increasingly rely on out sourcing their applicable skills and decisions to "AI".
Even if you believe out sourcing the minutia of coding is a net positive, the \ "enshitification" principal in general should give you pause; as soon as the net \ developer skill for a project has degraded to a point of reliance, even somewhat, I \ think we can be confident those AI tools will NOT get less expensive.
I'd rather be independently less productive, than dependent on some MegaCorp(TM)'s \ good will to rent us back access to our brains at a fair price.
- achaean
https://marc.info/?l=openbsd-tech&m=177430829313972&w=2
nurettin 2 hours ago ago

It is amusing to see that the only concern seems to be about a confusion around licensing, not the validity or maintainability of the code itself.
[-]
- tolciho an hour ago ago
  Eh, well, if your guns are trained on the "copyright" portion of the ship and you can sink it from there, no need to waste ammo or time trying to figure out if code bits are as explosive as the copyright bits are. Probably the code is just as sinkable, e.g. here's a recent response to some other AI slop:
```
  I didn't look closely at most of the code but one thing that caught my eye, pid is not safe for tempfile name generation, another user of the system can easily generate files that conflict with this. Functions like mktemp and mkstemp are there for a reason. Some of the other "safety" checks make no sense. If the LLM code generator is coming up with things which any competent unix sysadmin (let alone programmer) can tell are obviously wrong, it doesn't bode well for the rest.
```
  https://marc.info/?l=openbsd-ports&m=177460682403496&w=2
  The next AI winter can't come soon enough…
- kvuj 2 hours ago ago
  
  How is that different than a human writing the code? Whether an AI or a human wrote it, I would expect the same bar of validity/maintainability.
  [-]
  - nurettin 2 hours ago ago
    
    To me, SOTA is just bad at DRY, KISS, succint, well architected, top down, easy to test code and has to be constantly steered to come close. Even the article suggests that. YMMV.
  - scuff3d 2 hours ago ago
    
    Because humans make design decisions, AI just bangs it's head against the problem until it gets something that "works".
- g0xA52A2A 2 hours ago ago
  
  Is it worth the effort to review until such implications are understood?
  [-]
  - nurettin 2 hours ago ago
    
    No of course not, bike shedding licenses is where it is at.
charcircuit 2 hours ago ago

>incorporate knowledge carrying an illiberal license.
Copyright prevents copying. It doesn't prevent using knowledge.
[-]
- bigfishrunning an hour ago ago
  
  Good luck proving an LLM has "Knowledge", and isn't just a statistical model that tries to form outputs as a copy of it's training data...
hypeatei an hour ago ago

> This obsession with copyrights between different free software ecosystems - who put the lawyers in charge?
This comment on the article is spot on. I don't vibe code or care about AI really, but it's so exhausting to see people playing lawyer in threads about LLM-generated code. No one knows, a ton of people are using LLMs, the companies behind these models torrented content themselves, and why would you spend your time defending copyright / use it as a tool to spread FUD? Copyright is a made up concept that exists to kill competition and protect those who suck at executing on ideas.
hulitu 26 minutes ago ago

> Vibe-Coded Ext4 for OpenBSD
Who wants to test it ? Preferably on real hardware. /s
bitwizeshift 2 hours ago ago

Paywalled article on something vibe-coded? That seems like a bold strategy.
[-]
- dana321 2 hours ago ago
  
  click to continue
CodeWriter23 2 hours ago ago

Well this is ironic, GPL advocate(s) declaring a clean implementation based on specifications infringing due to someone/something reading specs provided under license. Didn't Oracle lose that argument in court as pertains to Android implementation of Java libraries?
[-]
- corbet an hour ago ago
  
  I'm not sure what you're reading; there is a distinct lack of GPL advocates in that conversation.