> "...it would sometimes regurgitate training data verbatim. That’s been patched in the years since..."
> "They are robots. Programs. Fancy robots and big complicated programs, to be sure — but computer programs, nonetheless."
This is totally misleading to anyone with less familiarity with how LLMs work. They are only programs in as much as they perform inference from a fixed, stored, statistical model. It turns out that treating them theoretically in the same way as other computer programs gives a poor representation of their behaviour.
This distinction is important, because no, "regurgitating data" is not something that was "patched out", like a bug in a computer program. The internal representations became more differentially private as newer (subtly different) training techniques were discovered. There is an objective metric by which one can measure this "plagiarism" in the theory, and it isn't nearly as simple as "copying" vs "not copying".
It's also still an ongoing issue and an active area of research, see [1] for example. It is impossible for the models to never "plagiarize" in the sense we think of while remaining useful. But humans repeat things verbatim too in little snippets, all the time. So there is some threshold where no-one seems to care anymore; think of it like the % threshold in something like Turnitin. That's the point that researchers would like to target.
Of course, this is separate from all of the ethical issues around training on data collected without explicit consent, and I would argue that's where the real issues lie.
No they're not. They're starving, struggling to find work and lamenting AI is eating their lunch. It's quite ironic that after complaining LLMs are plagiarism machines, the author thinks using them for translation is fine.
"LLMs are evil! Except when they're useful for me" I guess.
Simultaneously, if you hire human translators, you are likely to get machine translations. Maybe not often or overtly, but the translation industry has not been healthy for a while.
>As a quick aside, I am not going to entertain the notion that LLMs are intelligent, for any value of “intelligent.” They are robots. Programs. Fancy robots and big complicated programs, to be sure — but computer programs, nonetheless. The rest of this essay will treat them as such. If you are already of the belief that the human mind can be reduced to token regurgitation, you can stop reading here. I’m not interested in philosophical thought experiments.
I can't imagine why someone would want to openly advertise that they're so closed minded. Everything after this paragraph is just anti-LLM ranting.
I disagree that the majority of it is anti-LLM ranting, there are several subtle points here that are grounded in realism. You should read on past the first bit if you're judging mainly from the initial (admittedly naive) first few paragraphs.
> I can't imagine why someone would want to openly advertise that they're so closed minded.
I would say the exact same about you, rejecting an absolutely accurate and factual statement like that as closed minded strikes me as the same as the people who insist that medical science is closed minded about crystals and magnets.
I can't imagine why someone would want to openly advertise they think LLMs are actual intelligence, unless they were in a position to benefit financially from the LLM hype train of course.
Cool, so clearly articulate the goal posts. What do LLMs have to do to convince you that they are intelligent? If the answer is there is no amount of evidence that can change your mind, then you're not arguing in good faith.
Can we as a group agree to stop upvoting "AI is great" and "AI sucks" posts that don't make novel, meaningful arguments that provoke real thought? The plagiarism argument is thin and feels biased, the lock-in argument is counter to the market dynamics that are currently playing out, and in general the takes are just one dude's vibes.
I don't know, this one is a little novel. I've never seen the developer of a Buddhist meditation app discuss whether to use LLMs with a paragraph like:
> Pariyatti’s nonprofit mission, it should be noted, specifically incorporates a strict code of ethics, or sīla: not to kill, not to steal, not to engage in sexual misconduct, not to lie, and not to take intoxicants.
If you're already sold on the plagiarism narrative that big entertainment is trying to propagandize in order to get leverage against the tech companies, nothing I say is going to change your mind.
I don't really know what you mean by "big entertainment" trying to get leverage against tech companies. Tech companies are behemoths. Most of the artists I know fretting about AI don't earn half a junior engineer's salary. And this is coming from someone who is relatively bullish on AI. I just don't think the framing of "big entertainment" makes any sense at all.
> As a quick aside, I am not going to entertain the notion that LLMs are intelligent, for any value of “intelligent.” They are robots. Programs. Fancy robots and big complicated programs, to be sure — but computer programs, nonetheless.
The same could be said of humans too. Humans are made of cells that work deterministically. Sure, humans are fancy, big complicated combinations of cells - but they're cells, nonetheless.
That view of humans - and LLMs - ignores the fact that when you combine large numbers of simple building blocks, you can get completely novel behavior. Protons, neutrons and electrons come together to create chemistry. Molecules come together to create biological systems. A bunch of neurons taken together created the poetry of Shakespeare.
Unless you have a dualistic view of the world, in which the mind is a separate realm that exists independently of matter and does not arise from neurons interacting in our brains, you have to accept that robots can be intelligent. Just to put this more sharply: Would a perfect simulation of a human brain be intelligent or not? If you answer "no," then you believe that thought comes from some other, immaterial realm, not from our brains.
> That view of humans - and LLMs - ignores the fact that when you combine large numbers of simple building blocks, you can get completely novel behavior.
I can bang smooth rocks to get sharper rocks; that doesn't make sharper rocks more intelligent. Makes them sharper, though.
Yes, that seems to hold for rocks. But that doesn’t shut down the original post’s premise, unless you hold the answer to what can and cannot be banged together to create emergent intelligence.
> "...it would sometimes regurgitate training data verbatim. That’s been patched in the years since..."
> "They are robots. Programs. Fancy robots and big complicated programs, to be sure — but computer programs, nonetheless."
This is totally misleading to anyone with less familiarity with how LLMs work. They are only programs in as much as they perform inference from a fixed, stored, statistical model. It turns out that treating them theoretically in the same way as other computer programs gives a poor representation of their behaviour.
This distinction is important, because no, "regurgitating data" is not something that was "patched out", like a bug in a computer program. The internal representations became more differentially private as newer (subtly different) training techniques were discovered. There is an objective metric by which one can measure this "plagiarism" in the theory, and it isn't nearly as simple as "copying" vs "not copying".
It's also still an ongoing issue and an active area of research, see [1] for example. It is impossible for the models to never "plagiarize" in the sense we think of while remaining useful. But humans repeat things verbatim too in little snippets, all the time. So there is some threshold where no-one seems to care anymore; think of it like the % threshold in something like Turnitin. That's the point that researchers would like to target.
Of course, this is separate from all of the ethical issues around training on data collected without explicit consent, and I would argue that's where the real issues lie.
[1] https://arxiv.org/abs/2601.02671
The plagiarism by the models is only part of it. Perhaps it's in such small pieces that it becomes difficult to care. I'm not convinced.
The larger, and I'd argue more problematic, plagiarism is when people take this composite output of LLMs and pass it off as their own.
> Translators are busy
No they're not. They're starving, struggling to find work and lamenting AI is eating their lunch. It's quite ironic that after complaining LLMs are plagiarism machines, the author thinks using them for translation is fine.
"LLMs are evil! Except when they're useful for me" I guess.
Simultaneously, if you hire human translators, you are likely to get machine translations. Maybe not often or overtly, but the translation industry has not been healthy for a while.
>As a quick aside, I am not going to entertain the notion that LLMs are intelligent, for any value of “intelligent.” They are robots. Programs. Fancy robots and big complicated programs, to be sure — but computer programs, nonetheless. The rest of this essay will treat them as such. If you are already of the belief that the human mind can be reduced to token regurgitation, you can stop reading here. I’m not interested in philosophical thought experiments.
I can't imagine why someone would want to openly advertise that they're so closed minded. Everything after this paragraph is just anti-LLM ranting.
I disagree that the majority of it is anti-LLM ranting, there are several subtle points here that are grounded in realism. You should read on past the first bit if you're judging mainly from the initial (admittedly naive) first few paragraphs.
I read the rest of it. It was intellectually lazy.
What's wrong about the statement? The black box algorithm might have been generated by machine learning, but it's still a computer program in the end.
> I can't imagine why someone would want to openly advertise that they're so closed minded.
It's not being closed-minded. It's not wanting to get sea-lioned to death by obnoxious people.
> I can't imagine why someone would want to openly advertise that they're so closed minded.
I would say the exact same about you, rejecting an absolutely accurate and factual statement like that as closed minded strikes me as the same as the people who insist that medical science is closed minded about crystals and magnets.
I can't imagine why someone would want to openly advertise they think LLMs are actual intelligence, unless they were in a position to benefit financially from the LLM hype train of course.
Cool, so clearly articulate the goal posts. What do LLMs have to do to convince you that they are intelligent? If the answer is there is no amount of evidence that can change your mind, then you're not arguing in good faith.
It was actually much less anti LLM than I was expecting from the beginning.
But I agree that it is self limiting to not bother to consider the ways that LLM inference and human thinking might be similar (or not).
To me, they seem do a pretty reasonable emulation of single- threaded thinking.
> I can't imagine why someone would want to openly advertise that they're so closed minded.
Because humans often anthropomorphize completely inert things? E.g. a coffee machine or a bomb disposal robot.
So far whatever behavior LLMs have shown is basically fueled by Sci-Fi stories of how a robot should behave under such and such.
Give it up. Buddha would not approve.
And there will be more compute for the rest of us :)
Can we as a group agree to stop upvoting "AI is great" and "AI sucks" posts that don't make novel, meaningful arguments that provoke real thought? The plagiarism argument is thin and feels biased, the lock-in argument is counter to the market dynamics that are currently playing out, and in general the takes are just one dude's vibes.
I dunno, I enjoyed reading about how the author personally feels about the act of working with them more then the whole "is this moral" part.
I don't know, this one is a little novel. I've never seen the developer of a Buddhist meditation app discuss whether to use LLMs with a paragraph like:
> Pariyatti’s nonprofit mission, it should be noted, specifically incorporates a strict code of ethics, or sīla: not to kill, not to steal, not to engage in sexual misconduct, not to lie, and not to take intoxicants.
Not a whole lot of Pali in most LLM editorials.
> not to engage in sexual misconduct
I must remember to add this quality guarantee to my own software projects.
My software projects are also uranium-free.
> The plagiarism argument is thin and feels biased
are you being serious with this one
If you're already sold on the plagiarism narrative that big entertainment is trying to propagandize in order to get leverage against the tech companies, nothing I say is going to change your mind.
I don't really know what you mean by "big entertainment" trying to get leverage against tech companies. Tech companies are behemoths. Most of the artists I know fretting about AI don't earn half a junior engineer's salary. And this is coming from someone who is relatively bullish on AI. I just don't think the framing of "big entertainment" makes any sense at all.
I stopped reading after "problem with LLMs is plagiarism"...
Too bad. You missed some interesting stuff. And I say that as someone who sees some of this very differently than the author.
Announcing that one line of the piece made you mad without providing any other thought is not very constructive.
> LLMs will always be plagiarism machines but in 40 years we might not care.
40 years?
Virtually nobody cares about this already... today.
(I'm not refuting the author's claim that LLMs are built on plagiarism, just noting how the world has collectively decided to turn a blind eye to it)
> As a quick aside, I am not going to entertain the notion that LLMs are intelligent, for any value of “intelligent.” They are robots. Programs. Fancy robots and big complicated programs, to be sure — but computer programs, nonetheless.
The same could be said of humans too. Humans are made of cells that work deterministically. Sure, humans are fancy, big complicated combinations of cells - but they're cells, nonetheless.
That view of humans - and LLMs - ignores the fact that when you combine large numbers of simple building blocks, you can get completely novel behavior. Protons, neutrons and electrons come together to create chemistry. Molecules come together to create biological systems. A bunch of neurons taken together created the poetry of Shakespeare.
Unless you have a dualistic view of the world, in which the mind is a separate realm that exists independently of matter and does not arise from neurons interacting in our brains, you have to accept that robots can be intelligent. Just to put this more sharply: Would a perfect simulation of a human brain be intelligent or not? If you answer "no," then you believe that thought comes from some other, immaterial realm, not from our brains.
> That view of humans - and LLMs - ignores the fact that when you combine large numbers of simple building blocks, you can get completely novel behavior.
I can bang smooth rocks to get sharper rocks; that doesn't make sharper rocks more intelligent. Makes them sharper, though.
Which is to say, novel behavior != intelligence.
Yes, that seems to hold for rocks. But that doesn’t shut down the original post’s premise, unless you hold the answer to what can and cannot be banged together to create emergent intelligence.