For context, two days ago some users [1] discovered this sentence reiterated throughout the codex 5.5 system prompt [2]:
> Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.
Does nobody else laugh that a company supposedly worth more than almost anything else at the moment, is basically hacking around a load of text files telling their trillion dollar wonder machine it absolutely must stop talking to customers about goblins, gremlins and ogres? The number one discussion point, on the number one tech discussion site. This literally is, today, the state of the art.
McKenna looks more correct everyday to me atm. Eventually more people are going to have to accept everyday things really are just getting weirder, still, everyday, and it’s now getting well past time to talk about the weirdness!
It can be funny but it should not be surprising. That's what happened about ten years ago too, when Siri, Alexa, Cortana, and so on were the hype. Big tech companies publicly tried to outclass each other has having the best AI, so it was not about doing proper research and development, it was about building hacks, like giant regex databases for request matching.
Is this the "prompt engineering" that I keep hearing will be an indispensable job skill for software engineers in the AI-driven future? I had better start learning or I'll be replaced by someone who has.
I wonder how much energy OpenAI spends each day on pink elephant paradoxing goblins. A prompt like that will preoccupy the LLM with goblins on every request.
Prompt engineering is mostly structured thought. Can you write a lab report? Can you describe the who, what, when, where, and why of a problem and its solution?
You can get it to work with one off commands or specific instructions, but I think that will be seen as hacks, red flags, prompt smells in the long term.
Spoiler: future versions of mainstream AI's will be fine tuned in the exact same way to subtly sneak in favorable mentions of sponsored products as part of their answers. And Chinese open-weight AIs will do the exact same thing, only about China, the Chinese government and the overarching themes of Xi Jinping Thought.
I'd like to see them explain why AI have so distinctive writing style that is very easy to detect most of the time. Even though, it had immense progress in coding, it didn't get better at writing.
Would love if OpenAI did more of these types of posts. Off the top of my head, I'd like to understand:
- The sepia tint on images from gpt-image-1
- The obsession with the word "seam" as it pertains to coding
Other LLM phraseology that I cannot unsee is Claude's "___ is the real unlock" (try google it or search twitter!). There's no way that this phrase is overrepresented in the training data, I don't remember people saying that frequently.
It was always funny how easy it was to spot the people using a Studio Ghibli style generated avatar for their Discord or Slack profile, just from that yellow tinging. A simple LUT or tone-mapping adjustment in Krita/Photoshop/etc. would have dramatically reduced it.
The worst was you could tell when someone had kept feeding the same image back into chatgpt to make incremental edits in a loop. The yellow filter would seemingly stack until the final result was absolutely drenched in that sickly yellow pallor, made any photorealistic humans look like they were all suffering from advanced stages of jaundice.
All GPTisms are like that. In moderation there's nothing wrong with any of them. But you start noticing them because a lot of people use these things, and c/p the responses verbatim (or now use claws, I guess). So they stand out.
I don't think it's training data overrepresentation, at least not alone. RLHF and more broadly "alignment" is probably more impactful here. Likely combined with the fact that most people prompt them very briefly, so the models "default" to whatever it was most straight-forward to get a good score.
I've heard plenty of "the system still had some gremlins, but we decided to launch anyway", but not from tens of thousands of people at the same time. That's "the catch", IMO.
Maybe the only solution to GPTisms is infinite context. If I'm talking to my coworker every day I would consciously recognize when I already used a metaphor recently and switch it up. However if my memory got reset every hour, I certainly might tell the same story or use the same metaphor over and over.
Claude, at least 4.5, not checked recently, has/had an obsession with the number 47 (or number containing 47). Ask it to pick a random time or number, or write prose containing numbers, and the bias was crazy.
Humans tend to be biased towards 47 as well. It’s almost halfway between 1 and 100 and prime so you’ll find people picking it when they have to choose a random number.
> the term originates from Michael Feathers Working Effectively with Legacy Code
I haven’t read the book but, taking the title and Amazon reviews at face value, I feel like this embodies Codex’s coding style as a whole. It treats all code like legacy code.
One I saw recently was "wires" and "wired" from opus.
It was using it like every 3rd sentence and I was like, yeah I have seen people say wired like this but not really for how it was using it in every sentence.
GPT started to ‘wire in’ stuff around 5.2 or 5.3 and clearly Opus, ahem, picked it up. I remember being a tiny bit shocked when I saw ‘wired’ for the first time in an Anthropic model.
> We unknowingly gave particularly high rewards for metaphors with creatures.
I recall a math instructor who would occasionally refer to variables (usually represented by intimidating greek letters) as "this guy". Weirdly, the casual anthropomorphism made the math seem more approachable. Perhaps 'metaphors with creatures' has a similar effect i.e. makes a problem seem more cute/approachable.
On another note, buzzwords spread through companies partly because they make the user of the buzzword sound smart relative to peers, thus increasing status. (examples: "big data" circa 2013, "machine learning" circa 2016, "AI" circa 2023-present..).
The problem is the reputation boost is only temporary; as soon as the buzzword is overused (by others or by the same individual) it loses its value. Perhaps RLHF optimises for the best 'single answer' which may not sufficiently penalise use of buzzwords.
A decade ago I gave a presentation on automata theory. I demonstrated writing arbitrary symbols to tape with greek letters, just like I’d learned at university. The audience was pretty confused and didn’t really grok the presentation. A genius communicator in the audience advised me to replace the greek letters with emoji… I gave the same presentation to the same demographic audience a week later and it was a smash hit, best received tech talk I’ve given. That lesson has always stuck with me.
I had a similar experience explaining logic, especially nested expressions, with cats and boxes. Also for showing syntactic versus semantic. We _can_ use cats if we wanted and retain the semantics. Also my proudest moment as a teacher was students producing a meme based on some of the discrete mathematics on graphs. They understood the point well enough to make a joke of it.
> I recall a math instructor who would occasionally refer to variables (usually represented by intimidating greek letters) as "this guy".
I also had an instructor who was doing that! This was 20 years ago, and I totally forgot about it until I have read your comment. Can’t remember the subject, maybe propositional logic? I wonder if my instructor and your instructor have picked up this habit from the same source.
They give everyone the false and very misleading impression that with One prompt all kinds of complexity minimizes. Its a bed time story for children.
Ashby's Law of Requisite Variety
asserts that for a system to effectively regulate or control a complex environment, it must possess at least as much internal behavioral variety (complexity) as the environment it seeks to control.
This is what we see in nature. Massive variety. Thats a fundamental requirement of surviving all the unpredictablity in the universe.
So the word is actually semantically very close to "bug"! I guess we could still be using it, but the word's just too long for something that is one of the most used terms in software development.
At this point, picking that specific word is not at all a random quirk, as it's using the word literally like it's originally intended to be used.
> the evidence suggests that the broader behavior emerged through transfer from Nerdy personality training.
> The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them
> Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.
Sounds awfully like the development of a culture or proto-culture. Anyone know if this is how human cultures form/propagate? Little rewards that cause quirks to spread?
Just reading through the post, what a time to be an AInthropologist. Anthropologists must be so jealous of the level of detailed data available for analysis.
Also, clearly even in AI land, Nerdz Rule :)
PS: if AInthropologist isn't an official title yet, chances are it will likely be one in the near future. Given the massive proliferation of AI, it's only a matter of time before AI/Data Scientist becomes a rather general term and develops a sub-specialization of AInthropologist...
There is no word anthropodes. :) I guess it would mean man-feet. Antipodes is opposite-feet, literally. Synthetipologist looks to me like a portmanteau of synthetic and apologist. Otherwise the -po- in it comes from nowhere.
Sensible boring versions of this like synthesilogy just end up meaning the study of synthesis. I reckon instead do something with Talos, the man made of bronze who guarded Crete from pirates and argonauts. Talologist, there you go.
yeah I realized that when I looked up podes downthread. I still like synthetologist better than talologist, in general no one in the common folk knows who Talos is.
You're probably right. There's things that are correct, and then there's things people think they know, which win and become true. We already have "synths", after all, which are keyboards. Though that adds to the vagueness of synthetologist, because maybe it refers to Rick Wakeman or Giorgio Moroder.
Yeah, I realize that's more correct. I also realized when someone else downthread bastardized it into synthropologist that the podes part has entirely to do with feet and nothing to do with beings, necessarily. Anthro- -podes is more what I had in mind, not as a pluralization of anthropos.
So unless the AI has feet you wouldn't study Synthetipology.
But since when is there a synthetos? Since right now, I guess. Shrug But you know it's from the same root as thesis, and synthesis (or a more proper ancient Greek spelling) is the noun and doesn't end in -os.
That's not how the Greek word stems work. Technically it would not be synthetipologist, it would more accurately just be Synthetologist, as the Greek podes suffix means having feet.
> Synthetipologists, those who study Synthetic beings.
I see you took the prudent approach of recognizing the being-ness of our future overlords :) ("being" wasn't in your first edit to which I responded below...)
Still, a bit uninspired, methinks. I like AInthropologist better, and my phone's keyboard appears to have immediately adopted that term for the suggestions line. Who am I to fight my phone's auto-suggest :-)
I might have to hard disagree on this one, since my understanding of state machines (the technical term [1] [2]) is that they are determistic, while LLMs (the ai topic of discussion) are probabilistic in most of the commercial implementations that we see.
“The problem with defending the purity of the English language is that English is about as pure as a cribhouse wh***. We don’t just borrow words; on occasion, English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary.”* --James D. Nicoll
That's fair. Was trying to be funny, so glossed over the difference. Leaving my post above unedited/undeleted as a testament to your precision, and evidence of my folly.
Onwards; more appropriate rebuttals:
"English is a precision instrument assembled from spare parts during a thunderstorm." --ChatGPT
“If the English language made any sense, a catastrophe would be an apostrophe with fur.” -- Doug Larson
I don't think humans are smart enough to be AInthropologists. The models are too big for that.
Nobody really understands what's truly going on in these weights, we can only make subjective interpretations, invent explanations, and derive terminal scriptures and morals that would be good to live by. And maybe tweak what we do a little bit, like OpenAI did here.
I wondered how is training data balanced? If you put in to much Wikipedia, and your model sounds like a walking encyclopedia?
After doing the Karpathy tutorials I tried to train my AI on tiny stories dataset. Soon I noticed that my AI was always using the same name for its stories characters. The dataset contains that name consistently often.
At this scale, that kind of thing is not really a problem; you just dump all of the data you can find into the model (pre-training)1. Of course, the pre-training data influences the model, but the reinforcement learning is really what determines the model’s writing style and, in general, how it “thinks” (post-training).
This is a worry that people have been talking about in various forms for a while now, and I think it's a gigantic one. The only reason this was caught is that the quirk was a very noticeable verbal one. When words like "goblin" and "gremlin" pop up it is easy for us to spot. If the quirk takes another shape (say, ranking certain people with certain features as less trustworthy) it might be too subtle or too weird for us to notice it. Would I ever notice if ChatGPT consistently rates people born in June to be untrustworthy?
I suspected OpenAI was actively training their models to be cringy in the thought that it's charming. Turns out it's true. And they only see a problem when it narrows down on one predicliction. But they should have seen it was bad long before that.
This is funny because it’s a silly topic, but I think it shows something extremely seriously wrong with llms.
The goblins stand out because it’s obvious. Think of all the other crazy biases latent in every interaction that we don’t notice because it’s not as obvious.
Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.
> Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.
May I introduce you to homo sapiens, a species so vulnerable to such subtle (or otherwise) biases (and affiliations) that they had to develop elaborate and documented justice systems to contain the fallouts? :)
We’re really not that vulnerable to such things as a species, because we as individuals all have our own minds and our own sets of biases that cancel out and get lost in the noise. If we all had the exact same bias then it would be a huge problem.
I hear you but of course history is full of examples of biases shared across large groups of people resulting in huge human costs.
The analogy isn’t perfect of course but the way humans learn about their world is full of opportunities to introduce and sustain these large correlated biases—social pressure, tradition, parenting, education standardization. And not all of them are bad of course, but some are and many others are at least as weird as stray references to goblins and creatures
Now imagine that every opinion you have is automatically fully groupthinked and you see the difference/problem with training up a big AI model that has a hundred million users.
The problem does exist when using individual humans but in a much smaller form.
> We’re really not that vulnerable to such things as a species, because we as individuals all have our own minds and our own sets of biases that cancel out and get lost in the noise.
[Citation Needed]
Just because if you have a species-wide bias, people within the species would not easily recognize it. You can't claim with a straight face that "we're really not that vulnerable to such things".
For example, I think it's pretty clear that all humans are vulnerable to phone addiction, especially kids.
Mandatory reading on that topic: www.anthropic.com/research/small-samples-poison
We're probably not noticing a LOT of malicious attempts at poisoning major AI's only because we don't know what keywords to ask (but the scammers do and will abuse it).
I think it's extraordinarily telling that people are capable of being reflexively pessimistic in response to the goblin plague. It's like something Zitron would do.
Doesn't seem that surprising or terrifying to me. Humans come equipped with a lot more internal biases (learned in a fairly similar fashion), and they're usually a lot more resistant to getting rid of them.
The truly terrifying stuff never makes it out of the RLHF NDAs.
We ought to be terrified, when one adjusts for All the use cases people are talking about using these algorithms in. (Even if they ultimately back off, it's a lot of frothy bubble opportunity cost.)
There a great many things people do which are not acceptable in our machines.
Ex: I would not be comfortable flying on any airplane where the autopilot "just zones-out sometimes", even though it's a dysfunction also seen in people.
>Ex: I would not be comfortable flying on any airplane where the autopilot "just zones-out sometimes", even though it's a dysfunction also seen in people.
You might if that was the best auto-pilot could be. Have you never used a bus or taken a taxi ?
The vast majority of things people are using LLMs for isn't stuff deterministic logic machines did great at, but stuff those same machines did poorly at or straight up stuff previously relegated to the domains of humans only.
If your competition also "just zones out sometimes" then it's not something you're going to focus on.
I wish the blog mentioned more about why exactly training for nerdy personality rewarded mention of goblins. Since it's probably not a deterministic verifiable reward, at their level the reward model itself is another LLM. But this just pushes the issue down one layer, why did _that_ model start rewarding mentions of goblin?
> I wish the blog mentioned more about why exactly training for nerdy personality rewarded mention of goblins. Since it's probably not a deterministic verifiable reward, at their level the reward model itself is another LLM. But this just pushes the issue down one layer, why did _that_ model start rewarding mentions of goblin?
Speculation: because nerds stereotypically like sci-fi and fantasy to an unhealthy degree, and goblins, gremlins, and trolls are fantasy creatures which that stereotype should like? Then maybe it hit a sweet spot where it could be a problem that could sneak up on them.
Perhaps it has something to do with recent human trends for saying "goblin" or "gremlin" to describe... basically the opposite of dignified and socially acceptable behavior, like hunching under a blanket, unshowered, playing video games all day and eating shredded cheese directly out of the bag.
The fact that it was strongly associated with the "nerdy" personality makes me think of this connection.
is a kv cache not a kind of state? what does statefulness have to do with selfhood? how does a system prompt work at all if these things have no reference to themselves?
Ahh I see. I guess when I turned off privacy settings and allowed training on my code, then generated 10 million .md files with random fantasy books, the poisoning worked.
Wherein OpenAI admits they have very little understanding of how their models’ personality develops. And implicitly admit it’s not all that important to them, except when it gets so out of hand that they get caught making blunt corrections.
> You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking.
Just; the mentality required to write something like that, and then base part of your "product" on it. Is this meant to be of any actual utility or is it meant to trap a particular user segment into your product's "character?"
The explanation is very concerning. Lexical tidbits shouldn’t be learnt and reinforced across cross sections. Here, gremlin and goblin went from being selected for in the nerdy profile to being selected for in all profiles. The solution was easy: don’t mention goblins.
But what about when the playful profile reinforces usage of emoji and their usage creeps up in all other profiles accordingly? Ban emoji everywhere? Now do the same thing for other words, concepts, approaches? It doesn’t scale!
For context, two days ago some users [1] discovered this sentence reiterated throughout the codex 5.5 system prompt [2]:
> Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query.
[1] https://x.com/arb8020/status/2048958391637401718
[2] https://github.com/openai/codex/blob/main/codex-rs/models-ma...
Does nobody else laugh that a company supposedly worth more than almost anything else at the moment, is basically hacking around a load of text files telling their trillion dollar wonder machine it absolutely must stop talking to customers about goblins, gremlins and ogres? The number one discussion point, on the number one tech discussion site. This literally is, today, the state of the art.
McKenna looks more correct everyday to me atm. Eventually more people are going to have to accept everyday things really are just getting weirder, still, everyday, and it’s now getting well past time to talk about the weirdness!
It can be funny but it should not be surprising. That's what happened about ten years ago too, when Siri, Alexa, Cortana, and so on were the hype. Big tech companies publicly tried to outclass each other has having the best AI, so it was not about doing proper research and development, it was about building hacks, like giant regex databases for request matching.
Is this the "prompt engineering" that I keep hearing will be an indispensable job skill for software engineers in the AI-driven future? I had better start learning or I'll be replaced by someone who has.
If you aren't telling your computer to ignore goblins, you're going to be left behind.
I wonder how much energy OpenAI spends each day on pink elephant paradoxing goblins. A prompt like that will preoccupy the LLM with goblins on every request.
I mean probably not or they wouldn’t have shipped it, right?
Prompt engineering is mostly structured thought. Can you write a lab report? Can you describe the who, what, when, where, and why of a problem and its solution?
You can get it to work with one off commands or specific instructions, but I think that will be seen as hacks, red flags, prompt smells in the long term.
If I could do those things, I wouldn't be using an LLM to write for me, now would I?
Spoiler: future versions of mainstream AI's will be fine tuned in the exact same way to subtly sneak in favorable mentions of sponsored products as part of their answers. And Chinese open-weight AIs will do the exact same thing, only about China, the Chinese government and the overarching themes of Xi Jinping Thought.
Lol yeah it's kinda hilarious actually. This timeline gets a lot of well-earned shit, but it really nails the comic relief, I'll give it that!
Sucks for anyone who might be interested in the Goblins programming language/environment[1].
[1] https://spritely.institute/goblins/
I'd like to see them explain why AI have so distinctive writing style that is very easy to detect most of the time. Even though, it had immense progress in coding, it didn't get better at writing.
Would love if OpenAI did more of these types of posts. Off the top of my head, I'd like to understand:
- The sepia tint on images from gpt-image-1
- The obsession with the word "seam" as it pertains to coding
Other LLM phraseology that I cannot unsee is Claude's "___ is the real unlock" (try google it or search twitter!). There's no way that this phrase is overrepresented in the training data, I don't remember people saying that frequently.
It was always funny how easy it was to spot the people using a Studio Ghibli style generated avatar for their Discord or Slack profile, just from that yellow tinging. A simple LUT or tone-mapping adjustment in Krita/Photoshop/etc. would have dramatically reduced it.
The worst was you could tell when someone had kept feeding the same image back into chatgpt to make incremental edits in a loop. The yellow filter would seemingly stack until the final result was absolutely drenched in that sickly yellow pallor, made any photorealistic humans look like they were all suffering from advanced stages of jaundice.
For context, an example of what happens when you feed the same image back in repeatedly: https://www.instagram.com/reels/DJFG6EDhIHs/
Haha fantastic. I'd love to see a comparison reel of that same image-loop for the entire image gen series (gpt-image-1, gpt-image-1.5, gpt-image-2).
Fixed points are a window to the soul of a LLM
- Lucretius in "De rerum natura", probably
I like how the AI seems forced to change their ethnicity to keep up with the color changes. Absolutely wild.
Its called the piss filter
All GPTisms are like that. In moderation there's nothing wrong with any of them. But you start noticing them because a lot of people use these things, and c/p the responses verbatim (or now use claws, I guess). So they stand out.
I don't think it's training data overrepresentation, at least not alone. RLHF and more broadly "alignment" is probably more impactful here. Likely combined with the fact that most people prompt them very briefly, so the models "default" to whatever it was most straight-forward to get a good score.
I've heard plenty of "the system still had some gremlins, but we decided to launch anyway", but not from tens of thousands of people at the same time. That's "the catch", IMO.
Maybe the only solution to GPTisms is infinite context. If I'm talking to my coworker every day I would consciously recognize when I already used a metaphor recently and switch it up. However if my memory got reset every hour, I certainly might tell the same story or use the same metaphor over and over.
The one phrase that irks me as overly dramatic and both GPT and Claude use it a lot is "__ is the real smoking gun!"
I'm a non-native English speaker, so maybe it's a really common idiom to use when debugging?
It probably was found in a bunch of meaningful code commit messages
Claude, at least 4.5, not checked recently, has/had an obsession with the number 47 (or number containing 47). Ask it to pick a random time or number, or write prose containing numbers, and the bias was crazy.
Also "something shifted" or "cracked".
Humans tend to be biased towards 47 as well. It’s almost halfway between 1 and 100 and prime so you’ll find people picking it when they have to choose a random number.
Then there’s the whole Pomona College thing https://en.wikipedia.org/wiki/47_(number)
Maybe Claude is just a fan of Alias.
>with the word "seam" as it pertains to coding
I thought this was an established term when it comes to working with codebases comprised of multiple interacting parts.
https://softwareengineering.stackexchange.com/questions/1325...
thanks for this.
> the term originates from Michael Feathers Working Effectively with Legacy Code
I haven’t read the book but, taking the title and Amazon reviews at face value, I feel like this embodies Codex’s coding style as a whole. It treats all code like legacy code.
I can't say it isn't, but I have been writing code since about 2004 and this is the first time I've become aware that this is a thing.
One I saw recently was "wires" and "wired" from opus.
It was using it like every 3rd sentence and I was like, yeah I have seen people say wired like this but not really for how it was using it in every sentence.
GPT started to ‘wire in’ stuff around 5.2 or 5.3 and clearly Opus, ahem, picked it up. I remember being a tiny bit shocked when I saw ‘wired’ for the first time in an Anthropic model.
The number of things that Claude has told me are 'load-bearing' or 'belt-and-suspenders' is... very load-bearing
for me, doing the heavy lifting is doing the heavy lifting
Also too many lands and hits.
Seams, spirals, codexes, recursion, glyphs, resonance, the list goes on and on.
Ask any LLM for 10 random words and most of them will give you the same weird words every time.
If you lower the temperature setting, it really will be the same 10 words every single attempt. :p
They are text completion algorithms with little randomness.
"shape" too, at least with gpt5.5, is coming up constantly.
> We unknowingly gave particularly high rewards for metaphors with creatures.
I recall a math instructor who would occasionally refer to variables (usually represented by intimidating greek letters) as "this guy". Weirdly, the casual anthropomorphism made the math seem more approachable. Perhaps 'metaphors with creatures' has a similar effect i.e. makes a problem seem more cute/approachable.
On another note, buzzwords spread through companies partly because they make the user of the buzzword sound smart relative to peers, thus increasing status. (examples: "big data" circa 2013, "machine learning" circa 2016, "AI" circa 2023-present..).
The problem is the reputation boost is only temporary; as soon as the buzzword is overused (by others or by the same individual) it loses its value. Perhaps RLHF optimises for the best 'single answer' which may not sufficiently penalise use of buzzwords.
A decade ago I gave a presentation on automata theory. I demonstrated writing arbitrary symbols to tape with greek letters, just like I’d learned at university. The audience was pretty confused and didn’t really grok the presentation. A genius communicator in the audience advised me to replace the greek letters with emoji… I gave the same presentation to the same demographic audience a week later and it was a smash hit, best received tech talk I’ve given. That lesson has always stuck with me.
I had a similar experience explaining logic, especially nested expressions, with cats and boxes. Also for showing syntactic versus semantic. We _can_ use cats if we wanted and retain the semantics. Also my proudest moment as a teacher was students producing a meme based on some of the discrete mathematics on graphs. They understood the point well enough to make a joke of it.
> I recall a math instructor who would occasionally refer to variables (usually represented by intimidating greek letters) as "this guy".
I also had an instructor who was doing that! This was 20 years ago, and I totally forgot about it until I have read your comment. Can’t remember the subject, maybe propositional logic? I wonder if my instructor and your instructor have picked up this habit from the same source.
I recall my old chemistry/physics teacher doing it too - "now THIS guy, he's really greedy for electrons" and stuff like that.
They give everyone the false and very misleading impression that with One prompt all kinds of complexity minimizes. Its a bed time story for children.
Ashby's Law of Requisite Variety asserts that for a system to effectively regulate or control a complex environment, it must possess at least as much internal behavioral variety (complexity) as the environment it seeks to control.
This is what we see in nature. Massive variety. Thats a fundamental requirement of surviving all the unpredictablity in the universe.
Had a math prof in undergrad that once said, “this guy” 61 times in a 50 minute lecture!
TIL gremlins weren’t just used to explain mysterious mechanical failures in airplanes, it’s the origin story of the term ‘gremlin’ itself[0].
I had always assumed there was some previous use of the term, neat!
[0]https://en.wikipedia.org/wiki/Gremlin
So the word is actually semantically very close to "bug"! I guess we could still be using it, but the word's just too long for something that is one of the most used terms in software development.
At this point, picking that specific word is not at all a random quirk, as it's using the word literally like it's originally intended to be used.
Wow fascinating I’d have thought they were a lot older.
> the evidence suggests that the broader behavior emerged through transfer from Nerdy personality training.
> The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them
> Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.
Sounds awfully like the development of a culture or proto-culture. Anyone know if this is how human cultures form/propagate? Little rewards that cause quirks to spread?
Just reading through the post, what a time to be an AInthropologist. Anthropologists must be so jealous of the level of detailed data available for analysis.
Also, clearly even in AI land, Nerdz Rule :)
PS: if AInthropologist isn't an official title yet, chances are it will likely be one in the near future. Given the massive proliferation of AI, it's only a matter of time before AI/Data Scientist becomes a rather general term and develops a sub-specialization of AInthropologist...
Anthro means human and these are not human. Please do not use anthropology or any derivative of the word to refer to non-human constructs.
I suggest Synthetipologists, those who study beings of synthetic origin or type, aka synthetipodes, just as anthropologists study Anthropodes
It is not in any sense of the word a being, it's a sophisticated generator that relies entirely on what you feed it.
> It is not in any sense of the word a being, it's a sophisticated generator that relies entirely on what you feed it.
OP is hedging bets in case the future overlords review forum postings for evidence of bias against machine beings. [1]
[1] https://knowyourmeme.com/memes/i-for-one-welcome-our-new-ins...
There is no word anthropodes. :) I guess it would mean man-feet. Antipodes is opposite-feet, literally. Synthetipologist looks to me like a portmanteau of synthetic and apologist. Otherwise the -po- in it comes from nowhere.
Sensible boring versions of this like synthesilogy just end up meaning the study of synthesis. I reckon instead do something with Talos, the man made of bronze who guarded Crete from pirates and argonauts. Talologist, there you go.
yeah I realized that when I looked up podes downthread. I still like synthetologist better than talologist, in general no one in the common folk knows who Talos is.
You're probably right. There's things that are correct, and then there's things people think they know, which win and become true. We already have "synths", after all, which are keyboards. Though that adds to the vagueness of synthetologist, because maybe it refers to Rick Wakeman or Giorgio Moroder.
Agree with your sentiment, I think synthetologist (σύνθετος/synthetos + λογία/logia) flows better.
The plural of anthropos is anthropoi, not anthropodes.
Yeah, I realize that's more correct. I also realized when someone else downthread bastardized it into synthropologist that the podes part has entirely to do with feet and nothing to do with beings, necessarily. Anthro- -podes is more what I had in mind, not as a pluralization of anthropos.
So unless the AI has feet you wouldn't study Synthetipology.
You're probably thinking of anthropoids? That's anthrop[os]-oid. Like in humanoid or centroid or factoid. Or dorkazoid.
But since when is there a synthetos? Since right now, I guess. Shrug But you know it's from the same root as thesis, and synthesis (or a more proper ancient Greek spelling) is the noun and doesn't end in -os.
σύνθεσις (súnthesis, “a putting together; composition”), says Wiktionary.
Oh wait there is a σύνθετος, but it's an adjective for "composite". Hmm, OK. Modern Greek, looks like.
Fair, should've gone with σύνθετον.
Synthetipologist vs Synthropologist tho.
Anthropo- is the entire prefix as it relates to human kind. The -thro- does not carry a meaning on its own that can be carried to another word.
> Synthropologist
Have an upvote :)
*thropologist: study of beings
That's not how the Greek word stems work. Technically it would not be synthetipologist, it would more accurately just be Synthetologist, as the Greek podes suffix means having feet.
> That's not how the Greek word stems work.
Sir, I would have you know that we are discussing English terms, not Greek
AInthropologist works fine for me, and is a lot funnier
LoL
> Synthetipologists, those who study Synthetic beings.
I see you took the prudent approach of recognizing the being-ness of our future overlords :) ("being" wasn't in your first edit to which I responded below...)
Still, a bit uninspired, methinks. I like AInthropologist better, and my phone's keyboard appears to have immediately adopted that term for the suggestions line. Who am I to fight my phone's auto-suggest :-)
They are state machines so they have a state of being therefore they are beings. Living is an entirely different argument.
> They are state machines
I might have to hard disagree on this one, since my understanding of state machines (the technical term [1] [2]) is that they are determistic, while LLMs (the ai topic of discussion) are probabilistic in most of the commercial implementations that we see.
[1] https://en.wikipedia.org/wiki/Finite-state_machine
[2] have written some for production use, so have some personal experience here
> Please do not use anthropology or any derivative of the word to refer to non-human constructs
So you, for one, do not welcome our new robot overlords?
A rather risky position to adopt in public, innit ;-)
I’ve already had my Roko’s basilisk existential breakdown a decade ago, so I don’t really care one way or the other.
I just wanna point out that I only called them non-human and I am asking for a precision of language.
> am asking for a precision of language.
“The problem with defending the purity of the English language is that English is about as pure as a cribhouse wh***. We don’t just borrow words; on occasion, English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary.”* --James D. Nicoll
* Does not generally apply to scientific papers
Precision of ideas isn't purity of language.
> Precision of ideas isn't purity of language
That's fair. Was trying to be funny, so glossed over the difference. Leaving my post above unedited/undeleted as a testament to your precision, and evidence of my folly.
Onwards; more appropriate rebuttals:
"English is a precision instrument assembled from spare parts during a thunderstorm." --ChatGPT
“If the English language made any sense, a catastrophe would be an apostrophe with fur.” -- Doug Larson
I call myself an AI theologian.
I don't think humans are smart enough to be AInthropologists. The models are too big for that.
Nobody really understands what's truly going on in these weights, we can only make subjective interpretations, invent explanations, and derive terminal scriptures and morals that would be good to live by. And maybe tweak what we do a little bit, like OpenAI did here.
I don’t see much of a distinction from anthropology
> AI theologian
no no no, don't stop there, just go full AItheologian, pronounced aetheologian :)
"Anyone know if this is how human cultures form/propagate?" I don't know but can confidently tell you anyone who claims to know is full of it.
I wondered how is training data balanced? If you put in to much Wikipedia, and your model sounds like a walking encyclopedia?
After doing the Karpathy tutorials I tried to train my AI on tiny stories dataset. Soon I noticed that my AI was always using the same name for its stories characters. The dataset contains that name consistently often.
At this scale, that kind of thing is not really a problem; you just dump all of the data you can find into the model (pre-training)1. Of course, the pre-training data influences the model, but the reinforcement learning is really what determines the model’s writing style and, in general, how it “thinks” (post-training).
1 This data is still heavily filtered/cleaned
If a tiny misconfiguration of reward system can cause such noticeable annoyance ...
What dangers lurk beneath the surface.
This is not funny.
For every gremlin spotted, many remain unseen...
This is a worry that people have been talking about in various forms for a while now, and I think it's a gigantic one. The only reason this was caught is that the quirk was a very noticeable verbal one. When words like "goblin" and "gremlin" pop up it is easy for us to spot. If the quirk takes another shape (say, ranking certain people with certain features as less trustworthy) it might be too subtle or too weird for us to notice it. Would I ever notice if ChatGPT consistently rates people born in June to be untrustworthy?
Here is an academic paper discussing this kind of worry: https://link.springer.com/article/10.1007/s11023-022-09605-x
I suspected OpenAI was actively training their models to be cringy in the thought that it's charming. Turns out it's true. And they only see a problem when it narrows down on one predicliction. But they should have seen it was bad long before that.
This is funny because it’s a silly topic, but I think it shows something extremely seriously wrong with llms.
The goblins stand out because it’s obvious. Think of all the other crazy biases latent in every interaction that we don’t notice because it’s not as obvious.
Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.
> Absolutely terrifying that OpenAI is just tossing around that such subtle training biases were hard enough to contain it had to be added to system prompt.
May I introduce you to homo sapiens, a species so vulnerable to such subtle (or otherwise) biases (and affiliations) that they had to develop elaborate and documented justice systems to contain the fallouts? :)
We’re really not that vulnerable to such things as a species, because we as individuals all have our own minds and our own sets of biases that cancel out and get lost in the noise. If we all had the exact same bias then it would be a huge problem.
I hear you but of course history is full of examples of biases shared across large groups of people resulting in huge human costs.
The analogy isn’t perfect of course but the way humans learn about their world is full of opportunities to introduce and sustain these large correlated biases—social pressure, tradition, parenting, education standardization. And not all of them are bad of course, but some are and many others are at least as weird as stray references to goblins and creatures
> If we all had the exact same bias then it would be a huge problem.
And may I introduce you to "groupthink" :))
Now imagine that every opinion you have is automatically fully groupthinked and you see the difference/problem with training up a big AI model that has a hundred million users.
The problem does exist when using individual humans but in a much smaller form.
> The problem does exist when using individual humans but in a much smaller form.
And may I introduce you to organized religion :)
That's still a lot smaller!
Make a major religion where everyone is a scifi clone of one person including their memories and then it'll be in the same ballpark of spreading bias.
> We’re really not that vulnerable to such things as a species, because we as individuals all have our own minds and our own sets of biases that cancel out and get lost in the noise.
[Citation Needed]
Just because if you have a species-wide bias, people within the species would not easily recognize it. You can't claim with a straight face that "we're really not that vulnerable to such things".
For example, I think it's pretty clear that all humans are vulnerable to phone addiction, especially kids.
Mandatory reading on that topic: www.anthropic.com/research/small-samples-poison
We're probably not noticing a LOT of malicious attempts at poisoning major AI's only because we don't know what keywords to ask (but the scammers do and will abuse it).
I think it's extraordinarily telling that people are capable of being reflexively pessimistic in response to the goblin plague. It's like something Zitron would do.
This story is wonderful.
I feel at least partially responsible. I would often instruct agents to "stop being a goblin". I really enjoyed this story too, though.
We do not have the complete picture.
Doesn't seem that surprising or terrifying to me. Humans come equipped with a lot more internal biases (learned in a fairly similar fashion), and they're usually a lot more resistant to getting rid of them.
The truly terrifying stuff never makes it out of the RLHF NDAs.
We ought to be terrified, when one adjusts for All the use cases people are talking about using these algorithms in. (Even if they ultimately back off, it's a lot of frothy bubble opportunity cost.)
There a great many things people do which are not acceptable in our machines.
Ex: I would not be comfortable flying on any airplane where the autopilot "just zones-out sometimes", even though it's a dysfunction also seen in people.
>Ex: I would not be comfortable flying on any airplane where the autopilot "just zones-out sometimes", even though it's a dysfunction also seen in people.
You might if that was the best auto-pilot could be. Have you never used a bus or taken a taxi ?
The vast majority of things people are using LLMs for isn't stuff deterministic logic machines did great at, but stuff those same machines did poorly at or straight up stuff previously relegated to the domains of humans only.
If your competition also "just zones out sometimes" then it's not something you're going to focus on.
Humans also take a lot of time in producing output, and do not feed into a crazy accelerationistic feedback loop (most of the time).
Nice, OpenAI mentioned my HackerNews post in their article :) I appreciate that they wrote a whole blog post to explain!
https://news.ycombinator.com/item?id=47319285
article :
bla blah blah, marketing... we are fun people, bla blah, goblin, we will not destroy the world you live in.. RL rewards bug is a culprit. blah blah.
someone woke up on the wrong side of the goblin today
real goblin-y response
I suspect this was intentionally added. Just to give some personality and to fuel hype
those idiotic remarks at the end of each answer are so unnecessary and annoying
A plausible theory I've seen going around: https://x.com/QiaochuYuan/status/2049307867359162460
If you tell an LLM it's a mushroom you'll get thoughts considering how its mycelium could be causing the goblins.
This "theory" is simply role playing and has no grounding in reality.
I wish the blog mentioned more about why exactly training for nerdy personality rewarded mention of goblins. Since it's probably not a deterministic verifiable reward, at their level the reward model itself is another LLM. But this just pushes the issue down one layer, why did _that_ model start rewarding mentions of goblin?
> I wish the blog mentioned more about why exactly training for nerdy personality rewarded mention of goblins. Since it's probably not a deterministic verifiable reward, at their level the reward model itself is another LLM. But this just pushes the issue down one layer, why did _that_ model start rewarding mentions of goblin?
Speculation: because nerds stereotypically like sci-fi and fantasy to an unhealthy degree, and goblins, gremlins, and trolls are fantasy creatures which that stereotype should like? Then maybe it hit a sweet spot where it could be a problem that could sneak up on them.
Perhaps it has something to do with recent human trends for saying "goblin" or "gremlin" to describe... basically the opposite of dignified and socially acceptable behavior, like hunching under a blanket, unshowered, playing video games all day and eating shredded cheese directly out of the bag.
The fact that it was strongly associated with the "nerdy" personality makes me think of this connection.
It is a stateless text / pixel auto-complete it has no references of self, stop spreading this bs.
It has trained on vast amounts of content that contains the concept of self, of course the idea of self is emergent.
And autoregressive LLMs are not stateless.
is a kv cache not a kind of state? what does statefulness have to do with selfhood? how does a system prompt work at all if these things have no reference to themselves?
The kv cache is not persistent. It's a hyper-short-term memory.
Ask Claude about Claude.
Ahh I see. I guess when I turned off privacy settings and allowed training on my code, then generated 10 million .md files with random fantasy books, the poisoning worked.
Keep using AI and you'll become a goblin too.
Wherein OpenAI admits they have very little understanding of how their models’ personality develops. And implicitly admit it’s not all that important to them, except when it gets so out of hand that they get caught making blunt corrections.
So, you brain damaged your model with a system prompt.
Weird. I thought they came from Nilbog.
> Why it matters
i despise this title so much now
Here are the key insights:
I. Love. This.
> You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking.
Just; the mentality required to write something like that, and then base part of your "product" on it. Is this meant to be of any actual utility or is it meant to trap a particular user segment into your product's "character?"
OpenAI is having fun, love this.
The explanation is very concerning. Lexical tidbits shouldn’t be learnt and reinforced across cross sections. Here, gremlin and goblin went from being selected for in the nerdy profile to being selected for in all profiles. The solution was easy: don’t mention goblins.
But what about when the playful profile reinforces usage of emoji and their usage creeps up in all other profiles accordingly? Ban emoji everywhere? Now do the same thing for other words, concepts, approaches? It doesn’t scale!
It seems like models can be permanently poisoned.