This is hilarious. Seems like kind of a random image for a model to memorize, but it could be.
There is definitely enough empirical validation that shows image models retain lots of original copies in their weights, despite how much AI boosters think otherwise. That said, it is often images that end up in the training set many times, and I would think it strange for this image to do that.
It's practically all dark except for a few spots. It's the same image just different size compression whatever. I can't find it in any stock image search, though. Surely it could not have memorized the whole image at that fidelity. Maybe I just didn't search well enough.
OPENAI_API_KEY="$(llm keys get openai)" \
uv run https://tools.simonwillison.net/python/openai_image.py \
-m gpt-image-2 \
"Do a where's Waldo style image but it's where is the raccoon holding a ham radio"
Here's what I got from that prompt. I do not think it included a raccoon holding a ham radio (though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure): https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...
OPENAI_API_KEY="$(llm keys get openai)" \
uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
-m gpt-image-2 \
"Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
--quality high --size 3840x2160
Fed into a clear Claude Code max effort session with : "Inspect waldo2.png, and give me the pixel location of a raccoon holding a ham radio.". It sliced the image into small sections and gave:
"Found the raccoon holding a ham radio in waldo2.png (3840×2160).
- Raccoon center: roughly (460, 1680)
- Ham radio (walkie-talkie) center: roughly (505, 1650) — antenna tip around (510, 1585)
- Bounding box (raccoon + radio): approx x: 370–540, y: 1550–1780
It's in the lower-left area of the image, just right of the red-and-white striped souvenir umbrella, wearing a green vest. "
There have already been several attempts to procedurally generate Where’s Waldo? style images since the early Stable Diffusion days, including experiments that used a YOLO filter on each face and then processed them with ADetailer.
It's a difficult test for genai to pass. As I mentioned in a different thread, it requires a holistic understanding (in that there can only be one Waldo Highlander style), while also holding up to scrutiny when you examine any individual, ordinary figure.
I've actually been feeding them into Claude Opus 4.7 with its new high resolution image inputs, with mixed results - in one case there was no raccoon but it was SURE there was and told me it was definitely there but it couldn't find it.
I just tried feeding your image to Claude Code, using /effort max. I get better results when Claude has a dedicated VM so I tend to ask more involved questions directly in a clear Claude Code session.
"Inspect waldo.png, and give me the pixel location of a raccoon holding a ham radio."
It used python to split the images into smaller sections, and got this:
"Found it. The raccoon holding a ham radio is inside the HAM RADIO booth, standing between two humans.
Pixel location: approximately (662, 500) in the 1402×1122 image.
Bounding box of the raccoon is roughly (645, 475) to (680, 525). You can see its characteristic dark-masked face, pointy ears,
light belly, and it's holding the radio at chest level." (Baked for 2m 23s)
Looks like a human to me. So I prompted: "That is a human, but I understand the confusion. Look elsewhere.".
It made much smaller sections and inspected them one by one.
"Found it — the raccoon is sitting on the ground near the trash can, holding a ham radio with a visible speaker/dial face.
Pixel location: approximately (370, 1030) in the 1402×1122 image.
Bounding box of the raccoon + radio is roughly (340, 1005) to (400, 1060). It's in the lower-left region, just right of the
General Store, next to a silver trash can." (Sautéed for 4m 26s)
Still not it, but it was an interesting visual test.
5.4 thinking says "Just right of center, immediately to the right of the HAM RADIO shack. Look on the dirt path there: the raccoon is the small gray figure partly hidden behind the woman in the red-and-yellow shirt, a little above the man in the green hat. Roughly 57% from the left, 48% from the top."
This is not as exciting as previous models were, but it is incredibly good. I am starting to think that expressing thoughts in words clearly is probably the most important and general skill of the future.
If every single image on their blog was generated by Images 2.0 (I've no reason to believe that's not the case), then wow, I'm seriously impressed. The fidelity to text, the photorealism, the ability to show the same character in a variety of situations (e.g. the manga art) -- it's all great!
One interesting thing I found comparing OpenAI and Gemini image editing is - Gemini rejects anything involving a well known person. Anything. OpenAI is happy to edit and change every time I tried
I have a sideproject where I want to display standup comedies. I thought I could edit standup comedy posters with some AI to fit my design. Gemini straight up refuses to change any image of any standup comedy poster involving a well know human. OpenAI does not care and is happy to edit away
I don't know tbh. I've tried it on 10-20 various level of famous standups and Gemini refuses every time
Just for testing, I just tried this https://i.ytimg.com/vi/_KJdP4FLGTo/sddefault.jpg ("Redesign this image in a brutalist graphic design style"). Gemini refuses (api as well as UI), OpenAI does it
It seems like they're trying to follow local law. What a nightmare to have to manage all jurisdictions around such a product. Surprised it didn't kill image generation entirely.
I wake up everyday, read the tech news, and usually see some step change in AI or whatever. It's wild to think I'm living through such a massive transformation in my lifetime. The future of tech is going to be so different from when I was born (1980), I guess this is how people born in 1900 felt when they got to see man land on the moon?
> Wow, the difference between AI and non-AI images collapses. I hate the future where I won't be able to tell the difference.
Image generation is now pretty much "solved". Video will be next. Perhaps things will turn out the same as chess: in that even though chess was "solved" by IBM's Deep Blue, we still value humans playing chess. We value "hand made" items (clothes, furniture) over the factory made stuff. We appreciate & value human effort more than machines. Do you prefer a hand-written birthday card or an email?
"Solved" seems a tad overstated if you scroll up to Simonw's Where's Waldo test with deformed faces plus a confabulated target when prompted for an edit to highlight the hidden character with an arrow.
It's "solved" in that we have a way forward to reduce the errors down to 0.00001% (a number I just made up). Throwing more compute/time/money at these problems seems to reduce that error number.
As someone born in 1975 I always felt until the last couple of years that I had been stuck in a long period of stagnation compared to an earlier generation. My grandmother who was born in the 1910s got to witness adoption of electricity, mass transit, radio, television, telephony, jet flights and even space exploration before I was born.
Feels like now is a bit of a catchup after pretty tepid period that was most of my life.
Chess exists solely for the sake of the humans playing it. Even if machines solved chess, people would rather play chess against a person than a machine because it is a social activity in a way. It's like playing tennis versus a person compared to tennis against a wall.
Photographs, videos, and digital media in general, in contrast, are used for much, much more than just socializing.
I am hopeful that OpenAI will potentially offer clarity on their loss-leading subscription model. I'd prefer to know the real cost of a token from OpenAI as opposed to praying the venture-funded tokens will always be this cheap.
One of the images in the blog (https://images.ctfassets.net/kftzwdyauwt9/4d5dizAOajLfAXkGZ7...) is a carbon copy of an image from an article posted Mar 27, 2026 with credits given to an individual: https://www.cornellsun.com/article/2026/03/cornell-accepts-5...
Was this an oversight? Or did their new image generation model generate an image that was essentially a copy of an existing image?
This is hilarious. Seems like kind of a random image for a model to memorize, but it could be.
There is definitely enough empirical validation that shows image models retain lots of original copies in their weights, despite how much AI boosters think otherwise. That said, it is often images that end up in the training set many times, and I would think it strange for this image to do that.
Regardless, great find.
That has to be the wrong stock image included or something, bloody hell.
It's practically all dark except for a few spots. It's the same image just different size compression whatever. I can't find it in any stock image search, though. Surely it could not have memorized the whole image at that fidelity. Maybe I just didn't search well enough.Or the image was generated with AI in the first place and a test for Images 2.0
Haha! That would really take the cake. If it is, congratulations to them! I could never have known.
Given the recency of that image, it is unlikely it is in the training data and therefore I would go with oversight.
I've been trying out the new model like this:
Code here: https://github.com/simonw/tools/blob/main/python/openai_imag...Here's what I got from that prompt. I do not think it included a raccoon holding a ham radio (though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure): https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a...
I just got a much better version using this command instead, which uses the maximum image size according to https://github.com/openai/openai-cookbook/blob/main/examples...
https://gist.github.com/simonw/88eecc65698a725d8a9c1c918478a... - I found the raccoon!I think that image cost 40 cents.
Fed into a clear Claude Code max effort session with : "Inspect waldo2.png, and give me the pixel location of a raccoon holding a ham radio.". It sliced the image into small sections and gave:
"Found the raccoon holding a ham radio in waldo2.png (3840×2160).
Which is correct!I had one problem: finding the raccoon. Now I have two: finding the red-and-white striped souvenir umbrella, and finding the raccoon.
simonw posted 2 different images: make sure to look at the second one.
Yeah, I noticed that just now, but too late to delete the comment :p
The faces...that's nice that it turned a kid's book into an abomination
The people in this image remind me of early this person does not exist, in the best way
I found it on the 2nd image! On the 1st one not yet...
> though the problem with Where's Waldo tests is that I don't have the patience to solve them for sure
I see an opportunity for a new AI test!
There have already been several attempts to procedurally generate Where’s Waldo? style images since the early Stable Diffusion days, including experiments that used a YOLO filter on each face and then processed them with ADetailer.
It's a difficult test for genai to pass. As I mentioned in a different thread, it requires a holistic understanding (in that there can only be one Waldo Highlander style), while also holding up to scrutiny when you examine any individual, ordinary figure.
I've actually been feeding them into Claude Opus 4.7 with its new high resolution image inputs, with mixed results - in one case there was no raccoon but it was SURE there was and told me it was definitely there but it couldn't find it.
I just tried feeding your image to Claude Code, using /effort max. I get better results when Claude has a dedicated VM so I tend to ask more involved questions directly in a clear Claude Code session.
"Inspect waldo.png, and give me the pixel location of a raccoon holding a ham radio."
It used python to split the images into smaller sections, and got this:
"Found it. The raccoon holding a ham radio is inside the HAM RADIO booth, standing between two humans.
Looks like a human to me. So I prompted: "That is a human, but I understand the confusion. Look elsewhere.".It made much smaller sections and inspected them one by one.
"Found it — the raccoon is sitting on the ground near the trash can, holding a ham radio with a visible speaker/dial face.
Still not it, but it was an interesting visual test.5.4 thinking says "Just right of center, immediately to the right of the HAM RADIO shack. Look on the dirt path there: the raccoon is the small gray figure partly hidden behind the woman in the red-and-yellow shirt, a little above the man in the green hat. Roughly 57% from the left, 48% from the top."
(I don't think it's right).
I tried
> please add a giant red arrow to a red circle around the raccoon holding a ham radio or add a cross through the entire image if one does not exist
and got this. I'm not sure I know what a ham radio looks like though.
https://i.ritzastatic.com/static/ffef1a8e639bc85b71b692c3ba1...
Also, the racoon it circled isn't in the original.
I love how perfectly this captures the difficulties of using generative AI for detection tasks.
Indeed. I suppose one way to ensure you can find Waldo in any image is to add it yourself.
That's excellent. I added it to my post: https://simonwillison.net/2026/Apr/21/gpt-image-2/#update-as...
The second 4K image definitely has a raccoon on the left there! Nice.
Damn. There’s a fun game app to make here ^^
haha took me a while to notice that one of the buildings is labelled 'Ham radio'
I see the raccoon
HN submission for a direct link to the product announcement which for some reason is being penalized by the HN algorithm: https://news.ycombinator.com/item?id=47853000
Suggest renaming this to "OpenAI Livestream: ChatGPT Images 2.0"
or "How we make money with your images 2.0".
the guys presenting are probably all like 25x smarter than I am but good god, literally 0 on screen presence or personality.
That's a trained skill, and they presumably have focused on other skills.
Yeah, skills to make them a cool 10mn a year
eh, i don't think personalities are trained. on screen presence for sure, but you'd see right through it IRL.
I think its endearing
This is not as exciting as previous models were, but it is incredibly good. I am starting to think that expressing thoughts in words clearly is probably the most important and general skill of the future.
I caught the last minute of this—was it just ChatGPT Images 2.0?
It appears so!
yes
No mention of modifying existing images, which is more important than anything they mentioned.
I think we all know the feeling of getting an image that is ok, but needs a few modifications, and being absolutely unable to get the changes made.
It either keeps coming up with the same image, or gives you a completely new take on the image with fresh problems.
Anyone know if modification of existing images is any better?
Anything better that OpenAI?
There was an Edit button in one of the images in the livestream
If every single image on their blog was generated by Images 2.0 (I've no reason to believe that's not the case), then wow, I'm seriously impressed. The fidelity to text, the photorealism, the ability to show the same character in a variety of situations (e.g. the manga art) -- it's all great!
It seems to still have this gpt image color that you can just feel. The slight sepia and softness.
I was just wondering about that. Did they embrace it as a “signature look”? it cant be accidental, right?
One interesting thing I found comparing OpenAI and Gemini image editing is - Gemini rejects anything involving a well known person. Anything. OpenAI is happy to edit and change every time I tried
I have a sideproject where I want to display standup comedies. I thought I could edit standup comedy posters with some AI to fit my design. Gemini straight up refuses to change any image of any standup comedy poster involving a well know human. OpenAI does not care and is happy to edit away
How does it determine they are well known and not just similar looking?
I don't know tbh. I've tried it on 10-20 various level of famous standups and Gemini refuses every time
Just for testing, I just tried this https://i.ytimg.com/vi/_KJdP4FLGTo/sddefault.jpg ("Redesign this image in a brutalist graphic design style"). Gemini refuses (api as well as UI), OpenAI does it
It's not super deterministic but it didn't fail once on my attempts. See: https://imgur.com/a/james-acaster-cold-lasagne-1R7fpzQ
Very interesting. It fails every single time for me. I'm in Germany, maybe Google is stricter here?
See https://imgur.com/a/77BRDQv
That makes sense to me. I just Googled around like a fool and got here https://en.wikipedia.org/wiki/Personality_rights#Germany
It seems like they're trying to follow local law. What a nightmare to have to manage all jurisdictions around such a product. Surprised it didn't kill image generation entirely.
What if you change the prompt to tell it specifically its not a famous person? Or try it without text?
Can it generate anything high resolution at increased cost and time? Or is it always restricted?
https://openai.com/index/introducing-chatgpt-images-2-0/
Thanks, all displayed images look horrible and artificial. This will fail like Sora.
Your single other comment is simplistic hyperbole as well, so this is presumably a bot account.
Hard disagree on this, I was coming here to comment that this is the first time I really can't tell that some of the photos are AI generated.
I felt the same, particularly with the diagrams / magazines anyway.
I don't think it'll fail like Sora though. gpt-image-1.5 didn't fail.
great obfuscation idea - hidden message on a grain of rice
No gpt-5.5
Thursday
Wow, the difference between AI and non-AI images collapses. I hate the future where I won't be able to tell the difference.
I wake up everyday, read the tech news, and usually see some step change in AI or whatever. It's wild to think I'm living through such a massive transformation in my lifetime. The future of tech is going to be so different from when I was born (1980), I guess this is how people born in 1900 felt when they got to see man land on the moon?
> Wow, the difference between AI and non-AI images collapses. I hate the future where I won't be able to tell the difference.
Image generation is now pretty much "solved". Video will be next. Perhaps things will turn out the same as chess: in that even though chess was "solved" by IBM's Deep Blue, we still value humans playing chess. We value "hand made" items (clothes, furniture) over the factory made stuff. We appreciate & value human effort more than machines. Do you prefer a hand-written birthday card or an email?
"Solved" seems a tad overstated if you scroll up to Simonw's Where's Waldo test with deformed faces plus a confabulated target when prompted for an edit to highlight the hidden character with an arrow.
It's "solved" in that we have a way forward to reduce the errors down to 0.00001% (a number I just made up). Throwing more compute/time/money at these problems seems to reduce that error number.
As someone born in 1975 I always felt until the last couple of years that I had been stuck in a long period of stagnation compared to an earlier generation. My grandmother who was born in the 1910s got to witness adoption of electricity, mass transit, radio, television, telephony, jet flights and even space exploration before I was born.
Feels like now is a bit of a catchup after pretty tepid period that was most of my life.
Chess exists solely for the sake of the humans playing it. Even if machines solved chess, people would rather play chess against a person than a machine because it is a social activity in a way. It's like playing tennis versus a person compared to tennis against a wall.
Photographs, videos, and digital media in general, in contrast, are used for much, much more than just socializing.
Well, for some of these images for the first time I can't tell that they are AI generated
Can it generate Chibi figures to mask the oligarchy's true intentions on Twitter and make them more relatable?
In 5 years and 3 months between DALL-E and Images 2.0 we've managed to progress from exuberant excitement to jaded indifference.
Because we are all seeing the harm these tools are being used for.
It's just another step into hell.
Image generation? Hmm, would be cool if OpenAI also made a video-generation model someday..
If only there was a social network with solely AI generated videos, I would pay literal money for it...
I am hopeful that OpenAI will potentially offer clarity on their loss-leading subscription model. I'd prefer to know the real cost of a token from OpenAI as opposed to praying the venture-funded tokens will always be this cheap.