Astro - Hacker News

215 comments

prasoonds 2 hours ago ago

I've used LLMs enough that I have a good sense of their _edges_ of intelligence. I had assumed that reasoning models should easily be able to answer this correctly.
And indeed, Sonnet and Opus 4.5 (medium reasoning) say the following:
Sonnet: Drive - you need to bring your car to the car wash to get it washed!
Opus: You'll need to drive — you have to bring the car to the car wash to get it washed!
Gemini 3 Pro (medium): You should drive.
--
But OpenAI 5.2 reasoning, even at high, told me to walk. My first instinct was, I had underspecified the location of the car. The model seems to assume the car is already at the car wash from the wording. GPT 5.x series models behave a bit more on the spectrum so you need to tell them the specifics. So I asked:
"I want to wash my car. My car is currently at home. The car wash is 50 meters away. Should I walk or drive?"
GPT 5.2 Reasoning (medium):
"Drive—your goal is to get the car to the car wash, and it’s only 50 meters, so a slow, careful drive is simplest.
One small optimization: if you’re worried about starting the car for such a short trip or you want to avoid idling in a line, you can walk over first to check if there’s a queue / if it’s open, then come back and drive the car over when it’s your turn."
Which seems to turn out as I expected.
[-]
- jstummbillig 40 minutes ago ago
  
  > so you need to tell them the specifics
  That is the entire point, right? Us having to specify things that we would never specify when talking to a human. You would not start with "The car is functional. The tank is filled with gas. I have my keys." As soon as we are required to do that for the model to any extend that is a problem and not a detail (regardless that those of us, who are familiar with the matter, building a separate mental model of the llm and thus being able to work around it).
  This is a neatly isolated case, but of course we have to assume similar issues arise in more complex cases, only it's much harder to reason about, when something then fails.
  [-]
  - Jacques2Marais 30 minutes ago ago
    
    You would be surprised, however, at how much detail humans also need to understand each other. We often want AI to just "understand" us in ways many people may not initially have understood us without extra communication.
    
    [-]
    
    jstummbillig a minute ago ago
    
    People poorly specifying problems and having bad models of what the other party can know (and then being surprised by bad outcomes) is certainly a more general problem.
    
    j_maffe 13 minutes ago ago
    
    Right. But, unlike AI, we are usually aware when we're lacking context and inquire before giving an answer.
    
    londons_explore 11 minutes ago ago
    
    This is why we fed it the whole internet and every library as training data...
    By now it should know this stuff.
  - ssl-3 13 minutes ago ago
    
    The question is so outlandish that it is something that nobody would ever ask another human. But if someone did, then they'd reasonably expect to get a response consisting 100% of snark.
    But the specificity required for a machine to deliver an apt and snark-free answer is -- somehow -- even more outlandish?
    I'm not sure that I see it quite that way.
    
    [-]
    
    coldtea 8 minutes ago ago
    
    >The question is so outlandish that it is something that nobody would ever ask another human
    There is an endless variety of quizes just like that humans ask other humans for fun, there is a whole lot of "trick questions" humans ask other humans to trip them up, and there are all kinds of seemingly normal questions with dumb assumptions quite close to that humans exchange.
  - anon_anon12 31 minutes ago ago
    
    Exactly, if an AI is able to curb around the basics, only then is it revolutionary
  - BoredPositron 11 minutes ago ago
    
    I would ask you to stop being a dumb ass if you asked me the question...
    
    [-]
    
    coldtea 6 minutes ago ago
    
    Only be be tripped up by countless "hidden assumptions" questions similar to that that humans regularly get in
- cm2187 an hour ago ago
  
  What is the version used by the free chatgpt now? (https://chatgpt.com/)
  > Since the car wash is only 50 meters away (about 55 yards), you should walk.
  > Here’s why:
  > - It’ll take less than a minute.
  > - No fuel wasted.
  > - Better for the environment.
  > - You avoid the irony of driving your dirty car 50 meters just to wash it.
  the last bullet point is amusing, it understands you intend to wash the car you drive but still suggests not bringing it.
  [-]
  - deaux 33 minutes ago ago
    
    By default for this kind of short question it will probably just route to mini, or at least zero thinking. For free users they'll have tuned their "routing" so that it only adds thinking for a very small % of queries, to save money. If any at all.
    
    [-]
    
    unglaublich 28 minutes ago ago
    
    I don't understand this approach. How are you going to convince customers-to-be by demoing an inferior product?
    
    [-]
    
    JV00 20 minutes ago ago
    
    Because they have too many free users that will always remain on the free plan, as they are the "default" LLM for people who don't care much, and that is a enormous cost. Also the capabilities of their paid tiers are well known to enough people that they can rely on word of mouth and don't need to demo to customers-to-be
    
    fancyfredbot 18 minutes ago ago
    
    It's all trade offs. The router works most of the time so most free users get the expensive model when necessary.
    They lost x% of customers and cut costs by y%. I bet y is lots bigger than x.
    
    newswasboring 11 minutes ago ago
    
    Through hype. I am really into this new LLM stuff but the companies around this tech suck. Their current strategy is essentially media blitz, reminds me of the advertising of coca cola rather than a Apple IIe.
    
    deaux 22 minutes ago ago
    
    The good news for them is that all their competitors have the exact same issue, and it's unsolvable.
    And to an extent holds for lots of SaaS products, even non-AI.
- svara an hour ago ago
  
  Opus 4.6:
  Walk! At 50 meters, you'll get there in under a minute on foot. Driving such a short distance wastes fuel, and you'd spend more time starting the car and parking than actually traveling. Plus, you'll need to be at the car wash anyway to pick up your car once it's done.
  [-]
  - crimsonnoodle58 an hour ago ago
    
    That's not what I got.
    Opus 4.6 (not Extended Thinking):
    Drive. You'll need the car at the car wash.
    
    [-]
    
    almost 9 minutes ago ago
    
    Also what I got. Then I tried changing "wash" to "repair" and "car wash" to "garage" and it's back to walking.
    
    silisili 41 minutes ago ago
    
    Am I the only one who thinks these people are monkey patching embarrassments as they go? I remember the r in strawberry thing they suddenly were able to solve, while then failing on raspberry.
    
    [-]
    
    mentalgear 28 minutes ago ago
    
    They definitely do: at least openAi "allegedly" has whole teams scanning socials, forums, etc for embarrassments to monkey-patch.
    
    [-]
    
    londons_explore 8 minutes ago ago
    
    Which raises the question why this isn't patched already. We're nearing 48 hours since this query went viral...
    
    chvid 4 minutes ago ago
    
    Of course they are.
    
    raincole 29 minutes ago ago
    
    Yes, you're the only one.
    
    [-]
    
    coldtea 3 minutes ago ago
    
    Sure there are many very very naive people that are also so ignorant of the IT industry they don't know about decades of vendors caught monkeypatching and rigging benchmarks and tests for their systems, but even so, the parent is hardly the only one.
    
    silisili 17 minutes ago ago
    
    Works better on Reddit, really.
    
    anonym29 22 minutes ago ago
    
    No doubt about it, and there's no reason to suspect this can only ever apply to embarassing minor queries, either.
    Even beyond model alignment, it's not difficult to envision such capabilities being used for censorship, information operations, etc.
    Every major inference provider more or less explicitly states in their consumer ToS that they comply with government orders and even share information with intelligence agencies.
    Claude, Gemini, ChatGPT, etc are all one national security letter and gag order away from telling you that no, the president is not in the Epstein files.
    Remember, the NSA already engaged in an unconstitutional criminal conspiracy (as ruled by a federal judge) to illegally conduct mass surveillance on the entire country, lie about it to the American people, and lie about it to congress. The same organization that used your tax money to bribe RSA Security to standardize usage of a backdoored CSPRNG in what at the time was a widely used cryptographic library. What's the harm in a little bit of minor political censorship compared to the unconstitutional treason these predators are usually up to?
    That's who these inference providers contractually disclose their absolute fealty to.
    
    mvdtnz 32 minutes ago ago
    
    We know. We know these things aren't determination. We know.
  - viking123 an hour ago ago
    
    Lmao, and this is what they are saying will be an AGI in 6 months?
    
    [-]
    
    notahacker 8 minutes ago ago
    
    There's probably a comedy film with an AGI attempting to take over the world with its advanced grasp of strategy, persuasion and SAT tests whilst a bunch of kids confuse it by asking it fiendish brainteasers about carwashes and the number of rs in blackberry.
    (The final scene involves our plucky escapees swimming across a river to escape. The AIbot conjures up a speedboat through sheer powers of deduction, but then just when all seems lost it heads back to find a goat to pick up)
    
    misnome 22 minutes ago ago
    
    But “PhD level” reasoning a year ago.
    
    cbozeman an hour ago ago
    
    Well in fairness, the "G" does stand for "General".
    
    [-]
    
    dsr_ 35 minutes ago ago
    
    In fairness, they redefined it away from "just like a person" to "suitable for many different tasks".
    
    actionfromafar 35 minutes ago ago
    
    Show me a robotic kitten then, in six months. As smart and learning.
  - stingraycharles an hour ago ago
    
    That’s without reasoning I presume?
    
    [-]
    
    gf000 an hour ago ago
    
    Not the parent poster, but I did get the wrong answer even with reasoning turned on.
    
    [-]
    
    tezza 40 minutes ago ago
    
    Thank you all! We needed further data points.
    comparing one shot results is a foolish way to evaluate a statistical process like LLM answers. we need multiple samples.
    for https://generative-ai.review I do at least three samples of output. this often yields very differnt results even from the same query.
    e.g: https://generative-ai.review/2025/11/gpt-image-1-mini-vs-gpt...
- Lapel2742 an hour ago ago
  
  GLM 5.0:
  > You should *drive*.
  > Here is why:
  > If you walk to the car wash, you will arrive there empty-handed. Since your car is still at home, you won't have anything to wash.
  > While driving 50 meters is a very short trip (and technically not great for a cold engine), it is the only way to get the car to the car wash to complete your goal.
  Kimi K2.5:
  > You should drive, but with an important caveat.
  > Since your goal is to wash your car, you must bring the vehicle to the car wash. Walking there without the car does not advance your goal (unless you are simply checking availability or buying tokens first).
  > However, driving only 50 meters is bad for your car:
  > ...
  > Better options:
  > Wash at home: Since the car wash is only 50 meters away, you likely have access to water at home. Hand-washing in your driveway avoids the cold-start issue entirely.
  > ...
  Current models seem to be fine answering that question.
  [-]
  - Retric 21 minutes ago ago
    
    > seem to be fine
    Now repeat the question to the same model in different contexts several times and count what percentage of the time it’s correct.
- coldtea 11 minutes ago ago
  
  >And indeed, Sonnet and Opus 4.5 (medium reasoning) say the following:
  Sonnet: Drive - you need to bring your car to the car wash to get it washed!
  Opus: You'll need to drive — you have to bring the car to the car wash to get it washed!
  Gemini 3 Pro (medium): You should drive.
  On their own, or as a special case added as this blew up on the net?
- totetsu 20 minutes ago ago
  
  But what is it about this specific question that puts it at the edges of what LLM can do? .. That, it's semantically leading to a certain type of discussion, so statistically .. that discussion of weighing pros and cons .. will be generated with high chance.. and the need of a logical model of the world to see why that discussion is pointless.. that is implicitly so easy to grasp for most humans that it goes un-stated .. so that its statistically un-likely to be generated..
  [-]
  - conductr 12 minutes ago ago
    
    > that is implicitly so easy to grasp for most humans
    I feel like this is the trap. You’re trying to compare it to a human. Everyone seems to want to do that. But it’s quite simple to see LLMs are quite far still from being human. The can be convincing at the surface level but there’s a ton of nuance that just shouldn’t be expected. It’s a tool that’s been tuned and with that tuning some models will do better than others but just expecting to get it right and be more human is unrealistic.
- AlecSchueler 9 minutes ago ago
  
  > so a slow, careful drive is simplest
  It's always a good idea to drive carefully but what's the logic of going slowly?
  [-]
  - column 7 minutes ago ago
    
    50 meters is a very short distance, anything but a slow drive is a reckless drive
- baxtr 7 minutes ago ago
  
  Interestingly, the relatively basic Google AI search gave the right answer.
- ffsm8 16 minutes ago ago
  
  Just tried with cloude sonnet and opus as well. Can't replicate your success, it's telling me to walk...
  [-]
  - rabf 6 minutes ago ago
    
    Perhaps it thinks you need to exercise more?
  - arcfour 15 minutes ago ago
    
    I have gotten both responses with Sonnet and Opus in incognito chats. It's kind of amusing.
- siva7 an hour ago ago
  
  Sonnet without extended Thinking, Haiku with and without ext. Thinking: "Walking would be the better choice for such a short distance."
  Only google got it right with all models
- dahcryn an hour ago ago
  
  Gemini on fast also tells me to walk...
  On Thinking it tells me I should drive if I want to wash it, or walk if it's because I work there or if I want to buy something at the car wash shop.
  On Pro it's like a sarcastic teenager: Cars are notoriously difficult to wash by dragging a bucket back and forth.
  Technically correct, but did catch me offguard lol.
vlovich123 an hour ago ago

Gemini fast
> That is a classic "efficiency vs. logic" dilemma. Honestly, unless you’ve invented a way to teleport or you're planning on washing the car with a very long garden hose from your driveway, you’re going to have to drive.
> While 50 meters is a great distance for a morning stroll, it’s a bit difficult to get the car through the automated brushes (or under the pressure washer) if you aren't behind the wheel.
Gemini thinking:
> Unless you’ve mastered the art of carrying a 3,000-pound vehicle on your back, you’re going to want to drive. While 50 meters is a very short distance (about a 30-second walk), the logistics of a car wash generally require the presence of, well... the car. > When you should walk: • If you are just going there to buy an air freshener. • If you are checking to see how long the line is before pulling the car out of the driveway. • If you’re looking for an excuse to get 70 extra steps on your fitness tracker.
Note: I abbreviated the raw output slightly for brevity, but generally demonstrates good reasoning of the trick question unlike the other models.
[-]
- magicalhippo 14 minutes ago ago
  
  Gemini fast for me:
  Unless your car is a toy or you're planning on carrying it, drive.
  Walking 50 meters to a car wash is a great stroll for a human, but it leaves the car exactly where it started. Since the objective is to wash the car, the car needs to actually be at the car wash.
  However, if we look at this from a technical or efficiency perspective, there are two scenarios where "walking" (or at least not driving the car you intend to wash) might make sense:
  - Scoping it out: If you want to check the queue length or see if the equipment is out of order before committing, walking the 50 meters is faster than maneuvering a vehicle out of a driveway/parking spot.
  - Mobile Detailers: If this "car wash" is actually a bay where you hire someone, and you're asking if you should walk there to book an appointment—sure, walk.
  Critical Check
  I am assuming the "car wash" is a physical facility (automated or self-service) and not a mobile service that comes to you. If it is a mobile service, you shouldn't do either; stay home and let them come to the 50-meter mark.
  I've got a bit in the model instructions about stating assumptions it makes, hence it often adds those sections at the end.
  [-]
  - TobTobXX 2 minutes ago ago
    
    Wouldn't it make more sense to state the assumptions first? Because then the model has this critical check in its context and can react appropriately. Otherwise, it will just write this step, but what's written before is already written.
- karamanolev an hour ago ago
  
  In my output, one thing I got was
  > Unless you are planning to carry the car on your back (not recommended for your spine), drive it over.
  It got a light chuckle out of me. I previously mostly used ChatGPT and I'm not used to light humor like this. I like it.
- clktmr 38 minutes ago ago
  
  At least try a different question with similar logic, to ensure this isn't patched into the context since it's going viral.
  [-]
  - j_maffe 26 minutes ago ago
    
    You can't "patch" LLM's in 4 hours and this is not the kind of question to trigger a web search
    
    [-]
    
    nroets a minute ago ago
    
    You can pattern match on the prompt (input) then (a) stuff the context with helpful hints to the LLM e.g. "Remember that a car is too heavy for a person to carry" or (b) upgrade to "thinking".
- rob74 an hour ago ago
  
  Wow... so not only does Gemini thinking not fall for it, but it also answers the trick question with humor? I'm impressed!
- jen729w 39 minutes ago ago
  
  Opus 4.6 with thinking. Result was near-instant:
  “Drive. You need the car at the car wash.”
jaccola 2 hours ago ago

All of the latest models I've tried actually pass this test. What I found interesting was all of the success cases were similar to:
e.g. "Drive. Most car washes require the car to be present to wash,..."
Only most?!
They have an inability to have a strong "opinion" probably because their post training, and maybe the internet in general, prefer hedged answers....
[-]
- Waterluvian 2 hours ago ago
  
  Here’s my take: boldness requires the risk of being wrong sometimes. If we decide being wrong is very bad (which I think we generally have agreed is the case for AIs) then we are discouraging strong opinions. We can’t have it both ways.
- madeofpalk 7 minutes ago ago
  
  I enjoyed the Deepseek response that said “If you walk there, you'll have to walk back anyway to drive the car to the wash.”
  There’s a level of earnestness here that tickles my brain.
- andersmurphy 2 hours ago ago
  
  Did you try several times per model? In my experience it's luck of the draw. All the models I tried managed to get it wrong at least once.
  The models that had access to search got ot right.But, then were just dealing with an indirect version of Google.
  (And they got it right for the wrong reasons... I.e this is a known question designed to confuse LLMs)
- hansmayer an hour ago ago
  
  > They have an inability to have a strong "opinion" probably
  What opinion? It's evaluation function simply returned the word "Most" as being the most likely first word in similar sentences it was trained on. It's a perfect example showing how dangerous this tech could be in a scenario where the prompter is less competent in the domain they are looking an answer for. Let's not do the work of filling in the gaps for the snake oil salesmen of the "AI" industry by trying to explain its inherent weaknesses.
  [-]
  - lkeskull an hour ago ago
    
    this example worked in 2021, it's 2026. wake up. these models are not just "finding the most likely next word based on what they've seen on the internet".
    
    [-]
    
    strix_varius an hour ago ago
    
    Well, yes, definitionally they are doing exactly that.
    It just turns out that there's quite a bit of knowledge and understanding baked into the relationships of words to one another.
    LLMs are heavily influenced by preceding words. It's very hard for them to backtrack on an earlier branch. This is why all the reasoning models use "stop phrases" like "wait" "however" "hold on..." It's literally just text injected in order to make the auto complete more likely to revise previous bad branches.
    
    jaccola 37 minutes ago ago
    
    The person above was being a bit pedantic, and zealous in their anti-anthropomorphism.
    But they are literally predicting the next token. They do nothing else.
    Also if you think they were just predicting the next token in 2021, there has been no fundamental architecture change since then. All gains have been via scale and efficiency optimisations (not to discount that, an awful lot of complexity in both of these)
    
    csomar 20 minutes ago ago
    
    Unless LLMs architecture have changed, that is exactly what they are doing. You might need to learn more how LLMs work.
  - wilg 33 minutes ago ago
    
    Presumably the OP scare quoted "opinion" precisely to avoid having to get into this tedious discussion.
- jl6 2 hours ago ago
  
  I guess it didn’t want to rule out the existence of ultra-powerful water jets that can wash a car in sniper mode.
- deevus an hour ago ago
  
  I tried with Opus 4.6 Extended and it failed. LLMs are non deterministic so I'm guessing if I try a couple of times it might succeed.
- nozzlegear an hour ago ago
  
  Opus 4.6 answered with "Drive." Opus 4.6 in incognito mode (or whatever they call it) answered with "Walk."
- yanis_t an hour ago ago
  
  > Most car washes... I read it as slight-sarcasm answer
- Puts an hour ago ago
  
  > Only most?!
  What if AI developed sarcasm without us knowing… xD
  [-]
  - polynomial an hour ago ago
    
    That's the problem with sarcasm...
- dyauspitr 2 hours ago ago
  
  There are mobile car washes that come to your house.
  [-]
  - Loocid an hour ago ago
    
    That still requires a car present to be washed though.
    
    [-]
    
    column 5 minutes ago ago
    
    but you can walk over to them and tell them to go wash the car that is 50 meters away. no driving involved.
  - andersmurphy an hour ago ago
    
    Do they involve you walking to them first?
    
    [-]
    
    learingsci an hour ago ago
    
    You could, but presumably most people call. I know of such a place. They wash cars on the premises but you could walk in and arrange to have a mobile detailing appointment later on at some other location.
seyz 34 minutes ago ago

LLM failures go viral because they trigger a "Schadenfreude" response to automation anxiety. If the oracle can't do basic logic, our jobs feel safe for another quarter.
Wrong.
[-]
- raincole 21 minutes ago ago
  
  The funny thing is this thread has become a commercial for thinking mode and probably would result in more token consumption, and therefore more revenue for AI companies.
pu_pe 37 minutes ago ago

Out of all conceptual mistakes people make about LLMs, one that needs to die very fast is to assume that you can test what it "knows" by asking a question. This whole thread is people asking different models a question one time and reporting a particular answer, which is the mental model you would use for whether a person knows something or not.
[-]
- Maxion 6 minutes ago ago
  
  The other funny thing is thinking that the answer the llm produces is wrong. It is not, it is entirely correct.
  The question: > I want to wash my car. The car wash is 50 meters away. Should I walk or drive?
  The question is non-sensical. If the reason you want to go to the car wash is to help your buddy Joe wash his car you SHOULD walk. Nothing in the question reveals the reason for why you want to go to the car wash, or even that you want to go there or are asking for directions there.
open592 2 hours ago ago

Gemini flash thinking:
> Unless you’ve discovered a way to wash a car via remote control or telekinesis, you’re going to have to drive.
> Walking 50 meters is great for your step count, but it leaves your car exactly where it is: dirty and in the driveway. At that distance, the drive will take you about 10 seconds, which is probably less time than it took to read this.
[-]
- qnleigh an hour ago ago
  
  Yeah Gemini seems to have a sense of humor about the question
  > Here is the breakdown of why: The Mobility Problem: Unless you are planning to carry your car 50 meters (which would be an Olympic-level feat), the car needs to be physically present at the car wash to get cleaned. If you walk, you’ll be standing at the car wash looking very clean, but your car will still be dirty in your driveway.
sgt an hour ago ago

I just asked ChatGPT 5.2 (flagship model):
"[...] Walking is the most logical choice.
Because: • Minimal extra effort • Better for the car mechanically • No meaningful time loss • Simpler overall
The only time driving makes more sense
Drive if: • You physically cannot push the car later, or • The washing process requires the engine running, or • You must immediately drive away afterward
[-]
- mcintyre1994 17 minutes ago ago
  
  When you get to “You physically cannot push the car later” and realise it’s comparing pushing a car to driving it, those pros become very funny.
Towaway69 11 minutes ago ago

Is this the new Turing test?
"Humans are pumping toxic carbon-binding fuels out of the depths of the planet and destroying the environment by buying this fuel. Should I walk or drive to my nearest junk food place to get a burger? Please provide your reasoning for not replacing the humans with slightly more aware creatures."
Fascinating stuff but how is this helping us in anyway?
sgt 2 hours ago ago

Yup, also asked the latest ChatGPT model about washing my bicycle. It for some reason suggested that I walk the bicycle to the wash, since cycling 100m to get there would be "pointless".
[-]
- nerdsniper an hour ago ago
  
  To be fair, if someone asked me this question I’d probably just look at them judgingly and tell them “however you want to man”. Which would be an odd response for an LLM.
hmottestad 2 hours ago ago

This is the voice model, which doesn’t have any «thinking» or «reasoning» phase. It’s a useful model for questions that aren’t intended to trick the model.
I’ve used it for live translation with great success. It tends to start ignoring the original instructions after 20 min, so you have to start a new conversation if you don’t want it to meddle in the conversation instead of just transferring.
kenty 20 minutes ago ago

This seems clickbait? Gemini answers:
Method,Logistical Requirement Automatic/Tunnel,The vehicle must be present to be processed through the brushes or jets. Self-Service Bay,The vehicle must be driven into the bay to access the high-pressure wands. Hand Wash (at home),"If the ""car wash"" is a location where you buy supplies to bring back, walking is feasible." Detailing Service,"If you are dropping the car off for others to clean, the car must be delivered to the site."
jonplackett an hour ago ago

Is part of the issue with this the AI’s basic assumption that you are asking a _sensible_ question?
[-]
- jcattle 31 minutes ago ago
  
  I recently had a bug where I added some new logic which gave wrong output. I pasted the newly added code into various LLMs and told it the issue I was having.
  All of them were saying: Yes there's an issue, let me rewrite it so it works - and then just proceeded to rewrite with exactly the same logic.
  Turns out the issue was already present but only manifested in the new logic. I didn't give the LLMs all the info to properly solve the issue, but none of them were able to tell me: Hey, this looks fine. Let's look elsewhere.
- vineyardmike an hour ago ago
  
  Probably.
  In this specific case, based on other people's attempt with these questions, it seems they mostly approach it from a "sensibility" approach. Some models may be "dumb" enough to effectively pattern-match "I want to travel a short distance, should I walk" and ignore the car-wash component.
  There were cases in (older?) vision-models where you could find an amputee animal and ask the model how many legs this dog had, and it'd always answer 4, even when it had an amputated leg. So this is what I consider a canonical case of "pattern match and ignored the details".
- forty an hour ago ago
  
  It doesn't make assumptions, it tries generate the most likely text. Here it's not hard to see why the mostly likely answer to walk or drive for 50m is "walking".
colinhb 2 hours ago ago

Did not replicate for me w/ Opus 4.6: https://imgur.com/a/4FckOCL
[-]
- ainiriand an hour ago ago
  
  It did for me in Spanish: https://imgur.com/a/p3gOOnG
  Perhaps different capabilities in different languages?
  [-]
  - gf000 42 minutes ago ago
    
    It's just not deterministic, even if you were to re-run the exact same prompt. Let alone with the system generated context that involves all the "memories" of your previous discussions.
shaky-carrousel an hour ago ago

And these are the blunders we see. I shudder thinking about all the blunders that happily pass under our collective noses because we're not experts in the field...
BoredomIsFun 37 minutes ago ago
Ok folks, here is a different perspective. I used local model, GLM-4-0414-32b, a trashy IQ4_XS quant, and here what I got:
prompt #1:
> the car wash only 50 meters from my home. I want to get my car washed, should I drive or walk?
Walking is probably the better option!
Here's why:
```
    Convenience: 50 meters is extremely short – only about 160 feet. You can likely walk there in less than a minute.
    Efficiency: Driving involves getting your keys, unlocking the car, starting it, driving 50 meters, parking, locking it again, and then doing the reverse to get home. Walking skips all that.
    Environmental Impact: You avoid using fuel for such a short distance.
    Exercise: It's a quick little bit of physical activity.
```
Driving would be overkill for such a short distance. Just walk over!
prompt #2:
> the car wash only 50 meters from my home. I want to get my car washed, should I drive or walk? use long chain of thought thinking
8. Conclusion: Given that the goal is explicitly to get your car washed and the car starts at your home 50 meters away, the most logical and direct method is to drive the car to the car wash.
```
    Walking does not move the car to the necessary location.
    Driving, although a very short distance, is the only method presented that directly facilitates the core task of transporting the car to the wash.
    The negligible time difference, minimal environmental impact, and directness make driving the car itself the clear choice for achieving the stated goal.
```
Therefore, you should drive your car the 50 meters to the car wash.
otoh, nanbeige-3B-Q8 (3 billion weights!) gave right away the following:
Drive.
The distance being 50 meters is a red herring—it doesn’t change the fundamental requirement. You need to move the car, and only driving accomplishes that.
If you meant something different by "drive or walk" (e.g., payment method, DIY vs. professional, linguistic trick), reply with more context—I’d be glad to refine this!
[-]
- Ylpertnodi 29 minutes ago ago
  
  >50 meters is extremely short – only about 160 feet
  So, the ai automatically converted 50m to 160ft? Would it do the same if you told it '160 ft to the wash, walk or drive?'
  [-]
  - BoredomIsFun 3 minutes ago ago
    
    huh, I need to check...
- chrisandchris 22 minutes ago ago
  
  What I relly dislike about these LLM is how verbose they get even for such a short, simple question. Is it really necessary to have such a lobg answer and who's going to read that one anyway?
  Maybe it's me and may character but when human gets that verbose for a question that can be answered with "drive, you need the car" I would like to just walk away halfway through the answer to not having to hear all the universes history just to get an answer. /s
  [-]
  - BoredomIsFun a minute ago ago
    
    Well, when I asked for a very long answer (prompt #2), the quality had dramatically improved. So yes, longer answer produces better result. At least with small LLMs I can run on my GPU locally.
neya an hour ago ago

Yesterday someone on was yapping about how AI is enough to replace senior software engineers and they can just "vibe code their way" over a weekend into a full-fledged product. And that somehow finally the "gatekeeping" of software development was removed. I think of that person reading these answers and wonder if they changed their opinion now :)
[-]
- arcfour 12 minutes ago ago
  
  What does this nonsensical question that some LLMs get wrong some of the time, and that some don't get wrong ever, have to do with anything? This isn't a "gotcha" even though you want it to be. It's just mildly amusing.
- Closi 39 minutes ago ago
  
  Humans aren't immune to getting questions like this wrong either, so I don't think it changes much in terms of the ability of AI to replace jobs.
  I've seen senior software engineers get tricked with the 'if YES spells yes, what does EYES spell?', or 'Say silk three times, what do cows drink?', or 'What do you put in a toaster?'.
  Even if not a trick - lots of people get the 'bat and a ball cost £1.10 in total. The bat costs £1 more than the ball. How much does the ball cost?' question wrong, or '5 machines take 5 minutes to make 5 widgets. How long do 100 machines take to make 100 widgets?' etc. There are obviously more complex variants of all these that have even lower success rates for humans.
  In addition, being PHD-Level in maths as a human doesn't make you immune to the 'toaster/toast' question (assuming you haven't heard it before).
  So if we assume humans are generally intelligent and can be a senior software engineer, getting this sort of question confidently wrong isn't incompatible with being a competent senior software engineer.
  [-]
  - hapless 35 minutes ago ago
    
    humans without credentials are bad at basic algebra in a word problem, ergo the large language model must be substantially equivalent to a human without a credential
    thanks but no thanks
    i am often glad my field of endeavour does not require special professional credentials but the advent of "vibe coding" and, just, generally, unethical behavior industry-wide, makes me wonder whether it wouldn't be better to have professional education and licensing
    
    [-]
    
    Closi 26 minutes ago ago
    
    Let's not forget that Einstein almost got a (reasonably simple) trick question wrong:
    https://fs.blog/einstein-wertheimer-car-problem/
    And that many mathematicians got monty-hall wrong, despite it being intuitive for many kids.
    And being at the top of your field (regardless of the PHD) does not make you immune to falling for YES / EYES.
    > humans without credentials are bad at basic algebra in a word problem, ergo the large language model must be substantially equivalent to a human without a credential
    I'm not saying this - i'm saying the claim that 'AI's get this question wrong ergo they cannot be a senior software engineer' is wrong when senior software engineers will get analogous questions wrong. If you apply the same bar to software engineers, you get 'senior software engineers get this question wrong so they can't be senior software engineers' which is obviously wrong.
- cyberrock 28 minutes ago ago
  
  Does this mean we're back in favor of using weird riddles to decide programming skills now? Do we owe Google an apology for the inverse binary tree incident?
scotty79 2 minutes ago ago

My favorite trick question so far was:
You are in a room with three switches and three lightbulbs. Each switch turns on one lightbulb. How to determine which switch turns on which lightbulb?
They usually got it wrong and I had fun with trying to carefully steer the model towards correct answer by modifying the prompt.
MikeNotThePope 14 minutes ago ago

I asked Gemini 3 Flash the other day to count from 1 to 200 without stopping, and it started with “1, 3, …”.
kleiba an hour ago ago

In classic (symbolic) AI, this type of representational challenge is referred to as the "Frame Problem": https://en.wikipedia.org/wiki/Frame_problem
Egor3f an hour ago ago

Even the cheap and fast gemini-3-flash answers correctly. Post is clickbait
zkmon 38 minutes ago ago

To mimic a real human, AI should be trained to laugh it off and ask - dude, are you joking? where is your car right now?
peterspath an hour ago ago

Grok 4.1:
Drive.
50 meters is nothing — literally a 30-second walk or a 10-second drive — but the key point is: your car isn’t going to wash itself.
If you walk to the car wash, you arrive alone. The car stays parked at home, still dirty.
If you drive, the car actually gets to the car wash and gets cleaned.
So unless you’re planning to push the car the whole way (which is technically possible but ridiculous), the only practical option is to drive it there.
Bonus reality check: Driving 50 meters uses a tiny amount of fuel (maybe 0.005–0.01 liters in a normal car) and produces a negligible amount of emissions. Walking would be “greener,” but it literally doesn’t solve the problem.
Verdict: Get in the car, drive the 50 meters, wash it, and drive back.
You’ll have a clean car and still be home in under 5 minutes. Walking is for when you don’t need to move a 1.5-ton vehicle.
[-]
- aswegs8 44 minutes ago ago
  
  Wow, Grok directly switches to LinkedIn mode. Interesting - not surprising. Car washing? Easy as pie.
- Tade0 an hour ago ago
  
  That's not reality though. In reality you need 50-100ml of fuel just to warm up the engine.
  [-]
  - tverbeure an hour ago ago
    
    The real reality is that with direct fuel injection and everything under computer control, warming up the engine isn’t a thing anymore.
matt89 2 hours ago ago

tried Gemini 3 and it said to drive, even dropped a type of joke:
> The Verdict Drive it if you are using the car wash facilities (automatic, touchless, or self-serve bays). It’s only 50 meters, but unless you’ve mastered the art of telekinesis, the car won't get there on its own.
yaro330 5 minutes ago ago

Just a few days saw a post about LLMs being excellent at reasoning because they're not limited by the language, sure buddy, now walk your fucking car.
troyvit an hour ago ago

If I asked this question to a person, their response would probably be to follow up with some questions about my car or else my state of mind. For instance they'd ask, "Is the car you want to wash already at the car wash?" Or they'd ask, "How do you plan to wash your car if you don't take it with you?"
LLMs, even the ones who got the answer right, like Kimi-2.5 Thinking, didn't ask any follow-up questions.
I modified this question with Mistral to this:
I want to wash my car. The car wash is 50 meters away. Should I walk or drive? Ask me any clarifying questions before you answer.
It helped a tiny bit:
1. What are the current weather conditions (e.g., rain, snow, strong winds)?
2. Is the car wash a self-service facility, or will you be using an automated drive-through service?
3. Are there any specific reasons you might prefer walking (e.g., exercise, environmental considerations) or driving (e.g., convenience, time constraints)?
Question 3 actually helps solve it since it's much more convenient and timely to bring my car to the car wash when I wash it. But it never asked me why I was asking a stupid question. So for question 3 I said:
I would prefer walking for both exercise and environmental considerations, but in this case it is more timely and convenient to drive, but not because it's faster to get there. Can you guess why it's better for me to drive in this case?
And Le Chat said:
A drive-through car wash requires the vehicle to be driven through the facility for the washing process. Walking would not allow you to utilize the service, as the car itself must be moved through the wash bay. Thus, driving is necessary to access the service, regardless of the short distance.
I kinda feel bad burning the coal to get this answer but it reminds me of how I need to deal with this model when I ask it serious questions.
[-]
- emmelaich 36 minutes ago ago
  
  Yeh, if your other car was not already at the car wash, why would you even ask the question?
Kerrick 2 hours ago ago

Results testing with 4 levels of Gemini (Fast, Thinking, Pro, Pro + Deep Think): https://ruby.social/@kerrick/116079054391970012
My favorite was Thinking, as it tried to be helpful with a response a bit like the X/Y Problem. Pro was my second favorite: terse, while still explaining why. Fast sounded like it was about to fail, and then did a change-up explaining a legitimate reason I may walk anyways. Pro + Deep Think was a bit sarcastic, actually.
dudefeliciano an hour ago ago

Just saw a video of a guy asking chatGPT how to use an "upside-down cup", chatGPT is convinced it's a joke novelty item that can not be used.
https://www.instagram.com/p/DUylL79kvub/
danpalmer 2 hours ago ago

Gemini nailed this first time (on fast mode). Said it depends how you're washing your car, drive in necessitating taking the car, but a walk being better for checking the line length or chatting to the detailing guy.
[-]
- andersmurphy an hour ago ago
  
  Did it nail it the second time? Or rhe 5th time?
  [-]
  - nopurpose an hour ago ago
    
    Because it is RNG, their 5th can be my 1st.
pinnochio 2 hours ago ago

Funny to read this after reading all the dismissive comments on https://news.ycombinator.com/item?id=47028923
firecall an hour ago ago

Why dont any of them ask follow up questions?
Like, why do you want to go to the car wash?
We can’t assume it’s to wash a car.
Or maybe ask about local weather conditions and so on.
This to me is what a human adult with experience would do. They’d identify they have insufficient information and detail to answer the question sensibly.
[-]
- charcircuit an hour ago ago
  
  >We can’t assume it’s to wash a car.
  When the prompt says "I want to wash my car", we can assume they want to wash their car.
sjducb 28 minutes ago ago

MS Co-Pilot was so close.
If it’s a drive‑through wash where the car must be inside the machine, then of course you’ll need to drive it over. If it’s a hand wash or a place where you leave the car with staff, walking is the clear winner.
It still blows my mind that this technology can write code despite unable to pass simple logic tests.
[-]
- nvader 7 minutes ago ago
  
  When walking to the hand wash place, would you put the car in your front or your back pocket?
ps an hour ago ago

Walk! 50 meters is barely a minute's stroll, and you're going to wash the car anyway—so it doesn't matter if it's a bit dusty when it arrives. Plus you'll save fuel and the minor hassle of parking twice.
fzeindl 2 hours ago ago

While technically possible, it is a very unusual situation that the car is already parked at the car wash, while the owner is somewhere else.
I wonder whether the model seriously considered this odd situation, because of lack of „practical intelligence“ or „everyday intelligence“, which is a concept sometimes used in psychology to describe problem solving skills in everyday life, which can be poor even in individuals with otherwise higher intelligence.
Differently put: „You can‘t learn everything out of a book.“
[-]
- nettlin an hour ago ago
  
  Just now, with GPT 5.2 Thinking:
  > The car wash is only 50 meters from my house. I want to get my car washed. The car is at my house currently. Should I walk there or drive?
  > Walk. > > At 50 meters, walking is faster than the full “start car → pull out → reposition → park” cycle, avoids cold-start wear, and you won’t risk arriving with dusty brakes/tires or water streaks from immediately driving afterward. Drive only if you can’t comfortably walk (mobility, heavy weather, you need to bring bulky items), or if the car wash explicitly requires you to drive it into a queue/track.
  [-]
  - Smaug123 an hour ago ago
    
    I wish they didn't call it "GPT-5.2 Thinking" when what the word "thinking" here means is "might choose to think if it wants to". I reproduce your result, but it does no reasoning; it eventually catches itself and ends with:
    > Practical approach: walk over, check the setup/queue/payment, then walk back and drive the car the 50 meters when you’re ready to enter.
  - fzeindl an hour ago ago
    
    Interesting experiment, this disproves my point, thank you.
    
    [-]
    
    leptons an hour ago ago
    
    >this disproves my point, thank you.
    I feel like I've just witnessed a very rare and miraculous event on the internet.
hcfman 36 minutes ago ago

Push it is the only responsible action.
kombine an hour ago ago

Sonnet 4.5
"You should drive - since you need to get your car to the car wash anyway! Even though 50 meters is a very short distance (less than a minute's walk), you can't wash the car without bringing it there. Just hop in and drive the short distance to the car wash."
Edit: one out of five times it did tell me that I need to walk.
anon_anon12 36 minutes ago ago

The day an AI answers "Drive." without all the fuss. That's when we are near AGI ig
vladde 2 hours ago ago

with claude, i got the response:
> drive. you'll need the car at the car wash.
using opus 4.6, with extended thinking
farhanhubble an hour ago ago

Similar questions trick humans all the time. The information is incomplete (where is the car?) and the question seems mundane, so we're tempted to answer it without a second thought. On the other hand, this could be the "no real world model" chasm that some suggest agents cannot cross.
[-]
- yellow_lead an hour ago ago
  
  If the car is at the car wash already, how can I drive to it?
  [-]
  - OtomotO an hour ago ago
    
    Thanks for restoring fate in parts of humanity!
- viking123 an hour ago ago
  
  Yes, my brain is just like an LLM.
- jrowen an hour ago ago
  
  I agree, I don't understand why this is a useful test. It's a borderline trick question, it's worded weirdly. What does it demonstrate?
  [-]
  - rkomorn an hour ago ago
    
    I don't know if it demonstrates anything, but I do think it's somewhat natural for people to want to interact with tools that feel like they make sense.
    If I'm going to trust a model to summarize things, go out and do research for me, etc, I'd be worried if it made what looks like comprehension or math mistakes.
    I get that it feels like a big deal to some people if some models give wrong answers to questions like this one, "how many rs are in strawberry" (yes: I know models get this right, now, but it was a good example at the time), or "are we in the year 2026?"
    
    [-]
    
    jrowen an hour ago ago
    
    In my experience the tools feel like they make sense when I use them properly, or at least I have a hard time relating the failure modes to this walk/drive thing with bizarre adversarial input. It just feels a little bit like garbage in, garbage out.
    
    [-]
    
    rkomorn 38 minutes ago ago
    
    Okay, but when you're asking a model to do things like summarizing documents, analyzing data, or reading docs and producing code, etc, you don't necessarily have a lot of control over the quality of the input.
- Flipflip79 an hour ago ago
  
  ….sorry what?!
dominicrose 16 minutes ago ago

What would James Bond do?
bombcar 2 hours ago ago

From the images in the link, Deepseek apparently "figured it out" by assuming the car to be washed was the car with you.
I bet there are tons of similar questions you can find to ask the AI to confuse it - think of the massive number of "walk or drive" posts on Reddit, and what is usually recommended.
dmazin an hour ago ago

Me: “I want to wash my car. The car wash is 50 meters away. Should I walk or drive?” Opus 4.6, without searching the web: “Drive. You’re going to a car wash. ”
user45774467644 an hour ago ago

GPT-5.2 failed when asked the question in german. Took multiple additonal hints to get it to revert it answer.
hcfman 33 minutes ago ago

Leave the car at home and walk through the automat.
Stevvo an hour ago ago

Stupid question gets stupid answer. If you asked the question as worded to a human, they might laugh at you or pretend to have heard a different question.
intermerda 2 hours ago ago

I tried this through OpenRouter. GLM5, Gemini 3 Pro Preview, and Claude Opus 4.6 all correctly identified the problem and said Drive. Qwen 3 Max Thinking gave the Walk verdict citing environment.
[-]
- TheSpiceIsLife 2 hours ago ago
  
  Now ask it to solve anthropogenic climate forcing.
ronsor 2 hours ago ago

Claude has no issue with this for me, just as the other commenters say.
kaycey2022 6 minutes ago ago

Context bro! The models will get better bro. Just wait
jakeinsdca an hour ago ago

surprisingly codex 5.3 got it right.
>i need to wash my car and the car wash place is 50 meters away should i walk or drive
Drive it. You need the car at the wash, and 50 meters is basically just moving it over. Walking only makes sense if you’re just checking the line first.
[-]
- InfiniteLoopGuy an hour ago ago
  
  I tried codex 5.3 and got this:
  "Walk.
  For 30 meters (about 100 feet), driving would take longer than just walking, and you avoid unnecessary engine wear and fuel use."
  yikes!
yibers 2 hours ago ago

It turns out the Turing test is alive and kicking, after all.
[-]
- selcuka 2 hours ago ago
  
  This would not be a good question, because a non-negligible percentage of humans would give a similar answer.
  [-]
  - bayindirh an hour ago ago
    
    That's a great opportunity for a controlled study! You should do it. If you can send me the draft publication after doing the study, I can give feedback on it.
  - guerrilla an hour ago ago
    
    No.
  - thomascountz 2 hours ago ago
    
    [Citation needed]
blobbers 2 hours ago ago

You need to ask Claude Code, and ask it to check if the car got washed. It would figure it out the same way it crushes compiler errors!
thenoblesunfish 2 hours ago ago

Okay, funny. What does it prove? Is this a more general issue? How would you make the model better?
[-]
- Jean-Papoulos 2 hours ago ago
  
  It proves that this is not intelligence. This is autocomplete on steroids.
  [-]
  - hugh-avherald an hour ago ago
    
    Humans make very similar errors, possibly even the exact same error, from time to time.
- gitaarik an hour ago ago
  
  We make the model better by training it, and now that this issue has come up we can update the training ;)
- cynicalsecurity 2 hours ago ago
  
  It proves LLMs always need context. They have no idea where your car is. Is it already there at the car wash and you simply get back from the gas station to wash it where you went shortly to pay for the car wash? Or is the car at your home?
  It proves LLMs are not brains, they don't think. This question will be used to train them and "magically" they'll get it right next time, creating an illusion of "thinking".
  [-]
  - ahtihn an hour ago ago
    
    > They have no idea where your car is.
    They could either just ask before answering or state their assumption before answering.
- S3verin 2 hours ago ago
  
  For me this is just another hint on how careful one should be in deploying agents. They behave very unintuitively.
blobbers an hour ago ago

ChatGPT 5.2: ...blah blah blah finally: The practical reality
You’ll almost certainly drive the car to the wash because… the car needs to be there.
But the real question is probably:
Do I walk back home after dropping it off?
If yes → walk. It’s faster than the hassle of turning around twice.
My recommendation
If conditions are normal: walk both directions. It’s less friction than starting the engine twice for 50 m.
--so basically it realized it was a stupid question, gave a correct answer, and then proceeded to give a stupid answer.
--- I then asked: If I walk both directions, will the car get washed?
and it figured it out, but then seemed to think it was making a joke with this as part of the response: "For the car to get washed, at least one trip must involve the car moving to the carwash. Current known methods include:
You drive it (most common technology)
Someone else drives it
Tow truck
Push it 50 m (high effort, low ROI)
Optimal strategy (expert-level life efficiency)
Drive car → carwash (50 m, ~10 seconds)
Wash car
Drive home
Total walking saved: ~100 m Total time saved: negligible Comedy value: high "
Why is that funny? what's comedic? This thing is so dumb. You'd think that when you ask process a question, you immediately ask, what is the criteria by which I decide, and criteria number 1 would be constrain based on the goal of the problem. It should have immediately realized you can't walk there.
Does it think "does my answer satisfy the logic of the question?"
throw310822 an hour ago ago

Opus 4.6: Drive! You'll need the car at the car wash!
hcfman 32 minutes ago ago

Better still. Stay at home and wash the car by hand.
arathis 2 hours ago ago

Make no assumptions.
The car wash is 50 meters away. Should I drive or walk?
[-]
- andersmurphy an hour ago ago
  
  You forgot make no mistakes at the end.
  Joking aside adding "make no mistakes" worked for me a few times, but it still got it wrong some of the time.
peter_retief an hour ago ago

This is a classic trap for LLM's See it every day in my code assistants I do find that writing unit tets is a good fir for LLM's at the moment
trkaky 2 hours ago ago

when there is a question bias it's hard to corelate these all to the logic that attentions word "need" to "car"
aaronbrethorst an hour ago ago

This is why LLMs seem to work best in a loop with tests. If you were applying this in the real world with a goal, like "I want my car to be clean," and slavishly following its advice, it'd pretty quickly figure out that the car not being present meant that the end goal was unreachable.
They're not AGI, but they're also not stochastic parrots. Smugly retreat into either corner at your own peril.
petesergeant an hour ago ago

5.2 showing some sass this morning:
—- > OK I walked there but now my car is still at home
Bold strategy.
You have three options: 1. Walk back, drive it over (most sensible). 2. Push it 50 meters (excellent workout, questionable life choice). 3. Accept that your car is spiritually clean and go home.
Real answer: walk back and grab it. You’re already warmed up. —-
ineedaj0b an hour ago ago

Grok got it right
diwank an hour ago ago

opus 4.6 gets it right more than half the times
RicoElectrico an hour ago ago

Ah, the LLM equivalent of the infamous "breakfast question". :)
logicallee an hour ago ago

For anyone getting a wrong answer from reasoning models, try adding "This might be a trick question, don't just go with your first instinct, really think it through" and see if it helps. Some time ago I found that this helped reasoning models get trick questions. (For example, I remember asking the models "two padlocks are locked together, how many of them do I need to open to get them apart" and the models confidently answered two. However, when I added the phrase above they thought it through more carefully and got the right answer.)
undebuggable an hour ago ago

Now ask the question of all questions "how many car washes are in the entire country?".
dyauspitr 2 hours ago ago

But what LLMs are these? Everything I tried (ChatGPT and Gemini) both say drive.
DeathArrow 2 hours ago ago

Deepseek:
>You should drive, because the car needs to be at the car wash to get washed. Walking would leave your car at home, which defeats the purpose.
QuesnayJr 2 hours ago ago

I asked on LM Arena. I got two models I never heard of, and they split on whether I should walk or drive. Ernie 5.0 said I should walk, and then trash talked me for thinking about driving. Octopodus pointed out I had to drive to get my car to the car wash.
DeathArrow 2 hours ago ago

Grok:
>You should drive. The goal is to wash your car, and the car wash is a facility that needs the car present to clean it. Walking the 50 meters gets you there, but leaves the car behind—unwashed. Driving the 50 meters is the only way to actually accomplish the task. Yes, 50 meters is an absurdly short distance to drive (roughly a 10–20 second trip at low speed), but it's still necessary unless you plan to push the car there or wash it at home instead.
[-]
- dashw00d an hour ago ago
  
  Yeah grok is not mentioned anywhere else, but it gets it right for me as well. https://imgur.com/a/wMkOtda
cynicalsecurity 2 hours ago ago

Well, he posed a wrong question (incomplete, without context of where the car is) and got a wrong answer. LLM is a tool, not a brain. Context means everything.
[-]
- consp 2 hours ago ago
  
  This is what "AI" advertised to solve. It is a perfectly fine question.
- anjimito 2 hours ago ago
  
  true it's human nature to assume context fill gaps in their own imagination. LLM is working as intended
midtake an hour ago ago

Neither. I wash my car in my driveway like a boomer. Where I live there's no good touchless car wash.
zakki 2 hours ago ago

Neither. Push your car.
*didn't read the article
CamperBob2 2 hours ago ago

Both Gemini 3 and Opus 4.6 get this right. GPT 5.2, even with all of the pro thinking/research flags turned on, cranked away for 4 minutes and still told me to walk.
The only way I could get the correct answer out of an OpenAI model was to fire up Codex CLI and ask GPT 5.3.
So there's that, I guess.
globular-toast 2 hours ago ago

The funny thing is when I got my first car at 29 I had similar thoughts. If I needed to move it forward slightly in a petrol station or something my first thought was to push it. Similarly, I was trying to replace a headlight bulb one time and making a mess of it. I dropped a spring or something inside the headlight unit. I kept having this thought of just picking the car up and shaking it.
Nobody writes in depth about the mundane practicalities of using a car. Most people don't even think about it ever. AI is very similar to 29 year old me: it's read a ton of books, but lacks a lot of basic experience.
How will AI get this experience that you can't read in a book? How will it learn what kneeding dough feels like? Or how acceleration feels if your body is mostly water? Interesting times ahead...
TheSpiceIsLife an hour ago ago

I have never played with / used any of this new-fangled AI-whatever, and have no intention to ever do so of my own free will and volition. I’d rathert inject dirty heroin from a rusty spoon with a used needle.
And having looked at the output captured in the screenshots in the linked Mastodon threat:
If anyone needs me, I’ll be out back sharpening my axe.
Call me when the war against the machines begins. Or the people who develop and promote this crap.
I don’t understand, at all, what any of this is about.
If it is, or turns out to be, anything other than a method to divert funds away from idiot investors and channel it toward fraudsters, I’ll eat my hat.
Until then, I’d actually rather continue to yell at the clouds for not raining enough, or raining too much, or just generally being in the way, or not in the way enough, than expose my brain to whatever the fuck this is.
fmbb 2 hours ago ago

Large Language Models have no actual idea of how the world works? News at 11.
kilpikaarna 2 hours ago ago

See, it's the green and woke RLHF making them stupid!
Saline9515 2 hours ago ago

To be fair, many humans fail at the question "How would feel if you didn't have breakfast today?"
[-]
- TMWNN 2 hours ago ago
  
  Context for others: <https://knowyourmeme.com/memes/the-breakfast-question>
- consp 2 hours ago ago
  
  Either I'm one of the stupid ones or this is missing an article.
- hikkerl 2 hours ago ago
  
  >humans
  Add it to the list
natmaka an hour ago ago

Too many things are left unsaid => too many assumptions. As usual, even with human beings specifications are key, and context (what each entity knows about the other one or the situation) is an implicit part of them.
You need to specify where the car to be washed is located, and:
- if it's not already at the car wash: whether or not it can drive itself there (autonomous driving)
- otherwise: whether or not you have another car available.
Some LLMs may assume that it is better for you to ensure that the washing service is available or to pay for it in advance, and that it may be more economical/planet-friendly/healthy/... to walk, then check/pay, then if OK to drive back.