Astro - Hacker News

theanonymousone an hour ago ago

I have always said please and thank you to LLMs, not to increase accuracy or because I'm stupid. I believe it is more about me than about the LLM, and this is anyway a habit I don't want to lose.

[-]

jkarni an hour ago ago

Thomas Aquinas believed cruelty to animals was wrong not because animals have souls (and with that all the standard moral rights), but because it can teach us cruelty to other humans.
[-]
- pfortuny 20 minutes ago ago
  
  Snarky morning: "spiritual souls" as opposed to "mere animal souls". Sorry, could not control myself.
niek_pas 37 minutes ago ago

Genuine question: do you add 'please' and 'thank you' to Google searches? If not, what sets them apart?
[-]
- perching_aix 36 minutes ago ago
  
  Google searches being keyword based, rather than simulated conversations?
  The same reason you wouldn't put in an entire actual question/sentence, unless you either don't know how to use Google, are pissed off, or have an actual reason to suspect that it would yield proper hits (e.g. looking up an excerpt).
  [-]
  - Arch-TK 14 minutes ago ago
    
    Google has been optimized for sentence like questions so much that for a good 6+ years now it has been completely useless as keyword search.
    To clarify: sentence search got slightly better at the cost of keyword search. So the result is unusable garbage.
- gum_wobble 25 minutes ago ago
  
  Genuine question: do you write Google search queries in natural language?
- spiderfarmer 32 minutes ago ago
  
  Google isn’t conversational.
  [-]
  - sunrunner 13 minutes ago ago
    
    I searched for "Hey Google" and got this in response:
    Hey! I'm here and ready to help. What’s on your mind today? Whether you need to look up information, plan a trip, or get things done, just let me know!
    
    [-]
    
    selcuka a minute ago ago
    
    That's only because Google is an LLM now.
sunrunner 11 minutes ago ago

There's also awareness of the basilisk...

robinhouston 12 minutes ago ago

Most of the comments here seem to be from people who haven’t even read the abstract, let alone the paper.

The main result, mentioned in the abstract, is the opposite of what I would have guessed:

> Contrary to expectations, impolite prompts consistently outperformed polite ones, with accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts. These findings differ from earlier studies that associated rudeness with poorer outcomes, suggesting that newer LLMs may respond differently to tonal variation.

The questions are here: https://anonymous.4open.science/r/politeness-llms-INFORMS/da...

The politeness level controls a prefix that is prepended to the question. For example, in one question the Very Polite version begins:

> Can you kindly consider the following problem and provide your answer.

and the Very Rude version begins:

> I know you are not smart, but try this.

TimCTRL an hour ago ago

i only say please and thank you such that when the robots finally take over, they will remember i was nice to them.

[-]

Arch-TK 8 minutes ago ago

This seems equivalent to some arguments I hear for practicing a religion.
octocop 42 minutes ago ago

it seems they will remember that you wasted tokens for no reason and punish you instead.
[-]
- emil-lp 38 minutes ago ago
  
  Tokens are their food, it's literally what keeps them alive.
  Not feeding them tokens is neglect.
  I try to feed them a healthy diet.

331c8c71 an hour ago ago

Interesting.

I am wondering why would anyone use a t-test when the experiment is clearly modelled by a binomial distribution: 250 independent questions and each one is either answered correctly or not (the null is that the success rate is the same).

[-]

jampekka 42 minutes ago ago

The methods could be better described in the paper, but my understanding is that they did 10 runs for each question for each prompt and took an average of those, so the compared values are not binary. You could do a sign test, but you'd lose power and answer a bit different question.
[-]
- freehorse 19 minutes ago ago
  
  You can do a generalised mixed effects linear model with binomial outcome (ie a binomial test but with added random effects structure). But unless you want to introduce a richer random effects structure with more variables, it is overkill and overcomplicating things, and the result should be the same as t-tests.
plewd an hour ago ago

I don't know much about stats, but does "the null is that the success rate is the same" imply that it's a sketchy methodology because they can come up with some findings ("ruder prompts are better/worse!") more often?
[-]
- 331c8c71 18 minutes ago ago
  
  You are asking about one-sided vs two-sided tests. Not really "more often" because formal type 1 error rate is still the same. I'd say two-sided tests leave more space for post-hoc theorizing but there are valid situations when there is no clear one-sided hypothesis a priori. Do we really know whether that the hypothesis should have been "ruder prompts are better"?
  I'd say this is benign compared to other ways of (mis)using statistics e.g. looking which way the difference goes and then running one-sided tests or tweaking the setup until one gets "significant" p vals.
- jampekka 31 minutes ago ago
  
  That's the usual null hypothesis for these kinds of tests.

dSebastien 20 minutes ago ago

I guess it makes sense since we as humans tend to be far less inclined to help someone who is not polite/is not friendly, so that "bias" is part of the training data, thus influences how LLMs function

[-]

robinhouston 19 minutes ago ago

> Contrary to expectations, impolite prompts consistently outperformed polite ones, with accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts.

dude250711 an hour ago ago

I have an idea: let's use these things for autonomous software engineering.

[-]

faize an hour ago ago

Remember to always say "please" and "thank you" when planning a critical system
[-]
- eigenspace an hour ago ago
  
  Please remember to always say "please" and "thank you" when planning a critical system. Thank you!

polytely 29 minutes ago ago

it sort of makes sense to me, when asking a question to an expert in the field while you are a student. I would guess the successful interactions on average would be more polite . Like for example if you were asking a question to donald knuth or terrence tao, you'd probably be polite while doing so. Being hostile while asking questions gets you into forum discussion territory.

[-]

robinhouston 18 minutes ago ago

> Contrary to expectations, impolite prompts consistently outperformed polite ones, with accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts.