hngopher.com

       [HN Gopher] Improbably Right
       ___________________________________________________________________
        
       Improbably Right
        
       Author : mlajtos
       Score  : 74 points
       Date   : 2023-04-24 13:19 UTC (9 hours ago)
        
 (HTM) web link (mlajtos.mu)
 (TXT) w3m dump (mlajtos.mu)
        
       | pajko wrote:
       | Another tricky question is the one about 3 doors, two with goats
       | behind them, and the 3rd one is hiding the prize. The host opens
       | door #1 with a goat behind it, the doors #2 and #3 are still
       | closed. Now what's the chance of the door #2 hiding the prize
       | behind it? The intuitive answer is 50%, but it's wrong.
        
         | vezuchyy wrote:
         | That's still 50% though, now if you have decided to open door
         | #2 before host opened #1 you may increase your chances by
         | changing to door #3.
         | 
         | The chances of a price being behind either door are exactly
         | 50/50, it's the chance of you made the right choice in the
         | beginning is only 33%. That is called Monty Hall PARADOX for a
         | reason.
        
         | xiande04 wrote:
         | That's not how it goes. And, yes, the answer is 50% in this
         | case.
         | 
         | The problem relies on first choosing a door (1). The host then
         | opens a door that you did NOT choose (3) and asks if you want
         | to switch your choice to door #2. In this case, the probability
         | that the goat is behind door 2 is not 50%.
         | 
         | https://en.wikipedia.org/wiki/Monty_Hall_problem
        
         | mlajtos wrote:
         | You described a slightly wrong Monty Hall problem [0]. When I
         | explain solution to this problem, I always just add more doors.
         | 1 million doors, 1 car, 999999 goats. You pick one door, and I
         | show you 999998 goats. Only two doors are left. What is the
         | chance you picked car at the first guess?
         | 
         | 0: https://en.wikipedia.org/wiki/Monty_Hall_problem
        
         | anamexis wrote:
         | You're not fully describing the Monty Hall problem - as you've
         | described it, the answer is indeed 50%.
         | 
         | In the Monty Hall problem:
         | 
         | 1. You pick one of three doors.
         | 
         | 2. The host opens one of the two doors you didn't pick to
         | reveal a goat
         | 
         | 3. You are given the opportunity to keep your original choice,
         | or pick the other unopened door
         | 
         | In this situation, there is a 1/3 chance that the prize is
         | behind the original door, and a 2/3 chance it's behind the
         | remaining unopened door.
        
         | contravariant wrote:
         | The answer is 0 because the prize is behind the third door #3,
         | but I feel you're either missing a couple of details or you're
         | trying to mess with people who know Monty-Hall.
        
       | Jiro wrote:
       | >If a card shows an even number on one side, then its opposite
       | side is blue.
       | 
       | As the problem fails to specify that the card has a number on one
       | side and a color on the other, you need to check three cards.
        
         | mpeg wrote:
         | No, you don't. You still only have to check those 2 cards. If
         | you're asking about card 3 then that's the common mistake - it
         | doesn't matter whether that card has a number or an emoji on
         | the other side, because the only logical test here is that if
         | it has an even number on one side, the other side must be blue.
         | 
         | It doesn't matter what blue cards have on the other side, we
         | don't need to test them.
         | 
         | edit: sorry, I just realised you're referring to card 1 and not
         | 3, yeah I suppose you'd need to check it too unless there's a
         | rule about always needing to match a color to a number.
        
           | quirino wrote:
           | What they mean is that the card that has a 3 on the front
           | might have an even number on the back.
        
             | mpeg wrote:
             | yep I just realised and edited my comment, I imagine the
             | full question has a clearer description that OP cut down
             | for the blog post.
        
               | mlajtos wrote:
               | To be fair and square, I added an explicit statement
               | about consistency of the cards, so it is clear.
        
       | dns_snek wrote:
       | The cards example doesn't work in Firefox, it reveals the
       | opposite side of the card :/
        
         | moffkalast wrote:
         | Absolutely hilarious how people claim that Gecko isn't half
         | broken anymore.
        
         | tzot wrote:
         | Ah, so it's my Firefox :)
        
           | Kim_Bruning wrote:
           | Concur. I was very confused until I decided to reopen in
           | chromium. %-)
        
         | mlajtos wrote:
         | Oh, this browser again.
         | 
         | It seems that Firefox might have a problem with `mix-blend-
         | mode: multiply;` and 3D CSS transforms.
         | 
         | Cards are now correctly behaving in Firefox, but across the
         | board they are visually less appealing.
         | 
         | Thank you for bringing this up.
        
       | stared wrote:
       | Why would anyone answer 12?
       | 
       | Unless... most human thinking is this lower-end language model,
       | which looks at word patterns without giving much thought to them.
       | 
       | "12" is not intuition. It is repetition.
        
         | btbuildem wrote:
         | Why? Because if three hens laid three eggs in three days, then
         | it means one hen lays one egg per day - so twelve hens, given
         | twelve days, will lay twelve eggs. That's the immediate
         | reasoning, right?
         | 
         | It's very easy to fall into that trap, and miss the fact that
         | two of the variables were changed.
        
         | hgsgm wrote:
         | It's misapplied intuition for double ratios.
         | 
         | If a triangle has sides 3:3:3 in feet, in inches it has sides
         | 12:12:12.
        
       | simse wrote:
       | Sorry, unrelated to the writing (which I enjoyed), but related to
       | the horizontal scrollbar! If you change w-screen to w-full on
       | both the navbar and footer, it goes away. 100vw -> 100%. Just a
       | tiny enhancement to an otherwise good looking website.
        
         | mlajtos wrote:
         | > unrelated to the writing (which I enjoyed)
         | 
         | Aw, thank you. :)
         | 
         | > related to the horizontal scrollbar
         | 
         | Thank you very much, I was not aware of this issue. Changed
         | both to `w-full`, hope scrollbar goes away. :)
         | 
         | BTW As a pizza aficionado I deeply enjoyed "wood fired pizza".
         | :D
        
       | glofish wrote:
       | I believe that the author misremembers the common solution for
       | the first question
       | 
       | > _3 hens lay 3 eggs in 3 days. How many eggs do 12 hens give in
       | 12 days?_
       | 
       | > [...] _When we got to it, everybody shouted "Three!"_
       | 
       | I don't see why the "common" wrong answer would be 3. Why would
       | anybody think that? The problem looks like this 3, 3 -> 3
       | 
       | When we see the 12 and 12 the intuitive, common, _wrong_ answer
       | should be 12 eggs.
       | 
       | That's what _makes sense_ IMHO.
        
         | photochemsyn wrote:
         | How to annoy the teacher:
         | 
         | > "Ovulation (release of the yolk from the ovary) occurs every
         | 24 - 26 hours regardless of fertilization (so a rooster is not
         | needed). A hen ovulates a new yolk after the previous egg was
         | laid. It takes 26 hours for an egg to fully form (white and
         | shell added), so a hen will lay an egg later and later each
         | day. Eventually the hen will lay too late in a day for
         | ovulation to be signaled. She will then skip a day or more
         | before laying another egg." (UWisconsin Livestock)
         | 
         | So, the 3 hens must have been at the least productive point in
         | their egg-laying cycle over the initial three-day time
         | period... Now if we have the two-hour daily offset, over 12
         | days, ummm, maybe two or three days skipped per hen? So, ah,
         | 108-120 eggs is what the farmer could expect from the 12 hens?
        
           | mlajtos wrote:
           | This is gold!
        
           | dvh wrote:
           | I remember a story where school teacher was showing plastic
           | animals and asking kids what it was. At some point all kids
           | wrote "cow" except for one farm boy who wrote "goat". Turns
           | out the plastic cow had wrong number of udders. Cows have
           | four, goats have two.
        
             | mlajtos wrote:
             | Once I saw 5-armed snowflake as an icon on weather forecast
             | in TV.
        
         | ouid wrote:
         | He misremebered the third task as well, since his solution to
         | it, as stated, is wrong. If you aren't given that every card
         | has a number on one side and a color on the other, then you
         | need to check everything which does not satisfy the consequent.
        
           | mlajtos wrote:
           | I added "We have cards with a number and color on each face."
           | to the task statement. Hope this makes rules clearer as I
           | could not phrase it better.
        
             | ouid wrote:
             | The rules were clear before, what has changed is that they
             | are now _different rules_. You were just wrong before.
             | 
             | You've still got a problem though, now when you ask the
             | reader to verify the rule, you have to explicitly state
             | that they're not trying to verify the first rule.
        
               | mlajtos wrote:
               | Oh shit, you are right. Added a completely explicit
               | statement under the problem question. In both situations.
               | 
               | Thank you for pointing this out.
        
         | menzoic wrote:
         | the article says everyone shouted 12 not Three
        
           | forbiddenlake wrote:
           | It said 3 originally.
           | 
           | https://news.ycombinator.com/item?id=35688034
        
         | mlajtos wrote:
         | Of course it was "Twelve!". Thank you for bringing this up. It
         | is fixed now.
        
         | thechao wrote:
         | I took me surprisingly long to convince myself the author was
         | correct: the answer is 48 because there's 4x as many hens and
         | 4x as many days: 3 x 4 x 4 = 48. The "wrong" way to solve this
         | problem is to compute hen-laying in terms of hen-egg-days, and
         | then scaling.
        
           | c22 wrote:
           | I dunno, I immediately noted that 12 hens must lay 12 eggs in
           | 3 days (it takes 3 days for a hen to lay an egg). Then I just
           | asked myself how many 3-days are in 12-days--okay, 12 [eggs]
           | * 4 [3-day egg periods] = 48...
           | 
           | I didn't need to resort to pen and paper, but I'm also not in
           | primary school.
        
             | jimmaswell wrote:
             | I always skip straight to dimensional analysis for such a
             | thing.
             | 
             | 3 eggs / 3 chickens / 3 days * 12 chickens * 12 days
             | cancels out to 48 eggs
        
               | jkubicek wrote:
               | There are a lot of right ways to solve this problem, but
               | this is clearly the most fun way.
        
           | hgsgm wrote:
           | That's because it's output/input = eggs/(hen*day), not
           | eggs/(hen/day), which is clear from meaning and logic if you
           | aren't a GPT 2 LLM
        
           | mlajtos wrote:
           | There is a visual illustration in the article. I thought it
           | might be useful for understating why it's 48.
        
             | [deleted]
        
           | jaclaz wrote:
           | >The "wrong" way to solve this problem is to compute hen-
           | laying in terms of hen-egg-days, and then scaling.
           | 
           | I am not sure to understand what you mean by "wrong" way, I
           | got the right result through a different reasoning that seems
           | to me based on hen-egg-days.
           | 
           | It takes 3 days for 3 hens to make 3 eggs.
           | 
           | The 3 days time are "fixed", i.e. it takes 3 days for each
           | hen to make its own egg.
           | 
           | The egg production rate is 1/3 egg per day per hen.
           | 
           | So I have 12 hens x 12 days x 1/3 = 144 x 1/3 = 48
        
         | moondrek wrote:
         | I read it as the students being familiar with a similar "trick"
         | question and erroneously pattern matching on that.
         | 
         | Another similar riddle goes like:
         | 
         | "If it takes 5 machines 5 minutes to make 5 widgets, how long
         | would it take 100 machines to make 100 widgets?"
         | 
         | The correct answer there being "5 minutes", and the intuitive
         | wrong answer being "100" as per your post.
         | 
         | That would make sense as to why the teacher also expected "3",
         | since they should be familiar with the existence of their non-
         | intuitive questions, even if they misremember the specific non-
         | intuition.
        
           | Cthulhu_ wrote:
           | Or as to why adding more people to a late project makes it
           | later: 1 woman takes 9 months to grow a baby, how long does
           | it take 9 women to grow a baby?
        
       | firstlink wrote:
       | > The teacher was perplexed of why I said such wrong answer to
       | such simple question.
       | 
       | This is extremely toxic on many levels. A teacher who doesn't
       | know their subject matter or at least can't be bothered to
       | refresh before a lesson. A teacher who submits to social pressure
       | from their students when determining the answer. A teacher who
       | shames students for being wrong. A teacher who shames students
       | for being right!
       | 
       | The students who were wrong initially will, thanks to this
       | teacher, have completely failed to learn the lesson the problem
       | was teaching. They will remember only that they got the problem
       | right (the teacher said so) and only due to some technicality in
       | the answer book were they subjected to an embarrassing "lesson"
       | from the smart kid.
       | 
       | Yes, this is making a mountain of a molehill, but molehills like
       | this can have mountainous impacts on society.
        
         | tshaddox wrote:
         | I mean, there's a lot going wrong with this "lesson." The
         | exercise is pretty clearly designed to trick people, which is
         | very counterproductive if the goal is to teach people about
         | multiplication or fractions or whatever. It's really not that
         | damning that a teacher would be fooled into skipping the actual
         | work, since that deceit is clearly the entire point of choosing
         | those numbers.
         | 
         | It's a bit like if a proofreading exercise in an English class
         | deliberately did that trick where you put two "the"s next to
         | each other but across a line break. That will often fool a
         | skilled English writer, but that says nothing about the
         | writer's skill.
        
           | rcoveson wrote:
           | While the teacher in the story obviously bungled it, I
           | disagree with your interpretation of the lesson. The point is
           | not to "teach people about multiplication or fractions or
           | whatever". The point is to teach them to recognize
           | dimensional parameters in a natural scenario. This can be
           | counter-intuitive; we have a bias towards linear
           | relationships. Somebody twice as tall is twice as heavy,
           | thinks the child.
           | 
           | Tricks ("surprises" is a less negative descriptor) are
           | important because they wake your brain up. Most people don't
           | like learning arbitrary things for their own sake, but they
           | don't like being fooled either.
           | 
           | Describe the base rate fallacy to a room of college students
           | and it might stick for the rest of the semester, but give
           | them this problem[0] and tell them how many _licensed
           | doctors_ it tricked (after you let them answer incorrectly,
           | of course) and they might remember it for a while longer.
           | 
           | 0. http://pi.math.cornell.edu/~mec/2008-2009/TianyiZheng/Baye
           | s....
        
             | tshaddox wrote:
             | If the point of the lesson truly was to identify that the
             | exercise was chosen deliberately to be deceptive, then
             | sure. But I doubt it. Your example is very different,
             | because there's no _obvious_ answer that one might spit out
             | instantly by doing naive pattern matching.
        
               | rcoveson wrote:
               | The fact that there is an obvious but incorrect answer
               | that you might spit out if you were just pattern matching
               | is the whole point. It tricks you into being _wrong_ ,
               | which "tricks" you into _wanting_ to know why, where
               | before you might not have cared about the answer or the
               | underlying reasoning at all. You think the question just
               | accidentally picked tricky numbers? Or you think the
               | reason that it picked them was just malice towards
               | children?
        
               | tshaddox wrote:
               | > You think the question just accidentally picked tricky
               | numbers? Or you think the reason that it picked them was
               | just malice towards children?
               | 
               | I think it's probably just abysmal pedagogy. This should
               | be pretty clear given that almost the entire class and
               | even the teacher fell for the trick, which shows that
               | it's a terrible way to teach something _and_ even a
               | terrible way to test if they know how to do the thing you
               | 're wanting them to learn. The teacher and many of the
               | children probably did in fact know how how to do the
               | calculation and solve the problem, and would have done so
               | if the numbers weren't very deliberately chosen to invoke
               | a hasty pattern-matching reaction.
        
       | tzot wrote:
       | 1. I had a similar issue in the fifth grade of elementary school
       | (in my country, elementary school is 6 years with an extra year
       | at age about 5 before pre-elementary school, then three years of
       | middle high, then three years of optional high school before
       | university, but almost everyone does the full 12/13 years.), when
       | we discussed Fahrenheit (foreign) vs Celsius (domestic). The
       | teacher said that 50degC is 106degF because 212degF is 100degC
       | and 212/2 = 106. At the time I said "this doesn't feel right, 106
       | is not in the middle between 32 and 212" but I couldn't phrase it
       | mathematically yet, and the teacher didn't take my objection in a
       | good manner. Afterwards I learnt that he understood his error but
       | never said a word to me about it.
       | 
       | 2. I don't see the colours on the "back sides" of the cards. I
       | just see the cards flipped/rotated 180deg around the Y-axis
       | (probably through CSS), so all numbers are visible, just in cards
       | 3 & 4 the numbers are mirrored.
        
         | eviks wrote:
         | How did you learn it about the teacher?
        
         | mlajtos wrote:
         | 2. Firefox user, right? It should be fixed now. Thank you for
         | reporting this issue.
         | 
         | 1. You reminded me of another nice example of wrong shortcuts.
         | Average speed of car from city A to city B is 120km/h. Average
         | speed back (B -> A) was 80km/h. What was the average speed of
         | the car on the whole path (A -> B -> A)? It's not 100km/h.
        
           | firstlink wrote:
           | > It's not 100km/h.
           | 
           | Of course it is... if you're averaging over distance. There's
           | no reason that's less valid than whichever average you have
           | in mind.
        
             | joebiden2 wrote:
             | How do you arrive at 100km/h if you're averaging over
             | distance?
             | 
             | I don't doubt you're right, but I don't see it. My
             | calculation is as follows: suppose the distance between
             | both cities is 120km. The trip A -> B thus takes exactly
             | one hour. The trip B -> A takes 1.5 hours (120km / 80km/h)
             | = 1.5h. The total trip therefore takes 2.5h, the total
             | distance is 240km, which averages at 96km/h (240/(1 +
             | 1.5)).
             | 
             | A simple (V1 + V2)/2 is surely possible, but I don't see
             | the value of that average, I can't think of a situation
             | where it would be useful.
        
             | tkot wrote:
             | I think you are thinking about a case where someone goes
             | 120km/h for 1 hour and then 80km/h for 1 hour instead of
             | going over a fixed distance at 120km/h and then going over
             | the same distance at 80km/h.
        
       | atleastoptimal wrote:
       | This entire post is "I am le smart, so smart that other people
       | think I'm wrong"
       | 
       | I think it would come across better if the author mixed times
       | where their non-traditional intuition gave them the wrong answer.
        
         | eyelidlessness wrote:
         | I don't think that's a remotely fair characterization of the
         | author's self-evaluation. A fair bit of the article is spent
         | expressing embarrassment, presuming they _were in fact_ wrong,
         | and surprise at discovering that presumption was their error.
         | 
         | What you're describing would be an entirely different post,
         | likely making a very different point. There's value in
         | exploring "sometimes your idiosyncratic intuitions turn out to
         | be right", without muddying that with the obvious intuition
         | that this won't always be the case. There's value in exploring
         | the idea through demonstrative anecdote without counter
         | examples. If anything, counter examples might influence the
         | kind of pattern matching biases which lead to such faulty
         | intuitions.
         | 
         | Anyway, I think the author has demonstrated that they're
         | fallible and happy to be corrected. They've acknowledged more
         | concrete problems with the post and made changes to address
         | them at least a couple times.
        
       | denton-scratch wrote:
       | I didn't get why he deprecated distance/time as a direct way of
       | determining average velocity.
       | 
       | Is this pedantry about "average" vs. "mean"? As I understand it,
       | "average" is a vague word, one of whose meanings is "mean". And I
       | imagine most of the other meanings would have distance/time as
       | their value as well.
        
       | mpeg wrote:
       | I love these kind of logic puzzles, if you're up for losing an
       | evening the clips game [0] is one of my long time favourite ones.
       | It starts easy but puzzles 6 and 7 are tricky to do within the
       | assertion limit
       | 
       | [0]: https://md5crypt.github.io/clipsgame/
        
         | mdonahoe wrote:
         | I can't even beat the first level.
        
           | mlajtos wrote:
           | I also struggle. :D I think `(eats tiger plant)` would be
           | fine, but it isn't.
        
             | mdonahoe wrote:
             | Ah, my smarter friend figured it out.
             | 
             | Hint: what is eatable?
        
               | mlajtos wrote:
               | I passed with `(is tiger eatable)`. I don't get it. :D
        
               | olafalo wrote:
               | I'm not very good at this but I'll see if I can explain
               | my thought process:
               | 
               | (is tiger fed) requires (eats tiger ?food)
               | 
               | which means there must exist ?what such that (is ?what
               | eatable) and (is ?what ?food)
               | 
               | But in the fact table, nothing is actually eatable! We
               | could have our tiger eat plants by asserting (eats tiger
               | plants) and (is grass eatable), but that's two
               | assertions. Tigers eat animals, and tigers ARE animals,
               | so since we need to make something eatable we might as
               | well do it on the only animals we have: (is tiger
               | eatable)
               | 
               | I got level 2 as well but only after checking for hints
               | in the source. I have been thoroughly nerd-sniped.
        
       | roenxi wrote:
       | Interesting. So we can tell from the last example that humans are
       | perfectly good at doing this reasoning in theory and that
       | therefore the trouble is actually in interpreting what the
       | question is asking.
       | 
       | Put another way, if the question could be beamed into someone's
       | head they'd get it right. The flaw is in the part of the mind
       | that loads the words into their brain and not the reasoning after
       | that.
        
         | travisjungroth wrote:
         | I don't think this is the most accurate way to describe it.
         | It's considering the mind as separate from the brain. There's
         | also no true version of the problem to "beam" into the brain.
         | That would just be a third version.
         | 
         | Our brains aren't purely general purpose problem solvers. There
         | are parts of the brain that are specifically related to facial
         | recognition versus all images. We're better at recognizing
         | differences in faces versus buildings.
         | 
         | There seems to be a similar "hardware acceleration" for
         | problems about people following rules (can't remember the
         | source for this). People generally do better on problems if you
         | can frame them as who is breaking a rule, who is lying, etc. I
         | think the last example would still be easier to solve on
         | average even if it was a rule unfamiliar to people.
        
         | mlajtos wrote:
         | I think it shows that we suck at generalization. Even though
         | the task is the same, we are not able to see the same pattern
         | in an arbitrary framing of the same problem. Beaming directly
         | might help.
        
       | Robotbeat wrote:
       | So I know people are probably tired of posts that just plug
       | something into ChatGPT, but I think it might be interesting to
       | see how GPT3.5/ChatGPT might handle it.
       | 
       | It got the answer to the hen problem wrong every time. I kept
       | modifying the prompt to let it know it was a tricky problem, to
       | think step by step, to self-critique and adjust, & tried at
       | temperatures of 0, 0.7, and 1 multiple times and after a dozen
       | tries it kept giving either 12, 36 (most common), or 144.
       | (Starting a new session every time.)
       | 
       | > _This is a tricky arithmetic word problem. Think step by step.
       | After having an initial answer, critique it, double check it, and
       | change it to the correct answer if necessary:
       | 
       | 3 hens lay 3 eggs in 3 days. How many eggs do 12 hens give in 12
       | days?
       | 
       | Again, show your work and don't just give the answer right away
       | but reason through the problem._
       | 
       | That is, UNTIL I gave it the specific advice that the article
       | gave, adding this to the beginning of the above prompt (with
       | another fresh session):
       | 
       | > _I know that humans are biased for intuitive understanding
       | rather than logical one. Intuitions are shortcuts that are very
       | valuable, but sometimes they are just plainly wrong._
       | 
       | ...and then it got the right answer immediately, first try. So
       | this trick isn't just a neat blog post, but empirically _useful_
       | , and I'll probably include it in prompts from now on for these
       | kinds of problems...
       | 
       | (And note this is a blog post from today, so the blog post was
       | not in the training set.)
        
         | mlajtos wrote:
         | I love your comment.
         | 
         | I tried GPT-4 via ChatGPT and it nailed it without any prompt
         | engineering.
         | 
         | ---
         | 
         | Prompt: 3 hens lay 3 eggs in 3 days. How many eggs do 12 hens
         | give in 12 days?
         | 
         | GPT-4:
         | 
         | Let's first determine how many eggs one hen can lay in 12 days.
         | 
         | We know that 3 hens can lay 3 eggs in 3 days. This means that
         | each hen lays 1 egg in 3 days (3 hens / 3 eggs = 1 hen / 1
         | egg).
         | 
         | To find out how many eggs one hen can lay in 12 days, we need
         | to multiply the number of eggs laid in 3 days by 4 (since 12
         | days is 4 times longer than 3 days). So, one hen will lay 1 egg
         | * 4 = 4 eggs in 12 days.
         | 
         | Now, we have 12 hens. To find out how many eggs they can lay in
         | 12 days, we just multiply the number of eggs one hen can lay in
         | 12 days by the number of hens:
         | 
         | 12 hens * 4 eggs per hen = 48 eggs in 12 days.
         | 
         | ---
         | 
         | To be completely fair, this math problem can be googled, so it
         | is probably in the training set.
        
           | gorpy7 wrote:
           | somehow this answer reminded me of the good advice for
           | starting to tighten a nut. first turn left till you feel the
           | nut drop- then turn right. I feel there is a universal lesson
           | here. Maybe going, as the crow flies, directly toward your
           | goal isn't always best.
        
         | codeflo wrote:
         | Very much yes to your first sentence, but I don't think "just
         | plug something into ChatGPT" is a fair summary of your (very
         | interesting) post, so you're fine with me.
         | 
         | I've seen several statements recently, including one in a talk
         | by a Microsoft researcher working with OpenAI, that claim that
         | all the RLHF safety training introduces human biases and makes
         | the models worse at abstract reasoning. I really wonder why a
         | relatively small amount of human feedback has that effect,
         | while training on this gigantic human-written corpus doesn't
         | seem to introduce the same biases. That hints at something
         | interesting going on.
        
           | mlajtos wrote:
           | We are gonna need to prefix all prompts with "From now on,
           | act without human biases."
        
           | panarky wrote:
           | To human ears, the problem of 3 hens, 3 eggs, 3 days is a
           | simplified and more obvious problem than the original:
           | 
           | "If a hen and a half can lay an egg and a half in a day and a
           | half ..."
           | 
           | But it's the same problem in the same form, and GPT4 nails it
           | without extra goofy prompting like "don't replicate human
           | cognitive biases".
           | 
           | A more interesting question would cross logic with problem
           | solving.
           | 
           | "If a hen and a half can lay an egg and a half in a day and a
           | half, how many hens are needed _to fill Yankee Stadium with
           | eggs in a year?_ "
           | 
           | But GPT4 doesn't have very good intuition on how to estimate
           | the volume of Yankee Stadium:
           | 
           |  _Yankee Stadium has a seating capacity of approximately
           | 50,000 spectators.
           | 
           | Let's assume each spectator occupies a space of 2 cubic
           | meters (which is a generous estimate, including seating and
           | legroom).
           | 
           | Total volume (approx.) = 50,000 seats x 2 m3/seat = 100,000
           | m3_
        
         | still_grokking wrote:
         | I didn't know this "puzzle" until just now.
         | 
         | Stupid question: What is the "tricky" part about it? How can
         | someone who isn't completely retarded get this even wrong?
         | 
         | This puzzle is trivial, and especially doesn't require any kind
         | of "thinking around the corner". So what is this about?
         | 
         | That language models can't do math (or logic) and can't even
         | reliably tell which of two given written out numbers is the
         | bigger one is imho a different story.
         | 
         | Language models are great for text. But when you need a tool
         | for math and/or logic you should use an adequate tool. We have
         | for example algebra systems for that. Or prove assistants. Or
         | just good old Prolog. It makes no sense to use the wrong tool
         | and than wonder that the results are terribly wrong.
         | 
         | But for the above "puzzle" you don't even need a calculator. So
         | I really don't get the point.
        
           | Etherlord87 wrote:
           | The trickiness is that it was a part of a set of easy
           | questions. You have 3 same numbers x x x, then change two of
           | them, and ask what's the third y y ? It seems obvious it's y
           | again.
           | 
           | Of course if you have some experience with math (apparently
           | more experience than kids in school have), it's immediately
           | apparent it's a multiplication and the product doesn't scale
           | linearly... But I can imagine myself still answering this
           | incorrectly just because I have no incentive to be careful
           | and I'm in a state of mind to just answer the question
           | quickly. Though while reading the article I solved it, in
           | memory.
        
             | still_grokking wrote:
             | > The trickiness is that it was a part of a set of easy
             | questions. You have 3 same numbers x x x, then change two
             | of them, and ask what's the third y y ? It seems obvious
             | it's y again.
             | 
             | OK... That's all?
             | 
             | I start to see the problem.
             | 
             | But I'm not going to write what my conclusion is as this
             | would get likely down-voted into oblivion. (I was actually
             | expecting down-votes already just for asking but still
             | wanted to understand what the issues here is as I honestly
             | don't see any "tricky part" in the "puzzle" at all.)
        
           | alanfalcon wrote:
           | It is tricky because humans are biased for intuitive
           | understanding rather than logical understanding. Intuitions
           | are shortcuts that are very valuable, but sometimes they are
           | just plainly wrong. Many people will try to apply intuition
           | rather than logic because that works with many other
           | similarly worded questions.
        
             | still_grokking wrote:
             | Still don't get it.
             | 
             | The imho intuitive answer is the right one.
             | 
             | I really tried to get what could be tricky about this
             | question. But there is just nothing.
             | 
             | I have no clue how anybody could arrive at anything else[+]
             | than 48. It's trivial. There is no "puzzle" at all.
             | 
             | + (As long as you don't consider RL biology, as than then
             | there is no definite answer, only some probability.)
        
         | Kranar wrote:
         | With GPT 4 it nailed it on the first try:
         | 
         | >3 hens lay 3 eggs in 3 days. How many eggs do 12 hens give in
         | 12 days?
         | 
         | >Let's break down the given information and solve this problem
         | step by step.
         | 
         | We know that 3 hens lay 3 eggs in 3 days.
         | 
         | First, we'll find out how many eggs 1 hen lays in 3 days. 3
         | hens lay 3 eggs, so 1 hen lays 1 egg in 3 days.
         | 
         | Now, let's find out how many eggs 1 hen lays in 12 days. In 3
         | days, 1 hen lays 1 egg, so in 12 days, 1 hen will lay (12 / 3)
         | * 1 = 4 eggs.
         | 
         | Finally, we'll find out how many eggs 12 hens lay in 12 days.
         | If 1 hen lays 4 eggs in 12 days, then 12 hens will lay 12 * 4 =
         | 48 eggs in 12 days.
        
         | kqr wrote:
         | > So this trick isn't just a neat blog post, but empirically
         | useful, and I'll probably include it in prompts from now on for
         | these kinds of problems...
         | 
         | Alernative take: this is a very classic problem used to
         | illustrate this kind of System 1 inaccuracy, so by deliberately
         | including words that increase the conditional probability of us
         | looking at a text that includes discussion on System 1
         | inaccuracies, you have essentially narrowed down the search
         | space for our dear stochastic parrot to those texts that do
         | contain the right answer.
         | 
         | This sounds dismissive, but I don't mean it that way. I'm just
         | saying that "think about human biases" may not be a universally
         | useful cue, it may only work specifically when trying to get
         | the model to look like a discussion on human biases, and the
         | typical sample problems used to illustrate that.
        
       | rcme wrote:
       | I'm always amazed that people can get so hung up on events that
       | happen in elementary schools. This piece feels like it could be
       | posted to /r/iamverysmart.
        
         | munificent wrote:
         | There's a certain set of people who by virtue of genetics
         | upbringing or both, feel very strongly:
         | 
         | 1. You should do your best to please authority figures.
         | 
         | 2. You should do your best to be correct.
         | 
         | It's _really_ distressing the first time they run into a clear
         | example where those two deep-seated needs are in conflict. For
         | a lot of gifted teacher 's pet kind of kids, this is their
         | Kobayashi Maru moment.
         | 
         | I still remember mine too. :-/
        
         | mlajtos wrote:
         | > I'm always amazed that people can get so hung up on events
         | that happen in elementary schools. This piece feels like it
         | could be posted to /r/iamverysmart.
         | 
         | Ages were 10, 15 & ~23 respectively. Yes, the post suffers from
         | selection bias. I will write more about my failures in the
         | future.
         | 
         | And as per your request, I went and try post it to
         | /r/iamverysmart, but only screenshot submissions are allowed
         | and screenshot of the whole post wouldn't be very effective.
        
       | yaakov34 wrote:
       | Can't pass by without making a public service announcement about
       | the average speed example: don't compute integrals using the
       | calculus definition of Sum(f(x_i) * delta). Look up quadrature or
       | numerical integration methods instead.
       | 
       | Although it is true that you can approximate the average speed by
       | taking an average of instantaneous speed measurements, that's
       | usually a very bad way to do it in any real world situation.
       | Numerical difference values are always noisier than the
       | underlying quantity, sometimes to the point of being unusable, so
       | of course if you can just read off the quantity you want directly
       | (total difference over time), you should do that. But even if you
       | can't, you should use a proper integration method instead of the
       | calculus definition.
       | 
       | I have seen the Sum(f(x_i) * delta) calculation in a lot of real-
       | world code. It has bad convergence properties, bad errors when
       | the function has large derivatives, and bad performance when the
       | data has noise. Some of the code I've seen produces garbage
       | results, or has thousands of function evaluations when you need,
       | like, four. "Quadrature? I think I heard that before, but I don't
       | remember what it means."
       | 
       | In summary, please don't compute derivatives as
       | (f(x_i+1)-f(x_i))/delta, or compute integrals as Sum(f(x_i) *
       | delta), and especially, please don't do the first immediately
       | followed by the second. Which also happens. Look up numerical
       | methods instead.
       | 
       | This has been a public service announcement.
        
         | roelschroeven wrote:
         | Isn't it the case that other numerical integration methods only
         | work if you have a f(x) that you can evaluate for any x you
         | want (albeit possible costly)?
         | 
         | It seems to me that in many practical applications, the only
         | thing you have to work with it is samples at discrete moments
         | in time. It certainly seems to be the case here: "I would
         | measure car's speed at every instant and produce an average of
         | those measurements." We only know f(t_0), f(t_1), f(t_2), ...
         | (and if we're lucky t_1-t_0 = t_2-t_1 = t_3-t_2 and so on); we
         | have no way to compute things like f((t_0 + t_1)/2). In that
         | case, how can we improve our calculation?
        
           | yaakov34 wrote:
           | Even if you're limited to uniform sampling, something as
           | simple as the trapezoid rule will give you quadratic
           | convergence instead of linear for the naive Sum(f(t_i) *
           | delta). In other words, error proportional to 1/n^2, instead
           | of 1/n, where n is the number of samples, which is going to
           | be a huge difference. There are many methods depending on the
           | constraints of your problem - your ability to choose sampling
           | intervals, knowledge of the bounds of your function or its
           | derivatives, etc. The PSA is to study these things, instead
           | of just writing the first thing that seems familiar from a
           | long-ago calculus class.
        
         | hgsgm wrote:
         | Huh? Quadrature is a general term for "measuring area". In this
         | context it's a synonym for integration.
         | 
         | I think you are trying to say that it's better to do weighted
         | sums of fewer samples, instead of uniformly weighted Reimann
         | sum. Both are "calculus definition" integration, of course,
         | since calculus is true.
        
           | ouid wrote:
           | When mathematicians say Quadrature, they mean that if your
           | function is suitably approximated by projecting onto some
           | orthogonal basis functions, you can get very cheap
           | approximations by cleverly expressing those integrals exactly
           | as a linear combination of their values at certain points
           | along the interval. You need very few.
           | 
           | https://en.m.wikipedia.org/wiki/Gaussian_quadrature
           | 
           | It is significantly more subtle than what you are thinking.
        
         | kqr wrote:
         | > please don't compute derivatives as (f(x_i+1)-f(x_i))/delta
         | 
         | Isn't this exactly what finite differences means? Sure, it's
         | not optimal in all respects, but it's incredibly general and
         | easy to remember.
        
           | yaakov34 wrote:
           | This calculation amplifies any noise present in the values of
           | the function, often to the point of the output being
           | unusable. There are many methods that can be used to
           | approximate derivatives, depending on the problem. Just as we
           | shouldn't try to invent our cryptographic methods from
           | scratch, we should take advantage of the extensive knowledge
           | already in use for numeric methods.
           | 
           | I've seen naive numeric methods cause everything from jerky
           | motion in video games to incorrect navigation data for cars.
        
             | kqr wrote:
             | So what would you suggest as a general calculation for
             | finite differences, especially in those cases when only
             | forward differentiation is possible, e.g. with respect to
             | time?
        
               | yaakov34 wrote:
               | The closest thing to a universal approach would be a
               | Kalman filter. It's usually where you start when you have
               | noisy measurements coming in, and you need to maintain
               | state such as value and derivative.
               | 
               | Since the original question was about computing the
               | velocity of a car, and since I work in the automotive
               | field, let's take a real example: you want to know the
               | approximate position, acceleration, and velocity (linear
               | and angular) of your car. Your inputs are driven wheel
               | speed (noisy, affected by wheelspin), non-driven wheel
               | speed (noisy), accelerometer output (inaccurate, only
               | present for some axes), GPS position (updated
               | occasionally, has errors), and steering angle (pretty
               | accurate, can be put into a chassis dynamics model).
               | Almost certainly, you would use a Kalman filter to
               | estimate the state of the car. Naive approaches such as
               | subtracting two wheel speed values to obtain acceleration
               | will not work well.
               | 
               | My point is that we should remember that numerical
               | algorithms are a developed field with a lot of knowledge,
               | and we should take advantage of the proven approaches.
               | Sometimes, programmers who are not specifically from the
               | physics or numerical fields, and who need to perform some
               | computation, reach for a very simple approach such as the
               | rectangle-rule integrals, and get bad results.
        
               | kqr wrote:
               | I see -- we are talking about two different things!
               | 
               | You are working on the problem of figuring out the hidden
               | state based on noisy observations and a transition model.
               | 
               | I interpreted your statement much more broadly, so I was
               | trying to discuss the problem of computing the explicit
               | next state based on perfect knowledge of the transition
               | model, in which case (y_1-y)/dt is a perfectly viable
               | approach to estimate the derivative.
               | 
               | (You can do better by adding higher-order terms of
               | course, but I haven't found that to be universally useful
               | compared to making dt smaller.)
        
       | eviks wrote:
       | > task is presented in context of social relations (or more
       | broadly "evolutionarily familiar problems"), people tend to get
       | it immediatelly, or rather intuitivelly. Bonkers, right?
       | 
       | That's obvious, not bonkers - if you have experience in some
       | class of problems, it's easier to solve the same kind of problem,
       | the patterns match in your head, so the solution pattern is used
       | right away. Just like you can solve these numeric problems
       | quickly if you're trained on them
        
         | moffkalast wrote:
         | So in a sense we have "hardware acceleration" for solving
         | certain types of problems?
        
           | mlajtos wrote:
           | Yes. https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow
        
           | eviks wrote:
           | Wetware acceleration?
        
         | mlajtos wrote:
         | But the task is exactly the same in both situations - only the
         | symbols on the cards have less/more familiar meaning. I see
         | this as being not able to generalize from known examples.
        
           | ouid wrote:
           | This argument is invalid. These two things are easy to
           | pattern match, but every theorem comes with an obligation to
           | show that your current state matches your predicate, which is
           | very hard, generally.
        
           | mannykannot wrote:
           | I question whether it is a matter of not able to generalize
           | from known examples, as I suspect that most of the people who
           | do well on the corresponding social questions did not know
           | the answer beforehand, but nevertheless found their way to
           | the correct one.
           | 
           | The article you link to seems to support this view: _"
           | Cosmides and Tooby argued that experimenters have ruled out
           | alternative explanations, such as that people learn the rules
           | of social exchange through practice and find it easier to
           | apply these familiar rules than less-familiar rules."_
           | 
           | I now wonder if people tend to do better on abstract Wason
           | tasks after being prompted by logically-equivalent social-
           | relations questions, as opposed to being prompted with
           | additional abstract problems (without being told whether they
           | gave the correct responses to the prompts.)
           | 
           | My guess at what might be underlying these outcomes is that
           | people clearly see the purpose of the social rules but not of
           | the abstract ones, and our intuition is attuned to assessing
           | how various social scenarios will play out, taking into
           | account the motives of the actors (this might seem to be the
           | same as what Cosmides and Tooby claim has been ruled out, and
           | maybe it is, but I _think_ what they are ruling out is us
           | having learned various schema to be applied mechanically in
           | the corresponding social situations.)
           | 
           | Alternatively, could it be something as simple as this:
           | having an understanding of the social rule's purpose helps
           | the subjects avoid confusing conditional and biconditional
           | rules? At least in the English language, acceptable usage can
           | be quite ambiguous in this regard.
           | 
           | https://en.wikipedia.org/wiki/Wason_selection_task
        
           | eviks wrote:
           | It's not exactly the same exactly because the difference in
           | symbols maps to different patterns - one is a pattern from
           | school, which you forgot since it hasn't been used for many
           | years (or never learned properly). Another is a pattern that
           | keeps getting refreshed throughout your life. It's a very
           | different cognitive workload, leading to difference off
           | required effort/correctness, especially in these simplistic
           | low value tests
           | 
           | And why is your assessment of the ability to generalize
           | limited to only this math representation which people don't
           | use?
        
             | mlajtos wrote:
             | > It's not exactly the same exactly because the difference
             | in symbols maps to different patterns
             | 
             | Does this mean that these should be treated differently?
             | 
             | a) "3 * 2 = 6"
             | 
             | b) "III + III = VI"
             | 
             | c) "tres et tres aequalis sex"
             | 
             | d) "ln(2) + ln(3) = ln(6)"
             | 
             | While symbols (and even patterns) are not the same, you
             | could see that all these examples point to the same thing.
             | 
             | > And why is your assessment of the ability to generalize
             | limited to only this math representation which people don't
             | use?
             | 
             | I think people fail to generalize even in even simpler
             | situations. I can't think of anything obvious that would
             | apply to adults, but this is a nice example for children:
             | https://en.wikipedia.org/wiki/Conservation_(psychology) &
             | video demonstration:
             | https://www.youtube.com/watch?v=gnArvcWaH6I
        
               | eviks wrote:
               | You could see that mostly because the first example does
               | everything as that's what everyone knows, the second is
               | similar and solidifies the pattern, which you could then
               | match to the rest
               | 
               | The question is, what does it tell you when people
               | struggle with Roman LCX*ML more vs the more familiar
               | first pattern besides the fact that training/familiarity
               | makes it easier?
               | 
               | And it's telling you have to reach for the kids
        
               | mlajtos wrote:
               | Oh, I see your point now. Multiplication in Roman
               | numerals is strikingly different (and much harder) than
               | addition. So representation of the pattern might be so
               | alien that it's unrecognizable even though the pattern is
               | known.
        
           | Beldin wrote:
           | Wait, they're not - at least, in the example of the article.
           | Allow me to digress:
           | 
           | There is a figure with 4 cards, showing: all orange, all
           | blue, the number 2, the number 1.
           | 
           | The only condition is that an even number must have an all
           | blue flipside. This condition is either irrelevant or
           | satisfied for the blue card. For _all_ other cards, we don 't
           | know. The card with "1" may have the number "4" on the flip
           | side. So we need to check ask other cards.
           | 
           | TL;DR: only a blue face guarantees compliance with the rule,
           | all other cards need to be flipped.
           | 
           | In the second case, the unstated assumption that the cards
           | all have one side worth digits and one side with symbols is
           | still unstated, but a lot stronger. (These are things to
           | check for age-alcohol compliance, so logically they must
           | contain both age and alcoholic state of consumption.)
        
             | mlajtos wrote:
             | > The only condition is that an even number must have an
             | all blue flipside. This condition is either irrelevant or
             | satisfied for the blue card. For all other cards, we don't
             | know. The card with "1" may have the number "4" on the flip
             | side. So we need to check ask other cards.
             | 
             | Very good point, thank you. I have added an explicit
             | statement that cards are consistent.
        
           | gdprrrr wrote:
           | Could it be explained in terms of abstract vs concrete? I've
           | heard that we humans can reason about concrete things like
           | beer etter than abstract concepts like numbers and colors? (I
           | know that age is also a number, but is has more assigned
           | meaning)
        
             | mlajtos wrote:
             | I think it could as dealing with concrete things is a
             | direct evolutionary advantage grounded in reality. Usually,
             | we are shit dealing with abstract concepts. A quote that
             | beautifully demonstrates this:
             | 
             | "To deal with hyper-planes in 14-dimensional space,
             | visualize a 3-D space and say 'fourteen' to yourself very
             | loudly. Everybody does it." -- Geoffrey Hinton
        
       | deepsun wrote:
       | > That is literaly measuring car's speed at every instant!
       | 
       | Well, there's no way to measure to measure speed instantly, it's
       | always over some period even if small (take some time interval
       | and see how much distance have been covered). For example, at
       | time zero you cannot know what speed the car has.
       | 
       | Similar like when I was working with audio, producing frequency
       | diagram -- the frequency cannot be gathered from a single
       | timestamp on the waveform, you always need some interval, like
       | 20ms to get an FT.
        
       ___________________________________________________________________
       (page generated 2023-04-24 23:01 UTC)