[HN Gopher] Improbably Right
___________________________________________________________________
Improbably Right
Author : mlajtos
Score : 74 points
Date : 2023-04-24 13:19 UTC (9 hours ago)
(HTM) web link (mlajtos.mu)
(TXT) w3m dump (mlajtos.mu)
| pajko wrote:
| Another tricky question is the one about 3 doors, two with goats
| behind them, and the 3rd one is hiding the prize. The host opens
| door #1 with a goat behind it, the doors #2 and #3 are still
| closed. Now what's the chance of the door #2 hiding the prize
| behind it? The intuitive answer is 50%, but it's wrong.
| vezuchyy wrote:
| That's still 50% though, now if you have decided to open door
| #2 before host opened #1 you may increase your chances by
| changing to door #3.
|
| The chances of a price being behind either door are exactly
| 50/50, it's the chance of you made the right choice in the
| beginning is only 33%. That is called Monty Hall PARADOX for a
| reason.
| xiande04 wrote:
| That's not how it goes. And, yes, the answer is 50% in this
| case.
|
| The problem relies on first choosing a door (1). The host then
| opens a door that you did NOT choose (3) and asks if you want
| to switch your choice to door #2. In this case, the probability
| that the goat is behind door 2 is not 50%.
|
| https://en.wikipedia.org/wiki/Monty_Hall_problem
| mlajtos wrote:
| You described a slightly wrong Monty Hall problem [0]. When I
| explain solution to this problem, I always just add more doors.
| 1 million doors, 1 car, 999999 goats. You pick one door, and I
| show you 999998 goats. Only two doors are left. What is the
| chance you picked car at the first guess?
|
| 0: https://en.wikipedia.org/wiki/Monty_Hall_problem
| anamexis wrote:
| You're not fully describing the Monty Hall problem - as you've
| described it, the answer is indeed 50%.
|
| In the Monty Hall problem:
|
| 1. You pick one of three doors.
|
| 2. The host opens one of the two doors you didn't pick to
| reveal a goat
|
| 3. You are given the opportunity to keep your original choice,
| or pick the other unopened door
|
| In this situation, there is a 1/3 chance that the prize is
| behind the original door, and a 2/3 chance it's behind the
| remaining unopened door.
| contravariant wrote:
| The answer is 0 because the prize is behind the third door #3,
| but I feel you're either missing a couple of details or you're
| trying to mess with people who know Monty-Hall.
| Jiro wrote:
| >If a card shows an even number on one side, then its opposite
| side is blue.
|
| As the problem fails to specify that the card has a number on one
| side and a color on the other, you need to check three cards.
| mpeg wrote:
| No, you don't. You still only have to check those 2 cards. If
| you're asking about card 3 then that's the common mistake - it
| doesn't matter whether that card has a number or an emoji on
| the other side, because the only logical test here is that if
| it has an even number on one side, the other side must be blue.
|
| It doesn't matter what blue cards have on the other side, we
| don't need to test them.
|
| edit: sorry, I just realised you're referring to card 1 and not
| 3, yeah I suppose you'd need to check it too unless there's a
| rule about always needing to match a color to a number.
| quirino wrote:
| What they mean is that the card that has a 3 on the front
| might have an even number on the back.
| mpeg wrote:
| yep I just realised and edited my comment, I imagine the
| full question has a clearer description that OP cut down
| for the blog post.
| mlajtos wrote:
| To be fair and square, I added an explicit statement
| about consistency of the cards, so it is clear.
| dns_snek wrote:
| The cards example doesn't work in Firefox, it reveals the
| opposite side of the card :/
| moffkalast wrote:
| Absolutely hilarious how people claim that Gecko isn't half
| broken anymore.
| tzot wrote:
| Ah, so it's my Firefox :)
| Kim_Bruning wrote:
| Concur. I was very confused until I decided to reopen in
| chromium. %-)
| mlajtos wrote:
| Oh, this browser again.
|
| It seems that Firefox might have a problem with `mix-blend-
| mode: multiply;` and 3D CSS transforms.
|
| Cards are now correctly behaving in Firefox, but across the
| board they are visually less appealing.
|
| Thank you for bringing this up.
| stared wrote:
| Why would anyone answer 12?
|
| Unless... most human thinking is this lower-end language model,
| which looks at word patterns without giving much thought to them.
|
| "12" is not intuition. It is repetition.
| btbuildem wrote:
| Why? Because if three hens laid three eggs in three days, then
| it means one hen lays one egg per day - so twelve hens, given
| twelve days, will lay twelve eggs. That's the immediate
| reasoning, right?
|
| It's very easy to fall into that trap, and miss the fact that
| two of the variables were changed.
| hgsgm wrote:
| It's misapplied intuition for double ratios.
|
| If a triangle has sides 3:3:3 in feet, in inches it has sides
| 12:12:12.
| simse wrote:
| Sorry, unrelated to the writing (which I enjoyed), but related to
| the horizontal scrollbar! If you change w-screen to w-full on
| both the navbar and footer, it goes away. 100vw -> 100%. Just a
| tiny enhancement to an otherwise good looking website.
| mlajtos wrote:
| > unrelated to the writing (which I enjoyed)
|
| Aw, thank you. :)
|
| > related to the horizontal scrollbar
|
| Thank you very much, I was not aware of this issue. Changed
| both to `w-full`, hope scrollbar goes away. :)
|
| BTW As a pizza aficionado I deeply enjoyed "wood fired pizza".
| :D
| glofish wrote:
| I believe that the author misremembers the common solution for
| the first question
|
| > _3 hens lay 3 eggs in 3 days. How many eggs do 12 hens give in
| 12 days?_
|
| > [...] _When we got to it, everybody shouted "Three!"_
|
| I don't see why the "common" wrong answer would be 3. Why would
| anybody think that? The problem looks like this 3, 3 -> 3
|
| When we see the 12 and 12 the intuitive, common, _wrong_ answer
| should be 12 eggs.
|
| That's what _makes sense_ IMHO.
| photochemsyn wrote:
| How to annoy the teacher:
|
| > "Ovulation (release of the yolk from the ovary) occurs every
| 24 - 26 hours regardless of fertilization (so a rooster is not
| needed). A hen ovulates a new yolk after the previous egg was
| laid. It takes 26 hours for an egg to fully form (white and
| shell added), so a hen will lay an egg later and later each
| day. Eventually the hen will lay too late in a day for
| ovulation to be signaled. She will then skip a day or more
| before laying another egg." (UWisconsin Livestock)
|
| So, the 3 hens must have been at the least productive point in
| their egg-laying cycle over the initial three-day time
| period... Now if we have the two-hour daily offset, over 12
| days, ummm, maybe two or three days skipped per hen? So, ah,
| 108-120 eggs is what the farmer could expect from the 12 hens?
| mlajtos wrote:
| This is gold!
| dvh wrote:
| I remember a story where school teacher was showing plastic
| animals and asking kids what it was. At some point all kids
| wrote "cow" except for one farm boy who wrote "goat". Turns
| out the plastic cow had wrong number of udders. Cows have
| four, goats have two.
| mlajtos wrote:
| Once I saw 5-armed snowflake as an icon on weather forecast
| in TV.
| ouid wrote:
| He misremebered the third task as well, since his solution to
| it, as stated, is wrong. If you aren't given that every card
| has a number on one side and a color on the other, then you
| need to check everything which does not satisfy the consequent.
| mlajtos wrote:
| I added "We have cards with a number and color on each face."
| to the task statement. Hope this makes rules clearer as I
| could not phrase it better.
| ouid wrote:
| The rules were clear before, what has changed is that they
| are now _different rules_. You were just wrong before.
|
| You've still got a problem though, now when you ask the
| reader to verify the rule, you have to explicitly state
| that they're not trying to verify the first rule.
| mlajtos wrote:
| Oh shit, you are right. Added a completely explicit
| statement under the problem question. In both situations.
|
| Thank you for pointing this out.
| menzoic wrote:
| the article says everyone shouted 12 not Three
| forbiddenlake wrote:
| It said 3 originally.
|
| https://news.ycombinator.com/item?id=35688034
| mlajtos wrote:
| Of course it was "Twelve!". Thank you for bringing this up. It
| is fixed now.
| thechao wrote:
| I took me surprisingly long to convince myself the author was
| correct: the answer is 48 because there's 4x as many hens and
| 4x as many days: 3 x 4 x 4 = 48. The "wrong" way to solve this
| problem is to compute hen-laying in terms of hen-egg-days, and
| then scaling.
| c22 wrote:
| I dunno, I immediately noted that 12 hens must lay 12 eggs in
| 3 days (it takes 3 days for a hen to lay an egg). Then I just
| asked myself how many 3-days are in 12-days--okay, 12 [eggs]
| * 4 [3-day egg periods] = 48...
|
| I didn't need to resort to pen and paper, but I'm also not in
| primary school.
| jimmaswell wrote:
| I always skip straight to dimensional analysis for such a
| thing.
|
| 3 eggs / 3 chickens / 3 days * 12 chickens * 12 days
| cancels out to 48 eggs
| jkubicek wrote:
| There are a lot of right ways to solve this problem, but
| this is clearly the most fun way.
| hgsgm wrote:
| That's because it's output/input = eggs/(hen*day), not
| eggs/(hen/day), which is clear from meaning and logic if you
| aren't a GPT 2 LLM
| mlajtos wrote:
| There is a visual illustration in the article. I thought it
| might be useful for understating why it's 48.
| [deleted]
| jaclaz wrote:
| >The "wrong" way to solve this problem is to compute hen-
| laying in terms of hen-egg-days, and then scaling.
|
| I am not sure to understand what you mean by "wrong" way, I
| got the right result through a different reasoning that seems
| to me based on hen-egg-days.
|
| It takes 3 days for 3 hens to make 3 eggs.
|
| The 3 days time are "fixed", i.e. it takes 3 days for each
| hen to make its own egg.
|
| The egg production rate is 1/3 egg per day per hen.
|
| So I have 12 hens x 12 days x 1/3 = 144 x 1/3 = 48
| moondrek wrote:
| I read it as the students being familiar with a similar "trick"
| question and erroneously pattern matching on that.
|
| Another similar riddle goes like:
|
| "If it takes 5 machines 5 minutes to make 5 widgets, how long
| would it take 100 machines to make 100 widgets?"
|
| The correct answer there being "5 minutes", and the intuitive
| wrong answer being "100" as per your post.
|
| That would make sense as to why the teacher also expected "3",
| since they should be familiar with the existence of their non-
| intuitive questions, even if they misremember the specific non-
| intuition.
| Cthulhu_ wrote:
| Or as to why adding more people to a late project makes it
| later: 1 woman takes 9 months to grow a baby, how long does
| it take 9 women to grow a baby?
| firstlink wrote:
| > The teacher was perplexed of why I said such wrong answer to
| such simple question.
|
| This is extremely toxic on many levels. A teacher who doesn't
| know their subject matter or at least can't be bothered to
| refresh before a lesson. A teacher who submits to social pressure
| from their students when determining the answer. A teacher who
| shames students for being wrong. A teacher who shames students
| for being right!
|
| The students who were wrong initially will, thanks to this
| teacher, have completely failed to learn the lesson the problem
| was teaching. They will remember only that they got the problem
| right (the teacher said so) and only due to some technicality in
| the answer book were they subjected to an embarrassing "lesson"
| from the smart kid.
|
| Yes, this is making a mountain of a molehill, but molehills like
| this can have mountainous impacts on society.
| tshaddox wrote:
| I mean, there's a lot going wrong with this "lesson." The
| exercise is pretty clearly designed to trick people, which is
| very counterproductive if the goal is to teach people about
| multiplication or fractions or whatever. It's really not that
| damning that a teacher would be fooled into skipping the actual
| work, since that deceit is clearly the entire point of choosing
| those numbers.
|
| It's a bit like if a proofreading exercise in an English class
| deliberately did that trick where you put two "the"s next to
| each other but across a line break. That will often fool a
| skilled English writer, but that says nothing about the
| writer's skill.
| rcoveson wrote:
| While the teacher in the story obviously bungled it, I
| disagree with your interpretation of the lesson. The point is
| not to "teach people about multiplication or fractions or
| whatever". The point is to teach them to recognize
| dimensional parameters in a natural scenario. This can be
| counter-intuitive; we have a bias towards linear
| relationships. Somebody twice as tall is twice as heavy,
| thinks the child.
|
| Tricks ("surprises" is a less negative descriptor) are
| important because they wake your brain up. Most people don't
| like learning arbitrary things for their own sake, but they
| don't like being fooled either.
|
| Describe the base rate fallacy to a room of college students
| and it might stick for the rest of the semester, but give
| them this problem[0] and tell them how many _licensed
| doctors_ it tricked (after you let them answer incorrectly,
| of course) and they might remember it for a while longer.
|
| 0. http://pi.math.cornell.edu/~mec/2008-2009/TianyiZheng/Baye
| s....
| tshaddox wrote:
| If the point of the lesson truly was to identify that the
| exercise was chosen deliberately to be deceptive, then
| sure. But I doubt it. Your example is very different,
| because there's no _obvious_ answer that one might spit out
| instantly by doing naive pattern matching.
| rcoveson wrote:
| The fact that there is an obvious but incorrect answer
| that you might spit out if you were just pattern matching
| is the whole point. It tricks you into being _wrong_ ,
| which "tricks" you into _wanting_ to know why, where
| before you might not have cared about the answer or the
| underlying reasoning at all. You think the question just
| accidentally picked tricky numbers? Or you think the
| reason that it picked them was just malice towards
| children?
| tshaddox wrote:
| > You think the question just accidentally picked tricky
| numbers? Or you think the reason that it picked them was
| just malice towards children?
|
| I think it's probably just abysmal pedagogy. This should
| be pretty clear given that almost the entire class and
| even the teacher fell for the trick, which shows that
| it's a terrible way to teach something _and_ even a
| terrible way to test if they know how to do the thing you
| 're wanting them to learn. The teacher and many of the
| children probably did in fact know how how to do the
| calculation and solve the problem, and would have done so
| if the numbers weren't very deliberately chosen to invoke
| a hasty pattern-matching reaction.
| tzot wrote:
| 1. I had a similar issue in the fifth grade of elementary school
| (in my country, elementary school is 6 years with an extra year
| at age about 5 before pre-elementary school, then three years of
| middle high, then three years of optional high school before
| university, but almost everyone does the full 12/13 years.), when
| we discussed Fahrenheit (foreign) vs Celsius (domestic). The
| teacher said that 50degC is 106degF because 212degF is 100degC
| and 212/2 = 106. At the time I said "this doesn't feel right, 106
| is not in the middle between 32 and 212" but I couldn't phrase it
| mathematically yet, and the teacher didn't take my objection in a
| good manner. Afterwards I learnt that he understood his error but
| never said a word to me about it.
|
| 2. I don't see the colours on the "back sides" of the cards. I
| just see the cards flipped/rotated 180deg around the Y-axis
| (probably through CSS), so all numbers are visible, just in cards
| 3 & 4 the numbers are mirrored.
| eviks wrote:
| How did you learn it about the teacher?
| mlajtos wrote:
| 2. Firefox user, right? It should be fixed now. Thank you for
| reporting this issue.
|
| 1. You reminded me of another nice example of wrong shortcuts.
| Average speed of car from city A to city B is 120km/h. Average
| speed back (B -> A) was 80km/h. What was the average speed of
| the car on the whole path (A -> B -> A)? It's not 100km/h.
| firstlink wrote:
| > It's not 100km/h.
|
| Of course it is... if you're averaging over distance. There's
| no reason that's less valid than whichever average you have
| in mind.
| joebiden2 wrote:
| How do you arrive at 100km/h if you're averaging over
| distance?
|
| I don't doubt you're right, but I don't see it. My
| calculation is as follows: suppose the distance between
| both cities is 120km. The trip A -> B thus takes exactly
| one hour. The trip B -> A takes 1.5 hours (120km / 80km/h)
| = 1.5h. The total trip therefore takes 2.5h, the total
| distance is 240km, which averages at 96km/h (240/(1 +
| 1.5)).
|
| A simple (V1 + V2)/2 is surely possible, but I don't see
| the value of that average, I can't think of a situation
| where it would be useful.
| tkot wrote:
| I think you are thinking about a case where someone goes
| 120km/h for 1 hour and then 80km/h for 1 hour instead of
| going over a fixed distance at 120km/h and then going over
| the same distance at 80km/h.
| atleastoptimal wrote:
| This entire post is "I am le smart, so smart that other people
| think I'm wrong"
|
| I think it would come across better if the author mixed times
| where their non-traditional intuition gave them the wrong answer.
| eyelidlessness wrote:
| I don't think that's a remotely fair characterization of the
| author's self-evaluation. A fair bit of the article is spent
| expressing embarrassment, presuming they _were in fact_ wrong,
| and surprise at discovering that presumption was their error.
|
| What you're describing would be an entirely different post,
| likely making a very different point. There's value in
| exploring "sometimes your idiosyncratic intuitions turn out to
| be right", without muddying that with the obvious intuition
| that this won't always be the case. There's value in exploring
| the idea through demonstrative anecdote without counter
| examples. If anything, counter examples might influence the
| kind of pattern matching biases which lead to such faulty
| intuitions.
|
| Anyway, I think the author has demonstrated that they're
| fallible and happy to be corrected. They've acknowledged more
| concrete problems with the post and made changes to address
| them at least a couple times.
| denton-scratch wrote:
| I didn't get why he deprecated distance/time as a direct way of
| determining average velocity.
|
| Is this pedantry about "average" vs. "mean"? As I understand it,
| "average" is a vague word, one of whose meanings is "mean". And I
| imagine most of the other meanings would have distance/time as
| their value as well.
| mpeg wrote:
| I love these kind of logic puzzles, if you're up for losing an
| evening the clips game [0] is one of my long time favourite ones.
| It starts easy but puzzles 6 and 7 are tricky to do within the
| assertion limit
|
| [0]: https://md5crypt.github.io/clipsgame/
| mdonahoe wrote:
| I can't even beat the first level.
| mlajtos wrote:
| I also struggle. :D I think `(eats tiger plant)` would be
| fine, but it isn't.
| mdonahoe wrote:
| Ah, my smarter friend figured it out.
|
| Hint: what is eatable?
| mlajtos wrote:
| I passed with `(is tiger eatable)`. I don't get it. :D
| olafalo wrote:
| I'm not very good at this but I'll see if I can explain
| my thought process:
|
| (is tiger fed) requires (eats tiger ?food)
|
| which means there must exist ?what such that (is ?what
| eatable) and (is ?what ?food)
|
| But in the fact table, nothing is actually eatable! We
| could have our tiger eat plants by asserting (eats tiger
| plants) and (is grass eatable), but that's two
| assertions. Tigers eat animals, and tigers ARE animals,
| so since we need to make something eatable we might as
| well do it on the only animals we have: (is tiger
| eatable)
|
| I got level 2 as well but only after checking for hints
| in the source. I have been thoroughly nerd-sniped.
| roenxi wrote:
| Interesting. So we can tell from the last example that humans are
| perfectly good at doing this reasoning in theory and that
| therefore the trouble is actually in interpreting what the
| question is asking.
|
| Put another way, if the question could be beamed into someone's
| head they'd get it right. The flaw is in the part of the mind
| that loads the words into their brain and not the reasoning after
| that.
| travisjungroth wrote:
| I don't think this is the most accurate way to describe it.
| It's considering the mind as separate from the brain. There's
| also no true version of the problem to "beam" into the brain.
| That would just be a third version.
|
| Our brains aren't purely general purpose problem solvers. There
| are parts of the brain that are specifically related to facial
| recognition versus all images. We're better at recognizing
| differences in faces versus buildings.
|
| There seems to be a similar "hardware acceleration" for
| problems about people following rules (can't remember the
| source for this). People generally do better on problems if you
| can frame them as who is breaking a rule, who is lying, etc. I
| think the last example would still be easier to solve on
| average even if it was a rule unfamiliar to people.
| mlajtos wrote:
| I think it shows that we suck at generalization. Even though
| the task is the same, we are not able to see the same pattern
| in an arbitrary framing of the same problem. Beaming directly
| might help.
| Robotbeat wrote:
| So I know people are probably tired of posts that just plug
| something into ChatGPT, but I think it might be interesting to
| see how GPT3.5/ChatGPT might handle it.
|
| It got the answer to the hen problem wrong every time. I kept
| modifying the prompt to let it know it was a tricky problem, to
| think step by step, to self-critique and adjust, & tried at
| temperatures of 0, 0.7, and 1 multiple times and after a dozen
| tries it kept giving either 12, 36 (most common), or 144.
| (Starting a new session every time.)
|
| > _This is a tricky arithmetic word problem. Think step by step.
| After having an initial answer, critique it, double check it, and
| change it to the correct answer if necessary:
|
| 3 hens lay 3 eggs in 3 days. How many eggs do 12 hens give in 12
| days?
|
| Again, show your work and don't just give the answer right away
| but reason through the problem._
|
| That is, UNTIL I gave it the specific advice that the article
| gave, adding this to the beginning of the above prompt (with
| another fresh session):
|
| > _I know that humans are biased for intuitive understanding
| rather than logical one. Intuitions are shortcuts that are very
| valuable, but sometimes they are just plainly wrong._
|
| ...and then it got the right answer immediately, first try. So
| this trick isn't just a neat blog post, but empirically _useful_
| , and I'll probably include it in prompts from now on for these
| kinds of problems...
|
| (And note this is a blog post from today, so the blog post was
| not in the training set.)
| mlajtos wrote:
| I love your comment.
|
| I tried GPT-4 via ChatGPT and it nailed it without any prompt
| engineering.
|
| ---
|
| Prompt: 3 hens lay 3 eggs in 3 days. How many eggs do 12 hens
| give in 12 days?
|
| GPT-4:
|
| Let's first determine how many eggs one hen can lay in 12 days.
|
| We know that 3 hens can lay 3 eggs in 3 days. This means that
| each hen lays 1 egg in 3 days (3 hens / 3 eggs = 1 hen / 1
| egg).
|
| To find out how many eggs one hen can lay in 12 days, we need
| to multiply the number of eggs laid in 3 days by 4 (since 12
| days is 4 times longer than 3 days). So, one hen will lay 1 egg
| * 4 = 4 eggs in 12 days.
|
| Now, we have 12 hens. To find out how many eggs they can lay in
| 12 days, we just multiply the number of eggs one hen can lay in
| 12 days by the number of hens:
|
| 12 hens * 4 eggs per hen = 48 eggs in 12 days.
|
| ---
|
| To be completely fair, this math problem can be googled, so it
| is probably in the training set.
| gorpy7 wrote:
| somehow this answer reminded me of the good advice for
| starting to tighten a nut. first turn left till you feel the
| nut drop- then turn right. I feel there is a universal lesson
| here. Maybe going, as the crow flies, directly toward your
| goal isn't always best.
| codeflo wrote:
| Very much yes to your first sentence, but I don't think "just
| plug something into ChatGPT" is a fair summary of your (very
| interesting) post, so you're fine with me.
|
| I've seen several statements recently, including one in a talk
| by a Microsoft researcher working with OpenAI, that claim that
| all the RLHF safety training introduces human biases and makes
| the models worse at abstract reasoning. I really wonder why a
| relatively small amount of human feedback has that effect,
| while training on this gigantic human-written corpus doesn't
| seem to introduce the same biases. That hints at something
| interesting going on.
| mlajtos wrote:
| We are gonna need to prefix all prompts with "From now on,
| act without human biases."
| panarky wrote:
| To human ears, the problem of 3 hens, 3 eggs, 3 days is a
| simplified and more obvious problem than the original:
|
| "If a hen and a half can lay an egg and a half in a day and a
| half ..."
|
| But it's the same problem in the same form, and GPT4 nails it
| without extra goofy prompting like "don't replicate human
| cognitive biases".
|
| A more interesting question would cross logic with problem
| solving.
|
| "If a hen and a half can lay an egg and a half in a day and a
| half, how many hens are needed _to fill Yankee Stadium with
| eggs in a year?_ "
|
| But GPT4 doesn't have very good intuition on how to estimate
| the volume of Yankee Stadium:
|
| _Yankee Stadium has a seating capacity of approximately
| 50,000 spectators.
|
| Let's assume each spectator occupies a space of 2 cubic
| meters (which is a generous estimate, including seating and
| legroom).
|
| Total volume (approx.) = 50,000 seats x 2 m3/seat = 100,000
| m3_
| still_grokking wrote:
| I didn't know this "puzzle" until just now.
|
| Stupid question: What is the "tricky" part about it? How can
| someone who isn't completely retarded get this even wrong?
|
| This puzzle is trivial, and especially doesn't require any kind
| of "thinking around the corner". So what is this about?
|
| That language models can't do math (or logic) and can't even
| reliably tell which of two given written out numbers is the
| bigger one is imho a different story.
|
| Language models are great for text. But when you need a tool
| for math and/or logic you should use an adequate tool. We have
| for example algebra systems for that. Or prove assistants. Or
| just good old Prolog. It makes no sense to use the wrong tool
| and than wonder that the results are terribly wrong.
|
| But for the above "puzzle" you don't even need a calculator. So
| I really don't get the point.
| Etherlord87 wrote:
| The trickiness is that it was a part of a set of easy
| questions. You have 3 same numbers x x x, then change two of
| them, and ask what's the third y y ? It seems obvious it's y
| again.
|
| Of course if you have some experience with math (apparently
| more experience than kids in school have), it's immediately
| apparent it's a multiplication and the product doesn't scale
| linearly... But I can imagine myself still answering this
| incorrectly just because I have no incentive to be careful
| and I'm in a state of mind to just answer the question
| quickly. Though while reading the article I solved it, in
| memory.
| still_grokking wrote:
| > The trickiness is that it was a part of a set of easy
| questions. You have 3 same numbers x x x, then change two
| of them, and ask what's the third y y ? It seems obvious
| it's y again.
|
| OK... That's all?
|
| I start to see the problem.
|
| But I'm not going to write what my conclusion is as this
| would get likely down-voted into oblivion. (I was actually
| expecting down-votes already just for asking but still
| wanted to understand what the issues here is as I honestly
| don't see any "tricky part" in the "puzzle" at all.)
| alanfalcon wrote:
| It is tricky because humans are biased for intuitive
| understanding rather than logical understanding. Intuitions
| are shortcuts that are very valuable, but sometimes they are
| just plainly wrong. Many people will try to apply intuition
| rather than logic because that works with many other
| similarly worded questions.
| still_grokking wrote:
| Still don't get it.
|
| The imho intuitive answer is the right one.
|
| I really tried to get what could be tricky about this
| question. But there is just nothing.
|
| I have no clue how anybody could arrive at anything else[+]
| than 48. It's trivial. There is no "puzzle" at all.
|
| + (As long as you don't consider RL biology, as than then
| there is no definite answer, only some probability.)
| Kranar wrote:
| With GPT 4 it nailed it on the first try:
|
| >3 hens lay 3 eggs in 3 days. How many eggs do 12 hens give in
| 12 days?
|
| >Let's break down the given information and solve this problem
| step by step.
|
| We know that 3 hens lay 3 eggs in 3 days.
|
| First, we'll find out how many eggs 1 hen lays in 3 days. 3
| hens lay 3 eggs, so 1 hen lays 1 egg in 3 days.
|
| Now, let's find out how many eggs 1 hen lays in 12 days. In 3
| days, 1 hen lays 1 egg, so in 12 days, 1 hen will lay (12 / 3)
| * 1 = 4 eggs.
|
| Finally, we'll find out how many eggs 12 hens lay in 12 days.
| If 1 hen lays 4 eggs in 12 days, then 12 hens will lay 12 * 4 =
| 48 eggs in 12 days.
| kqr wrote:
| > So this trick isn't just a neat blog post, but empirically
| useful, and I'll probably include it in prompts from now on for
| these kinds of problems...
|
| Alernative take: this is a very classic problem used to
| illustrate this kind of System 1 inaccuracy, so by deliberately
| including words that increase the conditional probability of us
| looking at a text that includes discussion on System 1
| inaccuracies, you have essentially narrowed down the search
| space for our dear stochastic parrot to those texts that do
| contain the right answer.
|
| This sounds dismissive, but I don't mean it that way. I'm just
| saying that "think about human biases" may not be a universally
| useful cue, it may only work specifically when trying to get
| the model to look like a discussion on human biases, and the
| typical sample problems used to illustrate that.
| rcme wrote:
| I'm always amazed that people can get so hung up on events that
| happen in elementary schools. This piece feels like it could be
| posted to /r/iamverysmart.
| munificent wrote:
| There's a certain set of people who by virtue of genetics
| upbringing or both, feel very strongly:
|
| 1. You should do your best to please authority figures.
|
| 2. You should do your best to be correct.
|
| It's _really_ distressing the first time they run into a clear
| example where those two deep-seated needs are in conflict. For
| a lot of gifted teacher 's pet kind of kids, this is their
| Kobayashi Maru moment.
|
| I still remember mine too. :-/
| mlajtos wrote:
| > I'm always amazed that people can get so hung up on events
| that happen in elementary schools. This piece feels like it
| could be posted to /r/iamverysmart.
|
| Ages were 10, 15 & ~23 respectively. Yes, the post suffers from
| selection bias. I will write more about my failures in the
| future.
|
| And as per your request, I went and try post it to
| /r/iamverysmart, but only screenshot submissions are allowed
| and screenshot of the whole post wouldn't be very effective.
| yaakov34 wrote:
| Can't pass by without making a public service announcement about
| the average speed example: don't compute integrals using the
| calculus definition of Sum(f(x_i) * delta). Look up quadrature or
| numerical integration methods instead.
|
| Although it is true that you can approximate the average speed by
| taking an average of instantaneous speed measurements, that's
| usually a very bad way to do it in any real world situation.
| Numerical difference values are always noisier than the
| underlying quantity, sometimes to the point of being unusable, so
| of course if you can just read off the quantity you want directly
| (total difference over time), you should do that. But even if you
| can't, you should use a proper integration method instead of the
| calculus definition.
|
| I have seen the Sum(f(x_i) * delta) calculation in a lot of real-
| world code. It has bad convergence properties, bad errors when
| the function has large derivatives, and bad performance when the
| data has noise. Some of the code I've seen produces garbage
| results, or has thousands of function evaluations when you need,
| like, four. "Quadrature? I think I heard that before, but I don't
| remember what it means."
|
| In summary, please don't compute derivatives as
| (f(x_i+1)-f(x_i))/delta, or compute integrals as Sum(f(x_i) *
| delta), and especially, please don't do the first immediately
| followed by the second. Which also happens. Look up numerical
| methods instead.
|
| This has been a public service announcement.
| roelschroeven wrote:
| Isn't it the case that other numerical integration methods only
| work if you have a f(x) that you can evaluate for any x you
| want (albeit possible costly)?
|
| It seems to me that in many practical applications, the only
| thing you have to work with it is samples at discrete moments
| in time. It certainly seems to be the case here: "I would
| measure car's speed at every instant and produce an average of
| those measurements." We only know f(t_0), f(t_1), f(t_2), ...
| (and if we're lucky t_1-t_0 = t_2-t_1 = t_3-t_2 and so on); we
| have no way to compute things like f((t_0 + t_1)/2). In that
| case, how can we improve our calculation?
| yaakov34 wrote:
| Even if you're limited to uniform sampling, something as
| simple as the trapezoid rule will give you quadratic
| convergence instead of linear for the naive Sum(f(t_i) *
| delta). In other words, error proportional to 1/n^2, instead
| of 1/n, where n is the number of samples, which is going to
| be a huge difference. There are many methods depending on the
| constraints of your problem - your ability to choose sampling
| intervals, knowledge of the bounds of your function or its
| derivatives, etc. The PSA is to study these things, instead
| of just writing the first thing that seems familiar from a
| long-ago calculus class.
| hgsgm wrote:
| Huh? Quadrature is a general term for "measuring area". In this
| context it's a synonym for integration.
|
| I think you are trying to say that it's better to do weighted
| sums of fewer samples, instead of uniformly weighted Reimann
| sum. Both are "calculus definition" integration, of course,
| since calculus is true.
| ouid wrote:
| When mathematicians say Quadrature, they mean that if your
| function is suitably approximated by projecting onto some
| orthogonal basis functions, you can get very cheap
| approximations by cleverly expressing those integrals exactly
| as a linear combination of their values at certain points
| along the interval. You need very few.
|
| https://en.m.wikipedia.org/wiki/Gaussian_quadrature
|
| It is significantly more subtle than what you are thinking.
| kqr wrote:
| > please don't compute derivatives as (f(x_i+1)-f(x_i))/delta
|
| Isn't this exactly what finite differences means? Sure, it's
| not optimal in all respects, but it's incredibly general and
| easy to remember.
| yaakov34 wrote:
| This calculation amplifies any noise present in the values of
| the function, often to the point of the output being
| unusable. There are many methods that can be used to
| approximate derivatives, depending on the problem. Just as we
| shouldn't try to invent our cryptographic methods from
| scratch, we should take advantage of the extensive knowledge
| already in use for numeric methods.
|
| I've seen naive numeric methods cause everything from jerky
| motion in video games to incorrect navigation data for cars.
| kqr wrote:
| So what would you suggest as a general calculation for
| finite differences, especially in those cases when only
| forward differentiation is possible, e.g. with respect to
| time?
| yaakov34 wrote:
| The closest thing to a universal approach would be a
| Kalman filter. It's usually where you start when you have
| noisy measurements coming in, and you need to maintain
| state such as value and derivative.
|
| Since the original question was about computing the
| velocity of a car, and since I work in the automotive
| field, let's take a real example: you want to know the
| approximate position, acceleration, and velocity (linear
| and angular) of your car. Your inputs are driven wheel
| speed (noisy, affected by wheelspin), non-driven wheel
| speed (noisy), accelerometer output (inaccurate, only
| present for some axes), GPS position (updated
| occasionally, has errors), and steering angle (pretty
| accurate, can be put into a chassis dynamics model).
| Almost certainly, you would use a Kalman filter to
| estimate the state of the car. Naive approaches such as
| subtracting two wheel speed values to obtain acceleration
| will not work well.
|
| My point is that we should remember that numerical
| algorithms are a developed field with a lot of knowledge,
| and we should take advantage of the proven approaches.
| Sometimes, programmers who are not specifically from the
| physics or numerical fields, and who need to perform some
| computation, reach for a very simple approach such as the
| rectangle-rule integrals, and get bad results.
| kqr wrote:
| I see -- we are talking about two different things!
|
| You are working on the problem of figuring out the hidden
| state based on noisy observations and a transition model.
|
| I interpreted your statement much more broadly, so I was
| trying to discuss the problem of computing the explicit
| next state based on perfect knowledge of the transition
| model, in which case (y_1-y)/dt is a perfectly viable
| approach to estimate the derivative.
|
| (You can do better by adding higher-order terms of
| course, but I haven't found that to be universally useful
| compared to making dt smaller.)
| eviks wrote:
| > task is presented in context of social relations (or more
| broadly "evolutionarily familiar problems"), people tend to get
| it immediatelly, or rather intuitivelly. Bonkers, right?
|
| That's obvious, not bonkers - if you have experience in some
| class of problems, it's easier to solve the same kind of problem,
| the patterns match in your head, so the solution pattern is used
| right away. Just like you can solve these numeric problems
| quickly if you're trained on them
| moffkalast wrote:
| So in a sense we have "hardware acceleration" for solving
| certain types of problems?
| mlajtos wrote:
| Yes. https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow
| eviks wrote:
| Wetware acceleration?
| mlajtos wrote:
| But the task is exactly the same in both situations - only the
| symbols on the cards have less/more familiar meaning. I see
| this as being not able to generalize from known examples.
| ouid wrote:
| This argument is invalid. These two things are easy to
| pattern match, but every theorem comes with an obligation to
| show that your current state matches your predicate, which is
| very hard, generally.
| mannykannot wrote:
| I question whether it is a matter of not able to generalize
| from known examples, as I suspect that most of the people who
| do well on the corresponding social questions did not know
| the answer beforehand, but nevertheless found their way to
| the correct one.
|
| The article you link to seems to support this view: _"
| Cosmides and Tooby argued that experimenters have ruled out
| alternative explanations, such as that people learn the rules
| of social exchange through practice and find it easier to
| apply these familiar rules than less-familiar rules."_
|
| I now wonder if people tend to do better on abstract Wason
| tasks after being prompted by logically-equivalent social-
| relations questions, as opposed to being prompted with
| additional abstract problems (without being told whether they
| gave the correct responses to the prompts.)
|
| My guess at what might be underlying these outcomes is that
| people clearly see the purpose of the social rules but not of
| the abstract ones, and our intuition is attuned to assessing
| how various social scenarios will play out, taking into
| account the motives of the actors (this might seem to be the
| same as what Cosmides and Tooby claim has been ruled out, and
| maybe it is, but I _think_ what they are ruling out is us
| having learned various schema to be applied mechanically in
| the corresponding social situations.)
|
| Alternatively, could it be something as simple as this:
| having an understanding of the social rule's purpose helps
| the subjects avoid confusing conditional and biconditional
| rules? At least in the English language, acceptable usage can
| be quite ambiguous in this regard.
|
| https://en.wikipedia.org/wiki/Wason_selection_task
| eviks wrote:
| It's not exactly the same exactly because the difference in
| symbols maps to different patterns - one is a pattern from
| school, which you forgot since it hasn't been used for many
| years (or never learned properly). Another is a pattern that
| keeps getting refreshed throughout your life. It's a very
| different cognitive workload, leading to difference off
| required effort/correctness, especially in these simplistic
| low value tests
|
| And why is your assessment of the ability to generalize
| limited to only this math representation which people don't
| use?
| mlajtos wrote:
| > It's not exactly the same exactly because the difference
| in symbols maps to different patterns
|
| Does this mean that these should be treated differently?
|
| a) "3 * 2 = 6"
|
| b) "III + III = VI"
|
| c) "tres et tres aequalis sex"
|
| d) "ln(2) + ln(3) = ln(6)"
|
| While symbols (and even patterns) are not the same, you
| could see that all these examples point to the same thing.
|
| > And why is your assessment of the ability to generalize
| limited to only this math representation which people don't
| use?
|
| I think people fail to generalize even in even simpler
| situations. I can't think of anything obvious that would
| apply to adults, but this is a nice example for children:
| https://en.wikipedia.org/wiki/Conservation_(psychology) &
| video demonstration:
| https://www.youtube.com/watch?v=gnArvcWaH6I
| eviks wrote:
| You could see that mostly because the first example does
| everything as that's what everyone knows, the second is
| similar and solidifies the pattern, which you could then
| match to the rest
|
| The question is, what does it tell you when people
| struggle with Roman LCX*ML more vs the more familiar
| first pattern besides the fact that training/familiarity
| makes it easier?
|
| And it's telling you have to reach for the kids
| mlajtos wrote:
| Oh, I see your point now. Multiplication in Roman
| numerals is strikingly different (and much harder) than
| addition. So representation of the pattern might be so
| alien that it's unrecognizable even though the pattern is
| known.
| Beldin wrote:
| Wait, they're not - at least, in the example of the article.
| Allow me to digress:
|
| There is a figure with 4 cards, showing: all orange, all
| blue, the number 2, the number 1.
|
| The only condition is that an even number must have an all
| blue flipside. This condition is either irrelevant or
| satisfied for the blue card. For _all_ other cards, we don 't
| know. The card with "1" may have the number "4" on the flip
| side. So we need to check ask other cards.
|
| TL;DR: only a blue face guarantees compliance with the rule,
| all other cards need to be flipped.
|
| In the second case, the unstated assumption that the cards
| all have one side worth digits and one side with symbols is
| still unstated, but a lot stronger. (These are things to
| check for age-alcohol compliance, so logically they must
| contain both age and alcoholic state of consumption.)
| mlajtos wrote:
| > The only condition is that an even number must have an
| all blue flipside. This condition is either irrelevant or
| satisfied for the blue card. For all other cards, we don't
| know. The card with "1" may have the number "4" on the flip
| side. So we need to check ask other cards.
|
| Very good point, thank you. I have added an explicit
| statement that cards are consistent.
| gdprrrr wrote:
| Could it be explained in terms of abstract vs concrete? I've
| heard that we humans can reason about concrete things like
| beer etter than abstract concepts like numbers and colors? (I
| know that age is also a number, but is has more assigned
| meaning)
| mlajtos wrote:
| I think it could as dealing with concrete things is a
| direct evolutionary advantage grounded in reality. Usually,
| we are shit dealing with abstract concepts. A quote that
| beautifully demonstrates this:
|
| "To deal with hyper-planes in 14-dimensional space,
| visualize a 3-D space and say 'fourteen' to yourself very
| loudly. Everybody does it." -- Geoffrey Hinton
| deepsun wrote:
| > That is literaly measuring car's speed at every instant!
|
| Well, there's no way to measure to measure speed instantly, it's
| always over some period even if small (take some time interval
| and see how much distance have been covered). For example, at
| time zero you cannot know what speed the car has.
|
| Similar like when I was working with audio, producing frequency
| diagram -- the frequency cannot be gathered from a single
| timestamp on the waveform, you always need some interval, like
| 20ms to get an FT.
___________________________________________________________________
(page generated 2023-04-24 23:01 UTC)