[HN Gopher] Show HN: Semantic Calculator (king-man+woman=?)
___________________________________________________________________
Show HN: Semantic Calculator (king-man+woman=?)
I've been playing with embeddings and wanted to try out what
results the embedding layer will produce based on just word-by-word
input and addition / subtraction, beyond what many videos / papers
mention (like the obvious king-man+woman=queen). So I built
something that doesn't just give the first answer, but ranks the
matches based on distance / cosine symmetry. I polished it a bit so
that others can try it out, too. For now, I only have nouns (and
some proper nouns) in the dataset, and pick the most common
interpretation among the homographs. Also, it's case sensitive.
Author : nxa
Score : 63 points
Date : 2025-05-14 19:54 UTC (3 hours ago)
(HTM) web link (calc.datova.ai)
(TXT) w3m dump (calc.datova.ai)
| antidnan wrote:
| Neat! Reminds me of infinite craft
|
| https://neal.fun/infinite-craft/
| thaumasiotes wrote:
| I went to look at infinite craft.
|
| It provides a panel filled with slowly moving dots. Right of
| the panel, there are objects labeled "water", "fire", "wind",
| and "earth" that you can instantiate on the panel and drag
| around. As you drag them, the background dots, if nearby, will
| grow lines connecting to them. These lines are not persistent.
|
| And that's it. Nothing ever happens, there are no interactions
| except for the lines that appear while you're holding the mouse
| down, and while there is notionally a help window listing the
| controls, the only controls are "select item", "delete item",
| and "duplicate item". There is also an "about" panel, which
| contains no information.
| n2d4 wrote:
| In the panel, you can drag one of the items (eg. Water) onto
| another one (eg. Earth), and it will create a new word (eg.
| Plant). It uses AI, so it goes very deep
| thaumasiotes wrote:
| No, that was the first thing I tried. The only thing that
| happens is that the two objects will now share their
| location. There are no interactions.
| n2d4 wrote:
| Probably a bug then, you can check YouTube to find videos
| of people playing it (eg. [0])
|
| [0] https://youtu.be/8-ytx84lUK8
| firejake308 wrote:
| King-man+woman=Navratilova, who is apparently a Czech tennis
| player. Apparently, it's very case-sensitive. Cool idea!
| fph wrote:
| "King" (capital) probably was interpreted as
| https://en.wikipedia.org/wiki/Billie_Jean_King , that's why a
| tennis player showed up.
| nxa wrote:
| when I first tried it, king was referring to the instrument
| and I was getting a result king-man+woman=flute ... :-D
| BeetleB wrote:
| Heh. This is fun:
|
| Navratilova - woman + man = Lendl
| nikolay wrote:
| Really?! man - brain = woman woman - brain
| = businesswoman
| 2muchcoffeeman wrote:
| Man - brain = Irish sea
| nikolay wrote:
| Case matters, obviously! Try "man" with a lower-case "M"!
| Alifatisk wrote:
| Why does case matter? How does it affect the meaning?
| bfLives wrote:
| "Man" is probably being interpreted as the Isle of Man.
|
| https://en.m.wikipedia.org/wiki/Isle_of_Man
| G1N wrote:
| Man (capital M) is probably being interpreted as some
| proper noun, maybe Isle of Man in this case?
| karel-3d wrote:
| woman+penis=newswoman (businesswoman is second)
|
| man+vagina=woman (ok that is boring)
| sapphicsnail wrote:
| Telling that Jewess, feminist, and spinster were near matches
| as well.
| nxa wrote:
| I probably should have prefaced this with "try at your own
| risk, results don't reflect the author's opinions"
| dmonitor wrote:
| I'm sure it would be trivial to get it to say something
| incredibly racist, so that's probably a worthwhile disclaimer
| to put on the website
| dalmo3 wrote:
| I think subtraction is broken. None of what I tried made any
| sense. Water - oxygen = gin and tonic.
| adzm wrote:
| noodle+tomato=pasta
|
| this is pretty fun
| growlNark wrote:
| Surely the correct answer would be `pasta-in-tomato-sauce`?
| Pasta exists outside of tomato sauce.
| cabalamat wrote:
| What does it mean when it surrounds a word in red? Is this
| signalling an error?
| nxa wrote:
| Yes, word in red = word not found mostly the case when you try
| plurals or non-nouns (for now)
| rpastuszak wrote:
| This is neat!
|
| I think you need to disable auto-capitalisation because on
| mobile the first word becomes uppercase and triggers a
| validation error.
| iambateman wrote:
| Try Lower casing, my phone tried to capitalize and it was a
| problem.
| fallinghawks wrote:
| Seems to be a word not in its dictionary. Seems to not have any
| country or language names.
|
| Edit: these must be capitalized to be recognized.
| zerof1l wrote:
| male + age = female
|
| female + age = male
| G1N wrote:
| twelve-ten+five=
|
| six (84%)
|
| Close enough I suppose
| lightyrs wrote:
| I don't get it but I'm not sure I'm supposed to.
| life + death = mortality life - death = lifestyle
| drug + time = occasion drug - time = narcotic
| art + artist + money = creativity art + artist - money =
| muse happiness + politics = contentment
| happiness + art = gladness happiness + money =
| joy happiness + love = joy
| grey-area wrote:
| Does the system you're querying 'get it'? From the answers it
| doesn't seem to understand these words or their relations. Once
| in a while it'll hit on something that seems to make sense.
| bee_rider wrote:
| Life + death = mortality
|
| is pretty good IMO, it is a nice blend of the concepts in an
| intuitive manner. I don't really get drug +
| time = occasion
|
| But drug - time = narcotic
|
| Is kind of interesting; one definition of narcotic is
|
| > a drug (such as opium or morphine) that in moderate doses
| dulls the senses, relieves pain, and induces profound sleep but
| in excessive doses causes stupor, coma, or convulsions
|
| https://www.merriam-webster.com/dictionary/narcotic
|
| So we can see some element of losing time in that type of drug.
| I guess? Maybe I'm anthropomorphizing a bit.
| woodruffw wrote:
| colorless+green+ideas doesn't produce anything of interest, which
| is disappointing.
| dmonitor wrote:
| well green is not a creative color, so that's to be expected
| skeptrune wrote:
| This is super fun. Offering the ranked matches makes it
| significantly more engaging than just showing the final result.
| spindump8930 wrote:
| First off, this interface is very nice and a pleasure to use,
| congrats!
|
| Are you using word2vec for these, or embeddings from another
| model?
|
| I also wanted to add some flavor since it looks like many folks
| in this thread haven't seen something like this - it's been known
| since 2013 that we can do this (but it's great to remind folks
| especially with all the "modern" interest in NLP).
|
| It's also known (in some circles!) that a lot of these vector
| arithmetic things need some tricks to really shine. For example,
| excluding the words already present in the query[1]. Others in
| this thread seem surprised at some of the biases present -
| there's also a long history of work on that [2,3].
|
| [1] https://blog.esciencecenter.nl/king-man-woman-
| king-9a7fd2935...
|
| [2] https://arxiv.org/abs/1905.09866
|
| [3] https://arxiv.org/abs/1903.03862
| nxa wrote:
| Thank you! I actually had a hard time finding prior work on
| this, so I appreciate the references.
|
| The dictionary is based on https://wordnet.princeton.edu/, no
| word2vec. It's just a plain lookup among precomputed embeddings
| (with mxbai-embed-large). And yes, I'm excluding words that are
| present in the query because.
|
| It would be interesting to see how other models perform. I
| tried one (forgot the name) that was focused on coding, and it
| didn't perform nearly as well (in terms of human joy from the
| results).
| kaycebasques wrote:
| (Question for anyone) how could I go about replicating this
| with Gemini Embedding? Generate and store an embedding for
| every word in the dictionary?
| nxa wrote:
| Yes, that's pretty much what it is. Watch out for
| homographs.
| 7373737373 wrote:
| it doesn't know the word human
| grey-area wrote:
| As you might expect from a system with knowledge of word
| relations but without understanding or a model of the world, this
| generates gibberish which occasionally sounds interesting.
| fallinghawks wrote:
| goshawk-cocaine = gyrfalcon , which is funny if you know anything
| about goshawks and gyrfalcons
|
| (Goshawks are very intense, gyrs tend to be leisurely in flight.)
| kataqatsi wrote:
| garden + sin = gardening
|
| hmm...
| MYEUHD wrote:
| king - man + woman = queen
|
| queen - woman + man = drone
| bee_rider wrote:
| The second makes sense, I think, if you are a bee.
| blobbers wrote:
| rice + fish = fish meat
|
| rice + fish + raw = meat
|
| hahaha... I JUST WANT SUSHI!
| godelski wrote:
| data + plural = number data - plural = research king
| - crown = (didn't work... crown gets circled in red) king -
| princess = emperor king - queen = kingdom queen -
| king = worker king + queen = queen + king = kingdom
| boy + age = (didn't work... boy gets circled in red) man -
| age = woman woman - age = newswoman woman + age =
| adult female body (tied with man) girl + age = female child
| girl + old = female child
|
| The other suggestions are pretty similar to the results I got in
| most cases. But I think this helps illustrate the curse of
| dimensionality (i.e. distances are ill-defined in high
| dimensional spaces). This is still quite an unsolved problem and
| seems a pretty critical one to resolve that doesn't get enough
| attention.
| Affric wrote:
| Yeah I did similar tests and got similar results.
|
| Curious tool but not what I would call accurate.
| n2d4 wrote:
| For fun, I pasted these into ChatGPT o4-mini-high and asked it
| for an opinion: data + plural = datasets
| data - plural = datum king - crown = ruler
| king - princess = man king - queen = prince
| queen - king = woman king + queen = royalty
| boy + age = man man - age = boy
| woman - age = girl woman + age = elderly woman
| girl + age = woman girl + old = grandmother
|
| The results are surprisingly good, I don't think I could've
| done better as a human. But keep in mind that this doesn't do
| embedding math like OP! Although it does show how generic LLMs
| can solve some tasks better than traditional NLP.
|
| The prompt I used:
|
| _> Remember those "semantic calculators" with AI embeddings?
| Like "king - man + woman = queen"? Pretend you're a semantic
| calculator, and give me the results for the following:_
| nbardy wrote:
| I hate to be pedantic, but the llm is definitely doing
| embedding math. In fact that's all it does.
| gweinberg wrote:
| I got a bunch of red stuff also. I imagine the author cached
| embeddings for some words but not really all that many to save
| on credits. I gave it mermaid - woman and got merman, but when
| I tried to give it boar + woman - man or ram + woman - man, it
| turns out it has never heard of rams or boars.
| thatguysaguy wrote:
| Can you elaborate on what the unsolved problem you're referring
| to is?
| ericdiao wrote:
| Interesting: parent + male = female (83%)
|
| Can not personally find the connection here, was expecting father
| or something.
| ericdiao wrote:
| Though dad is in the list with lower confidence (77%).
|
| High dimension vector is always hard to explain. This is an
| example.
| TZubiri wrote:
| I'm getting Navralitova instead of queen. And can't get other
| words to work, I get red circles or no answer at all.
| gus_massa wrote:
| From another comment,
| https://news.ycombinator.com/item?id=43988861 King (with
| capital K) was a top 1 male tenis player.
| nxa wrote:
| This might be helpful: I haven't implemented it in the UI, but
| from the API response you can see what the word definitions are,
| both for the input and the output. If the output has homographs,
| likeliness is split per definition, but the UI only shows the
| best one.
|
| Also, if it gets buried in comments, proper nouns need to be
| capitalized (Paris-France+Germany).
|
| I am planning on patching up the UI based on your feedback.
| ericdiao wrote:
| wine - alcohol = grape juice (32%)
|
| Accurate.
| afandian wrote:
| There was a site like this a few years ago (before all the LLM
| stuff kicked off) that had this and other NLP functionality.
| Styling was grey and basic. That's all I remember.
|
| I've been unable to find it since. Does anyone know which site
| I'm thinking of?
| halter73 wrote:
| I'm not sure this is old enough, but could you be referencing
| https://neal.fun/infinite-craft/ from
| https://news.ycombinator.com/item?id=39205020?
| montebicyclelo wrote:
| > king-man+woman=queen
|
| Is the famous example everyone uses when talking about word
| vectors, but is it actually just very cherry picked?
|
| I.e. are there a great number of other "meaningful" examples like
| this, or actually the majority of the time you end up with some
| kind of vaguely tangentially related word when adding and
| subtracting word vectors.
|
| (Which seems to be what this tool is helping to illustrate,
| having briefly played with it, and looked at the other comments
| here.)
|
| (Btw, not saying wordvecs / embeddings aren't extremely useful,
| just talking about this simplistic arithmetic)
| raddan wrote:
| > is it actually just very cherry picked?
|
| 100%
| gregschlom wrote:
| Also, as I just learned the other day, the result was never
| equal, just close to "queen" in the vector space.
| Retr0id wrote:
| I think it's slightly uncommon for the vectors to "line up"
| just right, but here are a few I tried:
|
| actor - man + woman = actress
|
| garden + person = gardener
|
| rat - sewer + tree = squirrel
|
| toe - leg + arm = digit
| groby_b wrote:
| I think it's worth keeping in mind that word2vec was
| specifically trained on semantic similarity. Most embedding
| APIs don't really give a lick about the semantic space
|
| And, worse, most latent spaces are decidedly non-linear. And so
| arithmetic loses a lot of its meaning. (IIRC word2vec mostly
| avoided nonlinearity except for the loss function). Yes, the
| distance metric sort-of survives, but addition/multiplication
| are meaningless.
|
| (This is also the reason choosing your embedding model is a
| hard-to-reverse technical decision - you can't just transform
| existing embeddings into a different latent space. A change
| means "reembed all")
| jumploops wrote:
| This is super neat.
|
| I built a game[0] along similar lines, inspired by infinite
| craft[1].
|
| The idea is that you combine (or subtract) "elements" until you
| find the goal element.
|
| I've had a lot of fun with it, but it often hits the same
| generated element. Maybe I should update it to use the second
| (third, etc.) choice, similar to your tool.
|
| [0] https://alchemy.magicloops.app/
|
| [1] https://neal.fun/infinite-craft/
| ezbie wrote:
| Can someone explain me what the fuck this is supposed to be!?
| mhitza wrote:
| Semantical subtraction within embeddings representation of text
| ("meaning")
| matallo wrote:
| uncle + aunt = great-uncle (91%)
|
| great idea, but I find the results unamusing
| HWR_14 wrote:
| Your aunt's uncle is your great-uncle. It's more correct than
| your intuition.
| matallo wrote:
| I asked ChatGPT (after posting my comment) and this is the
| response. "Uncle + Aunt = Great-Uncle is incorrect. A great-
| uncle is the brother of your grandparent."
| lcnPylGDnU4H9OF wrote:
| Some of these make more sense than others (and bookshop is
| hilarious even if it's only the best answer by a small margin; no
| shade to bookshop owners). map - legend =
| Mercator projection noodle - wheat = egg noodle
| noodle - gluten = tagliatelle architecture - calculus =
| architectural style answer - question = comment shop
| - income = bookshop curry - curry powder = cuisine
| rice - grain = chicken and rice rice + chicken = poultry
| milk + cereal = grain blue - yellow = Fiji blue -
| Fiji = orange blue - Arkansas + Bahamas + Florida - Pluto =
| Grenada
| kylecazar wrote:
| Woman + president = man
| tlhunter wrote:
| man + woman = adult female body
| __MatrixMan__ wrote:
| Here's a challenge: find something to subtract from "hammer"
| which does not result in a word that has "gun" as a substring.
| I've been unsuccessful so far.
| neom wrote:
| if I'm allowed only 1 something, I can't find anything either,
| if I'm allowed a few somethings, "hammer - wine - beer - red -
| child" will get you there. Guessing given that a gun has a
| hammer and is also a tool, it's too heavily linked in the small
| dataset.
| tough wrote:
| hammer + man = adult male body (75%)
| Retr0id wrote:
| Well that's easy, subtract "gun" :P
| mrastro wrote:
| The word "gun" itself seems to work. Package this as a game and
| you've got a pretty fun game on your hands :)
| downboots wrote:
| Bullet
| neom wrote:
| cool but not enough data to be useful yet I guess. Most of mine
| either didn't have the words or were a few % off the answer,
| vehicle - road + ocean gave me hydrosphere, but the other options
| below were boat, ship, etc. Klimt almost made it from Mozart -
| music + painting. doctor - hospital + school = teacher, nailed
| it.
|
| Getting to cornbread elegantly has been challenging.
| downboots wrote:
| three + two = four (90%)
| LadyCailin wrote:
| [delayed]
___________________________________________________________________
(page generated 2025-05-14 23:00 UTC)