[HN Gopher] Show HN: Semantic Calculator (king-man+woman=?)
       ___________________________________________________________________
        
       Show HN: Semantic Calculator (king-man+woman=?)
        
       I've been playing with embeddings and wanted to try out what
       results the embedding layer will produce based on just word-by-word
       input and addition / subtraction, beyond what many videos / papers
       mention (like the obvious king-man+woman=queen). So I built
       something that doesn't just give the first answer, but ranks the
       matches based on distance / cosine symmetry. I polished it a bit so
       that others can try it out, too.  For now, I only have nouns (and
       some proper nouns) in the dataset, and pick the most common
       interpretation among the homographs. Also, it's case sensitive.
        
       Author : nxa
       Score  : 63 points
       Date   : 2025-05-14 19:54 UTC (3 hours ago)
        
 (HTM) web link (calc.datova.ai)
 (TXT) w3m dump (calc.datova.ai)
        
       | antidnan wrote:
       | Neat! Reminds me of infinite craft
       | 
       | https://neal.fun/infinite-craft/
        
         | thaumasiotes wrote:
         | I went to look at infinite craft.
         | 
         | It provides a panel filled with slowly moving dots. Right of
         | the panel, there are objects labeled "water", "fire", "wind",
         | and "earth" that you can instantiate on the panel and drag
         | around. As you drag them, the background dots, if nearby, will
         | grow lines connecting to them. These lines are not persistent.
         | 
         | And that's it. Nothing ever happens, there are no interactions
         | except for the lines that appear while you're holding the mouse
         | down, and while there is notionally a help window listing the
         | controls, the only controls are "select item", "delete item",
         | and "duplicate item". There is also an "about" panel, which
         | contains no information.
        
           | n2d4 wrote:
           | In the panel, you can drag one of the items (eg. Water) onto
           | another one (eg. Earth), and it will create a new word (eg.
           | Plant). It uses AI, so it goes very deep
        
             | thaumasiotes wrote:
             | No, that was the first thing I tried. The only thing that
             | happens is that the two objects will now share their
             | location. There are no interactions.
        
               | n2d4 wrote:
               | Probably a bug then, you can check YouTube to find videos
               | of people playing it (eg. [0])
               | 
               | [0] https://youtu.be/8-ytx84lUK8
        
       | firejake308 wrote:
       | King-man+woman=Navratilova, who is apparently a Czech tennis
       | player. Apparently, it's very case-sensitive. Cool idea!
        
         | fph wrote:
         | "King" (capital) probably was interpreted as
         | https://en.wikipedia.org/wiki/Billie_Jean_King , that's why a
         | tennis player showed up.
        
           | nxa wrote:
           | when I first tried it, king was referring to the instrument
           | and I was getting a result king-man+woman=flute ... :-D
        
           | BeetleB wrote:
           | Heh. This is fun:
           | 
           | Navratilova - woman + man = Lendl
        
       | nikolay wrote:
       | Really?!                 man - brain = woman       woman - brain
       | = businesswoman
        
         | 2muchcoffeeman wrote:
         | Man - brain = Irish sea
        
           | nikolay wrote:
           | Case matters, obviously! Try "man" with a lower-case "M"!
        
             | Alifatisk wrote:
             | Why does case matter? How does it affect the meaning?
        
               | bfLives wrote:
               | "Man" is probably being interpreted as the Isle of Man.
               | 
               | https://en.m.wikipedia.org/wiki/Isle_of_Man
        
               | G1N wrote:
               | Man (capital M) is probably being interpreted as some
               | proper noun, maybe Isle of Man in this case?
        
         | karel-3d wrote:
         | woman+penis=newswoman (businesswoman is second)
         | 
         | man+vagina=woman (ok that is boring)
        
         | sapphicsnail wrote:
         | Telling that Jewess, feminist, and spinster were near matches
         | as well.
        
         | nxa wrote:
         | I probably should have prefaced this with "try at your own
         | risk, results don't reflect the author's opinions"
        
           | dmonitor wrote:
           | I'm sure it would be trivial to get it to say something
           | incredibly racist, so that's probably a worthwhile disclaimer
           | to put on the website
        
         | dalmo3 wrote:
         | I think subtraction is broken. None of what I tried made any
         | sense. Water - oxygen = gin and tonic.
        
       | adzm wrote:
       | noodle+tomato=pasta
       | 
       | this is pretty fun
        
         | growlNark wrote:
         | Surely the correct answer would be `pasta-in-tomato-sauce`?
         | Pasta exists outside of tomato sauce.
        
       | cabalamat wrote:
       | What does it mean when it surrounds a word in red? Is this
       | signalling an error?
        
         | nxa wrote:
         | Yes, word in red = word not found mostly the case when you try
         | plurals or non-nouns (for now)
        
           | rpastuszak wrote:
           | This is neat!
           | 
           | I think you need to disable auto-capitalisation because on
           | mobile the first word becomes uppercase and triggers a
           | validation error.
        
         | iambateman wrote:
         | Try Lower casing, my phone tried to capitalize and it was a
         | problem.
        
         | fallinghawks wrote:
         | Seems to be a word not in its dictionary. Seems to not have any
         | country or language names.
         | 
         | Edit: these must be capitalized to be recognized.
        
       | zerof1l wrote:
       | male + age = female
       | 
       | female + age = male
        
       | G1N wrote:
       | twelve-ten+five=
       | 
       | six (84%)
       | 
       | Close enough I suppose
        
       | lightyrs wrote:
       | I don't get it but I'm not sure I'm supposed to.
       | life + death = mortality         life - death = lifestyle
       | drug + time = occasion         drug - time = narcotic
       | art + artist + money = creativity         art + artist - money =
       | muse              happiness + politics = contentment
       | happiness + art      = gladness         happiness + money    =
       | joy         happiness + love     = joy
        
         | grey-area wrote:
         | Does the system you're querying 'get it'? From the answers it
         | doesn't seem to understand these words or their relations. Once
         | in a while it'll hit on something that seems to make sense.
        
         | bee_rider wrote:
         | Life + death = mortality
         | 
         | is pretty good IMO, it is a nice blend of the concepts in an
         | intuitive manner. I don't really get                  drug +
         | time = occasion
         | 
         | But                  drug - time = narcotic
         | 
         | Is kind of interesting; one definition of narcotic is
         | 
         | > a drug (such as opium or morphine) that in moderate doses
         | dulls the senses, relieves pain, and induces profound sleep but
         | in excessive doses causes stupor, coma, or convulsions
         | 
         | https://www.merriam-webster.com/dictionary/narcotic
         | 
         | So we can see some element of losing time in that type of drug.
         | I guess? Maybe I'm anthropomorphizing a bit.
        
       | woodruffw wrote:
       | colorless+green+ideas doesn't produce anything of interest, which
       | is disappointing.
        
         | dmonitor wrote:
         | well green is not a creative color, so that's to be expected
        
       | skeptrune wrote:
       | This is super fun. Offering the ranked matches makes it
       | significantly more engaging than just showing the final result.
        
       | spindump8930 wrote:
       | First off, this interface is very nice and a pleasure to use,
       | congrats!
       | 
       | Are you using word2vec for these, or embeddings from another
       | model?
       | 
       | I also wanted to add some flavor since it looks like many folks
       | in this thread haven't seen something like this - it's been known
       | since 2013 that we can do this (but it's great to remind folks
       | especially with all the "modern" interest in NLP).
       | 
       | It's also known (in some circles!) that a lot of these vector
       | arithmetic things need some tricks to really shine. For example,
       | excluding the words already present in the query[1]. Others in
       | this thread seem surprised at some of the biases present -
       | there's also a long history of work on that [2,3].
       | 
       | [1] https://blog.esciencecenter.nl/king-man-woman-
       | king-9a7fd2935...
       | 
       | [2] https://arxiv.org/abs/1905.09866
       | 
       | [3] https://arxiv.org/abs/1903.03862
        
         | nxa wrote:
         | Thank you! I actually had a hard time finding prior work on
         | this, so I appreciate the references.
         | 
         | The dictionary is based on https://wordnet.princeton.edu/, no
         | word2vec. It's just a plain lookup among precomputed embeddings
         | (with mxbai-embed-large). And yes, I'm excluding words that are
         | present in the query because.
         | 
         | It would be interesting to see how other models perform. I
         | tried one (forgot the name) that was focused on coding, and it
         | didn't perform nearly as well (in terms of human joy from the
         | results).
        
           | kaycebasques wrote:
           | (Question for anyone) how could I go about replicating this
           | with Gemini Embedding? Generate and store an embedding for
           | every word in the dictionary?
        
             | nxa wrote:
             | Yes, that's pretty much what it is. Watch out for
             | homographs.
        
       | 7373737373 wrote:
       | it doesn't know the word human
        
       | grey-area wrote:
       | As you might expect from a system with knowledge of word
       | relations but without understanding or a model of the world, this
       | generates gibberish which occasionally sounds interesting.
        
       | fallinghawks wrote:
       | goshawk-cocaine = gyrfalcon , which is funny if you know anything
       | about goshawks and gyrfalcons
       | 
       | (Goshawks are very intense, gyrs tend to be leisurely in flight.)
        
       | kataqatsi wrote:
       | garden + sin = gardening
       | 
       | hmm...
        
       | MYEUHD wrote:
       | king - man + woman = queen
       | 
       | queen - woman + man = drone
        
         | bee_rider wrote:
         | The second makes sense, I think, if you are a bee.
        
       | blobbers wrote:
       | rice + fish = fish meat
       | 
       | rice + fish + raw = meat
       | 
       | hahaha... I JUST WANT SUSHI!
        
       | godelski wrote:
       | data + plural = number       data - plural = research       king
       | - crown = (didn't work... crown gets circled in red)       king -
       | princess = emperor       king - queen = kingdom       queen -
       | king = worker       king + queen = queen + king = kingdom
       | boy + age = (didn't work... boy gets circled in red)       man -
       | age = woman       woman - age = newswoman       woman + age =
       | adult female body (tied with man)       girl + age = female child
       | girl + old = female child
       | 
       | The other suggestions are pretty similar to the results I got in
       | most cases. But I think this helps illustrate the curse of
       | dimensionality (i.e. distances are ill-defined in high
       | dimensional spaces). This is still quite an unsolved problem and
       | seems a pretty critical one to resolve that doesn't get enough
       | attention.
        
         | Affric wrote:
         | Yeah I did similar tests and got similar results.
         | 
         | Curious tool but not what I would call accurate.
        
         | n2d4 wrote:
         | For fun, I pasted these into ChatGPT o4-mini-high and asked it
         | for an opinion:                  data + plural    = datasets
         | data - plural    = datum        king - crown     = ruler
         | king - princess  = man        king - queen     = prince
         | queen - king     = woman        king + queen     = royalty
         | boy + age        = man        man - age        = boy
         | woman - age      = girl        woman + age      = elderly woman
         | girl + age       = woman        girl + old       = grandmother
         | 
         | The results are surprisingly good, I don't think I could've
         | done better as a human. But keep in mind that this doesn't do
         | embedding math like OP! Although it does show how generic LLMs
         | can solve some tasks better than traditional NLP.
         | 
         | The prompt I used:
         | 
         |  _> Remember those  "semantic calculators" with AI embeddings?
         | Like "king - man + woman = queen"? Pretend you're a semantic
         | calculator, and give me the results for the following:_
        
           | nbardy wrote:
           | I hate to be pedantic, but the llm is definitely doing
           | embedding math. In fact that's all it does.
        
         | gweinberg wrote:
         | I got a bunch of red stuff also. I imagine the author cached
         | embeddings for some words but not really all that many to save
         | on credits. I gave it mermaid - woman and got merman, but when
         | I tried to give it boar + woman - man or ram + woman - man, it
         | turns out it has never heard of rams or boars.
        
         | thatguysaguy wrote:
         | Can you elaborate on what the unsolved problem you're referring
         | to is?
        
       | ericdiao wrote:
       | Interesting: parent + male = female (83%)
       | 
       | Can not personally find the connection here, was expecting father
       | or something.
        
         | ericdiao wrote:
         | Though dad is in the list with lower confidence (77%).
         | 
         | High dimension vector is always hard to explain. This is an
         | example.
        
       | TZubiri wrote:
       | I'm getting Navralitova instead of queen. And can't get other
       | words to work, I get red circles or no answer at all.
        
         | gus_massa wrote:
         | From another comment,
         | https://news.ycombinator.com/item?id=43988861 King (with
         | capital K) was a top 1 male tenis player.
        
       | nxa wrote:
       | This might be helpful: I haven't implemented it in the UI, but
       | from the API response you can see what the word definitions are,
       | both for the input and the output. If the output has homographs,
       | likeliness is split per definition, but the UI only shows the
       | best one.
       | 
       | Also, if it gets buried in comments, proper nouns need to be
       | capitalized (Paris-France+Germany).
       | 
       | I am planning on patching up the UI based on your feedback.
        
       | ericdiao wrote:
       | wine - alcohol = grape juice (32%)
       | 
       | Accurate.
        
       | afandian wrote:
       | There was a site like this a few years ago (before all the LLM
       | stuff kicked off) that had this and other NLP functionality.
       | Styling was grey and basic. That's all I remember.
       | 
       | I've been unable to find it since. Does anyone know which site
       | I'm thinking of?
        
         | halter73 wrote:
         | I'm not sure this is old enough, but could you be referencing
         | https://neal.fun/infinite-craft/ from
         | https://news.ycombinator.com/item?id=39205020?
        
       | montebicyclelo wrote:
       | > king-man+woman=queen
       | 
       | Is the famous example everyone uses when talking about word
       | vectors, but is it actually just very cherry picked?
       | 
       | I.e. are there a great number of other "meaningful" examples like
       | this, or actually the majority of the time you end up with some
       | kind of vaguely tangentially related word when adding and
       | subtracting word vectors.
       | 
       | (Which seems to be what this tool is helping to illustrate,
       | having briefly played with it, and looked at the other comments
       | here.)
       | 
       | (Btw, not saying wordvecs / embeddings aren't extremely useful,
       | just talking about this simplistic arithmetic)
        
         | raddan wrote:
         | > is it actually just very cherry picked?
         | 
         | 100%
        
         | gregschlom wrote:
         | Also, as I just learned the other day, the result was never
         | equal, just close to "queen" in the vector space.
        
         | Retr0id wrote:
         | I think it's slightly uncommon for the vectors to "line up"
         | just right, but here are a few I tried:
         | 
         | actor - man + woman = actress
         | 
         | garden + person = gardener
         | 
         | rat - sewer + tree = squirrel
         | 
         | toe - leg + arm = digit
        
         | groby_b wrote:
         | I think it's worth keeping in mind that word2vec was
         | specifically trained on semantic similarity. Most embedding
         | APIs don't really give a lick about the semantic space
         | 
         | And, worse, most latent spaces are decidedly non-linear. And so
         | arithmetic loses a lot of its meaning. (IIRC word2vec mostly
         | avoided nonlinearity except for the loss function). Yes, the
         | distance metric sort-of survives, but addition/multiplication
         | are meaningless.
         | 
         | (This is also the reason choosing your embedding model is a
         | hard-to-reverse technical decision - you can't just transform
         | existing embeddings into a different latent space. A change
         | means "reembed all")
        
       | jumploops wrote:
       | This is super neat.
       | 
       | I built a game[0] along similar lines, inspired by infinite
       | craft[1].
       | 
       | The idea is that you combine (or subtract) "elements" until you
       | find the goal element.
       | 
       | I've had a lot of fun with it, but it often hits the same
       | generated element. Maybe I should update it to use the second
       | (third, etc.) choice, similar to your tool.
       | 
       | [0] https://alchemy.magicloops.app/
       | 
       | [1] https://neal.fun/infinite-craft/
        
       | ezbie wrote:
       | Can someone explain me what the fuck this is supposed to be!?
        
         | mhitza wrote:
         | Semantical subtraction within embeddings representation of text
         | ("meaning")
        
       | matallo wrote:
       | uncle + aunt = great-uncle (91%)
       | 
       | great idea, but I find the results unamusing
        
         | HWR_14 wrote:
         | Your aunt's uncle is your great-uncle. It's more correct than
         | your intuition.
        
           | matallo wrote:
           | I asked ChatGPT (after posting my comment) and this is the
           | response. "Uncle + Aunt = Great-Uncle is incorrect. A great-
           | uncle is the brother of your grandparent."
        
       | lcnPylGDnU4H9OF wrote:
       | Some of these make more sense than others (and bookshop is
       | hilarious even if it's only the best answer by a small margin; no
       | shade to bookshop owners).                 map - legend =
       | Mercator projection       noodle - wheat = egg noodle
       | noodle - gluten = tagliatelle       architecture - calculus =
       | architectural style       answer - question = comment       shop
       | - income = bookshop       curry - curry powder = cuisine
       | rice - grain = chicken and rice       rice + chicken = poultry
       | milk + cereal = grain       blue - yellow = Fiji       blue -
       | Fiji = orange       blue - Arkansas + Bahamas + Florida - Pluto =
       | Grenada
        
       | kylecazar wrote:
       | Woman + president = man
        
       | tlhunter wrote:
       | man + woman = adult female body
        
       | __MatrixMan__ wrote:
       | Here's a challenge: find something to subtract from "hammer"
       | which does not result in a word that has "gun" as a substring.
       | I've been unsuccessful so far.
        
         | neom wrote:
         | if I'm allowed only 1 something, I can't find anything either,
         | if I'm allowed a few somethings, "hammer - wine - beer - red -
         | child" will get you there. Guessing given that a gun has a
         | hammer and is also a tool, it's too heavily linked in the small
         | dataset.
        
         | tough wrote:
         | hammer + man = adult male body (75%)
        
         | Retr0id wrote:
         | Well that's easy, subtract "gun" :P
        
         | mrastro wrote:
         | The word "gun" itself seems to work. Package this as a game and
         | you've got a pretty fun game on your hands :)
        
         | downboots wrote:
         | Bullet
        
       | neom wrote:
       | cool but not enough data to be useful yet I guess. Most of mine
       | either didn't have the words or were a few % off the answer,
       | vehicle - road + ocean gave me hydrosphere, but the other options
       | below were boat, ship, etc. Klimt almost made it from Mozart -
       | music + painting. doctor - hospital + school = teacher, nailed
       | it.
       | 
       | Getting to cornbread elegantly has been challenging.
        
       | downboots wrote:
       | three + two = four (90%)
        
         | LadyCailin wrote:
         | [delayed]
        
       ___________________________________________________________________
       (page generated 2025-05-14 23:00 UTC)