[HN Gopher] The super effectiveness of Pokemon embeddings using ...
___________________________________________________________________
The super effectiveness of Pokemon embeddings using only raw JSON
and images
Author : minimaxir
Score : 91 points
Date : 2024-06-29 17:03 UTC (1 days ago)
(HTM) web link (minimaxir.com)
(TXT) w3m dump (minimaxir.com)
| axpy906 wrote:
| Nice article. I remember the original work. Can you elaborate on
| this one Max? > Even if the generative AI industry crashes
| pqdbr wrote:
| I think the author is implying that even if you can't extract
| real world value from generative AI, the current AI hype has
| evolved embeddings to a point they can provide real world value
| to a lot of projects (like the semantic search demonstrated in
| the article, where no generative AI was used).
| minimaxir wrote:
| It's a note that embeddings R&D is orthogonal to whatever
| happens with generative AI even though both involve LLMs.
|
| I'm not saying that generative AI _will_ crash but if it 's
| indeed at the top of the S-curve there could be issues,
| notwithstanding the cost and legal issues that are only
| increasing.
| qeternity wrote:
| While there is no real definition of LLM I'm not sure I would
| say both involve LLMs. There is a trend towards using the
| hidden state of an LLM as an embedding but this is relatively
| recent, and overkill for most use-cases. Plenty of embedding
| models are not large, and it's fairly trivial to train a
| small domain-specific embedding model that has incredible
| utility.
| vasco wrote:
| > man + women - king = queen
|
| Useless correction, it's king - man, not man - king.
| 01HNNWZ0MV43FF wrote:
| It's also woman not women
| PaulHoule wrote:
| Also you hear that example over and over again because you
| can't get other ones to work reliably with Word2Vec; you'd
| have thought you could train a good classifier for color
| words or nouns or something like that if it worked but
| actually you can't.
|
| Because it could not tell the difference between word senses
| I think Word2Vec introduced as many false positives as true
| positive, BERT was the revolution we needed.
|
| I use similar embedding models for classification and it is
| great to see improvements in this space.
| simonw wrote:
| The other example that worked for me with Word2Vec was
| Germany + Paris - France = Berlin: https://simonwillison.ne
| t/2023/Oct/23/embeddings/#exploring-...
| jpz wrote:
| Great article - thanks.
| flipflopclop wrote:
| Great post, really enjoyed the flow of narrative and quality deep
| technical details
| jszymborski wrote:
| I would be interested in how this might work with just looking
| for common words between the text fields of the JSON file
| weighted by e.g. TF-IDF or BM25.
|
| I wonder if you might get similar results. Also would be
| interested in the comperative computation resources it takes.
| Encoding takes a lot of resources, but I imagine look-up would be
| a lot less resource intensive (i.e.: time and/or memory).
| bc569a80a344f9c wrote:
| Very nice! This took me about 30 minutes to re-implement for
| Magic: The Gathering cards (with data from mtgjson.com), and then
| about 40 minutes or so to create the embeddings. It does rather
| well at finding similar cards for when you want more than a 4-of,
| or of course for Commander. That's quite useful for weirder
| effects where one doesn't have the common options memorized!
| minimaxir wrote:
| I was thinking about redoing this with Magic cards too (I have
| quite a lot of code for that preprocessing that data already)
| so it's good to know it works there too! :)
___________________________________________________________________
(page generated 2024-06-30 23:00 UTC)