Post AbqNP2SXPXeXWpRhmS by heckj@mastodon.social
 (DIR) More posts by heckj@mastodon.social
 (DIR) Post #AbqNP2SXPXeXWpRhmS by heckj@mastodon.social
       2023-11-15T21:18:04Z
       
       0 likes, 0 repeats
       
       @simon Your article about using LLM embeddings really stuck with me, thank you. I can see some "loss of quality" when you go to smaller models, but I've no idea how to characterize that loss. Is there any sort of standard or estimate? I want to use it embedding vectors for full text search in a mixed data (partially structured) setup, particular multi-lingual, which has been really tricky to arrange so far. But finding a model that works is the hard part for me. Any suggested reading?
       
 (DIR) Post #AbqNP3liXiaZabQXJY by simon@fedi.simonwillison.net
       2023-11-15T21:55:16Z
       
       0 likes, 0 repeats
       
       @heckj I've been wondering about that myself. There are definitely approaches - the Hugging Face leaderboard for embedding models demonstrates that - but I have no idea how to scale those down to my own smaller projects https://huggingface.co/spaces/mteb/leaderboard
       
 (DIR) Post #AbqPlyMV0jpDOCgW9Y by heckj@mastodon.social
       2023-11-15T22:22:23Z
       
       0 likes, 0 repeats
       
       @simon Thanks - I’ll see if I can dig up any of their approaches that I can replicate to my own (search) use case.