Post AbqNP2SXPXeXWpRhmS by heckj@mastodon.social
(DIR) More posts by heckj@mastodon.social
(DIR) Post #AbqNP2SXPXeXWpRhmS by heckj@mastodon.social
2023-11-15T21:18:04Z
0 likes, 0 repeats
@simon Your article about using LLM embeddings really stuck with me, thank you. I can see some "loss of quality" when you go to smaller models, but I've no idea how to characterize that loss. Is there any sort of standard or estimate? I want to use it embedding vectors for full text search in a mixed data (partially structured) setup, particular multi-lingual, which has been really tricky to arrange so far. But finding a model that works is the hard part for me. Any suggested reading?
(DIR) Post #AbqNP3liXiaZabQXJY by simon@fedi.simonwillison.net
2023-11-15T21:55:16Z
0 likes, 0 repeats
@heckj I've been wondering about that myself. There are definitely approaches - the Hugging Face leaderboard for embedding models demonstrates that - but I have no idea how to scale those down to my own smaller projects https://huggingface.co/spaces/mteb/leaderboard
(DIR) Post #AbqPlyMV0jpDOCgW9Y by heckj@mastodon.social
2023-11-15T22:22:23Z
0 likes, 0 repeats
@simon Thanks - I’ll see if I can dig up any of their approaches that I can replicate to my own (search) use case.