Post AUkLseUm5TQ8azxZGS by nabsiddiqui@h-net.social
 (DIR) More posts by nabsiddiqui@h-net.social
 (DIR) Post #AUkK0lP05kbIQR00Ei by simon@fedi.simonwillison.net
       2023-04-17T14:44:35Z
       
       1 likes, 1 repeats
       
       MiniGPT-4 is pretty astonishing: a research project that implements an AI chatbot you can use to ask questions about an image (a feature that's been promised but not yet shipped by GPT-4), building on top of the Vicuna-13B LLM (derived from LLaMA) and BLIP-2 vision-language model. https://minigpt-4.github.io/
       
 (DIR) Post #AUkKCClsftB6Ui9mpk by simon@fedi.simonwillison.net
       2023-04-17T14:46:39Z
       
       0 likes, 0 repeats
       
       They're working on a 7B version of the model. Given that vicuna-7b can run entirely in the browser now thanks to WebGPU and the Web LLM project, I honestly wouldn't be surprised if it was possible to run a future version of MiniGPT-4 entirely in the browser at some point soon as well - given a powerful enough host machine https://simonwillison.net/2023/Apr/16/web-llm/
       
 (DIR) Post #AUkLseUm5TQ8azxZGS by nabsiddiqui@h-net.social
       2023-04-17T15:05:33Z
       
       0 likes, 0 repeats
       
       @simon This is very cool. I wonder how art historians may use something like this.
       
 (DIR) Post #AUkVkRooKoNsVK3f8a by RogueWombat@hachyderm.io
       2023-04-17T16:56:11Z
       
       0 likes, 0 repeats
       
       @simon
       
 (DIR) Post #AUm552k0STx8MYKKUi by mw@toot.community
       2023-04-18T11:05:42Z
       
       0 likes, 0 repeats
       
       @simon if you don't mind me asking, what's the astonishing bit? In your example, it's wrong about what the picture is of.
       
 (DIR) Post #AUmEMHz3wEW09EfS0O by APPTeOORuzvlGOetVY.verita84@poster.place
       2023-04-18T12:52:50.882881Z
       
       0 likes, 0 repeats
       
       @nabsiddiqui @simon Have you been around niggers ? They are 14% of the population and do 50% of the crime
       
 (DIR) Post #AUmGAo8qkZMSJitllw by AUc34pg7Dq80K0xUzQ.spitfire@poster.place
       2023-04-18T13:12:58.150121Z
       
       2 likes, 1 repeats
       
       @verita84 @nabsiddiqui @simon The threat of nigger violence, the disgust for limpdick soycucks and trannie abominations are great motivation for lifting weights and getting into good physical shape.  Be the chad that walks in the room that niggers know it's best to keep they pink gumflaps shut around.
       
 (DIR) Post #AUn8kuglHUIKZWQT7A by simon@fedi.simonwillison.net
       2023-04-18T23:22:50Z
       
       0 likes, 0 repeats
       
       @mw it gets most of it right. Ten years ago this would have been science fiction. Three years ago it would have been cutting edge research. Now it's someone a team can cobble together in a couple of weeks by combining other models.
       
 (DIR) Post #AUwX0VtFObhVcGsQeO by mw@toot.community
       2023-04-23T12:06:52Z
       
       0 likes, 0 repeats
       
       @simon that's cool, but what's the use-case for "mostly right" picture descriptions? For all the ones I can think of, you would want a human to proofread the results, which doesn't seem like much of an improvement over just having a human write the description in the first place.Is it even possible to characterize an error rate for this? Presumably the errors will be dependent on the content.
       
 (DIR) Post #AUwlMxNvFaGscQoxcG by simon@fedi.simonwillison.net
       2023-04-23T14:47:57Z
       
       0 likes, 0 repeats
       
       @mw I have over 500 photos on www.niche-museums.com that are currently missing alt text - editing machine-generated 80%-right captions feels feasible to me, writing 500 from scratch is so daunting that I continue to avoid it
       
 (DIR) Post #AUwlYC238Hv7EQwpn6 by simon@fedi.simonwillison.net
       2023-04-23T14:48:42Z
       
       0 likes, 0 repeats
       
       @mw but there are SO MANY other applications for this kind of thing where any level of accuracy is better than none - analyzing footage from our wildlife night cameras for example, where there's no way I'm going to look at every frame myself