Post Av32sCv1DkL4y4gU6a by nihilistic_capybara@layer8.space
(DIR) More posts by nihilistic_capybara@layer8.space
(DIR) Post #Av32sCv1DkL4y4gU6a by nihilistic_capybara@layer8.space
2025-06-12T08:08:50Z
0 likes, 0 repeats
@alice @alttexthalloffame hey so forgive me for my ignorance but with the advent of multimodal llms isn't alt text kind of obsolete?
(DIR) Post #Av3OoHeJt8Q9TOPKoy by alttexthalloffame@mastodon.social
2025-06-12T12:14:06Z
0 likes, 0 repeats
@nihilistic_capybara AI/LLM is a pretty divisive technology. https://stefanbohacek.com/blog/a-few-thoughts-on-ai-as-an-assistive-tool/@alice
(DIR) Post #Av3nzivzHVlc8SDuJk by nihilistic_capybara@layer8.space
2025-06-12T16:56:50Z
0 likes, 0 repeats
@alice @alttexthalloffame I mean fair enough but it feels like with the right amount of context they could be good. Especially for websites that don't care and will never care. Like Facebook. Right?
(DIR) Post #Av3oMA1lELtQXAcew4 by jupiter_rowland@hub.netzgemeinde.eu
2025-06-12T16:07:16Z
0 likes, 0 repeats
@nihilistic_capybara LLMs aren't omniscient, and they will never be.If I make a picture on a sim in an OpenSim-based grid (that's a 3-D virtual world) which has only been started up for the first time 10 minutes ago, and which the WWW knows exactly zilch about, and I feed that picture to an LLM, I do not think the LLM will correctly pinpoint the place where the image was taken. It will not be able to correctly say that the picture was taken at <Place> on <Sim> in <Grid>, and then explain that <Grid> is a 3-D virtual world, a so-called grid, based on the virtual world server software OpenSimulator, and carry on explaining what OpenSim is, why a grid is called a grid, what a region is and what a sim is. But I can do that.If there's a sign with three lines of text on it somewhere within the borders of the image, but it's so tiny at the resolution of the image that it's only a few dozen pixels altogether, then no LLM will be able to correctly transcribe the three lines of text verbatim. It probably won't even be able to identify the sign as a sign. But I can do that by reading the sign not in the image, but directly in-world.By the way: All my original images are from within OpenSim grids. I've probably put more thought into describing images from virtual worlds than anyone. And I've pitted my own hand-written image description against an AI-generated image description of the self-same image twice. So I guess I know what I'm writing about.CC: @π
°π
»π
Έπ
²π
΄Β Β (ππ¦) @nihilistic_capybara#Long #LongPost #CWLong #OpenSim #OpenSimulator #Metaverse #VirtualWorlds #CWLongPost #ImageDescription #ImageDescriptions #ImageDescriptionMeta #CWImageDescriptionMeta #AI #LLM #AIVsHuman #HumanVsAI
(DIR) Post #Av3oMC8ZO0Wn4rp7pI by nihilistic_capybara@layer8.space
2025-06-12T17:00:41Z
0 likes, 0 repeats
@jupiter_rowland I am gonna be honest. I am kind of struggling to imagine this. Can you give me an example?
(DIR) Post #Av3s7xWnJ2ONWvccDY by nihilistic_capybara@layer8.space
2025-06-12T17:43:08Z
0 likes, 0 repeats
@alice @alttexthalloffame that makes sense. Thank you for taking the time to set me straight
(DIR) Post #Av5JqoCnhnA3ZJqAG8 by jupiter_rowland@hub.netzgemeinde.eu
2025-06-13T06:07:46Z
0 likes, 0 repeats
@nihilistic_capybara Yes. As a matter of fact, I've had an AI describe an image after describing it myself twice already. And I've always analysed the AI-generated description of the image from the point of view of someone who a) is very knowledgeable about these worlds in general and that very place in particular, b) has knowledge about the setting in the image which is not available anywhere on the Web because only he has this knowledge and c) can see much much more directly in-world than the AI can see in the scaled-down image.So here's an example.This was my first comparison thread. It may not look like it because it clearly isn't on Mastodon (at least I guess it's clear that this is not Mastodon), but it's still in the Fediverse, and it was sent to a whole number of Mastodon instances. Unfortunately, as I don't have any followers on layer8.space and didn't have any when I posted this, the post is not available on layer8.space. So you have to see it at the source in your Web browser rather than in your Mastodon app or otherwise on your Mastodon timeline.(Caution ahead: By my current standards, the image descriptions are outdated. Also, the explanations are not entirely accurate.)If you open the link, you'll see a post with a title, a summary and "View article" below. This works like Mastodon CWs because it's the exact same technology. Click or tap "View article" to see the full post. Warning: As the summary/CW indicates, it's very long.You'll see a bit of introduction post text, then the image with an alt-text that's actually short for my standards (on Mastodon, the image wouldn't be in the post, but below the post as a file attachment), then some more post text with the AI-generated image description and finally an additional long image description which is longer than 50 standard Mastodon toots. I've first used the same image, largely the same alt-text and the same long description in this post.Scroll further down, and you'll get to a comment in which I pick the AI description apart and analyse it for accuracy and detail level.For your convenience, here are some points where the AI failed:The AI did not clearly identify the image as from a virtual world. It remained vague. Especially, it did not recognise the location as the central crossing at BlackWhite Castle in Pangea Grid, much less explain what either is. (Then again, explanations do not belong into alt-text. But when I posted the image, BlackWhite Castle had been online for two or three weeks and advertised on the Web for about as long.)It failed to mention that the image is greyscale. That is, it actually failed to recognise that it isn't the image that's greyscale, but both the avatar and the entire scenery.It referred to my avatar as a "character" and not an avatar.It failed to recognise the avatar as my avatar.It did not describe at all what my avatar looks like.It hallucinated about what my avatar looks at. Allegedly, my avatar is looking at the advertising board towards the right. Actually, my avatar is looking at the cliff in the background which the AI does not mention at all. The AI could impossibly see my avatar's eyeballs from behind (and yes, they can move within the head).It did not describe anything about the advertising board, especially not what's on it.It did not know whether what it thinks my avatar is looking at is a sign or an information board, so it was still vague.It hallucinated about a forest with a dense canopy. Actually, there are only a few trees, there is no canopy, the tops of the trees closer to the camera are not within the image, and the AI was confused by the mountain and the little bit of sky in the background.The AI misjudged the lighting and hallucinated about the time of day, also because it doesn't know where the avatar and the camera are oriented.It used the attributes "calm and serene" on something that's inspired by German black-and-white Edgar Wallace thrillers from the 1950s and the 1960s. It had no idea what's going on.It did not mention a single bit of text in the image. Instead, it should have transcribed all of them verbatim. All of them. Legible in the image at the given resolution or not. (Granted, I myself forgot to transcribe a few little things in the image on the advertisement for the motel on the advertising board such as the license plate above the office door as well as the bits of text on the old map on the same board. But I didn't have any source for the map with a higher resolution, so I didn't give a detailed description of the map at all, and the text on it was illegible even to me.)It did not mention that strange illuminated object towards the right at all. I'd expect a good AI to correctly identify it as an OpenSimWorld beacon, describe what it looks like, transcribe all text on it verbatim and, if asked for it, explain what it is, what it does and what it's there for in a way that everyone will understand. All 100% accurately.CC: @π
°π
»π
Έπ
²π
΄Β Β (ππ¦)#Long #LongPost #CWLong #CWLongPost #OpenSim #OpenSimulator #Metaverse #VirtualWorlds #AltText #AltTextMeta #CWAltTextMeta #ImageDescription #ImageDescriptions #ImageDescriptionMeta #CWImageDescriptionMeta #AI #LLM #AIVsHuman #HumanVsAI
(DIR) Post #Av5JqqW17JikjUqXUO by nihilistic_capybara@layer8.space
2025-06-13T10:28:23Z
0 likes, 0 repeats
@jupiter_rowland this does not feel like a fair comparison tbh. The description you have given is a meter long and frankly (again please forgive my ignorance I know nothing about the blind and how they navigate the web) contains too much details to the point where using a screen reader to listen to this turns into a very boring podcast. And stuff like the text not being legible. I don't know how you read that text cause I am unable to read it as well.
(DIR) Post #Av5JzxN105EWbBypOq by nihilistic_capybara@layer8.space
2025-06-13T10:30:09Z
0 likes, 0 repeats
@jupiter_rowland is such a wall of text really what blind people expect from a screencap of a simulator/game?