[HN Gopher] Improving Accessibility Using Vision Models
___________________________________________________________________
Improving Accessibility Using Vision Models
Author : bearjaws
Score : 61 points
Date : 2024-10-03 18:48 UTC (1 days ago)
(HTM) web link (myswamp.substack.com)
(TXT) w3m dump (myswamp.substack.com)
| bearjaws wrote:
| Funny Google just released moments ago - gemini-1.5-flash-8b
| which scores slightly lower on vision. For clarity this is on the
| "older" gemini-1.5-flash.
|
| https://developers.googleblog.com/en/gemini-15-flash-8b-is-n...
| gostsamo wrote:
| Funnily enough, the images in the article do not have actually
| useful alt text and like every image in Substack I've encountered
| so far have no useful captions either.
| bearjaws wrote:
| How is the alt-text not useful? I even went through the effort
| of putting the data in the alt text for the bar chart. I tend
| to think of alt text as proving the same context as the image,
| for example the line chart is meant to convey how 1.5-flash
| outperforms 4o, but I am not going to embed each discrete data
| point in the alt text.
| gostsamo wrote:
| Maybe something is lost in the translation, but here it is
| what my screen reader makes out of the article:
|
| Along the way we realized some of our math courses had not
| been updated in quite some time, and some schools were still
| leveraging these courses to teach. Images for equations are
| bad m'kay
|
| It was immediately apparent was the use of images to
| represent equations like this: https%3A%2F%2Fsubstack-post-
| me... https%3A%2F%2Fsubstack-post-me... This is not great...
| the font is a bit on the smaller side and the font itself is
| not very legible, in my non-font expert opinion. Making
| matters worse, there is no alt-text provided that can explain
| the equation.
| gostsamo wrote:
| Checking the later pictures that you talk about, the alt text
| is found indeed. My recommendation though would be to give a
| summary of the data and not the conclusion. E.g. Gemini flash
| has error rate of x% while the others are y% and z%.
| SalmonSnarker wrote:
| 3 out of 5 images on the post have empty alt text (alt="").
| most substacks are pretty careless about alt text and so
| previous poster is just noting that your accessibility post
| follows this trend. (It's worth noting the post you made
| previous to this has 0 out of 4 images with alt text.)
| bryanrasmussen wrote:
| looking through it the images that are definitely content
| controlled by the user has alt text - that is to say the
| graphs, the first alt text = "" is inside a bit of content
| that is display:none and thus not available to a screen
| reader - I suppose the others, so it is not knowable if
| that alt text will be filled when the area is rendered
| (probably not) I didn't look for the other one but I expect
| it is the same situation because all the images I
| encountered that were in the writer's control had alt text.
|
| About the empty ones I have not investigated but there are
| numerous situations in which an empty alt text makes
| perfect sense and is a better accessibility solution for
| most users of screen readers than otherwise. For example if
| they are inside something clickable that has an aria label
| on it telling you how to use that part of the dom, the alt
| text on a child image just makes things overly verbose and
| annoying in most circumstances.
|
| I have an article in the works that touches on these issues
| with proposed solutions but unfortunately it would be too
| big to talk all about here.
|
| on edit: of course it is possible that, being alerted to
| the fact, the writer has added the alt text in.
| armoredkitten wrote:
| What is the measurement on the x-axis in the graph?? The text is
| talking about equations of 20 or 30 characters, but the graph
| goes up to...6. Six what?? Characters? Terms? If it's characters,
| why do we only get to see the performance from 1-6, when
| apparently 7% of equations had more than 20?
| bearjaws wrote:
| That's a fair point, I bucketed them into lengths of 1-10,
| 11-20, 21-30. I'll do a quick update.
| pumanoir wrote:
| I've had great success to convert math pics to latex using
| qwen2-vl
| jmull wrote:
| I don't understand "The Results" graph.
|
| The x-axis has integers, 0, 1, 2, 3, 4, 5, 6, but the text talks
| about models struggling at the 30 character mark? On the graph
| they all start getting bad around 3, depending on what you mean
| by bad. Is the x-axis tens of characters??
|
| Anyway...
|
| > anything longer than 20 characters would tend to have more
| issues, we flagged those for manual review.
|
| Even though the failure rate was smaller, is it OK if several of
| the shorter equations are wrong? Maybe they should have manually
| reviewed all of them.
|
| Edit: Now I see someone else brought up the x-axis issue. There's
| a response that seems to say the x-axis is buckets of 10
| characters. I guess the update hasn't gone through yet.
___________________________________________________________________
(page generated 2024-10-04 23:02 UTC)