[HN Gopher] Improving Accessibility Using Vision Models
       ___________________________________________________________________
        
       Improving Accessibility Using Vision Models
        
       Author : bearjaws
       Score  : 28 points
       Date   : 2024-10-03 18:48 UTC (4 hours ago)
        
 (HTM) web link (myswamp.substack.com)
 (TXT) w3m dump (myswamp.substack.com)
        
       | bearjaws wrote:
       | Funny Google just released moments ago - gemini-1.5-flash-8b
       | which scores slightly lower on vision. For clarity this is on the
       | "older" gemini-1.5-flash.
       | 
       | https://developers.googleblog.com/en/gemini-15-flash-8b-is-n...
        
       | gostsamo wrote:
       | Funnily enough, the images in the article do not have actually
       | useful alt text and like every image in Substack I've encountered
       | so far have no useful captions either.
        
         | bearjaws wrote:
         | How is the alt-text not useful? I even went through the effort
         | of putting the data in the alt text for the bar chart. I tend
         | to think of alt text as proving the same context as the image,
         | for example the line chart is meant to convey how 1.5-flash
         | outperforms 4o, but I am not going to embed each discrete data
         | point in the alt text.
        
           | gostsamo wrote:
           | Maybe something is lost in the translation, but here it is
           | what my screen reader makes out of the article:
           | 
           | Along the way we realized some of our math courses had not
           | been updated in quite some time, and some schools were still
           | leveraging these courses to teach. Images for equations are
           | bad m'kay
           | 
           | It was immediately apparent was the use of images to
           | represent equations like this: https%3A%2F%2Fsubstack-post-
           | me... https%3A%2F%2Fsubstack-post-me... This is not great...
           | the font is a bit on the smaller side and the font itself is
           | not very legible, in my non-font expert opinion. Making
           | matters worse, there is no alt-text provided that can explain
           | the equation.
        
           | gostsamo wrote:
           | Checking the later pictures that you talk about, the alt text
           | is found indeed. My recommendation though would be to give a
           | summary of the data and not the conclusion. E.g. Gemini flash
           | has error rate of x% while the others are y% and z%.
        
           | SalmonSnarker wrote:
           | 3 out of 5 images on the post have empty alt text (alt="").
           | most substacks are pretty careless about alt text and so
           | previous poster is just noting that your accessibility post
           | follows this trend. (It's worth noting the post you made
           | previous to this has 0 out of 4 images with alt text.)
        
       | armoredkitten wrote:
       | What is the measurement on the x-axis in the graph?? The text is
       | talking about equations of 20 or 30 characters, but the graph
       | goes up to...6. Six what?? Characters? Terms? If it's characters,
       | why do we only get to see the performance from 1-6, when
       | apparently 7% of equations had more than 20?
        
         | bearjaws wrote:
         | That's a fair point, I bucketed them into lengths of 1-10,
         | 11-20, 21-30. I'll do a quick update.
        
       | pumanoir wrote:
       | I've had great success to convert math pics to latex using
       | qwen2-vl
        
       ___________________________________________________________________
       (page generated 2024-10-03 23:00 UTC)