[HN Gopher] OmniParser V2 - A simple screen parsing tool towards...
       ___________________________________________________________________
        
       OmniParser V2 - A simple screen parsing tool towards pure vision
       based GUI agent
        
       Author : punnerud
       Score  : 33 points
       Date   : 2025-02-15 19:26 UTC (3 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | icodar wrote:
       | This is not the intended use but it good working on parsing
       | document layout from image.
        
       | NewUser76312 wrote:
       | Very cool work. Accurate GUI text and element parsing is exactly
       | the kind of input that LLMs need to be effective agents.
        
       | rgovostes wrote:
       | The OS has additional information including how different
       | graphics layers are composited, and what accessibility metadata
       | is attached to interface elements. It ought to be useful to
       | exploit this to do better than screenshot parsing.
        
       | nighthawk454 wrote:
       | One ponders the connections with the Recall feature
        
       ___________________________________________________________________
       (page generated 2025-02-15 23:01 UTC)