[HN Gopher] OmniParser V2 - A simple screen parsing tool towards...
___________________________________________________________________
OmniParser V2 - A simple screen parsing tool towards pure vision
based GUI agent
Author : punnerud
Score : 33 points
Date : 2025-02-15 19:26 UTC (3 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| icodar wrote:
| This is not the intended use but it good working on parsing
| document layout from image.
| NewUser76312 wrote:
| Very cool work. Accurate GUI text and element parsing is exactly
| the kind of input that LLMs need to be effective agents.
| rgovostes wrote:
| The OS has additional information including how different
| graphics layers are composited, and what accessibility metadata
| is attached to interface elements. It ought to be useful to
| exploit this to do better than screenshot parsing.
| nighthawk454 wrote:
| One ponders the connections with the Recall feature
___________________________________________________________________
(page generated 2025-02-15 23:01 UTC)