[HN Gopher] Show HN: Magnitude - Open-source AI browser automati...
       ___________________________________________________________________
        
       Show HN: Magnitude - Open-source AI browser automation framework
        
       Hey HN, Anders and Tom here. We had a post about our AI test
       automation framework 2 months ago that got a decent amount of
       traction (https://news.ycombinator.com/item?id=43796003).  We got
       some great feedback from the community, with the most positive
       response being about our vision-first approach used in our browser
       agent. However, many wanted to use the underlying agent outside the
       testing domain. So today, we're releasing our fully featured AI
       browser automation framework.  You can use it to automate tasks on
       the web, integrate between apps without APIs, extract data, test
       your web apps, or as a building block for your own browser agents.
       Traditionally, browser automation could only be done via the DOM,
       even though that's not how humans use browsers. Most browser agents
       are still stuck in this paradigm. With a vision-first approach, we
       avoid relying on flaky DOM navigation and perform better on complex
       interactions found in a broad variety of sites, for example:  -
       Drag and drop interactions  - Data visualizations, charts, and
       tables  - Legacy apps with nested iframes  - Canvas and webGL-heavy
       sites (like design tools or photo editing)  - Remote desktops
       streamed into the browser  To interact accurately with the browser,
       we use visually grounded models to execute precise actions based on
       pixel coordinates. The model used by Magnitude must be smart enough
       to plan out actions but also able to execute them. Not many models
       are both smart *and* visually grounded. We highly recommend Claude
       Sonnet 4 for the best performance, but if you prefer open source,
       we also support Qwen-2.5-VL 72B.  Most browser agents never make it
       to production. This is because of (1) the flaky DOM navigation
       mentioned above, but (2) the lack of control most browser agents
       offer. The dominant paradigm is you give the agent a high-level
       task + tools and hope for the best. This quickly falls apart for
       production automations that need to be reliable and specific. With
       Magnitude, you have fine-grained control over the agent with our
       `act()` and `extract()` syntax, and can mix it with your own code
       as needed. You also have full control of the prompts at both the
       action and agent level.  ```ts  // Magnitude can handle high-level
       tasks  await agent.act('Create an issue', {                 //
       Optionally pass data that the agent will use where appropriate
       data: {              title: 'Use Magnitude',
       description: 'Run "npx create-magnitude-app" and follow the
       instructions',            },       });  // It can also handle low-
       level actions  await agent.act('Drag "Use Magnitude" to the top of
       the in progress column');  // Intelligently extract data based on
       the DOM content matching a provided zod schema  const tasks = await
       agent.extract(                   'List in progress issues',
       z.array(z.object({                  title: z.string(),
       description: z.string(),                  // Agent can extract
       existing data or new insights                  difficulty:
       z.number().describe('Rate the difficulty between 1-5')
       })),       );  ```  We have a setup script that makes it trivial to
       get started with an example, just run "npx create-magnitude-app".
       We'd love to hear what you think!  Repo:
       https://github.com/magnitudedev/magnitude
        
       Author : anerli
       Score  : 28 points
       Date   : 2025-06-26 18:30 UTC (4 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | grbsh wrote:
       | Why not just use Claude by itself? Opus and Sonnet are great at
       | producing pixel coordinates and tool usages from screenshots of
       | UIs. Curious as to what your framework gives me over the plain
       | base model.
        
         | anerli wrote:
         | Hey! To have a framework that can effectively control browser
         | agents, you need systems to interact with the browser, but also
         | pass relevant content from the page to the LLM. Our framework
         | manages this agent loop in a way that enables flexible agentic
         | execution that can mix with your own code - giving you control
         | but in a convenient way. Claude and OpenAI computer use
         | APIs/loops are slower, more expensive, and tailored for a
         | limited set of desktop automation use cases rather than robust
         | browser automations.
        
       | KeysToHeaven wrote:
       | Finally, a browser agent that doesn't panic at the sight of a
       | canvas
        
         | anerli wrote:
         | Exactly :)
        
           | revskill wrote:
           | Not sure about this because you're the author.
        
             | anerli wrote:
             | Try it out and report back!
        
               | revskill wrote:
               | No
        
               | legucy wrote:
               | Classic new age hacker news hostility. Do you think this
               | response adds anything?
        
       | axlee wrote:
       | Using this for testing instead of regular playwright must 10000x
       | the cost and speed, doesn't it? At which points do the benefits
       | outweigh the costs?
        
         | anerli wrote:
         | I think depends a lot on how much you value your own time,
         | since its quite time consuming to write and update playwright
         | scripts. It's gonna save you developer hours to write
         | automations using natural language rather than messing around
         | with and fixing selectors. It's also able to handle tasks that
         | playwright wouldn't be able to do at all - like extracting
         | structured data from a messy/ambiguous DOM and adapting
         | automatically to changing situations.
         | 
         | You can also use cheaper models depending on your needs, for
         | example Qwen 2.5 VL 72B is pretty affordable and works pretty
         | well for most situations.
        
           | plufz wrote:
           | But we can use an LLM to write that script though and give
           | that agent access to a browser to find DOM selectors etc. And
           | than we have a stable script where we, if needed, manually
           | can fix any LLM bugs just once...? I'm sure there are use
           | cases with messy selectors as you say, but for me it feels
           | like most cases are better covered by generating scripts.
        
       | rozap wrote:
       | There are a number of these out there, and this one has a super
       | easy setup and appears to Just Work, so nice job on that. I had
       | it going and producing plausible results within a minute or so.
       | 
       | One thing I'm wondering is if there's anyone doing this at scale?
       | The issue I see is that with complex workflows which take several
       | dozen steps and have complex control flow, the probability of
       | reaching the end falls off pretty hard, because if each step has
       | a .95 chance of completing successfully, after not very many
       | steps you have a pretty small overall probability of success.
       | These use cases are high value because writing a traditional
       | scraper is a huge pain, but we just don't seem to be there yet.
       | 
       | The other side of the coin is simple workflows, but those tend to
       | be the workflows where writing a scraper is pretty trivial. This
       | did work, and I told it to search for a product at a local store,
       | but the program cost $1.05 to run. So doing it at any scale
       | quickly becomes a little bit silly.
       | 
       | So I guess my question is: who is having luck using these tools,
       | and what are you using them for?
       | 
       | One route I had some success with is writing a DSL for scraping
       | and then having the llm generate that code, then interpreting it
       | and editing it when it gets stuck. But then there's the "getting
       | stuck detection" part which is hard etc etc.
        
         | anerli wrote:
         | Glad you were able to get it set up quickly!
         | 
         | We currently are optimizing for reliability and quality, which
         | is why we suggest Claude - but it can get expensive in some
         | cases. Using Qwen 2.5-VL-72B will be significantly cheaper,
         | though may not be always reliable.
         | 
         | Most of our usage right now is for running test cases, and
         | people seem to often prefer qwen for that use case - since
         | typically test cases are clearer how to execute.
         | 
         | Something that is top of mind for is is figuring out a good way
         | to "cache" workflows that get taken. This way you can repeat
         | automations either with no LLM or with a smaller/cheap LLM.
         | This will would enable deterministic, repeatable flows, that
         | are also very affordable and fast. So even if each step on the
         | first run is only 95% reliable - if it gets through it, it
         | could repeat it with 100% reliability.
        
       ___________________________________________________________________
       (page generated 2025-06-26 23:00 UTC)