[HN Gopher] Puppeteer Support for Firefox
       ___________________________________________________________________
        
       Puppeteer Support for Firefox
        
       Author : cpeterso
       Score  : 309 points
       Date   : 2024-08-07 16:19 UTC (6 hours ago)
        
 (HTM) web link (hacks.mozilla.org)
 (TXT) w3m dump (hacks.mozilla.org)
        
       | hugs wrote:
       | Ranked #4 on HN at the moment and no comments. So I'll just say
       | hi. (Selenium project creator here. I had nothing to do with this
       | announcement, but feel free to ask me anything!)
       | 
       | My hot take on things: When the Puppeteer team left Google to
       | join Microsoft and continue the project as Playwright, that left
       | Google high and dry. I don't think Google truly realized how
       | complementary a browser automation tool is to an AI-agent
       | strategy. Similar to how they also fumbled the bag on transformer
       | technology. (The T in GPT)... So Google had a choice, abandon
       | Puppeteer and be dependent on MS/Playwright... or find a path
       | forward for Puppeteer. WebDriver BiDi takes all the chocolatey
       | goodness of the Chrome DevTools Protocol (CDP) that Puppeteer
       | (and Playwright) are built on... and moves that forward in a
       | standard way (building on the earlier success of the W3C
       | WebDriver process that browser vendors and members of the
       | Selenium project started years ago.)
       | 
       | Great to see there's still a market for cross-industry standards
       | and collaboration with this announcement from Mozilla today.
        
         | localfirst wrote:
         | is it possible to now use Puppeteer from inside the browser? or
         | do security concerns restrict this?
         | 
         | what does Webdriver Bidi do and what do you mean by "taking the
         | good stuff from CDP"
         | 
         | I don't want to run my scrapes in the cloud and pay a monthly
         | fee
         | 
         | I want to run them locally. I want to run LLM locally too.
         | 
         | I'm sick of SaaS
        
           | hugs wrote:
           | Puppeteer controls a browser... from the outside... like a
           | puppeteer controls a puppet. Other tools like Cypress (and
           | ironically the very first version of Selenium 20 years ago)
           | drive the browser from the inside using JavaScript. But we
           | abandoned that "inside out" approach in later versions of
           | Selenium because of the limitations imposed by the browser JS
           | security sandbox. Cypress is still trying to make it work and
           | I wish them luck.
           | 
           | You could probably figure out how to connect Llama to
           | Puppeteer. (If no one has done it, yet, that would be an
           | awesome project.)
        
             | localfirst wrote:
             | I see im still looking for a way to control browser from
             | the inside via an extension browser. very tough problem to
             | solve.
        
               | fitsumbelay wrote:
               | I do alot quick manually scrapes via devtools
               | 
               | you could try this
               | 
               | Chrome web scraper extension -
               | https://chromewebstore.google.com/detail/web-scraper-
               | free-we...
        
               | hugs wrote:
               | Yup. Lately, I've been doing it a completely different
               | way (but still from the outside)... Using a Raspberry Pi
               | as a fake keyboard and mouse. (Makes more sense in the
               | context of mobile automation than desktop.)
               | 
               | What's good for security is generally bad for
               | automation... and trying to automate from inside a
               | heavily secured sandbox is... frustrating. It works a
               | little bit (as Cypress folks more recently learned), but
               | you can never get to 100% covering all the things you'd
               | want to cover. Driving from the outside is easier... but
               | still not easy!
        
               | localfirst wrote:
               | interesting so you are emulating hardware inputs from RPi
               | 
               | how is it reading whats on the screen? computer vision?
        
               | hugs wrote:
               | Not to make this an ad for my project, but I'm starting
               | to document it more here: https://valetnet.dev/
               | 
               | The Raspberry Pi is configured to use the USB HID
               | protocol to look and act like a mouse and keyboard when
               | plugged into a phone. (Android and iOS now support mouse
               | and keyboard inputs). For video, we have two models:
               | 
               | - "Valet Link" uses an HDMI capture card (and a multi-
               | port dongle) to pull the video signal directly from the
               | phone if available. (This applies to all iPhones and
               | high-end Samsung phones.)
               | 
               | - "Valet Vision" which uses the Raspberry Pi V3 camera
               | positioned 200mm above the phone to grab the video that
               | way. Kinda crazy, but it works when HDMI output is not
               | available. The whole thing is also enclosed in a black
               | box so light from the environment doesn't affect the
               | video capture.
               | 
               | Then once we have an image, yes, you use whatever library
               | you want to process and understand what's in the image. I
               | currently use OpenCV and Tesseract (with Python). Could
               | probably write a book about the lessons learned getting a
               | "vision first" approach to automation working (as opposed
               | to the lower-level Puppeteer/Playwright/Selenium/Appium
               | way to do it.
        
               | weaksauce wrote:
               | are you using native messaging? there's a way to bridge a
               | program running with full permissions inside the computer
               | that could use puppeteer or the like.
               | https://developer.mozilla.org/en-US/docs/Mozilla/Add-
               | ons/Web...
               | 
               | seems like it wouldn't be that hard to sync the two but
               | the devil is in the details. also installing the native
               | script is outside the purview of the webext so you need
               | to have an installer.
        
               | namukang wrote:
               | I do this for https://browserflow.app (and the AI version
               | in development at https://browserbot.ai) via the
               | chrome.debugger API: https://developer.chrome.com/docs/ex
               | tensions/reference/api/d...
        
           | fitsumbelay wrote:
           | webdriver bidi info
           | -https://www.youtube.com/watch?v=6oXic6dcn9w
           | 
           | local scraping howto - https://www.freecodecamp.org/news/web-
           | scraping-in-javascript...
           | 
           | local LLM framework - https://ollama.com/
        
           | jgraham wrote:
           | > Is it possible to now use Puppeteer from inside the
           | browser?
           | 
           | Talking about WebDriver (BiDi) in general rather than
           | Puppeteer specifically, it depends what exactly you mean.
           | 
           | Classic WebDriver is a HTTP-based protocol. WebDriver BiDi
           | uses websockets (although other transports are a possibility
           | for the future). Script running inside the browser can create
           | HTTP connections and create websockets connections, so you
           | can create a web page that implements a WebDriver or
           | WebDriver BiDi client. But of course you need to have a
           | browser to connect to, and that needs to be configured to
           | actually allow connections from your host; for obvious
           | security reasons that's not allowed by default.
           | 
           | This sounds a bit obscure, but it can be useful. Firefox
           | devtools is implemented in HTML+JS in the browser (like the
           | rest of the Firefox UI), and can connect to a different
           | Firefox instance (e.g. for debugging mobile Firefox from
           | desktop). The default runner for web-platform-tests drives
           | the browser from the outside (typically) using WebDriver, but
           | it also provides an API so the in-browser tests can access
           | some WebDriver commands.
        
           | hoten wrote:
           | Yes. I'm not aware of any documentation walking one through
           | it though.
           | 
           | There is a extension api that exposes a CDP connection [1][2]
           | 
           | You can create a Puppeteer.Browser given a CDP connection.
           | 
           | You can bundle Puppeteer in a browser (we do this in
           | Lighthouse/Chrome DevTools[3]).
           | 
           | These two things is probably enough to get it working, though
           | it may be limited to the active tab.
           | 
           | [1] https://chromedevtools.github.io/devtools-
           | protocol/#:~:text=...
           | 
           | [2] https://stackoverflow.com/a/55284340/24042444
           | 
           | [3] https://source.chromium.org/chromium/chromium/src/+/main:
           | thi...
        
         | SomaticPirate wrote:
         | If I wanted to write some simple web-automation as a DevOps
         | engineer with little javascript (or webdev experience at all)
         | what tool would you recommend?
         | 
         | Some example use cases would be writing some basic tests to
         | validate a UI or automate some form-filling on a javascript
         | based website with no API.
        
           | hugs wrote:
           | Unironically, ask ChatGPT (or your favorite LLM) to create a
           | hello world WebDriver or Puppeteer script (and installation
           | instructions) and go from there.
        
             | righthand wrote:
             | "Go ask ChatGPT" is the new "RTFM".
        
               | hugs wrote:
               | sorry, not sorry?
        
               | distortedsignal wrote:
               | I don't think they're criticizing - I think it's
               | observation.
               | 
               | It makes a lot of sense, and we're early-ish to the tech
               | cycle. Reading the Manual/Google/ChatGPT are all just
               | tools in the toolbelt. If you (an expert) is giving this
               | advice, it should become mainstream soon-ish.
        
               | 0x1ch wrote:
               | I think this is where personal problem solving skills
               | matter. I use ChatGPT to start off a lot of new ideas or
               | projects with unfamiliar tools or libraries I will be
               | using, however the result isn't always good. From here, a
               | good developer will take the information from the A.I
               | tool and look further into current documentation to
               | supplement.
               | 
               | If you can't distinguish bad from good with LLMs, you
               | might as well be throwing crap at the wall hoping it will
               | stick.
        
               | tssge wrote:
               | >If you can't distinguish bad from good with LLMs, you
               | might as well be throwing crap at the wall hoping it will
               | stick.
               | 
               | This is why I think LLMs are more of a tool for the
               | expert rather than for the novice.
               | 
               | They give more speedup the more experience one has on the
               | subject in question. An experienced dev can usually spot
               | bad advice with little effort, while a junior dev might
               | believe almost any advice due to the lack of experience
               | to question things. The same goes for asking the right
               | questions.
        
               | progmetaldev wrote:
               | This is where I tell younger people thinking about
               | getting into computer science or development that there
               | is still a huge need for those skills. I think AI is a
               | long way off from taking away problem solving skills.
               | Most of us that have had the (dis)pleasure of needing to
               | repeatedly change and build on our prompts to get close
               | to what we're looking for will be familiar with this.
               | Without the general problem solving skills we've
               | developed, at best we're going to luck out and get just
               | the right solution, but more than likely will at best
               | have a solution that only gets partially towards what we
               | actually need. Solutions will often be inefficient or
               | subtly wrong in ways that still require knowledge in the
               | technology/language being produced by the LLM. I even
               | tell my teenage son that if he really does enjoy coding
               | and wishes to pursue it as a career, that he should go
               | for it. I shouldn't be, but I'm constantly astounded by
               | the number of people that take output from a LLM without
               | checking for validity.
        
               | devsda wrote:
               | I think it's the new "search/lookup xyz on Google".
               | 
               | Because Google search and search in general is no longer
               | reliable or predictable and top results are likely to be
               | ads or seo optimized fluff pieces, it is hard to make a
               | search recommendation these days.
               | 
               | For now, ChatGPT is the new no-nonsense search
               | engine(with caveats).
        
           | abdusco wrote:
           | Use playwright's code generator that turns turn page
           | interactions into code.
           | 
           | https://playwright.dev/python/docs/codegen-intro
        
         | anothername12 wrote:
         | Is the WebDriver standard a good one? (Relative to playwright I
         | guess) I seem to recall some pains implementing it a few years
         | ago.
        
         | huy-nguyen wrote:
         | What's the relationship between Selenium, Puppeteer and
         | Webdriver BiDi? I'm a happy user of Playwright. Is there any
         | reason why I should consider Selenium or Puppeteer?
        
           | imiric wrote:
           | > Is there any reason why I should consider Selenium or
           | Puppeteer?
           | 
           | I'm not a heavy user of these tools, but I've dabbled in this
           | space.
           | 
           | I think Playwright is far ahead as far as features and
           | robustness go compared to alternatives. Firefox has been
           | supported for a long time, as well as other features
           | mentioned in this announcement like network interception and
           | preload scripts. CDP in general is much more mature than
           | WebDriver BiDi. Playwright also has a more modern API, with
           | official bindings in several languages.
           | 
           | One benefit of WebDriver BiDi is that it's in process of
           | becoming a W3C standard, which might lead to wider adoption
           | eventually.
           | 
           | But today, I don't see a reason to use anything other than
           | Playwright. Happy to read alternative opinions, though.
        
           | Vinnl wrote:
           | I think Playwright depends on forking the browsers to support
           | the features they need, so that may be less stable than using
           | a standard explicitly supported by the browsers, and/or more
           | representative of realistic browser use.
        
           | hugs wrote:
           | Maybe you don't want to live in world where Microsoft owns
           | everything (again)?
        
       | fitsumbelay wrote:
       | Been waiting for this. This _rocks_
        
       | mstijak wrote:
       | Are there any advantages to using Firefox over Chrome for
       | exporting PDFs with Puppeteer?
        
         | lol768 wrote:
         | I've found Firefox to produce better PDFs than Chrome does, for
         | what it's worth. There are some CSS properties that Chrome/Skia
         | doesn't honour properly (e.g. repeating-linear-gradient) or
         | ends up generating PDFs from that don't work universally.
        
           | freedomben wrote:
           | Indeed, Firefox uses PDF.js which I've found to produce
           | really good results.
        
             | mook wrote:
             | Doesn't PDF.js go the other way (convert a PDF into HTML-
             | and-friends for display in a browser, instead of "printing"
             | a page into a PDF)?
             | 
             | I haven't dug into it and am quite possibly incorrect,
             | hence the request for confirmation!
        
       | whatnotests2 wrote:
       | For an alternative approach, try browserbase.com
       | 
       | * https://browserbase.com/
        
         | cebert wrote:
         | Playwright is such a good experience. I don't understand why
         | you would need something like browserbase.
        
       | e12e wrote:
       | What are reasons to prefer puppeteer to playwright which supports
       | many browsers?
       | 
       | > Cross-browser. Playwright supports all modern rendering engines
       | including Chromium, WebKit, and Firefox.
       | 
       | https://playwright.dev/
        
         | Vinnl wrote:
         | I said this in a subthread:
         | 
         | > I think Playwright depends on forking the browsers to support
         | the features they need, so that may be less stable than using a
         | standard explicitly supported by the browsers, and/or more
         | representative of realistic browser use.
         | 
         | (And for Safari/WebKit to support it as well, but I'm not
         | holding my breath for that one.) Though I hope Playwright will
         | adopt BiDi at some point as well, as its testing features and
         | API are really nice.
        
       | yoavm wrote:
       | I know this isn't what the WebDriver BiDi protocol is for, but I
       | feel like it's 90% there to being a protocol through which you
       | can create browsers, with swappable engines. Gecko has gone a
       | long way since Servo, and it's actually quite performant these
       | days. The sad thing is that it's so much easier to create a
       | Chromium-based browser than it is to create a Gecko based one.
       | But with APIs for navigating, intercepting requests, reading the
       | console, executing JS - why not just embed the thing, remove all
       | the browser chrome around it, and let us create customized
       | browsers?
        
         | djbusby wrote:
         | I have dreamed about a swappable engine.
         | 
         | Like, a wrapper that does my history and tabs and book marks -
         | but let's me move from rendering in Chrome or Gecko or Servo or
         | whatever.
        
           | sorenjan wrote:
           | There used to be an extension for Firefox called "IE Tab for
           | Firefox" that used the IE rendering engine inside a Firefox
           | tab, for sites that only worked in IE.
        
             | hyzyla wrote:
             | The same idea with built in Internet Explorer in Microsoft
             | Edge, where you can switch to Internet Explorer mode and
             | open website that only correctly works in Internet Exlorer
        
       | burntcaramel wrote:
       | This is great! I'm curious about the accessibility tree noted in
       | the unsupported-for-now APIs. Accessing the accessibility tree
       | was something that was in Playwright for the big 3 engines but
       | got removed about a year ago. I think it was partly because as
       | noted it was a dump of engine-specific internal data structures:
       | "page.accessibility.snapshot returns a dump of the Chromium
       | accessibility tree".
       | 
       | I'd like to advocate for more focus on these accessibility trees.
       | They are a distillation of every semantic element on the page,
       | which makes them fantastic for snapshot "tests" or BDD tests.
       | 
       | My dream would be these accessibility trees one day become
       | standardized across the major browser engines. And perhaps from a
       | web dev point-of-view accessible from the other layers like CSS
       | and DOM.
        
       ___________________________________________________________________
       (page generated 2024-08-07 23:00 UTC)