[HN Gopher] Helium: Lighter Web Automation with Python
       ___________________________________________________________________
        
       Helium: Lighter Web Automation with Python
        
       Author : mherrmann
       Score  : 135 points
       Date   : 2024-12-11 12:11 UTC (10 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | fermigier wrote:
       | "We shut down the company at the end of 2019 and I felt it would
       | be a shame if Helium simply disappeared from the face of the
       | earth."
       | 
       | I appreciate the effort. Thank you M. Hermann.
        
       | wslh wrote:
       | How does it compare with the "usual suspects"? I mean Playwright,
       | Selenium, Cypress, and Puppeteer.
        
         | mherrmann wrote:
         | It's more high-level. Instead of saying "click element with ID
         | xv9873", you can say "click Download".
        
           | Yossarrian22 wrote:
           | That's how Playwright works too
        
             | mherrmann wrote:
             | Doesn't work for logging into HN:                   from
             | playwright.sync_api import sync_playwright
             | playwright = sync_playwright().start()         browser =
             | playwright.chromium.launch()         page =
             | browser.new_page()
             | page.goto('https://news.ycombinator.com/login?goto=news')
             | page.get_by_label('username').fill('mherrmann')         #
             | playwright._impl._errors.TimeoutError: Locator.fill:
             | Timeout 30000ms exceeded.
             | 
             | I suspect Playwright expects there to be a <label> for an
             | <input> element.
             | 
             | It does work with Helium:                   from helium
             | import *         start_chrome('https://news.ycombinator.com
             | /login?goto=news')         write('mherrmann',
             | into='username')
             | 
             | The two scripts are equivalent, except Helium's works and
             | is half as long.
        
               | mdaniel wrote:
               | Depending on when you tried that, there is no such label
               | "username" in view-
               | source:https://news.ycombinator.com/login?goto=news
               | <table border="0"><tr><td>username:</td><td><input
               | type="text" name="acct" size="20" autocorrect="off"
               | spellcheck="false" autocapitalize="off" autofocus="true">
               | </td></tr><tr><td>password:</td><td><input
               | type="password" name="pw"
               | size="20"></td></tr></table><br>
               | 
               | as they only put _text_ username not  <label> and the
               | input is named "acct" (without even the common decency to
               | include autocomplete=username)
               | 
               | So _if_ your script really did write that string into a
               | what it thinks is  'username' then that's arguably one
               | more thing to debug when its wizardry goes awry in some
               | unknown way
        
               | mherrmann wrote:
               | Yes, there is no such <label>. There is only such <td>.
               | And Playwright isn't smart enough to understand that that
               | is still the label for the <input> element. Helium _is_
               | smart enough.
               | 
               |  _So if your script really did write that string into a
               | what it thinks is 'username' then that's arguably one
               | more thing to debug when its wizardry goes awry in some
               | unknown way_
               | 
               | I tested my script. It writes into the correct field. The
               | logic is not hard: "Find an element to the right of the
               | given label." If there are multiple, then Helium uses the
               | one that's closest to the last element it interacted
               | with. That's just how a human would do it. It works
               | surprisingly well. In many years of using Helium, I
               | barely recall this causing problems once.
               | 
               | Try it before judging. You will be surprised by how well
               | it works.
        
               | mdaniel wrote:
               | > And Playwright isn't smart enough to understand that
               | that is still the label for the <input> element. Helium
               | is smart enough.
               | 
               | I'm glad you like your project, and I'm sure there are
               | others who will similarly enjoy that kind of magick.
               | However, it's super disingenuous to write an example that
               | asks a standards based API to find a non-existent element
               | and then clutch pearls because it didn't find a non-
               | existent element. page.get_by_label("I dunno, I didn't
               | read, do what I am thinking").fill('lol') similarly would
               | not work but that's not the awesome dunk you think it is
        
               | mherrmann wrote:
               | It's not disingenuous. The HN example was literally the
               | first one I tried. It's what happens in the real world.
               | The real world doesn't adhere to standards, much of the
               | time.
        
               | bdcravens wrote:
               | Just like what you've done with Selenium, such a wrapper
               | could be written for Playwright (I think that's what most
               | developers end up doing anyways, just in a more domain-
               | specific manner)
        
               | TimTheTinker wrote:
               | This project was started _long_ before Playwright
               | existed.
               | 
               | I's an OSS tool that had very good reason to be made the
               | way it was at that time, and continues to be useful (in
               | my opinion).
        
               | bdcravens wrote:
               | It is a useful tool, similar to others that accomplish
               | the same task (WATIR and Capybara in Ruby, for example).
               | My point was that the comparison of a wrapper to an
               | underlying library is a bit apples to oranges, as a
               | similar wrapper could be written for Playwright as well.
               | I haven't looked at the code, but I assume Helium's API
               | could be used to support Playwright (which itself was an
               | evolution of Puppeteer).
        
               | vunderba wrote:
               | I like what you're doing with Helium, and while you are
               | technically correct that it's half as long - IMHO it's a
               | bit disingenuous considering that in any _meaningful_ web
               | automation script, you 'd only need to put in the
               | initialization code a single time, e.g.:
               | from playwright.sync_api import sync_playwright
               | playwright = sync_playwright().start()         browser =
               | playwright.chromium.launch()         page =
               | browser.new_page()
        
       | languagehacker wrote:
       | Importing * is universally discouraged by most Python linters and
       | best practice docs. You can always "import helium as h" if you're
       | looking to type less.
       | 
       | This looks largely like common workarounds that most people will
       | write using Python-based browser automation. Most of the time, we
       | accept that those capabilities aren't there by default because
       | they are not explicit enough and can result in bugs and undefined
       | behavior even when the elements that we expect to be on the page
       | are actually there.
       | 
       | Given the adage "explicit is better than implicit", I worry that
       | a layer like this might create more trouble than it's worth for
       | the sake of readability. When we get into the nitty-gritty of
       | browser automation, it might just make it harder to debug than
       | going straight to Selenium or Playwright.
        
         | mherrmann wrote:
         | _Importing * is universally discouraged by most Python linters
         | and best practice docs._
         | 
         | Yup, I would never do it in a .py file. But I do it all of the
         | time in the interpreter, which is what the video shows.
         | 
         |  _This looks largely like common workarounds that most people
         | will write using Python-based browser automation. Most of the
         | time, we accept that those capabilities aren 't there by
         | default because they are not explicit enough and can result in
         | bugs and undefined behavior even when the elements that we
         | expect to be on the page are actually there._
         | 
         | It sounds like you haven't tried Helium yet. I think you
         | should, and see for yourself whether the trade-off you talk
         | about actually exists.
         | 
         |  _Given the adage "explicit is better than implicit", I worry
         | that a layer like this might create more trouble than it's
         | worth for the sake of readability._
         | 
         | You could make the same argument about using C / assembly
         | instead of Python. I suggest you try Helium before making
         | statements about the "trouble" it may create. I believe you
         | will find that there is no trouble.
        
       | quickvi wrote:
       | for lightweight automation outside the browser:
       | 
       | https://github.com/elyase/screenium
        
         | okso wrote:
         | macOS only (uses Apple Vision framework)
        
           | erikcw wrote:
           | I've used SikuliX[0] in the past for similar purposes.
           | Unfortunately the author hasn't had much time to maintain it
           | recently.
           | 
           | [0] https://github.com/RaiMan/SikuliX1
        
         | oulipo wrote:
         | Very cool! Could be a kind of open-source, text-based (eg
         | recipes are .md with instructions) version of KeyboardMaestro!
         | 
         | I'd love to see such an "open automation" format (could even be
         | more general than pure software, could also automate your IoT
         | or whatever, through extensions)
         | 
         | eg you could have a file "Type my bank login password" for bank
         | websites which doesn't let you use keyboard input but force you
         | to click on stuff, like a self-documented script using .md with
         | code                   # Type my bank login password
         | ## Trigger         ```trigger:hotkey         key: cmd+l
         | filter: frontmost-app=Chrome and
         | chrome.tab.url=~mybank.com/login         ```
         | ## Deps         ```ensure-deps         shell-runner>=1.*
         | screen-ocr>=1.*         python-runner>=1.*         ```
         | Ensure that my system has the proper extensions for the
         | framework, to run all tasks                  ## What it does
         | This automation lets me input my password in a "click-only"
         | input for my lousy bank UI                  ```run:shell
         | /bin/sh:capture-output=password         echo $(op --vault
         | personal --site mybank)         ```         (the above runs the
         | shell script and captures the output as a "password" variable I
         | can use in other scripts below)                  ```run:screen-
         | ocr:capture-output=ocr-result         window:chrome         ```
         | ...go on scripting using typescript/python to locate the
         | numbers in the ocr-result
        
       | nkrisc wrote:
       | Having done some ad-hoc, temporary automation with Selenium in
       | the past (to help fellow, less technically-inclined designers) I
       | wish I had this at the time.
       | 
       | Looks like a nice, almost natural language-like API around what
       | is otherwise a quite cumbersome API.
        
       | crazymoka wrote:
       | Can it be headless?
        
         | mherrmann wrote:
         | Yes: start_chrome(headless=True)
        
       | __mharrison__ wrote:
       | Thanks for posting. All this AI has been interested in scraping
       | personal sites.
        
         | mherrmann wrote:
         | I have actually been wondering whether Helium's more high-level
         | API lends itself well for use by AI.
        
           | grantc wrote:
           | This. Seems like you could wedge this and a model into a
           | scrappy version of computer use for browsers.
           | 
           | Fwiw, thanks for contributing this. It seems apt for a number
           | of repetitive things I probably do dozens of times a week and
           | don't even notice as cruft anymore.
           | 
           | I'm not sure why there were such hot takes on what this is or
           | isn't. Maybe Big Selenium crisis actors? You made something
           | cool, you shared it w/ world -- that should be the system
           | prompt for people posting about it in my kinder world of
           | things.
        
       | wokwokwok wrote:
       | How can a wrapper around selenium be lighter than it?
       | 
       | A wrapper around an API is by definition heavier (more code, more
       | functions) than using the lower level api.
       | 
       | It's not using less resources.
       | 
       | It's not faster (it has implicit waiting).
       | 
       | It's not less code; it's literally a superset of selenium?
       | 
       | Feels like a "selenium framework" is more accurate than light
       | weight web automation?
       | 
       | Anyway, there's no fixing automation tests with fancy APIs.
       | 
       | No matter what you try to do, if people are only interested in
       | writing quick dirty scripts, you're doomed to a pile of stupid
       | spaghetti no matter what system or framework you have.
       | 
       | If you want sustainable automation, you have to do Real Software
       | Engineering and write actual composable modules; and you can do
       | that in anything, even raw selenium.
       | 
       | So... I'd be more interested if this was pitched as "composable
       | lego for building automation" ...
       | 
       | ...but, personally, as it stands all I can really see is "makes
       | easy things easier with sensible defaults".
       | 
       | That's nice for getting started; but getting started is not the
       | problem with automation tests.
       | 
       | It's maintaining them.
        
         | mherrmann wrote:
         | Its use can be lighter. That is, the wrapper can be easier to
         | use.
         | 
         | Helium helps with maintaining automation tests as well. _click(
         | "Compose")_ is infinitely more maintainable than
         | _document.getElementById( "eIu7Db").click()_. (I just took this
         | example from Gmail's web interface.)
        
           | wokwokwok wrote:
           | How do you compose low level operations like "click here"
           | into composable modules like:
           | 
           | loginAsUser(user)
           | 
           | id = createBooking(user)
           | 
           | loginAsAdmin()
           | 
           | approveBooking(id)
           | 
           | ?
           | 
           | Is it the same as selenium? Do whatever you want your self?
           | 
           | That's what I'm talking about. Unless you have high level
           | composable modules that let you express high level test
           | activities then your tests will always fall apart.
           | 
           | The syntax of the _low level operations_ doesn't matter
           | because you will never ever care about a click("compose").
           | 
           | That's not a test.
           | 
           | A test might be:
           | 
           | createEmail()
           | 
           | attachFile(...)
           | 
           | ... whatever your bespoke business requirements are.
           | 
           | Having fancy wrappers?
           | 
           | Is it nicer? Sure.
           | 
           | Does it meaningfully improve the tests, maintaining tests?
           | 
           | Nope.
           | 
           | Because at the end of the day the low level operations _will_
           | be bespoke, nasty, messy and different for each website;
           | that's why you wrap them up in functions and _compose them_.
           | 
           | At least, in my experience; this looks a lot like cypress; a
           | high level set of operations with sensible defaults for easy
           | tasks.
           | 
           | ...but, practically, I'm skeptical that hiding the low level
           | nasty details actually makes them go away; it's smoothing
           | them over for the "happy path"; but automation tests are like
           | 90% edge cases.
           | 
           | > It's use can be lighter
           | 
           | I don't think that's the generally accepted meaning of a
           | light weight framework.
           | 
           | ...but eh, fair enough. I understand what you mean.
        
             | mherrmann wrote:
             | > I'm skeptical that hiding the low level nasty details
             | actually makes them go away
             | 
             | It makes 90+% of them go away. That's a big win. Try it.
        
             | Zardoz84 wrote:
             | what you are asking is GEB
        
           | n144q wrote:
           | That's just some superficial changes that often lead to
           | confusion and other negative consequences down the road,
           | especially when not handled carefully.
           | 
           | I would much rather directly rely on Selenium's stable APIs
           | than someone else's wrapped APIs that is opionated and could
           | be incomplete, incorrect, outdated and potentially
           | unmaintained someday. There are always much more resources
           | put into Selenium than these add-ons.
           | 
           | If I really want, I can choose a few APIs that I actually use
           | and wrap them within my codebase. That's more reliable than
           | this.
        
             | mherrmann wrote:
             | You can freely mix Helium and Selenium API calls. You don't
             | lose any of the power you are describing when you use
             | Helium.
        
         | pryelluw wrote:
         | > but, personally, as it stands all I can really see is "makes
         | easy things easier with sensible defaults".
         | 
         | "Lighter" may be used as an alternative adjective to the word
         | easy or easier. Your post, which comes off as very rude, misses
         | the point of how the project is marketed.
         | 
         | At least the OP did not call it Python automation for humans
         | ...
        
       | edm0nd wrote:
       | Very neat!
       | 
       | Rolling in a captcha solving service like DeathByCaptcha or
       | AntiCaptcha and you got yourself a quick and easy script that can
       | do anything on any website regardless of captchas.
        
       | bg24 wrote:
       | Nice work! I looked at the cheatsheet, and it is not obvious to
       | me how to go through two factor authentication during login.
        
         | mherrmann wrote:
         | Thanks! Helium only automates browsers. If the 2FA is happening
         | in the browser, then you can use Helium to automate the flow.
         | If it's outside, then that part cannot be handled by Helium.
        
       | slt2021 wrote:
       | Thank you for sharing this project, this is really good
        
       | giis wrote:
       | Looks nice. Is it possible start_chrome() with specific chrome
       | browser profile name or re-use existing open firefox/chrome
       | browser session and launch a new tab with specific domain?
        
         | mherrmann wrote:
         | I don't know. Please check if Selenium supports this and if
         | yes, use Helium's _set_driver(...)_ or _options_ argument to
         | _start_chrome(...)_.
        
       | bilater wrote:
       | Nice - I can see some cool agentic flows created using this. A
       | thing I want to look into is creating a sandbox instance
       | (Ubuntu?) and letting an agent do its thing. Could be collecting
       | data or answering questions and I can pull up the window to check
       | in from time to time. It'll be like having an assistant.
        
       | Byte64 wrote:
       | This is so cool
        
       | bryanrasmussen wrote:
       | How easy is it to detect that this is automation as opposed to a
       | real user? I suppose probably pretty easy, so not sure if it is
       | useful if I want to automate the web for things I do every day as
       | I would really be running the risk of turning off access to those
       | things if they determined I am automating them.
        
         | bdcravens wrote:
         | This is a wrapper on top of Selenium, so unless the library
         | implements additional techniques to improve stealth, it's on
         | par with Selenium's detectability (which as you pointed out can
         | be detected easily enough)
        
       ___________________________________________________________________
       (page generated 2024-12-11 23:01 UTC)