hngopher.com

       [HN Gopher] Show HN: An API that takes a URL and returns a file ...
       ___________________________________________________________________
        
       Show HN: An API that takes a URL and returns a file with browser
       screenshots
        
       Author : gkamer8
       Score  : 84 points
       Date   : 2025-02-06 18:48 UTC (4 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | tantaman wrote:
       | us ai?
        
         | ge96 wrote:
         | the very same
        
         | bangaladore wrote:
         | The website [1] is very strange. What does U.S. stand for? If I
         | were to stumble on this I'd assume it was a fishing / scam
         | website trying to impersonate the government. Bad vibes all
         | around.
         | 
         | [1] - https://us.ai/
        
           | gkamer8 wrote:
           | I'll try to improve the vibes :(
           | 
           | I've been working at this startup for almost two years now
           | and that page and branding etc has been changing a lot as you
           | can imagine ...
        
             | bangaladore wrote:
             | But what is the branding?
             | 
             | United States AI?
             | 
             | Like the premise of the company name is bad. Real bad.
        
             | bbor wrote:
             | A) Thanks for sharing your OSS with the world!!
             | 
             | B) I'm also a little confused. Surely that domain cost(s)
             | $$$ -- why not go with a cute "us" branding rather than
             | "U.S."? Unless you're looking to sell in other countries
             | where maybe U.S. expertise is a selling point, this
             | definitely comes across like you're pretending to be part
             | of the government.
             | 
             | EDIT: For comparison, we.ai costs $500,000/y (!!!)
             | 
             | EDIT2: It looks like you're positioning yourself as a
             | defense/govt contractor, thus the branding? That's
             | certainly cool, but IMHO, if I were you and owned that
             | domain, I'd offer it to Palantir for $$$$$ and just go with
             | your second choice. They're currently starting in on a
             | whole genocide/global war thing, so they have cash to burn!
        
               | gkamer8 wrote:
               | Hi thanks! The domain actually used to be a redirect link
               | to U.S. Automotive Industries (a trade publication). I
               | reached out to them and got a deal, so it was a lot for
               | me but not, like, we.ai expensive lol.
               | 
               | The name was always a corporate placeholder and I liked
               | the idea of US Steel or General Electric type names. Some
               | startups have done similar things, and many people
               | actually like the name a ton. But I know it's
               | controversial and so any products I made have their own
               | names and branding that's pretty separate (see: Abbey).
               | 
               | Over the past few months I've gone the gov contracting
               | route and the name actually made some sense, so I've used
               | it raw. Still, the plan is to get a DBA in the near
               | future and switch it up. Thanks for the advice!
        
           | tolerance wrote:
           | The similarities with WhiteHouse.gov's design can't be much
           | help either, I imagine.
        
             | bangaladore wrote:
             | I thought the same, but I didn't double check so I did not
             | mention it.
        
           | xp84 wrote:
           | Just a totally normal domain for an Anguillan perspective on
           | all things America
        
           | johnmaguire wrote:
           | Unrelated, but I continue to be confused by "ClaudeMind" a
           | JetBrains Plugin by "73signals":
           | https://plugins.jetbrains.com/plugin/25082-claudemind
           | 
           | Their website doesn't even mention 73signals:
           | https://claudemind.com/
           | 
           | Surely Anthropic must have an issue with this use of their
           | trademark? And 73signals seems so similar to 37signals as to
           | be intentional.
        
         | wildzzz wrote:
         | It's one guy running his little AI startup fresh out of
         | college. Claims to be a former national security analyst but
         | makes no such claim on his LinkedIn.
        
           | gkamer8 wrote:
           | Thanks for the catch on my LinkedIn, I really should have
           | that there now. It was originally something I kept private.
        
       | xnx wrote:
       | For anyone who might not be aware, Chrome also has the ability to
       | save screenshots from the command line using: chrome --headless
       | --screenshot="path/to/save/screenshot.png" --disable-gpu
       | --window-size=1280,720 "https://www.example.com"
        
         | martinbaun wrote:
         | Oh man, I needed this so many times didn't even think of doing
         | it like this. I tried using Selenium and all different external
         | services. Thank you!
         | 
         | Works in chromium as well.
        
         | azhenley wrote:
         | Very nice, I didn't know this. I used pyppeteer and selenium
         | for this previously which seemed excessive.
        
         | Onavo wrote:
         | What features won't work without GPU?
        
           | xnx wrote:
           | LMGTFY (Let Me Gemini That For You) :-)
           | https://gemini.google.com/share/e9428bb57a22
           | 
           | I've used that when running unattended batches, but worth
           | trying with and without disabling to see what is best for
           | your use case.
        
             | dingnuts wrote:
             | oh good an AI summary with none of the facts checked,
             | literally more useless than the old lmgtfy and somehow more
             | rude
             | 
             | "here's some output that looks relevant to your question
             | but I couldn't even be arsed to look any of it up, or copy
             | paste it, or confirm its validity"
        
           | kylecazar wrote:
           | This flag isn't valid anymore in the new chrome headless.
           | Disable GPU doesn't exist unless your on the old version (and
           | then, it was meant as a workaround for Windows users only).
           | 
           | I've used this via selenium not too long ago
        
         | cmgriffing wrote:
         | Quick note: when trying to do full page screenshots, Chrome
         | does a screenshot of the current view, then scrolls and does
         | another screenshot. This can cause some interesting artifacts
         | when rendering pages with scroll behaviors.
         | 
         | Firefox does a proper full page screenshot and even allows you
         | to set a higher DPS value. I use this a lot when making video
         | content.
         | 
         | Check out some of the args in FF using: `:screenshot --help`
        
         | input_sh wrote:
         | Firefox equivalent:                   firefox -screenshot
         | file.png https://example.com --window-size=1280,720
         | 
         | A bit annoyingly, it won't work if you have Firefox already
         | open.
        
           | cmgriffing wrote:
           | LOL, you and I posted very similar replies at the same time.
        
           | blueflow wrote:
           | > it won't work if you have Firefox already open
           | 
           | now try and go ahead how you could isolate these instances so
           | they cannot see each other. this leads into a rabbit hole of
           | bad design.
        
           | UnlockedSecrets wrote:
           | Does it work if you use a different profile with -p?
        
             | paulryanrogers wrote:
             | Maybe with --no-remote
        
           | amelius wrote:
           | > A bit annoyingly, it won't work if you have Firefox already
           | open.
           | 
           | I hate it when applications do this.
        
       | aspeckt-112 wrote:
       | I'm looking forward to giving this a go. Great idea!
        
       | manmal wrote:
       | Being a bit frustrated with Linkwarden's resource usage, I've
       | thought about making my own self hosted bookmarking service. This
       | could be a low effort way of loading screenshots for these links,
       | very cool! It'll be interesting how many concurrent requests this
       | can process.
        
       | synthomat wrote:
       | That's nice and everything but what to do about the EU cookie
       | banners? Does hosting outside of the EU help?
        
         | gkamer8 wrote:
         | Yeah the EU cookie banners are annoying, I'm hoping to do some
         | automation to click out of them before taking the screenshots
        
           | cjr wrote:
           | There are browser extensions you could run like consent-o-
           | matic to try to click and hide the cookies from your
           | screenshots:
           | 
           | https://chromewebstore.google.com/detail/consent-o-
           | matic/mdj...
           | 
           | Otherwise using a combination of well-known class names,
           | 'accept' strings, and heuristics such as z-index, position:
           | fixed/sticky etc can also narrow down the number of likely
           | elements that could be modals/banners.
           | 
           | You could also ask a vision model whether a screenshot has a
           | cookie banner, and ask for co-ordinates to remove it,
           | although this could get expensive at scale!
        
             | gkamer8 wrote:
             | Thanks, that's a great idea! I was originally going to go
             | the vision model route because I'd also like people to be
             | able to send instructions to sign in with some credentials
             | (like when visiting the nytimes or something).
        
             | artur_makly wrote:
             | yeah that's what we basically did here at
             | https://VisualSitemaps.com, but it can also be quickly
             | become over-the-top, and you may end up removing important
             | content. That's why in the end we added a second option to
             | just manually enter CSS classes.
        
         | cess11 wrote:
         | No. Tell the services you're using to stop with the malicious
         | compliance.
        
         | busymom0 wrote:
         | Would recommend using SeleniumBase's CDP mode to search for
         | those substrings, click accept on those cookie banners and then
         | take screenshot.
        
       | quink wrote:
       | > SCREENSHOT_JPEG_QUALITY
       | 
       | Not two words that should be near each other, and JPEG is the
       | only option.
       | 
       | Almost like it's designed to nerd-snipe someone into a PR to
       | change the format based on Accept headers.
        
         | gkamer8 wrote:
         | > Almost like it's designed to nerd-snipe someone into a PR to
         | change the format based on Accept headers
         | 
         | pls
        
       | mpetrovich wrote:
       | Reminds me of this open source library I wrote to do the same
       | thing: https://github.com/nextbigsoundinc/imagely
       | 
       | It uses puppeteer and chrome headless behind the scenes.
        
       | joshstrange wrote:
       | This is cool but at this point MCP is the clear choice for
       | exposing tools to LLMs, I'm sure someone will write a wrapper
       | around this to provide the same functionality as an MCP-SSE
       | server.
       | 
       | I want to try this out though and see how I like it compared to
       | the MCP Puppeteer I'm using now (which does a great job of
       | visiting pages, taking screenshots, interacting with the page,
       | etc).
        
       | jot wrote:
       | If you're worried about the security risks, edge cases,
       | maintenance pain and scaling challenges of self hosting there are
       | various solid hosted alternatives:
       | 
       | - https://browserless.io - low level browser control
       | 
       | - https://scrapingbee.com - scraping specialists
       | 
       | - https://urlbox.com - screenshot specialists*
       | 
       | They're all profitable and have been around for years so you can
       | depend on the businesses and the tech.
       | 
       | * Disclosure: I work on this one and was a customer before I
       | joined the team.
        
         | edm0nd wrote:
         | https://www.scraperapi.com/ is good too. Been using them to
         | scrape via their API on websites that have a lot of captchas or
         | anti scraping tech like DataDome.
        
         | rustdeveloper wrote:
         | Happy to suggest another web scraping API alternative I rely
         | on: https://scrapingfish.com
        
         | bbor wrote:
         | Do these services respect norobot manifests? Isn't this all
         | kinda... illegal...? Or at least non-consensual?
        
           | basilgohar wrote:
           | robots.txt isn't legally binding. I am interested to know if
           | and how services even interact with it. It's more like a clue
           | on when the interesting content for scrapers is on your site.
           | This is how I imagine it goes:
           | 
           | "Hey, don't scrape the data here."
           | 
           | "You know what? I'm scrape it even harder!"
        
       | morbusfonticuli wrote:
       | Similar project: gowitness [1].
       | 
       | A really cool tool i recently discovered. Next to scraping and
       | performing screenshots of websites and saving it in multiple
       | formats (including sqlite3), it can grab and save the headers,
       | console logs & cookies and has a super cool web GUI to access all
       | data and compare e.g the different records.
       | 
       | I'm planning to build my personal archive.org/waybackmachine-like
       | web-log tool via gowitness in the not-so-distant future.
       | 
       | [1] https://github.com/sensepost/gowitness
        
       | westurner wrote:
       | simonw/shot-scraper has a number of cli args, a GitHub actions
       | repo template, and docs: https://shot-
       | scraper.datasette.io/en/stable/
       | 
       | From https://news.ycombinator.com/item?id=30681242 :
       | 
       | > _Awesome Visual Regression Testing > lists quite a few tools
       | and online services: https://github.com/mojoaxel/awesome-
       | regression-testing_
       | 
       | > _" visual-regression": https://github.com/topics/visual-
       | regression _
        
       ___________________________________________________________________
       (page generated 2025-02-06 23:00 UTC)