[HN Gopher] Show HN: Spegel, a Terminal Browser That Uses LLMs t...
       ___________________________________________________________________
        
       Show HN: Spegel, a Terminal Browser That Uses LLMs to Rewrite
       Webpages
        
       Author : simedw
       Score  : 291 points
       Date   : 2025-07-01 12:49 UTC (10 hours ago)
        
 (HTM) web link (simedw.com)
 (TXT) w3m dump (simedw.com)
        
       | qsort wrote:
       | This is actually very cool. Not really replacing a browser, but
       | it could enable an alternative way of browsing the web with a
       | combination of deterministic search and prompts. It would
       | probably work even better as a command line tool.
       | 
       | A natural next step could be doing things with multiple "tabs" at
       | once, e.g: tab 1 contains news outlet A's coverage of a story,
       | tab 2 has outlet B's coverage, tab 3 has Wikipedia; summarize and
       | provide references. I guess the problem at that point is whether
       | the underlying model can support this type of workflow, which
       | doesn't really seem to be the case even with SOTA models.
        
         | simedw wrote:
         | Thank you.
         | 
         | I was thinking of showing multiple tabs/views at the same time,
         | but only from the same source.
         | 
         | Maybe we could have one tab with the original content optimised
         | for cli viewing, and another tab just doing fact checking (can
         | ground it with google search or brave). Would be a fun
         | experiment.
        
           | wrsh07 wrote:
           | Would really love to see more functionality built into this.
           | Handling post requests, enabling scripting, etc could all be
           | super powerful
        
           | nextaccountic wrote:
           | In your cleanup step, after cleaning obvious junk, I think
           | you should do whatever Firefox's reader mode does to further
           | clean up, and if that fails bail out to the current output.
           | That should reduce the number of tokens you send to the LLM
           | even more
           | 
           | You should also have some way for the LLM to indicate there
           | is no useful output because perhaps the page is supposed to
           | be a SPA. This would force you to execute Javascript to
           | render that particular page though
        
             | simedw wrote:
             | Just had a look and three is quite a lot going into
             | Firefox's reader mode.
             | 
             | https://github.com/mozilla/readability
        
           | myfonj wrote:
           | Interestingly, the original idea of what we call a "browser"
           | nowadays - the "user agent" - was built on the premise that
           | each user has specific needs and preferences. The user agent
           | was designed to act on their behalf, negotiating data
           | transfers and resolving conflicts between content author and
           | user (content consumer) preferences according to "strengths"
           | and various reconciliation mechanisms.
           | 
           | (The fact that browsers nowadays are usually expected to
           | represent something "pixel-perfect" to everyone with similar
           | devices is utterly against the original intention.)
           | 
           | Yet the original idea was (due to the state of technical
           | possibilities) primarily about design and interactivity. The
           | fact that we now have tools to extend this concept to core
           | language and content processing is... huge.
           | 
           | It seems we're approaching the moment when our individual
           | _personal_ agent, when asked about a new page, will tell us:
           | Well, there's nothing new of interest for you, frankly:
           | All information presented there was present on pages visited
           | recently.         -- or --         You've already learned
           | everything mentioned there. (*)         Here's a brief
           | summary: ...         (Do you want to dig deeper, see the
           | content verbatim, or anything else?)
           | 
           | Because its "browsing history" will also contain a notion of
           | what we "know" from chats or what we had previously marked as
           | "known".
        
             | ffsm8 wrote:
             | > Well, there's nothing new of interest for you, frankly
             | 
             | For this to work like a user would want, the model would
             | have to be sentient.
             | 
             | But you could try to get there with current models, it'd
             | just be very untrustworthy to the point of being pointless
             | beyond a novelty
        
               | myfonj wrote:
               | Not any more "sentient" than existing LLMs even in the
               | limited chat context span are already.
               | 
               | Naturally, >>nothing new of interest for you<< here is
               | indeed just a proxy for >>does not involve any
               | significant concept that you haven't previously expressed
               | knowledge about<< (or how to put it), what seems pretty
               | doable, provided that contract of "expressing knowledge
               | about something" had been made beforehand.
               | 
               | Let's say that all pages you have ever bookmarked you
               | have really grokked (yes, a stretch, no "read it later"
               | here) - then your personal model would be able to (again,
               | figuratively) "make qualified guess" about your
               | knowledge. Or some kind of tag that you could add to any
               | browsing history entry, or fragment, indicating "I
               | understand this". Or set the agent up to quiz you when
               | leaving a page (that would be brutal). Or ... I think you
               | got the gist now.
        
             | bee_rider wrote:
             | It would have to have a pretty good model of my brain to
             | help me make these decisions. Just as a random example, it
             | will have to understand that an equation is a sort of thing
             | that I'm likely to look up even if I understand the meaning
             | of it, just to double check and get the particulars right.
             | That's an obvious example, I think there must be other
             | examples that are less obvious.
             | 
             | Or that I'm looking up a data point that I already actually
             | know, just because I want to provide a citation.
             | 
             | But, it could be interesting.
        
               | myfonj wrote:
               | Well we should first establish some sort of contract how
               | to convey the "I feel that I actually understand this
               | particular piece of information, so when confronted with
               | it in the future, you can mark is as such". My lines of
               | thought were more about a tutorial page that would
               | present the same techniques as course you have finished a
               | week prior, or news page reporting on an event you just
               | read about on a different news site a minute before ...
               | stuff like this ... so you wold potentially save the time
               | skimming/reading/understanding only to realise there was
               | no added value _for you in that particular moment_. Or
               | while scrolling through a comment section, hide comment
               | parts repeating the same remark, or joke.
               | 
               | Or (and this is actually doable absolutely without any
               | "AI" at all):                   What the bloody hell
               | actually newly appeared on this particular URL since my
               | last visit?
               | 
               | (There is one page nearby that would be quite unusable
               | for me, had I not a crude userscript aid for this
               | particular purpose. But I can imagine having a digest
               | about "What's new here?" / "Noteworthy responses?" would
               | be way better.)
               | 
               | For the "I need to cite this source", naturally, you
               | would want the "verbatim" view without any amendments
               | anyway. Also probably before sharing / directing someone
               | to the resource, looking at the "true form" would be
               | still pretty necessary.
        
             | idiotsecant wrote:
             | I can definitely see a future in which we are qch have our
             | own personal memetic firewall, keeping us safe and cozy in
             | our personal little worldview bubbles.
        
           | baq wrote:
           | wonder if you can work on the DOM instead of HTML...
           | 
           | almost unrelated, but you can also compare spegel to
           | https://www.brow.sh/
        
         | andrepd wrote:
         | LLMs to generate SEO slop of the most utterly piss-poor
         | quality, then another LLM to lossilly "summarise" it back.
         | Brave new world?
        
         | TeMPOraL wrote:
         | > _tab 1 contains news outlet A 's coverage of a story, tab 2
         | has outlet B's coverage, tab 3 has Wikipedia; summarize and
         | provide references._
         | 
         | I _think_ this is basically what https://ground.news/ does.
         | 
         | (I'm not affiliated with them; just saw them in the sponsorship
         | section of a Kurzgesagt video the other day and figured they're
         | doing the thing you described +/- UI differences.)
        
           | doctoboggan wrote:
           | I am a ground news subscriber (joined with a Kurzgesagt ref
           | link) and it does work that way (minus the wikipedia
           | summary). It's pretty good and I particularly like their
           | "blindspot" section showing news that is generally missing
           | from a specific partisan new bubble.
        
       | bubblyworld wrote:
       | Classic that the first example is for parsing the goddamn recipe
       | from the goddamn recipe site. Instant thumbs up from me haha,
       | looks like a neat little project.
        
         | andrepd wrote:
         | Which it apparently does by completely changing the recipe in
         | random places including ingredients and amounts thereof. It is
         | _indeed_ a very good microcosm of what LLMs are, just not in
         | the way these comments think.
        
           | throwawayoldie wrote:
           | The output was then posted to the Internet for everyone to
           | see, without the minimal amount of proofreading that would be
           | necessary to catch that, which gives us a good microcosm of
           | how LLMs are used.
           | 
           | On a more pleasant topic the original recipe sounds
           | delicious, I may give it a try when the weather cools off a
           | little.
        
           | simedw wrote:
           | It was actually a bit worse than that the LLM never got the
           | full recipe due to some truncation logic I had added. So it
           | regurgitated the recipe from training, and apparently, it
           | couldn't do both that and convert units at the same time with
           | the lite model (it worked for just flash).
           | 
           | I should have caught that, and there are probably other bugs
           | too waiting to be found. That said, it's still a great
           | recipe.
        
             | andrepd wrote:
             | You're missing the point, but okay.
        
           | bubblyworld wrote:
           | What do you mean? The recipes in the screenshot look more or
           | less the same, the formatting has just changed in the Spiegel
           | one (which is what was asked for, so no surprises there).
           | 
           | Edit: just saw the author's comment, I think I'm looking at
           | the fixed page
        
         | IncreasePosts wrote:
         | There are extensions that do that for you, in a deterministic
         | way and not relying on LLMs. For example, Recipe Filter for
         | chrome. It just shows a pop up over the page when it loads if
         | it detects a recipe
        
           | bubblyworld wrote:
           | Thanks, I already use that plugin, actually, I just found the
           | problem amusingly familiar. Recipe sites are the original AI
           | slop =P
        
         | lpribis wrote:
         | Another great example of LLM hype train re-inventing something
         | that already existed [1] (and was actually thought out) but
         | making it worse and non-deterministic in the worst ways
         | possible.
         | 
         | https://schema.org/Recipe
        
       | ohadron wrote:
       | This is a terrific idea and could also have a lot of value with
       | regards to accessibility.
        
         | taco_emoji wrote:
         | The problem, as always, is that LLMs are not deterministic.
         | Accessibility needs to be reliable and predictable above all
         | else.
        
       | pepperonipboy wrote:
       | Could work great with emacs' eww!
        
         | thephotonsphere wrote:
         | also with lynx because it can browse from stdin
        
         | sammy0910 wrote:
         | I built a project that basically does this for emacs
         | 
         | https://github.com/sstraust/simpleweb
        
       | clbrmbr wrote:
       | Suggestion: add a -p option:                   spegel -p "extract
       | only the product reviews" > REVIEWS.md
        
       | sammy0910 wrote:
       | I built something that did this a bit ago
       | 
       | https://github.com/sstraust/simpleweb
        
         | sammy0910 wrote:
         | something I found challenging when I was building was -- how do
         | you make the speed fast enough so that it still creates a
         | smooth browsing experience?
         | 
         | I'm curious how you tackled that problem
        
           | simedw wrote:
           | That's a cool project.
           | 
           | I think most of it comes down to Flash-Lite being really
           | fast, and the fact that I'm only outputting markdown, which
           | is fairly easy and streams well.
        
           | 4b11b4 wrote:
           | https://github.com/sstraust/simpleweb/blob/79294b461b2e67a24.
           | ..
           | 
           | Not the answer to your question but here's the prompt
        
         | busssard wrote:
         | what does it do about javascript?
        
       | anonu wrote:
       | Don't you need javascript to make most webpages useful?
        
         | inetknght wrote:
         | Good sir, no.
         | 
         | The web has existed for long before javascript was around.
         | 
         | The web was useful for long before javascript was around.
         | 
         | I literally hate javascript -- not the language itself but the
         | way it is used. It has enabled some pretty cool things, yes.
         | But javascript is not required to make useful webpages.
        
           | pmxi wrote:
           | I think you misunderstood him. Yes, it's possible to CREATE a
           | useful webpage without JavaScript, but many EXISTING webpages
           | rely on JavaScript to be functional.
        
         | jazzyjackson wrote:
         | If Amazon.com can work with JavaScript disabled, any site could
         | be rewritten to do without. But I think to even get to the
         | content on a lot of SPAs this would need to be running a
         | headless browser to render the page, before extracting the
         | static content unfortunately
        
         | IncreasePosts wrote:
         | No - an experiment: try disabling javascript in your browser
         | settings, and then whenever you see a webpage that isn't
         | working, enable javascript for that domain. You'd be surprised
         | how fast 90% of the web feels with JS disabled.
        
       | nicklo wrote:
       | Have you considered making an MCP for this? Would be great for
       | use in vibe-coding
        
       | ktpsns wrote:
       | Reminds me of https://www.brow.sh/ which is not AI related at all
       | but just a very powerful terminal browser which in fact supports
       | JS, even videos.
        
       | cheevly wrote:
       | Very cool! My retired AI agent transformed live webpage content,
       | here's an old video clip of transforming HN to My Little Pony
       | (with some annoying sounds):
       | https://www.youtube.com/watch?v=1_j6cYeByOU. Skip to ~37 seconds
       | for the outcome. I made an open-source standalone Chrome
       | extension as well, it should probably still work for anyone
       | curious: https://github.com/joshgriffith/ChromeGPT
        
       | Klaster_1 wrote:
       | Now that's a user agent!
        
         | CaptainFever wrote:
         | Finally, web browsers work for the user, not the website
         | owners!
        
       | adrianpike wrote:
       | Super neat - I did something similar on a lark to enable useful
       | "web browsing" over 1200 baud packet - I have Starlink back at my
       | camp but might be a few miles away, so as long as I can get line
       | of sight I can Google up stuff, albeit slow. Worked well but I
       | never really productionalized it beyond some weekend tinkering.
        
       | eniac111 wrote:
       | Cool! It would be even better if it was able to create simple web
       | pages for vintage browsers.
        
         | stronglikedan wrote:
         | That would violate the do-one-thing-and-do-it-well principle
         | for no apparent benefit. There are plenty of tools to convert
         | markdown to basic HTML already.
        
       | treyd wrote:
       | I wonder if you could use a less sophisticated model (maybe even
       | something based on LSTMs) to walk over the DOM and extract just
       | the chunks that should be emitted and collected into the
       | browsable data structure, but doing it all locally. I feel like
       | it'd be straightforward to generate training data for this, using
       | an LLM-based toolchain like what the author wrote to be used
       | directly.
        
         | askonomm wrote:
         | Unfortunately in the modern web simply walking the DOM doesn't
         | cut it if the website's content loads in with JS. You could
         | only walk the DOM once the JS has loaded, and all the requests
         | it makes have finished, and at that point you're already using
         | a whole browser renderer anyway.
        
           | kccqzy wrote:
           | Yeah but this project doesn't use JS anyway.
        
       | deepdarkforest wrote:
       | The main problem with these approaches is that most sites now are
       | useless without JS or having access to the accessibility tree.
       | Projects like browser-use or other DOM based approaches at least
       | see the DOM(and screenshots).
       | 
       | I wonder if you could turn this into a chrome extension that at
       | least filters and parses the DOM
        
         | jadbox wrote:
         | I actually made a CLI tool recently that uses Puppeteer to
         | render the page including JS, summarizes key info and actions,
         | and enables simple form filling all from a CLI menu. I built it
         | for my own use-cases (checking and paying power bills from
         | CLI), but I'd love to get feedback on the core concept:
         | https://github.com/jadbox/solomonagent
        
           | andoando wrote:
           | Dude I love this. I've been thinking of doing this exactly
           | this, but for as a screen reader for accessibility reasons.
        
             | jadbox wrote:
             | Thanks, it's alpha at the moment- next feature is complex
             | forms and bug fixing broken actions (downloading). Do give
             | it a spin! Welcome to contribute or drop feedback on the
             | repo :)
        
         | willsmith72 wrote:
         | True for stuff requiring interaction, but to help their LCP/SEO
         | lots of sites these days render plain html first. It's not
         | "usable" but for viewing it's pretty good
        
       | stared wrote:
       | Any chance it would work for pages like Facebook or LinkedIn? I
       | would love to have a distraction-free way of searching
       | information there.
       | 
       | Obviously, against wishes of these social networks, which want us
       | to be addicted... I mean, engaged.
        
         | simedw wrote:
         | We'll probably have to add some custom code to log in, get an
         | auth token, and then browse with it. Not sure if LinkedIn would
         | like that, but I certainly would.
        
         | aydyn wrote:
         | Does anyone really get addicted to linkedin? Its so sanitized
         | and clinical. Nobody acts real on there or even pretends to.
        
           | encom wrote:
           | The worst[1] part about losing my job last month was having
           | to take LinkedIn seriously, and the best[2] part about now
           | having found a new job is logging off LinkedIn, for a very
           | long time hopefully. The self-aggrandising, pretentious,
           | occasionally virtue signalling, performance-posting make me
           | want to throw up. It takes a considerable amount of effort on
           | my part to not make sarcastic shitposts, but in the interest
           | of self preservation, I restrain myself. My header picture,
           | however, is my extremely messy desk, full of electronics,
           | tools, test equipment, drawings, computers and coffee cups.
           | Because that's just how I work when I'm in the zone, and it
           | serves as a quiet counterpoint to the polished self-promotion
           | people do.
           | 
           | And I didn't even get the new job through LinkedIn, though it
           | did yield one interview.
           | 
           | [1] Not the actual worst.
           | 
           | [2] Not the actual best.
        
       | fzaninotto wrote:
       | Congrats! Now you need an entire datacenter to visualize a web
       | page.
        
         | juujian wrote:
         | Couldn't this time reasonably well on a local machine is you
         | have some kind of neutral processing chip and enough ram?
         | Conversion to MD shouldn't require a huge model.
        
         | busssard wrote:
         | only if you use an API and not a dedicated distill/tune for
         | html to MD conversion.
         | 
         | But the question of Javascript remains
        
       | b0a04gl wrote:
       | this is another layer of abstraction on top of an already broken
       | system. you're running html through an llm to get markdown that
       | gets rendered in a terminal browser. that's like... three format
       | conversions just to read text. the original web had simple html
       | that was readable in any terminal browser already. now they arent
       | designed as documents anymore but rather designed as applications
       | that happen to deliver some content as a side effect
        
         | MangoToupe wrote:
         | That's the world we live in. You can either not have access to
         | content or you must accept abstractions to remove all the bad
         | decisions browser vendors have forced on us the last 30 years
         | to support ad-browsing.
        
         | _joel wrote:
         | > this is another layer of abstraction on top of an already
         | broken system
         | 
         | pretty much like all modern computing then, hey.
        
         | nashashmi wrote:
         | Think of it as a secretary that is transforming and formatting
         | information. You may desire for the original medium to be
         | something like what you want but you don't get that so you can
         | get a cheap dumber secretary instead.
        
         | worldsayshi wrote:
         | If the web site is a SPA that is hydrated using an API it would
         | be conceivable that the LLM can build a reusable interface
         | around the API while taking inspiration from the original page.
         | That interface can then be stored in some cache.
         | 
         | I'm not saying it's necessarily a good idea but perhaps a
         | bad/fun idea that can inspire good ideas?
        
         | amelius wrote:
         | I take it you never use "Reader mode" in your browser?
        
         | jrm4 wrote:
         | I 100% agree -- but still I find this a feature and not a bug.
         | It's always an arms race, and I like this shot fired.
        
       | 098799 wrote:
       | You could also use headless selenium under the hood and pipe to
       | the model the entire Dom of the document after the JavaScript was
       | loaded. Of course it would make it much slower but also would
       | amend the main worry people have which is many websites will flat
       | out not show anything in the initial GET request.
        
         | busssard wrote:
         | can you flesh this out a tiny bit? because for indy-crawlers
         | the javascript rendering is the main problem.
        
           | 098799 wrote:
           | Here's a sketch: https://chatgpt.com/share/68640b97-9a48-8007
           | -a27c-fdf85ff412... -- selenium drives your actual browser
           | under the hood.
        
       | web3aj wrote:
       | Very cool. I've been interested in browsing the web directly from
       | my terminal; this feels accessible.
        
       | insane_dreamer wrote:
       | Interesting, but why round-trip through an LLM just to convert
       | HTML to Markdown?
        
         | markstos wrote:
         | Because the modern web isn't reliably HTML, it's "web apps"
         | with heavy use of JavaScript and API calls. To first display
         | the HTML that you see in your browser, you need a user agent
         | that runs JavaScript and makes all the backend calls that
         | Chrome would make to put together some HTML.
         | 
         | Some websites may still return some static upfront that could
         | be usefully understood without JavaScript processing, but a lot
         | don't.
         | 
         | That's not to say you need an LLM, there are projects like
         | Puppeteer that are like headless browsers that can return the
         | rendered HTML, which can _then_ be sent through an HTML to
         | Markdown filter. That would be less computationally intensive.
        
           | insane_dreamer wrote:
           | > That's not to say you need an LLM, ... then be sent through
           | an HTML to Markdown filter. That would be less
           | computationally intensive.
           | 
           | which was exactly my point
        
         | crent wrote:
         | Because this isn't just converting HTML to markdown. I'd
         | recommend taking another look at the website and particularly
         | read the recipe example as it demonstrates the goal of the
         | project pretty well.
        
       | nashashmi wrote:
       | You should call this software a lens and filter instead of a
       | mirror. It takes the essential information and transforms it into
       | another medium.
        
       | amelius wrote:
       | Can it strip ads?
        
         | tossandthrow wrote:
         | It can inject its own!
        
           | amelius wrote:
           | You have a point as it uses Gemini under the hood. However,
           | the moment Google introduces ads in the model users will run
           | away. So Google really has no opportunity here to inject ads.
           | 
           | And wouldn't it be ironic if Gemini was used to strip ads
           | from webpages?
        
             | tossandthrow wrote:
             | The field of "seo for Ai", ie, seeking to have your company
             | featured in LLMs, is already established.
             | 
             | In the rare cases where the model would jam on its own,
             | this will likely already happen.
        
       | mromanuk wrote:
       | I definitely like the LLM in the middle, it's a nice way to
       | circumvent the SEO machine and how Google has optimized writing
       | in recent years. Removing all the cruft from a recipe is a
       | brilliant case for an LLM. And I suspect more of this is coming:
       | LLMs to filter. I mean, it would be nice to just read the recipe
       | from HTML, but SEO has turned everything into an arms race.
        
         | hirako2000 wrote:
         | Do you also like what it costs you to browse the web via an LLM
         | potentially swallowing millions of tokens per minutes ?
        
           | prophesi wrote:
           | This seems like a suitable job for a small language model.
           | Bit biased since I just read this paper[0]
           | 
           | [0] https://research.nvidia.com/labs/lpr/slm-agents/
        
         | yellow_lead wrote:
         | LLM adds cruft, LLM removes cruft, never a miscommunication
        
         | visarga wrote:
         | I foreseen this a couple years ago. We already have web search
         | tools in LLMs, and they are amazing when they chain multiple
         | searches. But Spegel is a completely different take.
         | 
         | I think the ad blocker of the future will be a local LLM, small
         | and efficient. Want to sort your timeline chronologically? Or
         | want a different UI? Want some things removed, and others
         | promoted? Hide low quality comments in a thread? All are
         | possible with LLM in the middle, in either agent or proxy mode.
         | 
         | I bet this will be unpleasant for advertisers.
        
         | tines wrote:
         | > Removing all the cruft from a recipe is a brilliant case for
         | an LLM
         | 
         | Is it though, when the LLM might mutate the recipe
         | unpredictably? I can't believe people trust probabilistic
         | software for cases that cannot tolerate error.
        
           | kccqzy wrote:
           | I agree with you in general, but recipes are not a case where
           | precision matters. I sometimes ask LLMs to give me a recipe
           | and if it hallucinates something it will simply be taste bad.
           | Not much different from a human-written recipe where the
           | human has drastically different tastes than I do. Also you
           | basically never apply the recipe blindly; you have intuition
           | from years of cooking to know you need to adjust recipes to
           | taste.
        
             | tines wrote:
             | Huh? You don't care if an LLM switches pounds to kilograms
             | because... recipes might taste bad anyway????
        
               | kccqzy wrote:
               | Switching pounds with kilograms is off by a factor of
               | two. Most people capable of cooking should have the
               | intuition to know something is awfully wrong if you are
               | off by a factor of two, especially since pounds and
               | kilograms are fairly large units when it comes to
               | cooking.
        
             | Uehreka wrote:
             | Hard disagree. I don't have "years of cooking" experience
             | to draw from necessarily. If I'm looking up a recipe it's
             | because I'm out of my comfort zone, and if the LLM version
             | of the recipe says to add 1/2 cup of paprika I'm not gonna
             | intuitively know that the right amount was actually 1
             | teaspoon. Well, at least until I eat the dish and realize
             | it's total garbage.
             | 
             | Also like, forget amounts, cook times are super important
             | and not always intuitive. If you screw them up you have to
             | throw out all your work and order take out.
        
               | kccqzy wrote:
               | All I'm arguing is that you should have the intuition to
               | know the difference between 1/2 cup of paprika and a
               | teaspoon. Okay maybe if you just graduated from college
               | and haven't cooked much you could make such a mistake but
               | realistically outside the tech bubble of HN you won't
               | find people confusing 1/2 cup with a teaspoon. It's just
               | intuitively wrong. An entire bottle of paprika I recently
               | bought has only 60 grams.
               | 
               | And yes cook times are important but no, even for a
               | human-written recipe you need the intuition to apply
               | adjustments. A recipe might be written presuming a
               | powerful gas burner but you have a cheap underpowered
               | electric. Or the recipe asks for a convection oven but
               | your oven doesn't have the feature. Or the recipe
               | presumes a 1100W microwave but you have a 1600W one. You
               | stand by the food while it cooks. You use a food
               | thermometer if needed.
        
             | whatevertrevor wrote:
             | Not really an apt comparison.
             | 
             | For one an AI generated recipe could be something that no
             | human could possibly like, whereas the human recipe comes
             | with at least one recommendation (assuming good faith on
             | the source, which you're doing anyway LLM or not).
             | 
             | Also an LLM may generate things that are downright inedible
             | or even toxic, though the latter is probably unlikely even
             | if possible.
             | 
             | I personally would never want to spend roughly an hour or
             | so making bad food from a hallucinated recipe wasting my
             | ingredients in the process, when I could have spent at most
             | 2 extra minutes scrolling down to find the recommended
             | recipe to avoid those issues. But to each their own I
             | guess.
        
           | joshvm wrote:
           | There is a well-defined solution to this. Provide your
           | recipes as a Recipe schema: https://schema.org/Recipe
           | 
           | Seems like most of the usual food blog plugins use it,
           | because it allows search engines to report calories and star
           | ratings without having to rely on a fuzzy parser. So while
           | the experience sucks for users, search engines use the
           | structured data to show carousels with overviews, calorie
           | totals and stuff like that.
           | 
           | https://recipecard.io/blog/how-to-add-recipe-structured-
           | data...
           | 
           | https://developers.google.com/search/docs/guides/intro-
           | struc...
           | 
           | EDIT: Sure enough, if you look at the OPs recipe example, the
           | schema is in the source. So for certain examples, you would
           | probably be better off having the LLM identify that it's a
           | recipe website (or other semantic content), extract the
           | schema from the header and then parse/render it
           | deterministically. This seems like one of those context-
           | dependent things: getting an LLM to turn a bunch of JSON into
           | markdown is fairly reliable. Getting it to extract that from
           | an entire HTML page is potentially to clutter the context,
           | but you could separate the two and have one agent summarise
           | any of the steps in the blog that might be pertinent.
           | {"@context":"https://schema.org/","@type":"Recipe","name":"Sl
           | owly Braised Lamb Ragu ...
        
       | kelsey98765431 wrote:
       | People here are not realizing that html is just the start. If you
       | can render a webpage into a view, you can render any input the
       | model accepts. PDF to this view. Zip file of images to this view.
       | Giant json file into this view. Whatever. The view is the product
       | here, not the html input.
        
       | nartho wrote:
       | I think the project itself is really cool, that said I really
       | don't like the trend of having LLMs regurgitate content back to
       | us. That said, this kinda makes me think of Browsh, who took the
       | opposite approach and tries to render the HTML in the terminal
       | (without LLMs as far as I know)
       | 
       | https://github.com/browsh-org/browsh
       | https://www.youtube.com/watch?v=HZq86XfBoRo
        
         | hirako2000 wrote:
         | That would also keep your wallet or GPU rag coller
        
       | hyperific wrote:
       | Why not use pandoc to convert html to markdown and have the LLM
       | condense from there?
        
       | cyrillite wrote:
       | I have been thinking of a project extremely similar to this for a
       | totally different purpose. It's lovely to see something like
       | this. Thank you for sharing it, inspiring
        
         | amelius wrote:
         | Curious about that different purpose ...
        
       | __MatrixMan__ wrote:
       | It would be cool of it were smart enough to figure out whether it
       | was necessary to rewrite the page on every visit. There's a large
       | chunk of the web where one of us could visit once, rewrite to
       | markdown, and then serve the cleaned up version to each other
       | without requiring a distinct rebuild on each visit.
        
         | pmxi wrote:
         | The author says this is for "personalized views using your own
         | prompts." Though, I suppose it's still useful to cache the
         | outputs for the default prompt.
        
           | __MatrixMan__ wrote:
           | Or to cache the output for whatever prompt your peers think
           | is most appropriate for that particular site.
        
         | myfonj wrote:
         | Each user have distinct needs, and has a distinct prior
         | knowledge about the topic, so even the "raw" super clean source
         | form will probably be eventually adjusted differently for most
         | users.
         | 
         | But yes, having some global shared redundant P2P cache (of the
         | "raw" data), like IPFS (?) could possibly help and save some
         | processing power and help with availability and data
         | preservation.
        
           | __MatrixMan__ wrote:
           | I imagine it sort of like a microscope. For any chunk of data
           | that people bothered to annotate with prompts re: how it
           | should be rendered you'd end up with two or three "lenses"
           | that you could toggle between. Or, if the existing lenses
           | don't do the trick, you could publish your own and, if your
           | immediate peers find them useful, maybe your transitive peers
           | will end up knowing about them as well.
        
         | simedw wrote:
         | If the goal is to have a more consistent layout on each visit,
         | I think we could save the last page's markdown and send it to
         | the model as a one-shot example...
        
         | markstos wrote:
         | Cache headers exist for servers to communicate to clients how
         | long it safe to cache things for. The client could be updated
         | to add a cache layer that respects cache headers.
        
       | WD-42 wrote:
       | Does anyone know why LLMs love emojis so much?
        
       | coder543 wrote:
       | Just a typo note: the flow diagram in the article says "Gemini
       | 2.5 Pro Lite", but there is no such thing.
        
         | simedw wrote:
         | You are right, it's Gemini 2.5 Flash Lite
        
       | neocodesoftware wrote:
       | Does it fail cloudflare captcha?
        
       | willm wrote:
       | Why not just use ncurses?
        
       | Bluestein wrote:
       | Gosh. Lovely project and cool, and - likewise - a bit _scary_ :
       | This is where the "bubble" seals itself "from the inside" and
       | custom (or cloud, biased) LLMs sear the "bubble" in.-
       | 
       | The ultimate rose (or red, or blue or black ...) coloured
       | glasses.-
        
       | mossTechnician wrote:
       | Changes Spegel made to the linked recipe's ingredients:
       | 
       | Pounds of lamb become kilograms (more than doubling the quantity
       | of meat), a medium onion turns large, one celery stalk becomes
       | two, six cloves of garlic turn into four, tomato paste vanishes,
       | we lose nearly half a cup of wine, beef stock gets an extra 3/4
       | cup, rosemary is replaced with oregano.
        
         | achierius wrote:
         | Did you actually observe this, or is just meant to be
         | illustrative of what could happen?
        
           | mossTechnician wrote:
           | This is what actually happened in the linked article. The
           | recipe is around the text that says
           | 
           | > Sometimes you don't want to read through someone's life
           | story just to get to a recipe... That said, this is a great
           | recipe
           | 
           | I compared the list of ingredients to the screenshot, did a
           | couple unit conversions, and these are the discrepancies I
           | saw.
        
         | orliesaurus wrote:
         | oh damn...
        
         | jugglinmike wrote:
         | Great catch. I was getting ready to mention the theoretical
         | risk of asking an LLM be your arbiter of truth; it didn't even
         | occur to me to check the chosen example for correctness. In a
         | way, this blog post is a useful illustration not just of the
         | hazards of LLMs, but also of our collective tendency to eschew
         | verity for novelty.
        
           | andrepd wrote:
           | > Great catch. I was getting ready to mention the theoretical
           | risk of asking an LLM be your arbiter of truth; it didn't
           | even occur to me to check the chosen example for correctness.
           | 
           | It's beyond parody at this point. Shit just doesn't work, but
           | this fundamental flaw of LLMs is just waved away or simply
           | not acknowledged at all!
           | 
           | You have an algorithm that rewrites textA to textB (so nice),
           | where textB potentially has no relation to textB (oh no).
           | Were it anything else this would mean "you don't have an
           | algorithm to rewrite textA to textB", but for gen ai?
           | Apparently this is not a fatal flaw, it's not even a flaw at
           | all!
           | 
           | I should also note that there is no indication that this
           | fundamental flaw can be corrected.
        
         | simedw wrote:
         | Fantastic catch! It led me down a rabbit hole, and I finally
         | found the root cause.
         | 
         | The recipe site was so long that it got truncated before being
         | sent to the LLM. Then, based on the first 8000 characters,
         | Gemini hallucinated the rest of the recipe, it was definitely
         | in its training set.
         | 
         | I have fixed it and pushed a new version of the project. Thanks
         | again, it really highlights how we can never fully trust
         | models.
        
       | jannniii wrote:
       | gopher is back!
        
       | IncreasePosts wrote:
       | I did something similar, but with a chrome extension. Basically,
       | for every web page, I feed the HTML to a local LLM (well, on a
       | server in my basement). I ask it to consider if the content is
       | likely clickbait or can be summarized without losing too many
       | interesting details, and if so, it adds a little floating icon to
       | the top of the page that I can click on to see the summary
       | instead.
       | 
       | My next plan is to rewrite hyperlinks to provide a summary of the
       | page on hover, or possibly to rewrite the hyperlinks to be more
       | indicative of the content at the end of it(no more complaining
       | about the titles of HN posts...). But, my machine isn't too beefy
       | and I'm not sure how well that will work, or how to prioritize
       | links on the page.
        
       | benrutter wrote:
       | Welcome to 2025 where it's more reasonable to filter all content
       | through an LLM than to expect web developers to make use of the
       | semantic web that's existed for more than a decade. . .
       | 
       | Serioisly though, looks like a novel fix for the problem that
       | most terminal browsers face. Namely that terminals are text
       | based, but the web, whilst it contains text, is often subdivided
       | up in a way that only really makes sense graphically.
       | 
       | I wonder if a similar type of thing might work for screen readers
       | or other accessibility features
        
       | cout wrote:
       | This is a neat idea!
       | 
       | I wonder if it could be adapted to render as gopher pages.
        
       | Buttons840 wrote:
       | A step towards the future of ad-blocking maybe? Just rewrite
       | every page?
        
         | conradkay wrote:
         | Something tells me we'll see more ad-inserting
        
           | Modified3019 wrote:
           | >Companies burning energy with llms to dynamically hide ads
           | and bullshit on every pageload
           | 
           | >Individuals burning energy using personal llm internet
           | condoms to strips ads and bullshit from every pageload
           | 
           | Eventually there will be a project where volunteers use llms
           | to harvest the real internet and "launder" both the copyright
           | and content into some kind of pre-processed distributed
           | shadow internet where things are actual useable, while being
           | just as wrong as the real internet.
           | 
           | What a future.
        
       | revskill wrote:
       | Use uv instead of pip
        
       | tartoran wrote:
       | Loving the text only browsing. Is this as fast as in the preview?
        
       | eevmanu wrote:
       | great POC
       | 
       | looks very similar to a chrome extension i use for a similar
       | goal: reader view -
       | https://chromewebstore.google.com/detail/ecabifbgmdmgdllomnf...
        
       | deadbabe wrote:
       | I would like to see a version of this where an LLM just takes the
       | highlights of various social media content from your feed and
       | just gives you the stuff worth watching. This also means
       | excluding crap you had no interest in and was simply inserted
       | into your feed. Fight algorithms with algorithms. Eliminate doom
       | scrolling.
        
       ___________________________________________________________________
       (page generated 2025-07-01 23:00 UTC)