[HN Gopher] Show HN: Spegel, a Terminal Browser That Uses LLMs t...
___________________________________________________________________
Show HN: Spegel, a Terminal Browser That Uses LLMs to Rewrite
Webpages
Author : simedw
Score : 291 points
Date : 2025-07-01 12:49 UTC (10 hours ago)
(HTM) web link (simedw.com)
(TXT) w3m dump (simedw.com)
| qsort wrote:
| This is actually very cool. Not really replacing a browser, but
| it could enable an alternative way of browsing the web with a
| combination of deterministic search and prompts. It would
| probably work even better as a command line tool.
|
| A natural next step could be doing things with multiple "tabs" at
| once, e.g: tab 1 contains news outlet A's coverage of a story,
| tab 2 has outlet B's coverage, tab 3 has Wikipedia; summarize and
| provide references. I guess the problem at that point is whether
| the underlying model can support this type of workflow, which
| doesn't really seem to be the case even with SOTA models.
| simedw wrote:
| Thank you.
|
| I was thinking of showing multiple tabs/views at the same time,
| but only from the same source.
|
| Maybe we could have one tab with the original content optimised
| for cli viewing, and another tab just doing fact checking (can
| ground it with google search or brave). Would be a fun
| experiment.
| wrsh07 wrote:
| Would really love to see more functionality built into this.
| Handling post requests, enabling scripting, etc could all be
| super powerful
| nextaccountic wrote:
| In your cleanup step, after cleaning obvious junk, I think
| you should do whatever Firefox's reader mode does to further
| clean up, and if that fails bail out to the current output.
| That should reduce the number of tokens you send to the LLM
| even more
|
| You should also have some way for the LLM to indicate there
| is no useful output because perhaps the page is supposed to
| be a SPA. This would force you to execute Javascript to
| render that particular page though
| simedw wrote:
| Just had a look and three is quite a lot going into
| Firefox's reader mode.
|
| https://github.com/mozilla/readability
| myfonj wrote:
| Interestingly, the original idea of what we call a "browser"
| nowadays - the "user agent" - was built on the premise that
| each user has specific needs and preferences. The user agent
| was designed to act on their behalf, negotiating data
| transfers and resolving conflicts between content author and
| user (content consumer) preferences according to "strengths"
| and various reconciliation mechanisms.
|
| (The fact that browsers nowadays are usually expected to
| represent something "pixel-perfect" to everyone with similar
| devices is utterly against the original intention.)
|
| Yet the original idea was (due to the state of technical
| possibilities) primarily about design and interactivity. The
| fact that we now have tools to extend this concept to core
| language and content processing is... huge.
|
| It seems we're approaching the moment when our individual
| _personal_ agent, when asked about a new page, will tell us:
| Well, there's nothing new of interest for you, frankly:
| All information presented there was present on pages visited
| recently. -- or -- You've already learned
| everything mentioned there. (*) Here's a brief
| summary: ... (Do you want to dig deeper, see the
| content verbatim, or anything else?)
|
| Because its "browsing history" will also contain a notion of
| what we "know" from chats or what we had previously marked as
| "known".
| ffsm8 wrote:
| > Well, there's nothing new of interest for you, frankly
|
| For this to work like a user would want, the model would
| have to be sentient.
|
| But you could try to get there with current models, it'd
| just be very untrustworthy to the point of being pointless
| beyond a novelty
| myfonj wrote:
| Not any more "sentient" than existing LLMs even in the
| limited chat context span are already.
|
| Naturally, >>nothing new of interest for you<< here is
| indeed just a proxy for >>does not involve any
| significant concept that you haven't previously expressed
| knowledge about<< (or how to put it), what seems pretty
| doable, provided that contract of "expressing knowledge
| about something" had been made beforehand.
|
| Let's say that all pages you have ever bookmarked you
| have really grokked (yes, a stretch, no "read it later"
| here) - then your personal model would be able to (again,
| figuratively) "make qualified guess" about your
| knowledge. Or some kind of tag that you could add to any
| browsing history entry, or fragment, indicating "I
| understand this". Or set the agent up to quiz you when
| leaving a page (that would be brutal). Or ... I think you
| got the gist now.
| bee_rider wrote:
| It would have to have a pretty good model of my brain to
| help me make these decisions. Just as a random example, it
| will have to understand that an equation is a sort of thing
| that I'm likely to look up even if I understand the meaning
| of it, just to double check and get the particulars right.
| That's an obvious example, I think there must be other
| examples that are less obvious.
|
| Or that I'm looking up a data point that I already actually
| know, just because I want to provide a citation.
|
| But, it could be interesting.
| myfonj wrote:
| Well we should first establish some sort of contract how
| to convey the "I feel that I actually understand this
| particular piece of information, so when confronted with
| it in the future, you can mark is as such". My lines of
| thought were more about a tutorial page that would
| present the same techniques as course you have finished a
| week prior, or news page reporting on an event you just
| read about on a different news site a minute before ...
| stuff like this ... so you wold potentially save the time
| skimming/reading/understanding only to realise there was
| no added value _for you in that particular moment_. Or
| while scrolling through a comment section, hide comment
| parts repeating the same remark, or joke.
|
| Or (and this is actually doable absolutely without any
| "AI" at all): What the bloody hell
| actually newly appeared on this particular URL since my
| last visit?
|
| (There is one page nearby that would be quite unusable
| for me, had I not a crude userscript aid for this
| particular purpose. But I can imagine having a digest
| about "What's new here?" / "Noteworthy responses?" would
| be way better.)
|
| For the "I need to cite this source", naturally, you
| would want the "verbatim" view without any amendments
| anyway. Also probably before sharing / directing someone
| to the resource, looking at the "true form" would be
| still pretty necessary.
| idiotsecant wrote:
| I can definitely see a future in which we are qch have our
| own personal memetic firewall, keeping us safe and cozy in
| our personal little worldview bubbles.
| baq wrote:
| wonder if you can work on the DOM instead of HTML...
|
| almost unrelated, but you can also compare spegel to
| https://www.brow.sh/
| andrepd wrote:
| LLMs to generate SEO slop of the most utterly piss-poor
| quality, then another LLM to lossilly "summarise" it back.
| Brave new world?
| TeMPOraL wrote:
| > _tab 1 contains news outlet A 's coverage of a story, tab 2
| has outlet B's coverage, tab 3 has Wikipedia; summarize and
| provide references._
|
| I _think_ this is basically what https://ground.news/ does.
|
| (I'm not affiliated with them; just saw them in the sponsorship
| section of a Kurzgesagt video the other day and figured they're
| doing the thing you described +/- UI differences.)
| doctoboggan wrote:
| I am a ground news subscriber (joined with a Kurzgesagt ref
| link) and it does work that way (minus the wikipedia
| summary). It's pretty good and I particularly like their
| "blindspot" section showing news that is generally missing
| from a specific partisan new bubble.
| bubblyworld wrote:
| Classic that the first example is for parsing the goddamn recipe
| from the goddamn recipe site. Instant thumbs up from me haha,
| looks like a neat little project.
| andrepd wrote:
| Which it apparently does by completely changing the recipe in
| random places including ingredients and amounts thereof. It is
| _indeed_ a very good microcosm of what LLMs are, just not in
| the way these comments think.
| throwawayoldie wrote:
| The output was then posted to the Internet for everyone to
| see, without the minimal amount of proofreading that would be
| necessary to catch that, which gives us a good microcosm of
| how LLMs are used.
|
| On a more pleasant topic the original recipe sounds
| delicious, I may give it a try when the weather cools off a
| little.
| simedw wrote:
| It was actually a bit worse than that the LLM never got the
| full recipe due to some truncation logic I had added. So it
| regurgitated the recipe from training, and apparently, it
| couldn't do both that and convert units at the same time with
| the lite model (it worked for just flash).
|
| I should have caught that, and there are probably other bugs
| too waiting to be found. That said, it's still a great
| recipe.
| andrepd wrote:
| You're missing the point, but okay.
| bubblyworld wrote:
| What do you mean? The recipes in the screenshot look more or
| less the same, the formatting has just changed in the Spiegel
| one (which is what was asked for, so no surprises there).
|
| Edit: just saw the author's comment, I think I'm looking at
| the fixed page
| IncreasePosts wrote:
| There are extensions that do that for you, in a deterministic
| way and not relying on LLMs. For example, Recipe Filter for
| chrome. It just shows a pop up over the page when it loads if
| it detects a recipe
| bubblyworld wrote:
| Thanks, I already use that plugin, actually, I just found the
| problem amusingly familiar. Recipe sites are the original AI
| slop =P
| lpribis wrote:
| Another great example of LLM hype train re-inventing something
| that already existed [1] (and was actually thought out) but
| making it worse and non-deterministic in the worst ways
| possible.
|
| https://schema.org/Recipe
| ohadron wrote:
| This is a terrific idea and could also have a lot of value with
| regards to accessibility.
| taco_emoji wrote:
| The problem, as always, is that LLMs are not deterministic.
| Accessibility needs to be reliable and predictable above all
| else.
| pepperonipboy wrote:
| Could work great with emacs' eww!
| thephotonsphere wrote:
| also with lynx because it can browse from stdin
| sammy0910 wrote:
| I built a project that basically does this for emacs
|
| https://github.com/sstraust/simpleweb
| clbrmbr wrote:
| Suggestion: add a -p option: spegel -p "extract
| only the product reviews" > REVIEWS.md
| sammy0910 wrote:
| I built something that did this a bit ago
|
| https://github.com/sstraust/simpleweb
| sammy0910 wrote:
| something I found challenging when I was building was -- how do
| you make the speed fast enough so that it still creates a
| smooth browsing experience?
|
| I'm curious how you tackled that problem
| simedw wrote:
| That's a cool project.
|
| I think most of it comes down to Flash-Lite being really
| fast, and the fact that I'm only outputting markdown, which
| is fairly easy and streams well.
| 4b11b4 wrote:
| https://github.com/sstraust/simpleweb/blob/79294b461b2e67a24.
| ..
|
| Not the answer to your question but here's the prompt
| busssard wrote:
| what does it do about javascript?
| anonu wrote:
| Don't you need javascript to make most webpages useful?
| inetknght wrote:
| Good sir, no.
|
| The web has existed for long before javascript was around.
|
| The web was useful for long before javascript was around.
|
| I literally hate javascript -- not the language itself but the
| way it is used. It has enabled some pretty cool things, yes.
| But javascript is not required to make useful webpages.
| pmxi wrote:
| I think you misunderstood him. Yes, it's possible to CREATE a
| useful webpage without JavaScript, but many EXISTING webpages
| rely on JavaScript to be functional.
| jazzyjackson wrote:
| If Amazon.com can work with JavaScript disabled, any site could
| be rewritten to do without. But I think to even get to the
| content on a lot of SPAs this would need to be running a
| headless browser to render the page, before extracting the
| static content unfortunately
| IncreasePosts wrote:
| No - an experiment: try disabling javascript in your browser
| settings, and then whenever you see a webpage that isn't
| working, enable javascript for that domain. You'd be surprised
| how fast 90% of the web feels with JS disabled.
| nicklo wrote:
| Have you considered making an MCP for this? Would be great for
| use in vibe-coding
| ktpsns wrote:
| Reminds me of https://www.brow.sh/ which is not AI related at all
| but just a very powerful terminal browser which in fact supports
| JS, even videos.
| cheevly wrote:
| Very cool! My retired AI agent transformed live webpage content,
| here's an old video clip of transforming HN to My Little Pony
| (with some annoying sounds):
| https://www.youtube.com/watch?v=1_j6cYeByOU. Skip to ~37 seconds
| for the outcome. I made an open-source standalone Chrome
| extension as well, it should probably still work for anyone
| curious: https://github.com/joshgriffith/ChromeGPT
| Klaster_1 wrote:
| Now that's a user agent!
| CaptainFever wrote:
| Finally, web browsers work for the user, not the website
| owners!
| adrianpike wrote:
| Super neat - I did something similar on a lark to enable useful
| "web browsing" over 1200 baud packet - I have Starlink back at my
| camp but might be a few miles away, so as long as I can get line
| of sight I can Google up stuff, albeit slow. Worked well but I
| never really productionalized it beyond some weekend tinkering.
| eniac111 wrote:
| Cool! It would be even better if it was able to create simple web
| pages for vintage browsers.
| stronglikedan wrote:
| That would violate the do-one-thing-and-do-it-well principle
| for no apparent benefit. There are plenty of tools to convert
| markdown to basic HTML already.
| treyd wrote:
| I wonder if you could use a less sophisticated model (maybe even
| something based on LSTMs) to walk over the DOM and extract just
| the chunks that should be emitted and collected into the
| browsable data structure, but doing it all locally. I feel like
| it'd be straightforward to generate training data for this, using
| an LLM-based toolchain like what the author wrote to be used
| directly.
| askonomm wrote:
| Unfortunately in the modern web simply walking the DOM doesn't
| cut it if the website's content loads in with JS. You could
| only walk the DOM once the JS has loaded, and all the requests
| it makes have finished, and at that point you're already using
| a whole browser renderer anyway.
| kccqzy wrote:
| Yeah but this project doesn't use JS anyway.
| deepdarkforest wrote:
| The main problem with these approaches is that most sites now are
| useless without JS or having access to the accessibility tree.
| Projects like browser-use or other DOM based approaches at least
| see the DOM(and screenshots).
|
| I wonder if you could turn this into a chrome extension that at
| least filters and parses the DOM
| jadbox wrote:
| I actually made a CLI tool recently that uses Puppeteer to
| render the page including JS, summarizes key info and actions,
| and enables simple form filling all from a CLI menu. I built it
| for my own use-cases (checking and paying power bills from
| CLI), but I'd love to get feedback on the core concept:
| https://github.com/jadbox/solomonagent
| andoando wrote:
| Dude I love this. I've been thinking of doing this exactly
| this, but for as a screen reader for accessibility reasons.
| jadbox wrote:
| Thanks, it's alpha at the moment- next feature is complex
| forms and bug fixing broken actions (downloading). Do give
| it a spin! Welcome to contribute or drop feedback on the
| repo :)
| willsmith72 wrote:
| True for stuff requiring interaction, but to help their LCP/SEO
| lots of sites these days render plain html first. It's not
| "usable" but for viewing it's pretty good
| stared wrote:
| Any chance it would work for pages like Facebook or LinkedIn? I
| would love to have a distraction-free way of searching
| information there.
|
| Obviously, against wishes of these social networks, which want us
| to be addicted... I mean, engaged.
| simedw wrote:
| We'll probably have to add some custom code to log in, get an
| auth token, and then browse with it. Not sure if LinkedIn would
| like that, but I certainly would.
| aydyn wrote:
| Does anyone really get addicted to linkedin? Its so sanitized
| and clinical. Nobody acts real on there or even pretends to.
| encom wrote:
| The worst[1] part about losing my job last month was having
| to take LinkedIn seriously, and the best[2] part about now
| having found a new job is logging off LinkedIn, for a very
| long time hopefully. The self-aggrandising, pretentious,
| occasionally virtue signalling, performance-posting make me
| want to throw up. It takes a considerable amount of effort on
| my part to not make sarcastic shitposts, but in the interest
| of self preservation, I restrain myself. My header picture,
| however, is my extremely messy desk, full of electronics,
| tools, test equipment, drawings, computers and coffee cups.
| Because that's just how I work when I'm in the zone, and it
| serves as a quiet counterpoint to the polished self-promotion
| people do.
|
| And I didn't even get the new job through LinkedIn, though it
| did yield one interview.
|
| [1] Not the actual worst.
|
| [2] Not the actual best.
| fzaninotto wrote:
| Congrats! Now you need an entire datacenter to visualize a web
| page.
| juujian wrote:
| Couldn't this time reasonably well on a local machine is you
| have some kind of neutral processing chip and enough ram?
| Conversion to MD shouldn't require a huge model.
| busssard wrote:
| only if you use an API and not a dedicated distill/tune for
| html to MD conversion.
|
| But the question of Javascript remains
| b0a04gl wrote:
| this is another layer of abstraction on top of an already broken
| system. you're running html through an llm to get markdown that
| gets rendered in a terminal browser. that's like... three format
| conversions just to read text. the original web had simple html
| that was readable in any terminal browser already. now they arent
| designed as documents anymore but rather designed as applications
| that happen to deliver some content as a side effect
| MangoToupe wrote:
| That's the world we live in. You can either not have access to
| content or you must accept abstractions to remove all the bad
| decisions browser vendors have forced on us the last 30 years
| to support ad-browsing.
| _joel wrote:
| > this is another layer of abstraction on top of an already
| broken system
|
| pretty much like all modern computing then, hey.
| nashashmi wrote:
| Think of it as a secretary that is transforming and formatting
| information. You may desire for the original medium to be
| something like what you want but you don't get that so you can
| get a cheap dumber secretary instead.
| worldsayshi wrote:
| If the web site is a SPA that is hydrated using an API it would
| be conceivable that the LLM can build a reusable interface
| around the API while taking inspiration from the original page.
| That interface can then be stored in some cache.
|
| I'm not saying it's necessarily a good idea but perhaps a
| bad/fun idea that can inspire good ideas?
| amelius wrote:
| I take it you never use "Reader mode" in your browser?
| jrm4 wrote:
| I 100% agree -- but still I find this a feature and not a bug.
| It's always an arms race, and I like this shot fired.
| 098799 wrote:
| You could also use headless selenium under the hood and pipe to
| the model the entire Dom of the document after the JavaScript was
| loaded. Of course it would make it much slower but also would
| amend the main worry people have which is many websites will flat
| out not show anything in the initial GET request.
| busssard wrote:
| can you flesh this out a tiny bit? because for indy-crawlers
| the javascript rendering is the main problem.
| 098799 wrote:
| Here's a sketch: https://chatgpt.com/share/68640b97-9a48-8007
| -a27c-fdf85ff412... -- selenium drives your actual browser
| under the hood.
| web3aj wrote:
| Very cool. I've been interested in browsing the web directly from
| my terminal; this feels accessible.
| insane_dreamer wrote:
| Interesting, but why round-trip through an LLM just to convert
| HTML to Markdown?
| markstos wrote:
| Because the modern web isn't reliably HTML, it's "web apps"
| with heavy use of JavaScript and API calls. To first display
| the HTML that you see in your browser, you need a user agent
| that runs JavaScript and makes all the backend calls that
| Chrome would make to put together some HTML.
|
| Some websites may still return some static upfront that could
| be usefully understood without JavaScript processing, but a lot
| don't.
|
| That's not to say you need an LLM, there are projects like
| Puppeteer that are like headless browsers that can return the
| rendered HTML, which can _then_ be sent through an HTML to
| Markdown filter. That would be less computationally intensive.
| insane_dreamer wrote:
| > That's not to say you need an LLM, ... then be sent through
| an HTML to Markdown filter. That would be less
| computationally intensive.
|
| which was exactly my point
| crent wrote:
| Because this isn't just converting HTML to markdown. I'd
| recommend taking another look at the website and particularly
| read the recipe example as it demonstrates the goal of the
| project pretty well.
| nashashmi wrote:
| You should call this software a lens and filter instead of a
| mirror. It takes the essential information and transforms it into
| another medium.
| amelius wrote:
| Can it strip ads?
| tossandthrow wrote:
| It can inject its own!
| amelius wrote:
| You have a point as it uses Gemini under the hood. However,
| the moment Google introduces ads in the model users will run
| away. So Google really has no opportunity here to inject ads.
|
| And wouldn't it be ironic if Gemini was used to strip ads
| from webpages?
| tossandthrow wrote:
| The field of "seo for Ai", ie, seeking to have your company
| featured in LLMs, is already established.
|
| In the rare cases where the model would jam on its own,
| this will likely already happen.
| mromanuk wrote:
| I definitely like the LLM in the middle, it's a nice way to
| circumvent the SEO machine and how Google has optimized writing
| in recent years. Removing all the cruft from a recipe is a
| brilliant case for an LLM. And I suspect more of this is coming:
| LLMs to filter. I mean, it would be nice to just read the recipe
| from HTML, but SEO has turned everything into an arms race.
| hirako2000 wrote:
| Do you also like what it costs you to browse the web via an LLM
| potentially swallowing millions of tokens per minutes ?
| prophesi wrote:
| This seems like a suitable job for a small language model.
| Bit biased since I just read this paper[0]
|
| [0] https://research.nvidia.com/labs/lpr/slm-agents/
| yellow_lead wrote:
| LLM adds cruft, LLM removes cruft, never a miscommunication
| visarga wrote:
| I foreseen this a couple years ago. We already have web search
| tools in LLMs, and they are amazing when they chain multiple
| searches. But Spegel is a completely different take.
|
| I think the ad blocker of the future will be a local LLM, small
| and efficient. Want to sort your timeline chronologically? Or
| want a different UI? Want some things removed, and others
| promoted? Hide low quality comments in a thread? All are
| possible with LLM in the middle, in either agent or proxy mode.
|
| I bet this will be unpleasant for advertisers.
| tines wrote:
| > Removing all the cruft from a recipe is a brilliant case for
| an LLM
|
| Is it though, when the LLM might mutate the recipe
| unpredictably? I can't believe people trust probabilistic
| software for cases that cannot tolerate error.
| kccqzy wrote:
| I agree with you in general, but recipes are not a case where
| precision matters. I sometimes ask LLMs to give me a recipe
| and if it hallucinates something it will simply be taste bad.
| Not much different from a human-written recipe where the
| human has drastically different tastes than I do. Also you
| basically never apply the recipe blindly; you have intuition
| from years of cooking to know you need to adjust recipes to
| taste.
| tines wrote:
| Huh? You don't care if an LLM switches pounds to kilograms
| because... recipes might taste bad anyway????
| kccqzy wrote:
| Switching pounds with kilograms is off by a factor of
| two. Most people capable of cooking should have the
| intuition to know something is awfully wrong if you are
| off by a factor of two, especially since pounds and
| kilograms are fairly large units when it comes to
| cooking.
| Uehreka wrote:
| Hard disagree. I don't have "years of cooking" experience
| to draw from necessarily. If I'm looking up a recipe it's
| because I'm out of my comfort zone, and if the LLM version
| of the recipe says to add 1/2 cup of paprika I'm not gonna
| intuitively know that the right amount was actually 1
| teaspoon. Well, at least until I eat the dish and realize
| it's total garbage.
|
| Also like, forget amounts, cook times are super important
| and not always intuitive. If you screw them up you have to
| throw out all your work and order take out.
| kccqzy wrote:
| All I'm arguing is that you should have the intuition to
| know the difference between 1/2 cup of paprika and a
| teaspoon. Okay maybe if you just graduated from college
| and haven't cooked much you could make such a mistake but
| realistically outside the tech bubble of HN you won't
| find people confusing 1/2 cup with a teaspoon. It's just
| intuitively wrong. An entire bottle of paprika I recently
| bought has only 60 grams.
|
| And yes cook times are important but no, even for a
| human-written recipe you need the intuition to apply
| adjustments. A recipe might be written presuming a
| powerful gas burner but you have a cheap underpowered
| electric. Or the recipe asks for a convection oven but
| your oven doesn't have the feature. Or the recipe
| presumes a 1100W microwave but you have a 1600W one. You
| stand by the food while it cooks. You use a food
| thermometer if needed.
| whatevertrevor wrote:
| Not really an apt comparison.
|
| For one an AI generated recipe could be something that no
| human could possibly like, whereas the human recipe comes
| with at least one recommendation (assuming good faith on
| the source, which you're doing anyway LLM or not).
|
| Also an LLM may generate things that are downright inedible
| or even toxic, though the latter is probably unlikely even
| if possible.
|
| I personally would never want to spend roughly an hour or
| so making bad food from a hallucinated recipe wasting my
| ingredients in the process, when I could have spent at most
| 2 extra minutes scrolling down to find the recommended
| recipe to avoid those issues. But to each their own I
| guess.
| joshvm wrote:
| There is a well-defined solution to this. Provide your
| recipes as a Recipe schema: https://schema.org/Recipe
|
| Seems like most of the usual food blog plugins use it,
| because it allows search engines to report calories and star
| ratings without having to rely on a fuzzy parser. So while
| the experience sucks for users, search engines use the
| structured data to show carousels with overviews, calorie
| totals and stuff like that.
|
| https://recipecard.io/blog/how-to-add-recipe-structured-
| data...
|
| https://developers.google.com/search/docs/guides/intro-
| struc...
|
| EDIT: Sure enough, if you look at the OPs recipe example, the
| schema is in the source. So for certain examples, you would
| probably be better off having the LLM identify that it's a
| recipe website (or other semantic content), extract the
| schema from the header and then parse/render it
| deterministically. This seems like one of those context-
| dependent things: getting an LLM to turn a bunch of JSON into
| markdown is fairly reliable. Getting it to extract that from
| an entire HTML page is potentially to clutter the context,
| but you could separate the two and have one agent summarise
| any of the steps in the blog that might be pertinent.
| {"@context":"https://schema.org/","@type":"Recipe","name":"Sl
| owly Braised Lamb Ragu ...
| kelsey98765431 wrote:
| People here are not realizing that html is just the start. If you
| can render a webpage into a view, you can render any input the
| model accepts. PDF to this view. Zip file of images to this view.
| Giant json file into this view. Whatever. The view is the product
| here, not the html input.
| nartho wrote:
| I think the project itself is really cool, that said I really
| don't like the trend of having LLMs regurgitate content back to
| us. That said, this kinda makes me think of Browsh, who took the
| opposite approach and tries to render the HTML in the terminal
| (without LLMs as far as I know)
|
| https://github.com/browsh-org/browsh
| https://www.youtube.com/watch?v=HZq86XfBoRo
| hirako2000 wrote:
| That would also keep your wallet or GPU rag coller
| hyperific wrote:
| Why not use pandoc to convert html to markdown and have the LLM
| condense from there?
| cyrillite wrote:
| I have been thinking of a project extremely similar to this for a
| totally different purpose. It's lovely to see something like
| this. Thank you for sharing it, inspiring
| amelius wrote:
| Curious about that different purpose ...
| __MatrixMan__ wrote:
| It would be cool of it were smart enough to figure out whether it
| was necessary to rewrite the page on every visit. There's a large
| chunk of the web where one of us could visit once, rewrite to
| markdown, and then serve the cleaned up version to each other
| without requiring a distinct rebuild on each visit.
| pmxi wrote:
| The author says this is for "personalized views using your own
| prompts." Though, I suppose it's still useful to cache the
| outputs for the default prompt.
| __MatrixMan__ wrote:
| Or to cache the output for whatever prompt your peers think
| is most appropriate for that particular site.
| myfonj wrote:
| Each user have distinct needs, and has a distinct prior
| knowledge about the topic, so even the "raw" super clean source
| form will probably be eventually adjusted differently for most
| users.
|
| But yes, having some global shared redundant P2P cache (of the
| "raw" data), like IPFS (?) could possibly help and save some
| processing power and help with availability and data
| preservation.
| __MatrixMan__ wrote:
| I imagine it sort of like a microscope. For any chunk of data
| that people bothered to annotate with prompts re: how it
| should be rendered you'd end up with two or three "lenses"
| that you could toggle between. Or, if the existing lenses
| don't do the trick, you could publish your own and, if your
| immediate peers find them useful, maybe your transitive peers
| will end up knowing about them as well.
| simedw wrote:
| If the goal is to have a more consistent layout on each visit,
| I think we could save the last page's markdown and send it to
| the model as a one-shot example...
| markstos wrote:
| Cache headers exist for servers to communicate to clients how
| long it safe to cache things for. The client could be updated
| to add a cache layer that respects cache headers.
| WD-42 wrote:
| Does anyone know why LLMs love emojis so much?
| coder543 wrote:
| Just a typo note: the flow diagram in the article says "Gemini
| 2.5 Pro Lite", but there is no such thing.
| simedw wrote:
| You are right, it's Gemini 2.5 Flash Lite
| neocodesoftware wrote:
| Does it fail cloudflare captcha?
| willm wrote:
| Why not just use ncurses?
| Bluestein wrote:
| Gosh. Lovely project and cool, and - likewise - a bit _scary_ :
| This is where the "bubble" seals itself "from the inside" and
| custom (or cloud, biased) LLMs sear the "bubble" in.-
|
| The ultimate rose (or red, or blue or black ...) coloured
| glasses.-
| mossTechnician wrote:
| Changes Spegel made to the linked recipe's ingredients:
|
| Pounds of lamb become kilograms (more than doubling the quantity
| of meat), a medium onion turns large, one celery stalk becomes
| two, six cloves of garlic turn into four, tomato paste vanishes,
| we lose nearly half a cup of wine, beef stock gets an extra 3/4
| cup, rosemary is replaced with oregano.
| achierius wrote:
| Did you actually observe this, or is just meant to be
| illustrative of what could happen?
| mossTechnician wrote:
| This is what actually happened in the linked article. The
| recipe is around the text that says
|
| > Sometimes you don't want to read through someone's life
| story just to get to a recipe... That said, this is a great
| recipe
|
| I compared the list of ingredients to the screenshot, did a
| couple unit conversions, and these are the discrepancies I
| saw.
| orliesaurus wrote:
| oh damn...
| jugglinmike wrote:
| Great catch. I was getting ready to mention the theoretical
| risk of asking an LLM be your arbiter of truth; it didn't even
| occur to me to check the chosen example for correctness. In a
| way, this blog post is a useful illustration not just of the
| hazards of LLMs, but also of our collective tendency to eschew
| verity for novelty.
| andrepd wrote:
| > Great catch. I was getting ready to mention the theoretical
| risk of asking an LLM be your arbiter of truth; it didn't
| even occur to me to check the chosen example for correctness.
|
| It's beyond parody at this point. Shit just doesn't work, but
| this fundamental flaw of LLMs is just waved away or simply
| not acknowledged at all!
|
| You have an algorithm that rewrites textA to textB (so nice),
| where textB potentially has no relation to textB (oh no).
| Were it anything else this would mean "you don't have an
| algorithm to rewrite textA to textB", but for gen ai?
| Apparently this is not a fatal flaw, it's not even a flaw at
| all!
|
| I should also note that there is no indication that this
| fundamental flaw can be corrected.
| simedw wrote:
| Fantastic catch! It led me down a rabbit hole, and I finally
| found the root cause.
|
| The recipe site was so long that it got truncated before being
| sent to the LLM. Then, based on the first 8000 characters,
| Gemini hallucinated the rest of the recipe, it was definitely
| in its training set.
|
| I have fixed it and pushed a new version of the project. Thanks
| again, it really highlights how we can never fully trust
| models.
| jannniii wrote:
| gopher is back!
| IncreasePosts wrote:
| I did something similar, but with a chrome extension. Basically,
| for every web page, I feed the HTML to a local LLM (well, on a
| server in my basement). I ask it to consider if the content is
| likely clickbait or can be summarized without losing too many
| interesting details, and if so, it adds a little floating icon to
| the top of the page that I can click on to see the summary
| instead.
|
| My next plan is to rewrite hyperlinks to provide a summary of the
| page on hover, or possibly to rewrite the hyperlinks to be more
| indicative of the content at the end of it(no more complaining
| about the titles of HN posts...). But, my machine isn't too beefy
| and I'm not sure how well that will work, or how to prioritize
| links on the page.
| benrutter wrote:
| Welcome to 2025 where it's more reasonable to filter all content
| through an LLM than to expect web developers to make use of the
| semantic web that's existed for more than a decade. . .
|
| Serioisly though, looks like a novel fix for the problem that
| most terminal browsers face. Namely that terminals are text
| based, but the web, whilst it contains text, is often subdivided
| up in a way that only really makes sense graphically.
|
| I wonder if a similar type of thing might work for screen readers
| or other accessibility features
| cout wrote:
| This is a neat idea!
|
| I wonder if it could be adapted to render as gopher pages.
| Buttons840 wrote:
| A step towards the future of ad-blocking maybe? Just rewrite
| every page?
| conradkay wrote:
| Something tells me we'll see more ad-inserting
| Modified3019 wrote:
| >Companies burning energy with llms to dynamically hide ads
| and bullshit on every pageload
|
| >Individuals burning energy using personal llm internet
| condoms to strips ads and bullshit from every pageload
|
| Eventually there will be a project where volunteers use llms
| to harvest the real internet and "launder" both the copyright
| and content into some kind of pre-processed distributed
| shadow internet where things are actual useable, while being
| just as wrong as the real internet.
|
| What a future.
| revskill wrote:
| Use uv instead of pip
| tartoran wrote:
| Loving the text only browsing. Is this as fast as in the preview?
| eevmanu wrote:
| great POC
|
| looks very similar to a chrome extension i use for a similar
| goal: reader view -
| https://chromewebstore.google.com/detail/ecabifbgmdmgdllomnf...
| deadbabe wrote:
| I would like to see a version of this where an LLM just takes the
| highlights of various social media content from your feed and
| just gives you the stuff worth watching. This also means
| excluding crap you had no interest in and was simply inserted
| into your feed. Fight algorithms with algorithms. Eliminate doom
| scrolling.
___________________________________________________________________
(page generated 2025-07-01 23:00 UTC)