[HN Gopher] Show HN: Page Replica - Tool for Web Scraping, Prere...
___________________________________________________________________
Show HN: Page Replica - Tool for Web Scraping, Prerendering, and
SEO Boost
Author : nirvanist
Score : 108 points
Date : 2024-01-01 15:09 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| ssgodderidge wrote:
| Curious, how would caching the pages and serving over NGINX help
| with SEO? Is there any benefits over serving a static site?
| nodja wrote:
| If your web app serves dynamic routes (i.e. client only) this
| helps with SEO because those routes are now directly visible
| through most crawlers.
| nirvanist wrote:
| routes yes but maybe not content, most of the time google
| search console will show that you have no content , also
| google ads boot need to access to content this useful to get
| your adsense account quickly validated
| jotto wrote:
| A "static site" implies HTML rather than a JavaScript app.
|
| With respect to JavaScript apps (React, Angular, etc.):
|
| It's not clear these days because the major search engines
| don't explicitly clarify whether they parse JavaScript apps (or
| if they only parse high-ranking JS apps/sites. But 10 years ago
| it was a must-have to be indexed.
|
| One theory on pre-rendering is it reduces cost for the crawlers
| since they don't need to spend 1-3s of CPU time pre-rendering
| your site. And by reducing costs, it may increase chances of
| being indexed or higher rank.
|
| My hunch is that long-term, pre-rendering is not necessary for
| getting indexed. But it is typically still necessary for URL
| unfurls (link previews) for various social media and chat apps.
|
| disclosure: I operate https://headless-render-api.com
| nirvanist wrote:
| absolutly , bots still need to access content as faster as
| possible
| colesantiago wrote:
| It seems in the AI era SEO is now starting to become irrelevant
| and a relic of the 2000-2020 era.
|
| Why is SEO needed still needed here when AI / LLMs can just
| conjure up answers with references to valid links, bypassing
| search engines.
|
| Even privacy based search engines like DuckDuckGo, Brave and Kagi
| doesn't prioritise 'SEO'.
| paulcole wrote:
| In the AI era, which services provide up to date information
| about local queries -- like which dentists near me are open
| today?
| nirvanist wrote:
| yep AI will not return a listing.
| LASR wrote:
| Any reasonable AI product that does this would use RAG under
| the hood.
|
| I hope nobody is using ChatGPT to query for information. This
| is how you get hallucinations.
| paulcole wrote:
| What is the reasonable AI product that actually does this?
| wongarsu wrote:
| I still use Google (and ddg and kagi), so people who want to
| sell me stuff try to get better rankings in these search
| machines. I'd also wager that people who primarily use LLMs to
| answer their questions are only a rounding error.
| imiric wrote:
| How will AI tools know which sources are "valid"? It's likely
| SEO will transform into ways of tricking bots scraping training
| data into considering their information as being more "valid".
|
| Alternative search engines must rely on AI themselves to filter
| out good results, or some form of manual curation by humans,
| like Kagi's boost/block/pin feature.
| spinningslate wrote:
| >Why is SEO needed still needed here when AI / LLMs can just
| conjure up answers with references to valid links, bypassing
| search engines.
|
| In short: money. LLMs will no doubt change the implementation,
| but the commercial dynamics are fundamentally the same. It's
| expensive to build and run a search engine, whether
| conventional or LLM-based. Someone has to pay for that - and
| it's not search users _. Advertising and its derivatives have
| become that revenue source, with all the good and bad that
| brings with it. As long as that commercial dynamic remains,
| there 'll be SEO or some derivative thereof.
|
| --
|
| _Other than Kagi - but that's a tiny niche.
| 8organicbits wrote:
| Because AI doesn't provide accurate information and you need to
| validate it yourself? Has anyone who cared about SEO stopped
| recently?
| chrischen wrote:
| Because AI conjures up crap like this: https://www.nops.io/blog
| /k0s-vs-k3s-vs-k8s/#:~:text=While%20....
| rgrieselhuber wrote:
| SEO has been "dead" since the late 90s.
| hipadev23 wrote:
| Google's PageRank is why SEO became a thing, so I
| respectfully disagree with your timeline.
| rgrieselhuber wrote:
| SEO was a thing in the 90s, I know people who were doing it
| then.
| conradfr wrote:
| But there's still people doing it today.
| rgrieselhuber wrote:
| Yes, that's the point of my original comment.
| la_fayette wrote:
| I think we must distinguish between onpage and offpage seo.
| This proposal is only relevant for onpage seo, for which i
| would mostly agree with your comment. However, inbound links
| are and will be the most important signal for search. What else
| would be left for ranking?
| mrtksn wrote:
| It amusing how after all these years people keep building tools
| to solve problems that didn't have to exist had people stick with
| the original Web architecture where stuff is sent to the browser
| in full.
|
| In places where JS is truly useful, that is when the UI is more
| than a text document, SEO is not a concern or possibility.
|
| I mean, good luck I don't want to be a buzzkill but IMHO the sad
| state of the Web and the Search engines can be traced back to SEO
| and JavaScript misuse.
| pmx wrote:
| > In places where JS is truly useful, that is when the UI is
| more than a text document, SEO is not a concern or possibility.
|
| I don't think you're thinking about this. e-commerce is more
| than just a document and SEO is massively important there but
| javascript makes the user experience miles better; Think
| product variation selectors, bulk pricing selectors, product
| filtering, realtime cart, etc, etc. It's insane to say we
| shouldn't use new tech so that the search engines can index us,
| do we just forever stick with what we had 15 years ago and
| never progress? Madness.
| mrtksn wrote:
| I should have been clearer: I'm not against JS, I'm huge fan
| actually but I don't see the promised lands of better user
| experience with having these overly complex architectures
| where the UI is JS heavy and has to be pre-rendered on the
| server side so that the Search engines can make a sense of
| it. Larger and larger, heavier and heavier JS never delivered
| those perfected UI experiences that Web technologist
| promised.
|
| The Web technology folks still circle around the same
| problems like 10 years ago when the actually popular and
| successful websites like HN and others deliver superb UX with
| the dinosaur tech.
|
| Anything that is actually using the advanced Web technologies
| are not in the domain of SEO, stuff like true web apps like
| Figma, Google Docs let's say. At best, you can index list of
| content for those but that list could have been a JS-free
| HTML rendered by PHP.
| nirvanist wrote:
| even with php you need to use caching, the deal it s to
| provide bot quickly with content :)
| mrtksn wrote:
| Sure, my complaint wasn't targeting your project but the
| need for it in first place. In the state of the web, this
| is a nice tool to do some useful things.
| davedx wrote:
| A lot of people on HN are very against JavaScript and I often
| get the feeling they're academics or don't work in the part
| of IT with web software. The reality is huge swathes of the
| web benefit from it whether people like it or not, this
| project is just one example.
| mrtksn wrote:
| As I'm the subject here, no I'm not an academic but used to
| work with Web technologies. Quit working with web
| technologies at the time when a new JS framework was
| hitting the home page with a promise to make everything
| better. I quit because I lost my productivity because I got
| carried away with all this JS stuff and find myself in
| infinite complexity eating away my time and energy for less
| than marginal gains on the final product. It was extremely
| cool but utterly useless.
|
| Now that I'm at the receiving end of the Web based
| products, i assure you the experience is horrific. I don't
| even use search engines as much since the AI went
| mainstream and can't be happier.
|
| Good riddance.
|
| Of course I do enjoy things made with the advanced Web
| technologies like Wasm, Figma for example. I'm every now
| and then impressed by some web app that does something I
| thought not possible before.
|
| However I'm still disgusted by documents that are some
| images and text loading tons of stuff that don't do
| anything beyond degrading my experience by slowing down the
| response time or act weird when interacting.
| xnx wrote:
| What's the use case? Scrape someone else's dynamic site and serve
| it statically as your own?
| digitcatphd wrote:
| Curious
| nirvanist wrote:
| and reddit too , thank you for your feedback
| nirvanist wrote:
| for me is t not the purpose, but you can do it if you want.
|
| the use case for me was that meteorjs app are pourly SEO
| friendly and I need it to have prerindering html to serve it
| for bots
| binarymax wrote:
| I read it as being able to dev your site in whatever you want,
| then scrape it and publish it as a static and seo optimized
| site.
| throwoutway wrote:
| That sounds sort like an inefficient static site generator
| amadeuspagel wrote:
| It sounds like a hack, which is after all what this website
| is dedicated to.
| janjones wrote:
| I have done something similar when archiving a dynamic site,
| serving it as static snapshot for free.
| Lukkaroinen wrote:
| I wonder where this is actually needed, since most React
| frameworks support metadata with server-side rendering.
| nirvanist wrote:
| thank your for the comment while it's true that not all web
| applications leverage the React library, it's important to note
| that Next.js inherently supports React. However, the choice of
| technology stack depends on the specific use case and
| requirements of your project.
| CharlesW wrote:
| It's not about React specifically, but about whether pre-
| rendering will make any difference from an SEO perspective.
|
| Not only do most frameworks do SSR, but Google is able to
| crawl dynamic content just fine. Here's an article from 2015
| on the topic: https://searchengineland.com/tested-googlebot-
| crawls-javascr...
| ahoka wrote:
| Google renders all sites in Chrome according to their
| documentation. So why would you render it? AFAIK they even
| penalize serving different content to their crawlers.
| nirvanist wrote:
| sure , but you want aslo your web app available for other
| bots, and you are not serving other content it same
| content just you avoid the js layer so it will be quickly
| serverd, so far for my projects it worked very well
| ,google serach console showed a better ranking , it s not
| a new technique and other services like prerender.io do
| the same things but not free.
| nirvanist wrote:
| Companies still struggle with seo. I've worked at two tech
| companies so far, and both utilized third-party services
| for ssr, the reason is that bot allow a few second to
| render your html if it s not availbale during that time
| frame, you can use nextjs it s a good alternative. In any
| case, it remains useful for my own projects and it s free,
| maybe can useful for other dev
| nkg wrote:
| Absolutely. We have a NextJS website and we faced some
| SEO problems due to Google bot not being able to get to
| some pages. To be fair, we used too much sliders.
___________________________________________________________________
(page generated 2024-01-01 23:00 UTC)