[HN Gopher] Show HN: Page Replica - Tool for Web Scraping, Prere...
       ___________________________________________________________________
        
       Show HN: Page Replica - Tool for Web Scraping, Prerendering, and
       SEO Boost
        
       Author : nirvanist
       Score  : 108 points
       Date   : 2024-01-01 15:09 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | ssgodderidge wrote:
       | Curious, how would caching the pages and serving over NGINX help
       | with SEO? Is there any benefits over serving a static site?
        
         | nodja wrote:
         | If your web app serves dynamic routes (i.e. client only) this
         | helps with SEO because those routes are now directly visible
         | through most crawlers.
        
           | nirvanist wrote:
           | routes yes but maybe not content, most of the time google
           | search console will show that you have no content , also
           | google ads boot need to access to content this useful to get
           | your adsense account quickly validated
        
         | jotto wrote:
         | A "static site" implies HTML rather than a JavaScript app.
         | 
         | With respect to JavaScript apps (React, Angular, etc.):
         | 
         | It's not clear these days because the major search engines
         | don't explicitly clarify whether they parse JavaScript apps (or
         | if they only parse high-ranking JS apps/sites. But 10 years ago
         | it was a must-have to be indexed.
         | 
         | One theory on pre-rendering is it reduces cost for the crawlers
         | since they don't need to spend 1-3s of CPU time pre-rendering
         | your site. And by reducing costs, it may increase chances of
         | being indexed or higher rank.
         | 
         | My hunch is that long-term, pre-rendering is not necessary for
         | getting indexed. But it is typically still necessary for URL
         | unfurls (link previews) for various social media and chat apps.
         | 
         | disclosure: I operate https://headless-render-api.com
        
           | nirvanist wrote:
           | absolutly , bots still need to access content as faster as
           | possible
        
       | colesantiago wrote:
       | It seems in the AI era SEO is now starting to become irrelevant
       | and a relic of the 2000-2020 era.
       | 
       | Why is SEO needed still needed here when AI / LLMs can just
       | conjure up answers with references to valid links, bypassing
       | search engines.
       | 
       | Even privacy based search engines like DuckDuckGo, Brave and Kagi
       | doesn't prioritise 'SEO'.
        
         | paulcole wrote:
         | In the AI era, which services provide up to date information
         | about local queries -- like which dentists near me are open
         | today?
        
           | nirvanist wrote:
           | yep AI will not return a listing.
        
           | LASR wrote:
           | Any reasonable AI product that does this would use RAG under
           | the hood.
           | 
           | I hope nobody is using ChatGPT to query for information. This
           | is how you get hallucinations.
        
             | paulcole wrote:
             | What is the reasonable AI product that actually does this?
        
         | wongarsu wrote:
         | I still use Google (and ddg and kagi), so people who want to
         | sell me stuff try to get better rankings in these search
         | machines. I'd also wager that people who primarily use LLMs to
         | answer their questions are only a rounding error.
        
         | imiric wrote:
         | How will AI tools know which sources are "valid"? It's likely
         | SEO will transform into ways of tricking bots scraping training
         | data into considering their information as being more "valid".
         | 
         | Alternative search engines must rely on AI themselves to filter
         | out good results, or some form of manual curation by humans,
         | like Kagi's boost/block/pin feature.
        
         | spinningslate wrote:
         | >Why is SEO needed still needed here when AI / LLMs can just
         | conjure up answers with references to valid links, bypassing
         | search engines.
         | 
         | In short: money. LLMs will no doubt change the implementation,
         | but the commercial dynamics are fundamentally the same. It's
         | expensive to build and run a search engine, whether
         | conventional or LLM-based. Someone has to pay for that - and
         | it's not search users _. Advertising and its derivatives have
         | become that revenue source, with all the good and bad that
         | brings with it. As long as that commercial dynamic remains,
         | there 'll be SEO or some derivative thereof.
         | 
         | --
         | 
         | _Other than Kagi - but that's a tiny niche.
        
         | 8organicbits wrote:
         | Because AI doesn't provide accurate information and you need to
         | validate it yourself? Has anyone who cared about SEO stopped
         | recently?
        
         | chrischen wrote:
         | Because AI conjures up crap like this: https://www.nops.io/blog
         | /k0s-vs-k3s-vs-k8s/#:~:text=While%20....
        
         | rgrieselhuber wrote:
         | SEO has been "dead" since the late 90s.
        
           | hipadev23 wrote:
           | Google's PageRank is why SEO became a thing, so I
           | respectfully disagree with your timeline.
        
             | rgrieselhuber wrote:
             | SEO was a thing in the 90s, I know people who were doing it
             | then.
        
               | conradfr wrote:
               | But there's still people doing it today.
        
               | rgrieselhuber wrote:
               | Yes, that's the point of my original comment.
        
         | la_fayette wrote:
         | I think we must distinguish between onpage and offpage seo.
         | This proposal is only relevant for onpage seo, for which i
         | would mostly agree with your comment. However, inbound links
         | are and will be the most important signal for search. What else
         | would be left for ranking?
        
       | mrtksn wrote:
       | It amusing how after all these years people keep building tools
       | to solve problems that didn't have to exist had people stick with
       | the original Web architecture where stuff is sent to the browser
       | in full.
       | 
       | In places where JS is truly useful, that is when the UI is more
       | than a text document, SEO is not a concern or possibility.
       | 
       | I mean, good luck I don't want to be a buzzkill but IMHO the sad
       | state of the Web and the Search engines can be traced back to SEO
       | and JavaScript misuse.
        
         | pmx wrote:
         | > In places where JS is truly useful, that is when the UI is
         | more than a text document, SEO is not a concern or possibility.
         | 
         | I don't think you're thinking about this. e-commerce is more
         | than just a document and SEO is massively important there but
         | javascript makes the user experience miles better; Think
         | product variation selectors, bulk pricing selectors, product
         | filtering, realtime cart, etc, etc. It's insane to say we
         | shouldn't use new tech so that the search engines can index us,
         | do we just forever stick with what we had 15 years ago and
         | never progress? Madness.
        
           | mrtksn wrote:
           | I should have been clearer: I'm not against JS, I'm huge fan
           | actually but I don't see the promised lands of better user
           | experience with having these overly complex architectures
           | where the UI is JS heavy and has to be pre-rendered on the
           | server side so that the Search engines can make a sense of
           | it. Larger and larger, heavier and heavier JS never delivered
           | those perfected UI experiences that Web technologist
           | promised.
           | 
           | The Web technology folks still circle around the same
           | problems like 10 years ago when the actually popular and
           | successful websites like HN and others deliver superb UX with
           | the dinosaur tech.
           | 
           | Anything that is actually using the advanced Web technologies
           | are not in the domain of SEO, stuff like true web apps like
           | Figma, Google Docs let's say. At best, you can index list of
           | content for those but that list could have been a JS-free
           | HTML rendered by PHP.
        
             | nirvanist wrote:
             | even with php you need to use caching, the deal it s to
             | provide bot quickly with content :)
        
               | mrtksn wrote:
               | Sure, my complaint wasn't targeting your project but the
               | need for it in first place. In the state of the web, this
               | is a nice tool to do some useful things.
        
           | davedx wrote:
           | A lot of people on HN are very against JavaScript and I often
           | get the feeling they're academics or don't work in the part
           | of IT with web software. The reality is huge swathes of the
           | web benefit from it whether people like it or not, this
           | project is just one example.
        
             | mrtksn wrote:
             | As I'm the subject here, no I'm not an academic but used to
             | work with Web technologies. Quit working with web
             | technologies at the time when a new JS framework was
             | hitting the home page with a promise to make everything
             | better. I quit because I lost my productivity because I got
             | carried away with all this JS stuff and find myself in
             | infinite complexity eating away my time and energy for less
             | than marginal gains on the final product. It was extremely
             | cool but utterly useless.
             | 
             | Now that I'm at the receiving end of the Web based
             | products, i assure you the experience is horrific. I don't
             | even use search engines as much since the AI went
             | mainstream and can't be happier.
             | 
             | Good riddance.
             | 
             | Of course I do enjoy things made with the advanced Web
             | technologies like Wasm, Figma for example. I'm every now
             | and then impressed by some web app that does something I
             | thought not possible before.
             | 
             | However I'm still disgusted by documents that are some
             | images and text loading tons of stuff that don't do
             | anything beyond degrading my experience by slowing down the
             | response time or act weird when interacting.
        
       | xnx wrote:
       | What's the use case? Scrape someone else's dynamic site and serve
       | it statically as your own?
        
         | digitcatphd wrote:
         | Curious
        
           | nirvanist wrote:
           | and reddit too , thank you for your feedback
        
         | nirvanist wrote:
         | for me is t not the purpose, but you can do it if you want.
         | 
         | the use case for me was that meteorjs app are pourly SEO
         | friendly and I need it to have prerindering html to serve it
         | for bots
        
         | binarymax wrote:
         | I read it as being able to dev your site in whatever you want,
         | then scrape it and publish it as a static and seo optimized
         | site.
        
           | throwoutway wrote:
           | That sounds sort like an inefficient static site generator
        
             | amadeuspagel wrote:
             | It sounds like a hack, which is after all what this website
             | is dedicated to.
        
         | janjones wrote:
         | I have done something similar when archiving a dynamic site,
         | serving it as static snapshot for free.
        
       | Lukkaroinen wrote:
       | I wonder where this is actually needed, since most React
       | frameworks support metadata with server-side rendering.
        
         | nirvanist wrote:
         | thank your for the comment while it's true that not all web
         | applications leverage the React library, it's important to note
         | that Next.js inherently supports React. However, the choice of
         | technology stack depends on the specific use case and
         | requirements of your project.
        
           | CharlesW wrote:
           | It's not about React specifically, but about whether pre-
           | rendering will make any difference from an SEO perspective.
           | 
           | Not only do most frameworks do SSR, but Google is able to
           | crawl dynamic content just fine. Here's an article from 2015
           | on the topic: https://searchengineland.com/tested-googlebot-
           | crawls-javascr...
        
             | ahoka wrote:
             | Google renders all sites in Chrome according to their
             | documentation. So why would you render it? AFAIK they even
             | penalize serving different content to their crawlers.
        
               | nirvanist wrote:
               | sure , but you want aslo your web app available for other
               | bots, and you are not serving other content it same
               | content just you avoid the js layer so it will be quickly
               | serverd, so far for my projects it worked very well
               | ,google serach console showed a better ranking , it s not
               | a new technique and other services like prerender.io do
               | the same things but not free.
        
             | nirvanist wrote:
             | Companies still struggle with seo. I've worked at two tech
             | companies so far, and both utilized third-party services
             | for ssr, the reason is that bot allow a few second to
             | render your html if it s not availbale during that time
             | frame, you can use nextjs it s a good alternative. In any
             | case, it remains useful for my own projects and it s free,
             | maybe can useful for other dev
        
               | nkg wrote:
               | Absolutely. We have a NextJS website and we faced some
               | SEO problems due to Google bot not being able to get to
               | some pages. To be fair, we used too much sliders.
        
       ___________________________________________________________________
       (page generated 2024-01-01 23:00 UTC)