[HN Gopher] CNET is deleting old articles to try to improve its ...
       ___________________________________________________________________
        
       CNET is deleting old articles to try to improve its Google Search
       ranking
        
       Author : mikece
       Score  : 51 points
       Date   : 2023-08-09 21:10 UTC (1 hours ago)
        
 (HTM) web link (www.theverge.com)
 (TXT) w3m dump (www.theverge.com)
        
       | _jal wrote:
       | They're also using LLMs to write articles.
       | 
       | I don't understand why they think I or anyone else won't just
       | skip the part where they lard it up with ads and just talk to the
       | robot directly.
        
       | skilled wrote:
       | Funny timing, because Google said yesterday not to do this.
       | 
       | https://www.seroundtable.com/google-dont-delete-older-helpfu...
        
         | winwang wrote:
         | At the same time, is there any reason to trust them over
         | empirical evidence ( _if_ there is evidence)?
         | 
         | Incentives are incentivizing.
        
         | ijustwanttovote wrote:
         | > https://www.seroundtable.com/google-dont-delete-older-
         | helpfu...
         | 
         | I think older articles will eventually hold more weight in
         | Google Searches. There will be a before OpenAI vs after OpenAI
         | weighting.
        
       | imchillyb wrote:
       | Alternate title: "How the internet ate itself."
       | 
       | This seems antithetical to the internet approach of 'data wants
       | to be free.'
       | 
       | Data most assuredly doesn't want to die. Why U kill data?
        
       | ilrwbwrkhv wrote:
       | One of my favourite websites growing up: download.com
        
         | brink wrote:
         | Same. I was a kid in the country, so we didn't have internet. I
         | used to ride my bike with my thumb drive to the library's
         | computers to see what was new on download.com at least once a
         | week. It was like visiting a candy store. It's a shame those
         | days are over.
        
       | 1970-01-01 wrote:
       | "It became necessary to destroy the town to save it."
        
       | jzb wrote:
       | This is some bullshit. It's bad enough that a lot of sites with
       | content going back 10-20 years have linkrot or have simply gone
       | offline. But I am at a loss for words that they're disappearing
       | content on purpose just for SEO rankings.
       | 
       | If this is what online publishing has come to we have seriously
       | screwed up.
        
         | kenjackson wrote:
         | In fairness to them. If one of their top ways of getting
         | traffic is being marked as less relevant because of older
         | articles, what do you want them to do? Just continue to lose
         | money because Google can't rank them appropriately?
        
           | slater wrote:
           | Or mayhap Google et al could realize and understand that
           | having old articles in archive shouldn't penalize your
           | ranking.
        
           | ta988 wrote:
           | move the old articles to another domain or non indexed pages.
        
             | kenjackson wrote:
             | That's what the article says they're doing.
        
           | 1123581321 wrote:
           | I'd rather see them address it with a more complicated
           | approach that preserves the old content. For example, they
           | could move it to an unindexed search archive.
        
       | ghaff wrote:
       | While it would be nice to believe that search engines solve the
       | issue automagically, there are a lot of reasons why organizations
       | want to reduce they amount of old/outdated/stale information
       | that's served to searchers. I know where I work, we're constantly
       | deleting older material (at least for certain types of content--
       | which news and quasi-news pubs doesn't necessarily fall into).
        
         | flangola7 wrote:
         | What reasons would there to be to delete old new articles?
         | 
         | My inner historian is screaming.
        
           | ghaff wrote:
           | Companies are not, in general, in the business of being the
           | archive of record. They're, among other things, in the
           | business of providing their current customers with the
           | information that's most applicable and correct for their
           | current needs and not something relevant to a completely
           | different version of software from 10 years ago.
        
       | [deleted]
        
       | Espionage724 wrote:
       | I haven't visited a CNET page in years and didn't know they were
       | even still relevant :p
       | 
       | The times I have visited CNET pages in the past was to find
       | specific information. If such information happens to be in
       | deleted articles, that would reduce my interactions with CNET in
       | the future.
       | 
       | I think they should archive the old articles or even offload them
       | to Wayback willingly, but it's possible some of the articles
       | they're purging aren't worthwhile. If I write up an article about
       | a cat playing on a scratching post, there's a good chance there's
       | nothing unique or valuable about it and it doesn't need to sit
       | around gathering bit rot :p
        
         | sydbarrett74 wrote:
         | From the article:
         | 
         | 'Stories slated to be "deprecated" are archived using the
         | Internet Archive's Wayback Machine, and authors are alerted at
         | least 10 days in advance, according to the memo.'
        
           | LordShredda wrote:
           | Offloading trash to an already strained organisation. are
           | they donating any help or is it just taking advantage of
           | charity?
        
             | Espionage724 wrote:
             | Lmao there's no winning with you people :p
             | 
             | Don't worry, Wayback is capable of telling CNET where to go
             | if they were really concerned. If anything they should be
             | thankful for the newly-generated interest that a major
             | company would use them instead of some randoms cherry-
             | picking sites for arguments.
        
         | jzb wrote:
         | Article says that's what they're doing. It says exactly that.
         | 
         | Doesn't really make things better, IMO, but they are at least
         | doing that.
        
         | tenpies wrote:
         | > I haven't visited a CNET page in years and didn't know they
         | were even still relevant :p
         | 
         | I had the same feeling about the Verge.
         | 
         | It feels like going to Britannica online to read about what
         | Encarta encyclopaedia was and why it came in something called a
         | CD-ROM.
         | 
         | ---
         | 
         | E: Awesome: https://www.britannica.com/topic/Encarta
        
       | crazygringo wrote:
       | Is there any evidence this would even work?
       | 
       | Surely Google determines "fresh, relevant" content according to
       | whatever has recently been published, which this doesn't change.
       | If anything, doesn't Google consider sites with a long history of
       | content with tons of inbound links as _more_ authoritative and
       | therefore higher-ranked?
       | 
       | This baffles me. It baffles me why this would be successful SEO
       | -- and assuming that it actually isn't, it baffles me why CNET
       | thinks it _would_ be.
        
         | burnhamup wrote:
         | The theory I've heard is related to 'crawl budget'. Google is
         | only going to devote a finite amount of time to indexing your
         | site. If the number of articles on your site exceeds that time,
         | some portion of your site won't be indexed. So by 'pruning'
         | undesirable pages, you might boost attention on the articles
         | you want indexed. No clue how this ends up working in practice.
         | 
         | Google's suggestion isn't to delete pages, but maybe mark some
         | pages with a no index header.
         | 
         | https://developers.google.com/search/docs/crawling-indexing/...
        
           | nevi-me wrote:
           | It could be better to opt those articles out of the crawler.
           | Unless that's more effort. If articles included the year and
           | month in the URL prefix, I would disallow /201* instead.
        
           | 0cf8612b2e1e wrote:
           | Even if that rule were true, why wouldn't everything in the
           | say, top NNN internet sites get an exemption? It is the
           | Internet's most hit content, why would it not be exhaustively
           | indexed?
           | 
           | Alternatively, other than ads, what is changing on a CNN
           | article from 10 years ago? Why would that still be getting
           | daily scans?
        
             | kenjackson wrote:
             | That's a good point about the static nature of some pages.
             | Is there any way to tell a crawler to crawl this page, but
             | after this date don't crawl again, but keep anything you
             | previously crawled.
        
             | bhandziuk wrote:
             | CNET* not CNN. But everything you say is still true.
        
           | tedunangst wrote:
           | How does Wikipedia manage to remain indexed?
        
         | SoftTalker wrote:
         | Perhaps sites with a small ratio of new:total content would be
         | downranked --- but I really don't think that makes sense
         | because that's going to be the case for any long-established
         | site.
        
         | laweijfmvo wrote:
         | I have noticed some articles (and not just "Best XXX of 202Y"
         | articles) that seem to always update their "Updated on" date
         | which Google unhelpfully picks up and shows in search results
         | leading me to think the page is much more recent than it is.
        
           | aliasxneo wrote:
           | I've been curious about how they are doing this. It seems to
           | be an increasing trend and is making the query mostly
           | useless.
        
       | loandbehold wrote:
       | That's what I moved to ChatGPT4 for most of my info queries.
       | Google is badly compromised by SEO. Plus ChatGPT4 gives answer
       | right away within having to go through multiple results and
       | search within each page for specific line I need.
        
         | JRKrause wrote:
         | Right there with you. I've gotten so used to having it give me
         | exactly the answer to my specific question that, when I must
         | fall back to traditional search, it's noticeably unpleasant.
        
           | barbariangrunge wrote:
           | Gpt always an answer for you. Even if it's made up!
        
         | pavel_lishin wrote:
         | Heck, sometimes that answer is even correct!
        
           | 6510 wrote:
           | Think of it like floating point logic.
        
           | guy98238710 wrote:
           | I have been recently forking off a subproject from Git repo.
           | After spending a lot of time messing around with it and
           | getting into a lot of unforeseen trouble, I finally asked
           | ChatGPT how to do it and of course ChatGPT knew the correct
           | answer all along. I felt like an idiot. Now I always ask
           | ChatGPT first. These LLMs are way smarter than you would
           | think.
           | 
           | GPT4 with WolframAlpha plugin even gave me enough information
           | to implement Taylor polynomial approximation for Gaussian
           | function (don't ask why I needed that), which would have
           | otherwise taken me hours of studying if I could even solve it
           | at all.
        
       | petee wrote:
       | I've seen a couple news sources that are altering their publish
       | dates to show near the top of news feeds. Google will announce "3
       | hours old", despite being weeks old.
        
         | [deleted]
        
         | throwitawayfam wrote:
         | Reddit does this and it's very frustrating
        
           | mnholt wrote:
           | Wow, I thought that was just me. The elation of googling for
           | a niche Reddit topic and finding one very recent only to
           | reach that sad realization that the content is 11y old and
           | likely not relevant anymore.
        
         | laweijfmvo wrote:
         | Yup, "Last Updated on xxx" but no obvious updates called out in
         | the article
        
       | jprd wrote:
       | Bad. Antithetical to both Google's original ideals and the early
       | 'netzien goals.
       | 
       | Google's deteriorating performance shouldn't result in deleting
       | valuable historical viewpoints, journalistic trends and research
       | material just to raise your newly AI-generated sh1t to the top of
       | the trash fire.
        
       ___________________________________________________________________
       (page generated 2023-08-09 23:00 UTC)