[HN Gopher] CNET is deleting old articles to try to improve its ...
___________________________________________________________________
CNET is deleting old articles to try to improve its Google Search
ranking
Author : mikece
Score : 51 points
Date : 2023-08-09 21:10 UTC (1 hours ago)
(HTM) web link (www.theverge.com)
(TXT) w3m dump (www.theverge.com)
| _jal wrote:
| They're also using LLMs to write articles.
|
| I don't understand why they think I or anyone else won't just
| skip the part where they lard it up with ads and just talk to the
| robot directly.
| skilled wrote:
| Funny timing, because Google said yesterday not to do this.
|
| https://www.seroundtable.com/google-dont-delete-older-helpfu...
| winwang wrote:
| At the same time, is there any reason to trust them over
| empirical evidence ( _if_ there is evidence)?
|
| Incentives are incentivizing.
| ijustwanttovote wrote:
| > https://www.seroundtable.com/google-dont-delete-older-
| helpfu...
|
| I think older articles will eventually hold more weight in
| Google Searches. There will be a before OpenAI vs after OpenAI
| weighting.
| imchillyb wrote:
| Alternate title: "How the internet ate itself."
|
| This seems antithetical to the internet approach of 'data wants
| to be free.'
|
| Data most assuredly doesn't want to die. Why U kill data?
| ilrwbwrkhv wrote:
| One of my favourite websites growing up: download.com
| brink wrote:
| Same. I was a kid in the country, so we didn't have internet. I
| used to ride my bike with my thumb drive to the library's
| computers to see what was new on download.com at least once a
| week. It was like visiting a candy store. It's a shame those
| days are over.
| 1970-01-01 wrote:
| "It became necessary to destroy the town to save it."
| jzb wrote:
| This is some bullshit. It's bad enough that a lot of sites with
| content going back 10-20 years have linkrot or have simply gone
| offline. But I am at a loss for words that they're disappearing
| content on purpose just for SEO rankings.
|
| If this is what online publishing has come to we have seriously
| screwed up.
| kenjackson wrote:
| In fairness to them. If one of their top ways of getting
| traffic is being marked as less relevant because of older
| articles, what do you want them to do? Just continue to lose
| money because Google can't rank them appropriately?
| slater wrote:
| Or mayhap Google et al could realize and understand that
| having old articles in archive shouldn't penalize your
| ranking.
| ta988 wrote:
| move the old articles to another domain or non indexed pages.
| kenjackson wrote:
| That's what the article says they're doing.
| 1123581321 wrote:
| I'd rather see them address it with a more complicated
| approach that preserves the old content. For example, they
| could move it to an unindexed search archive.
| ghaff wrote:
| While it would be nice to believe that search engines solve the
| issue automagically, there are a lot of reasons why organizations
| want to reduce they amount of old/outdated/stale information
| that's served to searchers. I know where I work, we're constantly
| deleting older material (at least for certain types of content--
| which news and quasi-news pubs doesn't necessarily fall into).
| flangola7 wrote:
| What reasons would there to be to delete old new articles?
|
| My inner historian is screaming.
| ghaff wrote:
| Companies are not, in general, in the business of being the
| archive of record. They're, among other things, in the
| business of providing their current customers with the
| information that's most applicable and correct for their
| current needs and not something relevant to a completely
| different version of software from 10 years ago.
| [deleted]
| Espionage724 wrote:
| I haven't visited a CNET page in years and didn't know they were
| even still relevant :p
|
| The times I have visited CNET pages in the past was to find
| specific information. If such information happens to be in
| deleted articles, that would reduce my interactions with CNET in
| the future.
|
| I think they should archive the old articles or even offload them
| to Wayback willingly, but it's possible some of the articles
| they're purging aren't worthwhile. If I write up an article about
| a cat playing on a scratching post, there's a good chance there's
| nothing unique or valuable about it and it doesn't need to sit
| around gathering bit rot :p
| sydbarrett74 wrote:
| From the article:
|
| 'Stories slated to be "deprecated" are archived using the
| Internet Archive's Wayback Machine, and authors are alerted at
| least 10 days in advance, according to the memo.'
| LordShredda wrote:
| Offloading trash to an already strained organisation. are
| they donating any help or is it just taking advantage of
| charity?
| Espionage724 wrote:
| Lmao there's no winning with you people :p
|
| Don't worry, Wayback is capable of telling CNET where to go
| if they were really concerned. If anything they should be
| thankful for the newly-generated interest that a major
| company would use them instead of some randoms cherry-
| picking sites for arguments.
| jzb wrote:
| Article says that's what they're doing. It says exactly that.
|
| Doesn't really make things better, IMO, but they are at least
| doing that.
| tenpies wrote:
| > I haven't visited a CNET page in years and didn't know they
| were even still relevant :p
|
| I had the same feeling about the Verge.
|
| It feels like going to Britannica online to read about what
| Encarta encyclopaedia was and why it came in something called a
| CD-ROM.
|
| ---
|
| E: Awesome: https://www.britannica.com/topic/Encarta
| crazygringo wrote:
| Is there any evidence this would even work?
|
| Surely Google determines "fresh, relevant" content according to
| whatever has recently been published, which this doesn't change.
| If anything, doesn't Google consider sites with a long history of
| content with tons of inbound links as _more_ authoritative and
| therefore higher-ranked?
|
| This baffles me. It baffles me why this would be successful SEO
| -- and assuming that it actually isn't, it baffles me why CNET
| thinks it _would_ be.
| burnhamup wrote:
| The theory I've heard is related to 'crawl budget'. Google is
| only going to devote a finite amount of time to indexing your
| site. If the number of articles on your site exceeds that time,
| some portion of your site won't be indexed. So by 'pruning'
| undesirable pages, you might boost attention on the articles
| you want indexed. No clue how this ends up working in practice.
|
| Google's suggestion isn't to delete pages, but maybe mark some
| pages with a no index header.
|
| https://developers.google.com/search/docs/crawling-indexing/...
| nevi-me wrote:
| It could be better to opt those articles out of the crawler.
| Unless that's more effort. If articles included the year and
| month in the URL prefix, I would disallow /201* instead.
| 0cf8612b2e1e wrote:
| Even if that rule were true, why wouldn't everything in the
| say, top NNN internet sites get an exemption? It is the
| Internet's most hit content, why would it not be exhaustively
| indexed?
|
| Alternatively, other than ads, what is changing on a CNN
| article from 10 years ago? Why would that still be getting
| daily scans?
| kenjackson wrote:
| That's a good point about the static nature of some pages.
| Is there any way to tell a crawler to crawl this page, but
| after this date don't crawl again, but keep anything you
| previously crawled.
| bhandziuk wrote:
| CNET* not CNN. But everything you say is still true.
| tedunangst wrote:
| How does Wikipedia manage to remain indexed?
| SoftTalker wrote:
| Perhaps sites with a small ratio of new:total content would be
| downranked --- but I really don't think that makes sense
| because that's going to be the case for any long-established
| site.
| laweijfmvo wrote:
| I have noticed some articles (and not just "Best XXX of 202Y"
| articles) that seem to always update their "Updated on" date
| which Google unhelpfully picks up and shows in search results
| leading me to think the page is much more recent than it is.
| aliasxneo wrote:
| I've been curious about how they are doing this. It seems to
| be an increasing trend and is making the query mostly
| useless.
| loandbehold wrote:
| That's what I moved to ChatGPT4 for most of my info queries.
| Google is badly compromised by SEO. Plus ChatGPT4 gives answer
| right away within having to go through multiple results and
| search within each page for specific line I need.
| JRKrause wrote:
| Right there with you. I've gotten so used to having it give me
| exactly the answer to my specific question that, when I must
| fall back to traditional search, it's noticeably unpleasant.
| barbariangrunge wrote:
| Gpt always an answer for you. Even if it's made up!
| pavel_lishin wrote:
| Heck, sometimes that answer is even correct!
| 6510 wrote:
| Think of it like floating point logic.
| guy98238710 wrote:
| I have been recently forking off a subproject from Git repo.
| After spending a lot of time messing around with it and
| getting into a lot of unforeseen trouble, I finally asked
| ChatGPT how to do it and of course ChatGPT knew the correct
| answer all along. I felt like an idiot. Now I always ask
| ChatGPT first. These LLMs are way smarter than you would
| think.
|
| GPT4 with WolframAlpha plugin even gave me enough information
| to implement Taylor polynomial approximation for Gaussian
| function (don't ask why I needed that), which would have
| otherwise taken me hours of studying if I could even solve it
| at all.
| petee wrote:
| I've seen a couple news sources that are altering their publish
| dates to show near the top of news feeds. Google will announce "3
| hours old", despite being weeks old.
| [deleted]
| throwitawayfam wrote:
| Reddit does this and it's very frustrating
| mnholt wrote:
| Wow, I thought that was just me. The elation of googling for
| a niche Reddit topic and finding one very recent only to
| reach that sad realization that the content is 11y old and
| likely not relevant anymore.
| laweijfmvo wrote:
| Yup, "Last Updated on xxx" but no obvious updates called out in
| the article
| jprd wrote:
| Bad. Antithetical to both Google's original ideals and the early
| 'netzien goals.
|
| Google's deteriorating performance shouldn't result in deleting
| valuable historical viewpoints, journalistic trends and research
| material just to raise your newly AI-generated sh1t to the top of
| the trash fire.
___________________________________________________________________
(page generated 2023-08-09 23:00 UTC)