[HN Gopher] Loss of nearly a full decade of information from ear...
       ___________________________________________________________________
        
       Loss of nearly a full decade of information from early days of
       Chinese internet
        
       Author : cubefox
       Score  : 161 points
       Date   : 2024-06-01 16:13 UTC (6 hours ago)
        
 (HTM) web link (chinamediaproject.org)
 (TXT) w3m dump (chinamediaproject.org)
        
       | Ajay-p wrote:
       | _Written by He Jiayan (He Jia Yan ), an internet influencer
       | active since 2018, the essay concluded, based on a wide range of
       | searches of various entertainment and cultural figures from the
       | late 1990s through the mid-2000s, that nearly 100 percent of
       | content from major internet portals and private websites from the
       | first decade of China's internet has now been obliterated._
        
       | lelandfe wrote:
       | > _Within the selected date range of "May 22, 1998 to May 22,
       | 2005" on Baidu, there is just one positive result for "Jack Ma"
       | (dated May 22, 2024). [..] Click on the result and you'll find it
       | is an article posted in 2021_
       | 
       | US Google: About 2,580,000 results
       | 
       | A pretty remarkable scrubbing of history.
        
         | Arnavion wrote:
         | https://www.google.com/search?q=jack+ma&tbs=cdr%3A1%2Ccd_min...
         | 
         | First result for me is https://www.scb.co.th/en/personal-
         | banking/stories/business-m... which Google thinks is from
         | 2003-03-15, except it mentions COVID-19 so it obviously isn't.
         | 
         | Second result is https://www.instagram.com/jack_overpower/feed/
         | which Google thinks is from 2001-01-02, except Instagram didn't
         | exist at that time. It might have pictures from 2001 though.
         | 
         | Third result is http://pacificpower.foreignpolicy.com/15-jack-
         | ma/ which Google thinks is from 1999-02-15, except it mentions
         | Alibaba's 2014 IPO so it obviously isn't.
         | 
         | Fourth result is
         | https://www.facebook.com/story.php/?story_fbid=5041357966634...
         | which Google doesn't show a date for, but it's a Facebook post
         | from 2018.
         | 
         | ...
         | 
         | I don't doubt that some of those results are from 1998 to 2005,
         | but the millions of results number specifically is meaningless.
        
           | prophesi wrote:
           | Yep; there may be a lack of incentive to preserve old sites,
           | but what's worse are the ranking algorithms that prevent
           | their discoverability in the first place.
        
             | ccgreg wrote:
             | Both the Internet Archive and Common Crawl have tools that
             | reveal actual crawl dates. Search engines are not really
             | intended to be archives, so it's no surprise that they
             | aren't very good archives.
        
             | bbarnett wrote:
             | Not really prevented, the _huge_ one is http sites being
             | down ranked heavily by google.
             | 
             | But they are still there. Do a specific enough search and
             | they'll be at the top of the search results.
        
           | rasz wrote:
           | Google has perfect vision of the past (didnt latest leak
           | confirm they keep everything crawled indefinitely and have
           | extensive historical records for all domains?) but zero
           | incentive for redirecting you to old websites with no
           | advertisements.
        
             | MichaelZuo wrote:
             | This is false many old forums are only sporadically indexed
             | by Google even if you do verbatim text searches using the
             | site:... modifier.
        
           | boomboomsubban wrote:
           | The "custom range" feature simultaneously feels broken, gamed
           | by spammers, and intentionally being scrubbed. I'm surprised
           | they haven't completely removed it yet.
        
           | Ylpertnodi wrote:
           | >except it mentions COVID-19 so it obviously isn't.
           | 
           | Perhaps it was just updated?
           | 
           | I generally ignore/ get annoyed by articles that don't have a
           | date/ updated on, on the byline.
        
             | mycall wrote:
             | Sometimes you can find the date embedded inside the source
             | asset files.
        
         | asdasdsddd wrote:
         | There's pretty much nothing in that time range on Baidu, I
         | looked up Mao, George Washington, Yue fei (a popular chinese
         | folk hero), Garlic bread, etc.
         | 
         | But without the time filter, theres millions of search results.
        
           | tw04 wrote:
           | It's probably easier to just blanket scrub everything beyond
           | a small set of allowed information (like positive articles
           | about the party) than to selectively delete. Why do they care
           | if valuable information is lost?
        
       | mensetmanusman wrote:
       | Par for the course.
       | 
       | China as we (the world) knows it is only about 60 years old. This
       | is more true as they go about systematically destroying their own
       | history and forcing village traditions to be stamped out and
       | guided towards the city life.
       | 
       | Losing a blip of internet history during the regime of mass
       | censorship is probably a blessing in disguise.
        
       | wumeow wrote:
       | > Posted on Wednesday, May 22, He's post had been removed from
       | WeChat by the following day, yielding a 404 message that read:
       | "This content violates regulations and cannot be viewed."
        
         | actionfromafar wrote:
         | He will be educated.
        
       | alephnerd wrote:
       | Is original MIT BBS still archived? I haven't used it for
       | sometime.
        
       | Cheer2171 wrote:
       | There are a few commenters in this thread making blatant false
       | equivalence with the Western internet. This post is about how on
       | major search engines in China, you now set the years to 1998-2005
       | and search for a non-controversial celebrity and you get zero
       | search results from content actually published in that era.
       | 
       | The loss of the early web due to web hosters not maintaining
       | their own hosting and moving to walled gardens is painful and
       | tragic, but it is not in any way similar (or functionally
       | equivalent) to this blatant censorship.
        
         | anonzzzies wrote:
         | Yep... Only archive.org has it sometimes and then you need to
         | search there because you won't find it via others.
        
           | Cheer2171 wrote:
           | But for the Western internet, it disappears because the
           | people hosting those websites gave up, so all we have is
           | archive.org. With this case, there appears to be a
           | government-level purge.
        
             | ccgreg wrote:
             | The western Internet has a bunch of government archives, in
             | addition to the Internet Archive and Common Crawl.
             | 
             | Many of the government archives are not public for
             | copyright reasons.
        
         | wumeow wrote:
         | Yes, this is like if nothing turned up for Bill Gates when you
         | did a search for pre-2006 material.
        
       | lostemptations5 wrote:
       | Intentionally or not this is -- exactly -- what 1984 is all
       | about: changing our perception of history by rewriting or erasing
       | previous writings.
       | 
       | Unfortunately alot of it from the article seems typical: blogs
       | going off line as bloggers move to new technolgoies, social media
       | companies going defunct or just not keeping old content.
       | 
       | Alot of these things can happen in the west. Remember these books
       | you could read? "The Feynman Letters", etc. I'm paraphrasing--
       | but its impossible now.
       | 
       | Think of this: emails? A person dies and their laptop dies or is
       | disposed of -- they're all gone. In the past the physicality of
       | the letters would persist. Not so now. All this correspondence
       | vanishes.
       | 
       | Facebook, are you kidding me? If someone famous thought to export
       | their data -- and it can be found on a laptop still working (and
       | you have the login password), then maybe. See above. This repeats
       | and repeats for each system we interact with for communication.
       | 
       | Aside from the laptop scenario-- all this is lost. We live now in
       | a blackhole of historical details of information, and soon to be
       | replaced by a fabricated history hallucinated by LLMs perhaps.
       | 
       | Those that love historical understanding should be very worried.
        
         | Cheer2171 wrote:
         | Another false equivalence. "Intentionally or not" actually
         | really matters here. It took work to maintain archives in the
         | pre-digital era, and it takes work to maintain archives in the
         | digital era. So many of those physical letters were lost,
         | rotted, burned, etc.
         | 
         | This is a purge, not a failure to maintain archives. This is
         | like when during the Cultural Revolution, they literally burned
         | archives and letters by intellectuals.
        
           | bloomingeek wrote:
           | I love your replay, your answer is the near perfect summing
           | up of the issue! My view is some here in America are starting
           | to get too lenient towards Russia and other authoritarian
           | states. Do we not understand that these states want complete
           | control and don't care how they get it? Information and
           | educational purges are two of many ways this is done. After
           | that, it gets dirty.
           | 
           | Rule of thumb, if the Constitution says it stinks, it does.
           | If we don't like something in it, work for a change. In China
           | and Russia they don't have that right.
        
         | jimbob45 wrote:
         | Should the rewritten history still be preserved as history
         | then?
        
         | demosthanos wrote:
         | > Posted on Wednesday, May 22, He's post had been removed from
         | WeChat by the following day, yielding a 404 message that read:
         | "This content violates regulations and cannot be viewed."
         | 
         | You don't get your comments censored by commenting about
         | natural entropy on the internet. You do get your comments
         | censored by drawing attention to the censors.
         | 
         | I get very tired of people drawing false equivalences between
         | organic human behaviors in the West and intentional abuse by
         | central authorities in China. We can and should do more to
         | preserve our history in the West, but we are already preserving
         | orders of magnitude more data per person than any of our
         | ancestors could have dreamed of. There's no comparison between
         | emails getting lost when someone dies and centralized censors
         | actively purging old content to make it easier to change the
         | party's narrative.
        
           | pessimizer wrote:
           | > I get very tired of people drawing false equivalences
           | between organic human behaviors in the West
           | 
           | I get tired of people referring to intelligence agencies as
           | organic human behaviors. The farming of anti-whatever the
           | DoD, administration, or intelligence agencies dislike is
           | anything but organic, and has had millions to billions of
           | dollars of the budget assigned to it since WWII.
           | 
           | Forums like this are completely helpless against it. All
           | anyone has to do is farm a few accounts to flag the mildest
           | mention of this out of existence, and to upvote the most
           | obtuse, simplistic anti-enemy animus to the top.
           | 
           | Very few actual people are fooled by this. The US is
           | _jealous_ of the control China has over the discussions its
           | citizens have, and is closing the gap quickly and
           | dishonestly. I 'm jealous of the fact that Chinese people
           | speaking out of the government-range just get deleted, rather
           | than patronized.
        
             | trealira wrote:
             | I don't see how your comment relates to the parent's
             | comment; however, here's a reply.
             | 
             | > All anyone has to do is farm a few accounts to flag the
             | mildest mention of this out of existence, and to upvote the
             | most obtuse, simplistic anti-enemy animus to the top.
             | 
             | Have you considered that the negative sentiment against
             | Russia and China is genuine? I know of no evidence that the
             | DoD has shills or bots upvoting pro-US-government comments
             | and downvoting other ones. People probably just read the
             | news and form their opinions that way, and there's a
             | variety of different news sources with many different
             | perspectives, which don't get censored.
             | 
             | > I'm jealous of the fact that Chinese people speaking out
             | of the government-range just get deleted, rather than
             | patronized.
             | 
             | It's strange to be jealous of them not having protection
             | from government censorship.
        
           | WalterBright wrote:
           | I've love to have a single letter from some of my ancestors.
        
             | Natsu wrote:
             | I have one, actually, from my grandpa's generation. He told
             | another family member about his time growing up in the
             | early 1900s, riding trolleys and eating Walnettos (a
             | strange Walnut-based candy bar). Then the Spanish Flu came
             | around and the eldest sister just died at the breakfast
             | table one day. Later, the family rallied together to care
             | for each other after his father lost his job due to
             | automation. He moved on to doing odd jobs, then later fell
             | off the roof and broke his back, ending up as an invalid
             | for the rest of his days. They talked about the cherry
             | trees they used to feed themselves, which explains
             | grandma's fondness for the cherry soup I hated so much, and
             | how my grandma and grandpa got married and took care of
             | great grandpa while he was invalid.
             | 
             | They also talked about how Wonder Bread (the original
             | sliced bread and origin of the phrase "best thing since
             | sliced bread") came into town and the eldest son went to
             | work for them to support the family after the local baker
             | he had worked for folded, lost a finger to the machinery.
             | At some point, he had some kind of heated dispute at work
             | due to this, was beaten by security, and as I'm told, died
             | from injuries sustained during that beating some time
             | afterwards.
             | 
             | It was a weird little window into bits of family history
             | that would have otherwise been erased.
        
           | yorwba wrote:
           | The original post was about natural entropy on the internet.
           | Websites from 2005 that have disappeared or been redesigned
           | so that you can't find their old content anymore, and the
           | uselessness of search engines, domestic or foreign, for date
           | range queries reaching that far back into the past. Even on
           | the Internet Archive, the earliest working snapshot of Baidu
           | Tieba is from 2006.
           | 
           | You may think that it's impossible for an innocuous post to
           | get censored unless it has inadvertently unmasked a
           | conspiracy to bury the past, but censorship decisions also
           | get made to prevent unwanted _reactions_. If a post about
           | disappearing content inspires people to complain about
           | censorship, that 's enough to suppress it.
           | 
           | If the disappearance of old websites were entirely
           | deliberate, you'd also need to explain why the West is in on
           | it.
        
             | demosthanos wrote:
             | > The original post was about natural entropy on the
             | internet.
             | 
             | The post by He Jiayan was, but that post was taken down for
             | violating regulations. TFA is largely about the censorship
             | angle which He Jiayan specifically avoided talking about
             | (not that it helped him).
             | 
             | > If the disappearance of old websites were entirely
             | deliberate, you'd also need to explain why the West is in
             | on it.
             | 
             | Name one figure who was prominent in between 1995-2005 who
             | you can't find any content about from that era when using
             | Google's date filters. A single figure.
             | 
             | Some sites go down organically. It happens. _Every_ site
             | that references a figure who was once favored and is now
             | out of favor? That doesn 't happen in the Western internet.
        
         | yterdy wrote:
         | Recently: Google refuses to turn up old pages. I was recently
         | searching for a person who used to have a notable web presence
         | before passing away about a decade ago. I had to dig to find a
         | few links, through DDG and Yandex.
        
           | flir wrote:
           | Yandex is getting more and more of my web queries lately.
           | There's a definite irony there.
        
             | netsharc wrote:
             | Google and Bing (so DuckDuckGo as well) seem to like
             | searching for synonyms of search terms and returning the
             | most popular results, thinking popular means relevant. I
             | remember looking for something where I remembered the exact
             | terms and not getting anywhere with them, but on Yandex it
             | was the first hit.
        
         | jncfhnb wrote:
         | I would guess that 99.9% of letters are destroyed
        
         | akira2501 wrote:
         | > In the past the physicality of the letters would persist
         | 
         | I'm willing to bet that these physical letters have
         | historically fared about as well as our digital letters are;
         | otherwise, our world would be absolutely filled with the
         | written detritus of the past.
         | 
         | > Those that love historical understanding should be very
         | worried.
         | 
         | As humans we've always disposed of more than we've kept. It's
         | just not worth the energy cost to operate any other way.
         | Thankfully history is recorded as several overlapping
         | collections and not as a series of single data points.
        
         | abecedarius wrote:
         | Tangential, but what is "The Feynman Letters" here? I know of a
         | book of some of his letters, but not about censorship/loss
         | thereof.
        
       | ck2 wrote:
       | China is too easy of an example of rewriting history by political
       | will.
       | 
       | In North Korea it is illegal to mention famine or hunger.
       | 
       | In Florida it is illegal to mention climate change in any state
       | document.
        
         | tromp wrote:
         | > In Florida it is illegal to mention climate change in any
         | state document.
         | 
         | citation needed. Oh, found one:
         | https://www.miamiherald.com/news/state/florida/article129837...
         | ( https://archive.is/P9k4m )
         | 
         | > DEP officials have been ordered not to use the term "climate
         | change" or "global warming" in any official communications,
         | emails, or reports
         | 
         | I'm not sure that amounts to illegal, but they did at least
         | make it career impairing. Would be interesting to see someone
         | sue for wrongful termination on that basis...
        
       ___________________________________________________________________
       (page generated 2024-06-01 23:01 UTC)