hngopher.com

       [HN Gopher] Reflections as the Internet Archive turns 25
       ___________________________________________________________________
        
       Reflections as the Internet Archive turns 25
        
       Author : ingve
       Score  : 339 points
       Date   : 2021-07-22 03:04 UTC (19 hours ago)
        
 (HTM) web link (blog.archive.org)
 (TXT) w3m dump (blog.archive.org)
        
       | DonHopkins wrote:
       | A search of youtube for "wayback machine" produces pages of stuff
       | about the Internet Archive, and only the 24th result has anything
       | to do with the origin of the term.
       | 
       | People who didn't spend their Saturday mornings glued in front of
       | the TV screen as a child of the 1970's might not remember how
       | American kids learned about history back then:
       | 
       | Peabody's Improbable History - Surrender of Cornwallis
       | 
       | Peabody and Sherman travel back to October 19, 1781 to witness
       | when Cornwallis surrendered for Washington. However, when they
       | got there, then he didn't show up.
       | 
       | https://www.youtube.com/watch?v=3E8zmaOiCVw&ab_channel=bullw...
        
         | pwdisswordfish0 wrote:
         | FYI, it's not nowadays obscure. There's a current series _The
         | Mr. Peabody & Sherman Show_ on Netflix, and Hollywood made a
         | Mr. Peabody and Sherman movie (by Dreamworks) in 2014.
        
         | ConceptJunkie wrote:
         | Actually, "The Bullwinkle Show" premiered in 1959, but I
         | discovered it in the early 70s, as I suspect you did.
        
       | ZeroGravitas wrote:
       | This is a bit of a tangent, but there was mention of non-
       | advertising based funding models.
       | 
       | Is anyone working on an advertising model that achieves the basic
       | goal of advertising, but without the centralising aspect which
       | seems to be the root cause of many of the issues? Giant
       | monopolies are always going to subvert regulation but the same
       | industry as disconnected units might be easier to police.
       | Obviously you can just try to split them up or limit their size
       | with regulation after the fact but a good technical basis might
       | help out.
       | 
       | Does web advertising just not make sense unless you can amass
       | lots of private user data and track people across the web? If so
       | can we subcontract that data to smaller companies we can trust
       | with our data and effectively punish if they break the rules?
        
         | coldpie wrote:
         | > Does web advertising just not make sense unless you can amass
         | lots of private user data and track people across the web?
         | 
         | It does make sense, but it has to compete with the invasive-
         | style advertising, and it will always lose. If you want the
         | "good advertising" you have to kill the "bad advertising".
        
       | akkartik wrote:
       | I wish IA hosted Usenet archives. Even if they stopped at year
       | 2000 and didn't update further.
        
         | generationP wrote:
         | They host some: https://archive.org/details/usenet
         | 
         | As far as my own old posts are concerned, it looks complete :)
         | But it isn't easily findable or searchable; the intended way of
         | interaction is apparently to download an entire hierarchy and
         | grep.
        
           | akkartik wrote:
           | Oh good to know! That's adequate.
        
       | soheil wrote:
       | For anyone interested I made a quick and dirty way to pull up the
       | archive of any URL by prefixing it with arxiv.link
       | 
       | e.g.,
       | 
       | http://arxiv.link/https://news.ycombinator.com/
        
         | johtso wrote:
         | Is there an advantage compared to just putting
         | `web.archive.org/web/` before the url?
        
           | soheil wrote:
           | Didn't know that, thanks. I guess just shorter.
        
           | fouc wrote:
           | Oh interesting,
           | http://web.archive.org/web/https://news.ycombinator.com also
           | works, didn't know that.
        
         | ipsum2 wrote:
         | a little confusing since arxiv refers to a popular research
         | paper archive. Useful project though!
        
         | fouc wrote:
         | Nice, thanks! Ever since I started using archive.is last year,
         | I've always wanted something like this! Could be paired with a
         | bookmarklet too.
         | 
         | javascript:(function(){window.open('http://arxiv.link/'+locatio
         | n.href)})();
        
           | alexislours wrote:
           | You have always been able to do this with archive.is, I've
           | been using this bookmarklet for a quite some time.
           | 
           | javascript: (() => { window.open("https://archive.is/" +
           | window.location.href, '_blank')})();
        
             | codethief wrote:
             | archive.is is not related to the Internet Archive's Wayback
             | Machine, though, is it?
        
               | fouc wrote:
               | It's not related, but it will give you a link there if it
               | doesn't have the page archived.
        
             | fouc wrote:
             | I meant I wanted to go directly to the most recent result
             | in the wayback engine for the given page.
        
               | alexislours wrote:
               | In this case you can just use the following and it will
               | redirect on the latest archive without being dependant on
               | a 3rd party website:
               | 
               | javascript: (() => {
               | window.open("https://web.archive.org/web/" +
               | window.location.href, '_blank')})();
        
       | spiritplumber wrote:
       | The Internet Archive location is beautiful. it's a church that
       | has been partially turned into a server farm. Big Neuromancer
       | energy when you go inside and look.
        
         | N3cr0ph4g1st wrote:
         | Did a hackathon there in 2016, it is so cool!
        
         | DoingIsLearning wrote:
         | I was too curious not to look this up:
         | 
         | https://www.atlasobscura.com/places/internet-archive-headqua...
        
       | [deleted]
        
       | Thoreandan wrote:
       | It's a shame that the mirror in Alexandria appears to be long
       | abandoned.
        
       | nonbirithm wrote:
       | Isn't the publisher's lawsuit still going forward in November of
       | this year? It would be terrible if the Archive had to shut down
       | because they decided to do an unnecessary experiment with
       | copyright while simultaneously being tied to an irreplaceable,
       | yet centralized, historical resource.
       | 
       | I donate to them with the hopes that they won't try to do
       | anything that carries that kind of risk to their continued
       | existence again.
        
       | IfOnlyYouKnew wrote:
       | Big fan of the archive, especially openlibrary. But the lofty
       | talk of "democratizing" this-or-that is a bit overblown: in
       | authoritarian countries, the archive does zero democratizing,
       | because it's either just not available, or all the relevant
       | individual documents are blocked.
       | 
       | For democratic countries that have democratically elected to
       | restrict the distribution of some materials the society considers
       | harmful, such as Germany with Nazi propaganda, the archive
       | happily decides to undermine those clear, longstanding, and --
       | disproving the single argument for free speech absolutism -- not
       | slippery-sloping anywhere over decades. Why? Because laws you
       | disagree with are, apparently, illegitimate.
       | 
       | It probably helps their bold defense of all that is holy to
       | intimately know that these really _are_ democratic countries,
       | which aren't going to just send a wet team to dismember them,
       | Saudis-style, or to spend millions on an elaborate plan whose
       | only purpose is to let you live for another month, with a clear
       | mind that has complete certainty that you will die, and who did
       | it.
       | 
       | The anti-semitism and racism on archive.org, plus some copyright
       | violations isn't a byproduct of their "freedom". It's all of it.
       | There are plenty of free hosts for video or documents, and an
       | hour at minimum wage would pay for hosting quite a lot for quite
       | a while for most of the archive's audience. But the killer
       | feature is immunity, through anonymity and DMCA's Safe Harbour.
       | 
       | Sure, everyone here is only defending "free speech" and would
       | never agree with the swastika-fetishists. Only, somehow, they
       | never complain about ISIS having a hard time on Twitter, or porn
       | being censored on Facebook. It's the scans of _Der Sturmer_ ,
       | especially of the 38 to 44 vintage, are the chosen symbols of
       | "democratization".
       | 
       | [0]: https://www.sfgate.com/california-
       | politics/article/Internet-...
        
         | textfiles wrote:
         | "Big Fan" is doing a lot of heavy lifting there.
        
         | ZeroGravitas wrote:
         | I was intrigued by this comment, but couldn't really figure out
         | what was being claimed so read the linked article.
         | 
         | It's still not 100% clear but elements of it include:
         | 
         | - Historical racist texts (Mein Kampf, 1930s newspapers) which
         | I feel on balance is a good thing to preserve in a library. I
         | would assume having those papers available does more to combat
         | fascism than to encourage it (though who knows really)
         | 
         | - archives of websites that are dodgy (ISIS and Neo-Nazi are
         | mentioned but there must be all sorts of crazy and or bad stuff
         | on the web that gets archived)
         | 
         | - following US law rather than local law (a general conundrum
         | for internet sites like this, especially if you do it for
         | countries whose laws you like but not for other countries)
         | 
         | - providing a place for people to upload and share files
         | anonymously
         | 
         | So yeah, I'm not really a Free Speech absolutist myself
         | (doesn't seem like many that claim they are actually believe it
         | when it comes to things they disagree with) but doesn't feel
         | like they're in the same boat as social media platforms who
         | actively spread bad things if it increases ad impressions. Some
         | similar issues around policing large amounts of content and
         | dealing with different legal, political and moral frameworks at
         | scale though.
        
       | ezequiel-garzon wrote:
       | _This library would be available not only to those who could pay
       | the $1 per minute that LexusNexus charged_
       | 
       | Any idea on what LexusNexus is, or was back then? Thanks!
        
         | Teever wrote:
         | https://en.wikipedia.org/wiki/LexisNexis
        
         | lifekaizen wrote:
         | Very expensive research tool places like large law firms would
         | use.
        
         | fumeux_fume wrote:
         | I think it's a reference to LexisNexis which still exists today
         | in a similar capacity. Back in the day (1990s), it was an
         | online service to search for and read virtually any legal or
         | news article among many other things. It was very expense, but
         | was usually access thru a corporate or educational
         | institution's license.
        
         | Mountain_Skies wrote:
         | If you live in the US, they have a file on you and it's
         | probably very thick. A credit report will run a couple of pages
         | long, the LexusNexus report on me (a complete nobody) cost them
         | about $8 to mail. It contained many errors, mostly about
         | property I don't own but also about an insurance claim I never
         | made.
         | 
         | Like your credit report, you can get a free copy by writing
         | them and requesting a copy. IIRC when I did it a few years ago,
         | I had to make the request in writing, I wasn't able to order it
         | online, at least not for free.
        
           | jacquesm wrote:
           | They ought to be liable for those errors.
        
       | ovebepari wrote:
       | Brewster Kahle mentions his goal of creating "a library available
       | to anybody, anywhere in the world."
       | 
       | Fun fact: Archive.org is blocked in Bangladesh for god knows what
       | reasons.
        
       | ArtDev wrote:
       | I think we need a new src attribute that can have a fallback.
       | There are so many parts of the internet where most of the links
       | are dead. It's pretty sad actually.
        
       | elric wrote:
       | Without trying to be contrarian, I don't think that everything
       | should be archived. Random tweets, random blog posts, random
       | personal web sites. Let them wither and die and be forgotten.
       | Notable content by notable people? Sure.
       | 
       | Everyone else ought to have the right to be forgotten, including
       | some drunk tweet they wrote 10 years ago and regret, or an old
       | personal page which contained too much PII.
       | 
       | Archive no longer has a way to opt-out, which is bad enough, but
       | I still think they should be opt-in.
        
         | jacquesm wrote:
         | You never know in advance what will and what will not be
         | worthwhile archiving, you only know that at some unspecified
         | point in the future.
        
           | elric wrote:
           | That's lovely from the perspective of a historian 200 years
           | in the future. But it does not nothing to alleviate the pain
           | of people living in the present. People whose prospective
           | employers comb over their embarrassing past. Or bullies. Or
           | any other number of evildoers whose life is made easier by
           | unfettered access to indelible information.
        
             | pwdisswordfish8 wrote:
             | I don't like your proposed solution that uses right to be
             | forgotten as a blunt instrument to paper over serious
             | problems, to used a mixed metaphor. When you appeal to the
             | right to be forgotten to fix problems like people and
             | "prospective employers comb over their embarrassing past",
             | it's a way of throwing up your hands and neglecting deeper
             | issues.
             | 
             | In a right-to-be-forgotten world, the way it would end up
             | going is:
             | 
             | 1. problematic potentates punish pitiable proles
             | 
             | 2. someone invokes right to be forgotten
             | 
             | 3. this is considered "good enough"
             | 
             | 4. the problem conditions that allowed #1 to fester remain
             | uncorrected
             | 
             | I feel this way about a lot of stuff these days (especially
             | the where the erosion of the tenets of a liberal society is
             | involved), where people argue vociferously for a "solution"
             | that can at best be considered an indirect way of handling
             | the problem. You see this with a lot of contemporary calls
             | for the dismantling of tradition of free speech/free
             | inquire/freedom of association, for example. People end up
             | chafing in the direction of proposals that have dual-use
             | effects in the first instance and perniciously "null"
             | effects in the second instance.
        
         | jl6 wrote:
         | Perhaps the Internet Archive could do more to help people who
         | find their personal/sensitive/embarrassing content made
         | available in perpetuity (I'm not sure exactly _what_ they could
         | do), but it's incredibly valuable to have archiving on by
         | default. The voice of un-notable people is underrepresented in
         | every field of study, and the voice of notable people tends to
         | get preserved in other ways anyway.
        
           | jjkaczor wrote:
           | It is helpful to get the perspectives of "un-notable" people
           | from a historical perspective.
           | 
           | For example - the graffiti at Pompeii is interesting (and is
           | pretty much at the same "quality bar" as Twitter):
           | 
           | https://www.theatlantic.com/technology/archive/2016/03/adrie.
           | ..
           | 
           | https://kashgar.com.au/blogs/history/the-bawdy-graffiti-
           | of-p...
        
         | X6S1x6Okd1st wrote:
         | Historians spend a lot of time pouring over minutia from non-
         | notable people. There is plenty about the world that doesn't
         | seem worth writing down, but can be intuited from tangential
         | texts.
         | 
         | The costs seems low enough to just keep it.
        
           | dogorman wrote:
           | From what I understand, present day historians are sitting on
           | a huge pile of cuneiform tablets that have yet to be
           | transcribed or translated because there is much more material
           | than there is interest/manpower.
           | 
           | Of course, to your point, they do keep it around. They don't
           | just throw it in the trash.
        
             | jacobolus wrote:
             | Only a tiny number of people in the world can read Sumerian
             | or Akkadian, even for them the process is slow and error
             | prone because we are missing a lot of the original context,
             | and they have better things to do than skim through piles
             | of delivery receipts.
             | 
             | * * *
             | 
             | It would be pretty neat if someone could figure out how to
             | OCR all the cuneiform tablets and turn them into something
             | searchable.
        
         | generationP wrote:
         | Who decides what "notable" is? I frequently use the Archive to
         | find old academic grey lit (preprints, lecture notes, newsgroup
         | posts, etc.). Much of it is on "random" blog posts and personal
         | websites. Even the authors aren't usually notable by Wikipedia
         | standards. Yeah, there is some PII on those pages, but also
         | treasures of useful information.
        
         | stared wrote:
         | Most of history we know is from the perspective of the
         | wealthies 0.1%. Even though the Internet is still biased
         | towards the wealthier and more educated, having a history of
         | the wealthies 10% would be enormous progress.
        
       | [deleted]
        
       | causi wrote:
       | As a question related to archival, what's the best tool for local
       | archiving? HTTRACK is getting long in the tooth and it just work
       | for all the dynamic content on modern web pages.
        
       | TekMol wrote:
       | How can IA just go out there and copy+republish the content of
       | others and get away with it?
       | 
       | Aren't they breaching copyright on a massive massive scale?
        
         | ghaff wrote:
         | Because the vast majority of people don't care if someone
         | archives something they've made public. And the Internet
         | Archive bends over (to some too far) backwards with robots.txt
         | to exclude anything someone wants removed from public view.
        
       | dleslie wrote:
       | I love the Internet Archive; I worry that its utility will wane
       | as content becomes more dynamic than static. What does it mean to
       | archive the experience of scrolling through a social feed?
        
         | petertodd wrote:
         | The paid, legal-oriented, archiving service Perma.cc that
         | Harvard Law runs actually lets you upload your own PDFs and
         | screenshots in addition to allowing Perma.cc's bots capture
         | webpages. Of course, since you could upload anything the
         | difference is made clear in the UI.
         | 
         | In a legal context, simply attesting to the validity of a
         | screenshot is really common. So when that functionality is used
         | Perma.cc is operating more as a permanent file storage service
         | than a trusted archive.
         | 
         | Regardless, this does go a long way to solving the problem of
         | dynamic sites.
        
       | ignoramous wrote:
       | A little known trivia: Apache Hadoop (and the multi-billion
       | dollar open source big-data ecosystem it spawned) was worked upon
       | at first at Internet Archive [0].
       | 
       | Speaking of billions: According to Kahle, Alexa Internet's
       | compute infrastructure informed Amazon's take on IaaS (AWS) [1].
       | 
       | Another perhaps lost nugget is Amazon once funded (either in part
       | or in full) the development of the Wayback Machine, Internet
       | Archive's most impactful product. In addition, till date (if I'm
       | not mistaken) Amazon continues to donate data it fetches from
       | Alexa Toolbar installations to the Wayback Machine.
       | 
       | [0] https://archive.is/Le3id
       | 
       | [1] https://archive.is/EnzHq
        
       | endisneigh wrote:
       | I'm curious - how does Archive.org get around DMCA? I assume it's
       | because Fair Use and the fact that they're a not-for-profit, but
       | more details would be great.
       | 
       | Other sites like outline.com (which I guess is a for-profit)
       | entity don't really allow you to get around paywalls the way the
       | Wayback Machine does.
       | 
       | As someone interested in building a site that gets around
       | paywalls for semi-educational purposes I'm curious if anyone has
       | details!
        
         | pabs3 wrote:
         | IA are a designated library, which confers a set of privileges
         | under US copyright law.
        
           | ghaff wrote:
           | A very limited set of privileges. And AFAIK it's only a
           | "designated library" in California. I'm not sure there is
           | such a thing at the federal level other than the Library of
           | Congress.
           | 
           | Basically, most people are fine with what the Wayback Machine
           | does and they'll take down any mirror that the domain owner
           | asks them to.
        
           | DoingIsLearning wrote:
           | Yes as a non-US based user this is one of the most amazing
           | things for me with IA.
           | 
           | in my head IA was just the wayback machine with cached pages.
           | But during the pandemic I realized that there is a plethora
           | of actual books that one can checkout with far less friction
           | than in a standard library.
           | 
           | It is such a cool concept, also there are IA satellite
           | projects (not sure if they are owned by or in partnership),
           | for different non-english languages e.g. arquivo.pt so you
           | can have the same plethora of content in other non-english
           | languages as well.
        
         | joe_the_user wrote:
         | My impression archive.org follows whatever directions are in
         | robots.txt. If those directions allow a site to be archived,
         | they do it, otherwise they don't. That's from looking at not-
         | archived sites, ezboard, which mention in another post.
        
           | Santosh83 wrote:
           | And what about sites that do not have robots.txt? Does IA
           | snapshot those?
        
             | nlitened wrote:
             | Yes. By default, if you put information on the web, it is
             | assumed that you are okay with people seeing it.
        
         | toomuchtodo wrote:
         | Takedowns cause the content to be "darked." It's no longer
         | publicly available, but still archived on disk until a future
         | date.
        
       | joe_the_user wrote:
       | Nice they're there. At the same time, it's amazingly easy for
       | content to be removed from there - if someone objects or even if
       | things are murky.
       | 
       | For example, all content from the old ezboard site was been
       | removed based on the configuration of the current URL owners'
       | robots.txt, and current URL owner is just a domain parker.
       | Ezboard hosted a lot of content back in the day.
       | 
       | https://archive.org/post/560730/ezboard-is-there-any-hope
        
         | 1vuio0pswjnm7 wrote:
         | This is an old problem I could have sworn there were promises
         | they were going to change their procedures.
         | 
         | The question I have is how fast is the content removed after
         | the domain name registration changes, i.e., is there is a
         | window of time between the appearance of a new robots.txt and
         | the next scheduled crawl, and if so, is it be possible to
         | "rescue" the content, as ArchiveTeam would do, during that
         | window, before it disappears.
         | 
         | If this is possible, there could be a service for monitoring
         | changes to domain name registrations for sites that have large
         | amounts of historical content. I would happily volunteer to set
         | up such a service.
        
           | joe_the_user wrote:
           | Well, "complain on hn" has been a way to get stuff from
           | Google. Maybe someone at archive will notice this thread.
        
             | hidden-spyder wrote:
             | I'm curious. What changes has Google made due to complains
             | on HN?
        
         | SilverRed wrote:
         | Hopefully it is just hidden and not deleted. But this is the
         | main reason why alternative archive sites exists which ignore
         | the original posters requests. Frequently used to archive posts
         | from public figures which are suspected to be attempted to be
         | scrubbed later.
        
           | techrat wrote:
           | > Hopefully it is just hidden and not deleted.
           | 
           | Hidden. Even when you request for them to remove stuff.
           | 
           | Had domain, stuff got archived, asked for them to remove it,
           | added robots.txt. Domain lapsed. Someone else picked it up.
           | their robots.txt now permissive, old stuff that I requested
           | for them to remove is now visible.
        
             | throwslackforce wrote:
             | That's insidious. Does it even make sense to revive data
             | that has been removed based on _current_ configuration?
             | 
             | Even if the owner is the same, allowing the site to be
             | archived going forward isn't the same thing as permitting
             | it retroactively.
        
             | mavhc wrote:
             | Wonder how you'd overcome that flaw, is there a history of
             | domain name ownership?
        
               | newswasboring wrote:
               | Just delete the things when requested. No need to make it
               | more complicated than that.
        
               | mavhc wrote:
               | When requested by who? The current owner of the domain?
               | Do they own what was on that domain 20 years ago?
               | 
               | What if you lost your domain, but owned it in the past?
               | Can you delete stuff from that era?
        
               | account42 wrote:
               | Maybe we need a whois.archive.org.
        
           | fwn wrote:
           | As far as I was able to experience it's just hidden and not
           | deleted.
           | 
           | I have to keep an old domain indefinitely to host a
           | robots.txt just to keep sensitive personal data hidden that
           | little me foolishly published on the open internet.
           | 
           | But I'm not complaining. The internet archive is a great
           | gift. Using it with a bookmarklet really feels like a super
           | power.
        
             | mercora wrote:
             | it sounds a lot like this would need some kind of
             | delegation mechanism where you could point to a different
             | URL in-time before abandoning the place. or maybe some kind
             | of sealing using a cryptographic function that lets you
             | proof your are the owner of the current/previous content
             | and also would proof you are not the owner of the newer
             | content while this proof could be used to release the ban
             | if ever needed.
        
           | joe_the_user wrote:
           | Got any examples of these alternative archives?
        
             | mellosouls wrote:
             | A couple of pointers to the wider world of web archiving:
             | 
             | https://github.com/ArchiveBox/ArchiveBox/wiki/Web-
             | Archiving-...
             | 
             | https://github.com/iipc/awesome-web-archiving
        
             | SilverRed wrote:
             | This is the main one https://archive.is/
             | 
             | From the FAQ, they do not respect robots.txt since they
             | only archive on request by a user and they do not remove
             | archives unless they contain illegal content.
        
               | 1vuio0pswjnm7 wrote:
               | HN commenters like to use archive.is but I always wonder
               | if people are aware that archive.is is (a) blocked in
               | some countries^1 and (b) may block access to itself in a
               | country if it feels threatened.^2
               | 
               | There is also the issue of EDNS subnet.^3 archive.is
               | tries to require it; it wants to know what location a
               | request is coming from. In addition to EDNS, archive.is
               | inserts the IP address and geolocation of the incoming
               | request into the HTML of the returned page as a tracking
               | pixel.^4
               | 
               | Thus archive.is does some things archive.org does not do
               | besides just ignoring robots.txt
               | 
               | One of the things archive.org does that archive.is does
               | not do is that archive.org inserts an HTTP response
               | header intended to disable Chrome FLoC.^5 I add this
               | header for all sites in a local proxy; however I do not
               | see many sites adding it as a courtesy. Thanks
               | archive.org for doing that.
               | 
               | 1. https://en.wikipedia.org/wiki/Archive.is
               | 
               | 2. https://www.reddit.com/r/KotakuInAction/comments/3e29v
               | m/arch...
               | 
               | 3.
               | https://webapps.stackexchange.com/questions/135222/why-
               | does-...
               | 
               | 4. https://news.ycombinator.com/item?id=27498902
               | 
               | 5. permissions-policy: interest-cohort=()
        
             | v0x wrote:
             | There's archive.is but I get the sense that the major use
             | case for that is getting around paywalls as opposed to
             | permanently archiving a page - indeed, since they host
             | content that the site owner probably doesn't want them to,
             | it would stand to reason the service would not be likely to
             | stand the yet of time. But I could be wrong.
        
         | Jiro wrote:
         | They actually posted about this in 2017:
         | https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...
         | . At the time it sounded like they might change their
         | robots.txt policy. I guess they never followed up on it.
         | 
         | (I checked and ezboard is still excluded.)
        
       | tkgally wrote:
       | In his reflections, Brewster Kahle mentions his goal of creating
       | "a library available to anybody, anywhere in the world." He
       | doesn't mention, though, the costs of making that library
       | available to the world for free or the fact that the Internet
       | Archive accepts donations. So I will:
       | 
       | https://archive.org/donate/
        
         | state_less wrote:
         | I sent my bones or clams or whatever you call them.
         | 
         | Will send moore when I have more and when I've learned to be
         | more generous. It's good to know that you're near Internet
         | Archive.
         | 
         |  _But, oh, what a wonderful feeling
         | 
         | Just to know that you are near
         | 
         | Sets my a heart a-reeling
         | 
         | From my toes up to my ears_
         | 
         | -Bob Dylan, The man in me
        
           | mkaufman wrote:
           | state_less sounds like a Little Lebowski Urban Achiever.
        
             | state_less wrote:
             | Little Lebowski Urban Achievers - inner city children of
             | promise but without the necessary means for a - necessary
             | means for a higher education. So Mr Lebowski is committed
             | to sending all of them to college.
        
         | walterbell wrote:
         | Anyone know the relative budgets/donations/staff of Wikipedia
         | vs. Archive.org?
        
           | tkgally wrote:
           | ProPublica has a database of tax filing information for
           | nonprofits.
           | 
           | Internet Archive: https://projects.propublica.org/nonprofits/
           | organizations/943...
           | 
           | Wikimedia Foundation: https://projects.propublica.org/nonprof
           | its/organizations/200...
           | 
           | Mozilla Foundation: https://projects.propublica.org/nonprofit
           | s/organizations/200...
           | 
           | Electronic Frontier Foundation: https://projects.propublica.o
           | rg/nonprofits/organizations/430...
        
             | jdc wrote:
             | Wow, never thought I'd see Wikimedia at more than 4x the
             | budget of Mozilla!
        
               | feudalism wrote:
               | It's how Katherine Maher funds her trips to exotic
               | locations.
        
               | [deleted]
        
         | raybb wrote:
         | Donations are fantastic but if you have engineering (or project
         | management, design, etc) skills spending just 1 hour a week
         | contributing to their open source goes a very long way!
         | 
         | Open Library in particular has a very active repo with lots of
         | volunteers, a weekly community call, and a rather accessible
         | codebase. https://github.com/internetarchive/openlibrary
         | 
         | If anyone knows webpack well would LOVE to have this dev-facing
         | issue resolve to auto reload CSS
         | https://github.com/internetarchive/openlibrary/issues/4955
        
           | dailyanchovy wrote:
           | Alright, I'm new to pull requests but I had a go at it!
           | https://github.com/internetarchive/openlibrary/pull/5451
        
       | shrubby wrote:
       | Pictures and touchscreens have ruined the internet ;-)
        
       | TedDoesntTalk wrote:
       | How many books have you read that were published in the 1800s?
       | 
       | I'm betting close to zero.
       | 
       | Unfortunately, most people 200 years from now won't care about
       | the 70 petabytes the Internet Archive has saved.
       | 
       | Don't misunderstand: I am glad they do this and love their work.
       | I just think we overestimate the long-term value of this info
       | beyond a very small set of future historians or social
       | historians.
       | 
       | Most people have their lives to live in this moment, and if they
       | have a chance to look backwards before they were born, it's not a
       | big piece of their time.
        
         | 5etho wrote:
         | when I read for the first time Balzac's novel - nucingen Bank
         | (1839) [1] and about business 'mindset' at the time I was
         | mesmerized [1] https://pl.wikipedia.org/wiki/Bank_Nucingena (no
         | wiki for english lang at least no hyperink)
        
         | wolverine876 wrote:
         | Jane Austen, Dickens, Melville, Oscar Wilde, Mark Twain,
         | Tolstoy, Emily Dickenson, Dostoevsky, Emerson, Charlotte
         | Bronte, Lewis Carroll, Victor Hugo, Grimm bros., Goethe,
         | Darwin, Ibsen, Nietzsche, Pushkin ...
         | 
         | It's a pretty good century for literature and other books.
         | 
         | Also, lots of people are interested in history. Those can be
         | best sellers.
        
           | Andrew_nenakhov wrote:
           | You must be biased against the French, mentioning only Hugo
           | in your list! What about Dumas, Stendhal, Balzac, Zolya,
           | Flaubert, Maupassant, George Sand, Verlaine, Rimbaud,
           | Valery???!
           | 
           | (That's only from a Russian, pretty sure actual French will
           | add dozens more to this list)
        
         | mdp2021 wrote:
         | >I'm betting
         | 
         | You are betting very wrongly. People use large amounts of older
         | literature. Maybe not in your territory - well, be aware then
         | that many cultures do.
         | 
         | >Unfortunately, most people
         | 
         | That the median individual should be considered a parameter is
         | very controversial. (Contextually: services are very easily for
         | interested minorities.)
         | 
         | >overestimate the long-term value
         | 
         | As if Project Gutenberg had not arguably been one of the most
         | important endeavours in history.
         | 
         | >if they have a chance to look backwards
         | 
         | It is a fundamental part of education...
        
         | someguy101010 wrote:
         | I was reading every newspaper that mentioned spanish flu from
         | the 1910's when covid was starting. I never would have thought
         | that I would have been wishing for easier access to full text
         | search newspaper archives. Here we are :p
        
         | loughnane wrote:
         | Just add a few more that are popular
         | 
         | - Ralph Waldo Emerson
         | 
         | - Henry David Thoreau
         | 
         | - Rudyard Kipling (jungle book)
         | 
         | - Anna Sewell (black beauty) - Walt Whitman (leaves of grass)
         | 
         | - Edgar Allan Poe
         | 
         | - Alexander Dumas (Count of Monte Cristo, musketeers)
         | 
         | - Tocqueville (democracy in America)
        
         | dannyobrien wrote:
         | How many books have you read that were written by people who
         | read books published in the 1800s?
        
         | jfoutz wrote:
         | Eh, Shelly, Dickens, Twain and Wells are all I can think of.
         | Not zero, but a vanishingly small percentage.
        
         | TheCowboy wrote:
         | > How many books have you read that were published in the
         | 1800s? I'm betting close to zero.
         | 
         | Quite a few actually, and I'm not an outlier. Plus there are
         | many adaptations and derivative works that exist.
        
         | dleslie wrote:
         | 19th century literature is full of treasures.
        
         | kart23 wrote:
         | I hope you're wrong. Video and pictures coupled with text are a
         | goldmine for me. I would've loved to see a 'vlog' from the
         | 1800s honestly. What was it like for a perfectly normal person,
         | not a career writer? I don't think we have a lot of that, or if
         | we do, it's from a singular viewpoint, and we're subject to
         | their view and biases.
         | 
         | I found a archive of videos in my city from 1970, a street-view
         | like recording of select roads. I pored over it for a couple
         | hours, noting the buildings that were still there, the
         | completely empty hills now filled with houses, etc. That kind
         | of stuff is really cool.
        
         | SilverRed wrote:
         | How many times have you clicked a link and found the page 404
         | or redirect you somewhere else? That is the real value of the
         | web archive to me. Wikipedia uses it a lot as well, they
         | automatically snapshot any link used as a reference and if the
         | page goes away they automatically swap the link for a web
         | archive version.
         | 
         | Reading a reference 50 or even 200 years old is not absurd. A
         | post detailing some research findings which is referenced in
         | wikipedia is still greatly valuable. Youtube historians often
         | reference ancient patents to uncover the history of old items.
        
           | techbio wrote:
           | Interesting point about managing against link rot.
           | 
           | But what got me is Youtube historians and their "ancient
           | patents..." I want to see the ones for reinventing the wheel.
        
         | instagraham wrote:
         | Quite a spectacularly wrong take on multiple levels.
         | 
         | So much great literature comes from centuries before our own.
         | And considering that the internet is likely to be around
         | forever or as long as humans persist, a snapshot of its initial
         | decades will one day be one of the greatest "archaeological"
         | treasures.
         | 
         | Perhaps your main point is that you do not care for their work
         | nor for the work of literature written before your time. You
         | need not apply your yardstick to anything else in a bid to
         | gauge its value.
        
           | TedDoesntTalk wrote:
           | But 70 petabytes worth? Most people in this thread mention a
           | few dozens authors. Maybe this amounts to a few hundred
           | books. Not 70 petabytes (and counting).
        
             | allturtles wrote:
             | Sure, no human has ever read 70 petabytes worth of books.
             | 
             | An archive is not the same as a local public library. The
             | latter holds a small collection of mostly frequently
             | accessed items (e.g. the published works of Dickens,
             | Austen, etc.). The former holds a much larger collection of
             | rarely accessed items (e.g. every letter written by/to
             | Dickens that survived, every political pamphlet published
             | in Philadelphia in the nineteenth century , etc.).
             | 
             | If your point is that most items in the archive will be
             | rarely accessed, I don't think anyone will disagree with
             | you, but suggesting that the literature of the nineteenth
             | century is no longer of any interest was perhaps not the
             | best way of making that point.
        
         | Mortiffer wrote:
         | Via librivox for me and many others i consume 1800s content
         | almost daily
        
         | kilroy123 wrote:
         | I've read several from that time period. There are loads of
         | good books from the 1800s.
        
           | TedDoesntTalk wrote:
           | Yes. But 70 petabytes worth?
        
             | StrictDabbler wrote:
             | 70 petabytes of anything didn't exist in stored form in the
             | 1800s so that's a ridiculous standard, but...
             | 
             | The value here is genuinely historical. In a hundred years,
             | how will we track the etymology of common terms that
             | originated in this age?
             | 
             | Memes make preserving the internet extremely important.
             | Terms and ideas evolve so quickly that the history of
             | language and thought will become obscure almost instantly.
             | Even now it can be almost impossible to understand some
             | internet terms if you weren't part of the subculture that
             | spawned them at the exact time they were spawned.
             | 
             | Do you know what hunter2 means? Do you know it because of
             | bash.org?
             | 
             | A person doesn't have to read all this material. The
             | material has to be stored because our future society will
             | have descended from this material, and if they don't have
             | it they won't know how they got there.
             | 
             | You know, except for the bit where civilisation collapses
             | over the next hundred years as the planet warms and hot
             | countries invade cooler, developed countries looking for
             | living space.
        
       ___________________________________________________________________
       (page generated 2021-07-22 23:02 UTC)