[HN Gopher] Reflections as the Internet Archive turns 25
___________________________________________________________________
Reflections as the Internet Archive turns 25
Author : ingve
Score : 339 points
Date : 2021-07-22 03:04 UTC (19 hours ago)
(HTM) web link (blog.archive.org)
(TXT) w3m dump (blog.archive.org)
| DonHopkins wrote:
| A search of youtube for "wayback machine" produces pages of stuff
| about the Internet Archive, and only the 24th result has anything
| to do with the origin of the term.
|
| People who didn't spend their Saturday mornings glued in front of
| the TV screen as a child of the 1970's might not remember how
| American kids learned about history back then:
|
| Peabody's Improbable History - Surrender of Cornwallis
|
| Peabody and Sherman travel back to October 19, 1781 to witness
| when Cornwallis surrendered for Washington. However, when they
| got there, then he didn't show up.
|
| https://www.youtube.com/watch?v=3E8zmaOiCVw&ab_channel=bullw...
| pwdisswordfish0 wrote:
| FYI, it's not nowadays obscure. There's a current series _The
| Mr. Peabody & Sherman Show_ on Netflix, and Hollywood made a
| Mr. Peabody and Sherman movie (by Dreamworks) in 2014.
| ConceptJunkie wrote:
| Actually, "The Bullwinkle Show" premiered in 1959, but I
| discovered it in the early 70s, as I suspect you did.
| ZeroGravitas wrote:
| This is a bit of a tangent, but there was mention of non-
| advertising based funding models.
|
| Is anyone working on an advertising model that achieves the basic
| goal of advertising, but without the centralising aspect which
| seems to be the root cause of many of the issues? Giant
| monopolies are always going to subvert regulation but the same
| industry as disconnected units might be easier to police.
| Obviously you can just try to split them up or limit their size
| with regulation after the fact but a good technical basis might
| help out.
|
| Does web advertising just not make sense unless you can amass
| lots of private user data and track people across the web? If so
| can we subcontract that data to smaller companies we can trust
| with our data and effectively punish if they break the rules?
| coldpie wrote:
| > Does web advertising just not make sense unless you can amass
| lots of private user data and track people across the web?
|
| It does make sense, but it has to compete with the invasive-
| style advertising, and it will always lose. If you want the
| "good advertising" you have to kill the "bad advertising".
| akkartik wrote:
| I wish IA hosted Usenet archives. Even if they stopped at year
| 2000 and didn't update further.
| generationP wrote:
| They host some: https://archive.org/details/usenet
|
| As far as my own old posts are concerned, it looks complete :)
| But it isn't easily findable or searchable; the intended way of
| interaction is apparently to download an entire hierarchy and
| grep.
| akkartik wrote:
| Oh good to know! That's adequate.
| soheil wrote:
| For anyone interested I made a quick and dirty way to pull up the
| archive of any URL by prefixing it with arxiv.link
|
| e.g.,
|
| http://arxiv.link/https://news.ycombinator.com/
| johtso wrote:
| Is there an advantage compared to just putting
| `web.archive.org/web/` before the url?
| soheil wrote:
| Didn't know that, thanks. I guess just shorter.
| fouc wrote:
| Oh interesting,
| http://web.archive.org/web/https://news.ycombinator.com also
| works, didn't know that.
| ipsum2 wrote:
| a little confusing since arxiv refers to a popular research
| paper archive. Useful project though!
| fouc wrote:
| Nice, thanks! Ever since I started using archive.is last year,
| I've always wanted something like this! Could be paired with a
| bookmarklet too.
|
| javascript:(function(){window.open('http://arxiv.link/'+locatio
| n.href)})();
| alexislours wrote:
| You have always been able to do this with archive.is, I've
| been using this bookmarklet for a quite some time.
|
| javascript: (() => { window.open("https://archive.is/" +
| window.location.href, '_blank')})();
| codethief wrote:
| archive.is is not related to the Internet Archive's Wayback
| Machine, though, is it?
| fouc wrote:
| It's not related, but it will give you a link there if it
| doesn't have the page archived.
| fouc wrote:
| I meant I wanted to go directly to the most recent result
| in the wayback engine for the given page.
| alexislours wrote:
| In this case you can just use the following and it will
| redirect on the latest archive without being dependant on
| a 3rd party website:
|
| javascript: (() => {
| window.open("https://web.archive.org/web/" +
| window.location.href, '_blank')})();
| spiritplumber wrote:
| The Internet Archive location is beautiful. it's a church that
| has been partially turned into a server farm. Big Neuromancer
| energy when you go inside and look.
| N3cr0ph4g1st wrote:
| Did a hackathon there in 2016, it is so cool!
| DoingIsLearning wrote:
| I was too curious not to look this up:
|
| https://www.atlasobscura.com/places/internet-archive-headqua...
| [deleted]
| Thoreandan wrote:
| It's a shame that the mirror in Alexandria appears to be long
| abandoned.
| nonbirithm wrote:
| Isn't the publisher's lawsuit still going forward in November of
| this year? It would be terrible if the Archive had to shut down
| because they decided to do an unnecessary experiment with
| copyright while simultaneously being tied to an irreplaceable,
| yet centralized, historical resource.
|
| I donate to them with the hopes that they won't try to do
| anything that carries that kind of risk to their continued
| existence again.
| IfOnlyYouKnew wrote:
| Big fan of the archive, especially openlibrary. But the lofty
| talk of "democratizing" this-or-that is a bit overblown: in
| authoritarian countries, the archive does zero democratizing,
| because it's either just not available, or all the relevant
| individual documents are blocked.
|
| For democratic countries that have democratically elected to
| restrict the distribution of some materials the society considers
| harmful, such as Germany with Nazi propaganda, the archive
| happily decides to undermine those clear, longstanding, and --
| disproving the single argument for free speech absolutism -- not
| slippery-sloping anywhere over decades. Why? Because laws you
| disagree with are, apparently, illegitimate.
|
| It probably helps their bold defense of all that is holy to
| intimately know that these really _are_ democratic countries,
| which aren't going to just send a wet team to dismember them,
| Saudis-style, or to spend millions on an elaborate plan whose
| only purpose is to let you live for another month, with a clear
| mind that has complete certainty that you will die, and who did
| it.
|
| The anti-semitism and racism on archive.org, plus some copyright
| violations isn't a byproduct of their "freedom". It's all of it.
| There are plenty of free hosts for video or documents, and an
| hour at minimum wage would pay for hosting quite a lot for quite
| a while for most of the archive's audience. But the killer
| feature is immunity, through anonymity and DMCA's Safe Harbour.
|
| Sure, everyone here is only defending "free speech" and would
| never agree with the swastika-fetishists. Only, somehow, they
| never complain about ISIS having a hard time on Twitter, or porn
| being censored on Facebook. It's the scans of _Der Sturmer_ ,
| especially of the 38 to 44 vintage, are the chosen symbols of
| "democratization".
|
| [0]: https://www.sfgate.com/california-
| politics/article/Internet-...
| textfiles wrote:
| "Big Fan" is doing a lot of heavy lifting there.
| ZeroGravitas wrote:
| I was intrigued by this comment, but couldn't really figure out
| what was being claimed so read the linked article.
|
| It's still not 100% clear but elements of it include:
|
| - Historical racist texts (Mein Kampf, 1930s newspapers) which
| I feel on balance is a good thing to preserve in a library. I
| would assume having those papers available does more to combat
| fascism than to encourage it (though who knows really)
|
| - archives of websites that are dodgy (ISIS and Neo-Nazi are
| mentioned but there must be all sorts of crazy and or bad stuff
| on the web that gets archived)
|
| - following US law rather than local law (a general conundrum
| for internet sites like this, especially if you do it for
| countries whose laws you like but not for other countries)
|
| - providing a place for people to upload and share files
| anonymously
|
| So yeah, I'm not really a Free Speech absolutist myself
| (doesn't seem like many that claim they are actually believe it
| when it comes to things they disagree with) but doesn't feel
| like they're in the same boat as social media platforms who
| actively spread bad things if it increases ad impressions. Some
| similar issues around policing large amounts of content and
| dealing with different legal, political and moral frameworks at
| scale though.
| ezequiel-garzon wrote:
| _This library would be available not only to those who could pay
| the $1 per minute that LexusNexus charged_
|
| Any idea on what LexusNexus is, or was back then? Thanks!
| Teever wrote:
| https://en.wikipedia.org/wiki/LexisNexis
| lifekaizen wrote:
| Very expensive research tool places like large law firms would
| use.
| fumeux_fume wrote:
| I think it's a reference to LexisNexis which still exists today
| in a similar capacity. Back in the day (1990s), it was an
| online service to search for and read virtually any legal or
| news article among many other things. It was very expense, but
| was usually access thru a corporate or educational
| institution's license.
| Mountain_Skies wrote:
| If you live in the US, they have a file on you and it's
| probably very thick. A credit report will run a couple of pages
| long, the LexusNexus report on me (a complete nobody) cost them
| about $8 to mail. It contained many errors, mostly about
| property I don't own but also about an insurance claim I never
| made.
|
| Like your credit report, you can get a free copy by writing
| them and requesting a copy. IIRC when I did it a few years ago,
| I had to make the request in writing, I wasn't able to order it
| online, at least not for free.
| jacquesm wrote:
| They ought to be liable for those errors.
| ovebepari wrote:
| Brewster Kahle mentions his goal of creating "a library available
| to anybody, anywhere in the world."
|
| Fun fact: Archive.org is blocked in Bangladesh for god knows what
| reasons.
| ArtDev wrote:
| I think we need a new src attribute that can have a fallback.
| There are so many parts of the internet where most of the links
| are dead. It's pretty sad actually.
| elric wrote:
| Without trying to be contrarian, I don't think that everything
| should be archived. Random tweets, random blog posts, random
| personal web sites. Let them wither and die and be forgotten.
| Notable content by notable people? Sure.
|
| Everyone else ought to have the right to be forgotten, including
| some drunk tweet they wrote 10 years ago and regret, or an old
| personal page which contained too much PII.
|
| Archive no longer has a way to opt-out, which is bad enough, but
| I still think they should be opt-in.
| jacquesm wrote:
| You never know in advance what will and what will not be
| worthwhile archiving, you only know that at some unspecified
| point in the future.
| elric wrote:
| That's lovely from the perspective of a historian 200 years
| in the future. But it does not nothing to alleviate the pain
| of people living in the present. People whose prospective
| employers comb over their embarrassing past. Or bullies. Or
| any other number of evildoers whose life is made easier by
| unfettered access to indelible information.
| pwdisswordfish8 wrote:
| I don't like your proposed solution that uses right to be
| forgotten as a blunt instrument to paper over serious
| problems, to used a mixed metaphor. When you appeal to the
| right to be forgotten to fix problems like people and
| "prospective employers comb over their embarrassing past",
| it's a way of throwing up your hands and neglecting deeper
| issues.
|
| In a right-to-be-forgotten world, the way it would end up
| going is:
|
| 1. problematic potentates punish pitiable proles
|
| 2. someone invokes right to be forgotten
|
| 3. this is considered "good enough"
|
| 4. the problem conditions that allowed #1 to fester remain
| uncorrected
|
| I feel this way about a lot of stuff these days (especially
| the where the erosion of the tenets of a liberal society is
| involved), where people argue vociferously for a "solution"
| that can at best be considered an indirect way of handling
| the problem. You see this with a lot of contemporary calls
| for the dismantling of tradition of free speech/free
| inquire/freedom of association, for example. People end up
| chafing in the direction of proposals that have dual-use
| effects in the first instance and perniciously "null"
| effects in the second instance.
| jl6 wrote:
| Perhaps the Internet Archive could do more to help people who
| find their personal/sensitive/embarrassing content made
| available in perpetuity (I'm not sure exactly _what_ they could
| do), but it's incredibly valuable to have archiving on by
| default. The voice of un-notable people is underrepresented in
| every field of study, and the voice of notable people tends to
| get preserved in other ways anyway.
| jjkaczor wrote:
| It is helpful to get the perspectives of "un-notable" people
| from a historical perspective.
|
| For example - the graffiti at Pompeii is interesting (and is
| pretty much at the same "quality bar" as Twitter):
|
| https://www.theatlantic.com/technology/archive/2016/03/adrie.
| ..
|
| https://kashgar.com.au/blogs/history/the-bawdy-graffiti-
| of-p...
| X6S1x6Okd1st wrote:
| Historians spend a lot of time pouring over minutia from non-
| notable people. There is plenty about the world that doesn't
| seem worth writing down, but can be intuited from tangential
| texts.
|
| The costs seems low enough to just keep it.
| dogorman wrote:
| From what I understand, present day historians are sitting on
| a huge pile of cuneiform tablets that have yet to be
| transcribed or translated because there is much more material
| than there is interest/manpower.
|
| Of course, to your point, they do keep it around. They don't
| just throw it in the trash.
| jacobolus wrote:
| Only a tiny number of people in the world can read Sumerian
| or Akkadian, even for them the process is slow and error
| prone because we are missing a lot of the original context,
| and they have better things to do than skim through piles
| of delivery receipts.
|
| * * *
|
| It would be pretty neat if someone could figure out how to
| OCR all the cuneiform tablets and turn them into something
| searchable.
| generationP wrote:
| Who decides what "notable" is? I frequently use the Archive to
| find old academic grey lit (preprints, lecture notes, newsgroup
| posts, etc.). Much of it is on "random" blog posts and personal
| websites. Even the authors aren't usually notable by Wikipedia
| standards. Yeah, there is some PII on those pages, but also
| treasures of useful information.
| stared wrote:
| Most of history we know is from the perspective of the
| wealthies 0.1%. Even though the Internet is still biased
| towards the wealthier and more educated, having a history of
| the wealthies 10% would be enormous progress.
| [deleted]
| causi wrote:
| As a question related to archival, what's the best tool for local
| archiving? HTTRACK is getting long in the tooth and it just work
| for all the dynamic content on modern web pages.
| TekMol wrote:
| How can IA just go out there and copy+republish the content of
| others and get away with it?
|
| Aren't they breaching copyright on a massive massive scale?
| ghaff wrote:
| Because the vast majority of people don't care if someone
| archives something they've made public. And the Internet
| Archive bends over (to some too far) backwards with robots.txt
| to exclude anything someone wants removed from public view.
| dleslie wrote:
| I love the Internet Archive; I worry that its utility will wane
| as content becomes more dynamic than static. What does it mean to
| archive the experience of scrolling through a social feed?
| petertodd wrote:
| The paid, legal-oriented, archiving service Perma.cc that
| Harvard Law runs actually lets you upload your own PDFs and
| screenshots in addition to allowing Perma.cc's bots capture
| webpages. Of course, since you could upload anything the
| difference is made clear in the UI.
|
| In a legal context, simply attesting to the validity of a
| screenshot is really common. So when that functionality is used
| Perma.cc is operating more as a permanent file storage service
| than a trusted archive.
|
| Regardless, this does go a long way to solving the problem of
| dynamic sites.
| ignoramous wrote:
| A little known trivia: Apache Hadoop (and the multi-billion
| dollar open source big-data ecosystem it spawned) was worked upon
| at first at Internet Archive [0].
|
| Speaking of billions: According to Kahle, Alexa Internet's
| compute infrastructure informed Amazon's take on IaaS (AWS) [1].
|
| Another perhaps lost nugget is Amazon once funded (either in part
| or in full) the development of the Wayback Machine, Internet
| Archive's most impactful product. In addition, till date (if I'm
| not mistaken) Amazon continues to donate data it fetches from
| Alexa Toolbar installations to the Wayback Machine.
|
| [0] https://archive.is/Le3id
|
| [1] https://archive.is/EnzHq
| endisneigh wrote:
| I'm curious - how does Archive.org get around DMCA? I assume it's
| because Fair Use and the fact that they're a not-for-profit, but
| more details would be great.
|
| Other sites like outline.com (which I guess is a for-profit)
| entity don't really allow you to get around paywalls the way the
| Wayback Machine does.
|
| As someone interested in building a site that gets around
| paywalls for semi-educational purposes I'm curious if anyone has
| details!
| pabs3 wrote:
| IA are a designated library, which confers a set of privileges
| under US copyright law.
| ghaff wrote:
| A very limited set of privileges. And AFAIK it's only a
| "designated library" in California. I'm not sure there is
| such a thing at the federal level other than the Library of
| Congress.
|
| Basically, most people are fine with what the Wayback Machine
| does and they'll take down any mirror that the domain owner
| asks them to.
| DoingIsLearning wrote:
| Yes as a non-US based user this is one of the most amazing
| things for me with IA.
|
| in my head IA was just the wayback machine with cached pages.
| But during the pandemic I realized that there is a plethora
| of actual books that one can checkout with far less friction
| than in a standard library.
|
| It is such a cool concept, also there are IA satellite
| projects (not sure if they are owned by or in partnership),
| for different non-english languages e.g. arquivo.pt so you
| can have the same plethora of content in other non-english
| languages as well.
| joe_the_user wrote:
| My impression archive.org follows whatever directions are in
| robots.txt. If those directions allow a site to be archived,
| they do it, otherwise they don't. That's from looking at not-
| archived sites, ezboard, which mention in another post.
| Santosh83 wrote:
| And what about sites that do not have robots.txt? Does IA
| snapshot those?
| nlitened wrote:
| Yes. By default, if you put information on the web, it is
| assumed that you are okay with people seeing it.
| toomuchtodo wrote:
| Takedowns cause the content to be "darked." It's no longer
| publicly available, but still archived on disk until a future
| date.
| joe_the_user wrote:
| Nice they're there. At the same time, it's amazingly easy for
| content to be removed from there - if someone objects or even if
| things are murky.
|
| For example, all content from the old ezboard site was been
| removed based on the configuration of the current URL owners'
| robots.txt, and current URL owner is just a domain parker.
| Ezboard hosted a lot of content back in the day.
|
| https://archive.org/post/560730/ezboard-is-there-any-hope
| 1vuio0pswjnm7 wrote:
| This is an old problem I could have sworn there were promises
| they were going to change their procedures.
|
| The question I have is how fast is the content removed after
| the domain name registration changes, i.e., is there is a
| window of time between the appearance of a new robots.txt and
| the next scheduled crawl, and if so, is it be possible to
| "rescue" the content, as ArchiveTeam would do, during that
| window, before it disappears.
|
| If this is possible, there could be a service for monitoring
| changes to domain name registrations for sites that have large
| amounts of historical content. I would happily volunteer to set
| up such a service.
| joe_the_user wrote:
| Well, "complain on hn" has been a way to get stuff from
| Google. Maybe someone at archive will notice this thread.
| hidden-spyder wrote:
| I'm curious. What changes has Google made due to complains
| on HN?
| SilverRed wrote:
| Hopefully it is just hidden and not deleted. But this is the
| main reason why alternative archive sites exists which ignore
| the original posters requests. Frequently used to archive posts
| from public figures which are suspected to be attempted to be
| scrubbed later.
| techrat wrote:
| > Hopefully it is just hidden and not deleted.
|
| Hidden. Even when you request for them to remove stuff.
|
| Had domain, stuff got archived, asked for them to remove it,
| added robots.txt. Domain lapsed. Someone else picked it up.
| their robots.txt now permissive, old stuff that I requested
| for them to remove is now visible.
| throwslackforce wrote:
| That's insidious. Does it even make sense to revive data
| that has been removed based on _current_ configuration?
|
| Even if the owner is the same, allowing the site to be
| archived going forward isn't the same thing as permitting
| it retroactively.
| mavhc wrote:
| Wonder how you'd overcome that flaw, is there a history of
| domain name ownership?
| newswasboring wrote:
| Just delete the things when requested. No need to make it
| more complicated than that.
| mavhc wrote:
| When requested by who? The current owner of the domain?
| Do they own what was on that domain 20 years ago?
|
| What if you lost your domain, but owned it in the past?
| Can you delete stuff from that era?
| account42 wrote:
| Maybe we need a whois.archive.org.
| fwn wrote:
| As far as I was able to experience it's just hidden and not
| deleted.
|
| I have to keep an old domain indefinitely to host a
| robots.txt just to keep sensitive personal data hidden that
| little me foolishly published on the open internet.
|
| But I'm not complaining. The internet archive is a great
| gift. Using it with a bookmarklet really feels like a super
| power.
| mercora wrote:
| it sounds a lot like this would need some kind of
| delegation mechanism where you could point to a different
| URL in-time before abandoning the place. or maybe some kind
| of sealing using a cryptographic function that lets you
| proof your are the owner of the current/previous content
| and also would proof you are not the owner of the newer
| content while this proof could be used to release the ban
| if ever needed.
| joe_the_user wrote:
| Got any examples of these alternative archives?
| mellosouls wrote:
| A couple of pointers to the wider world of web archiving:
|
| https://github.com/ArchiveBox/ArchiveBox/wiki/Web-
| Archiving-...
|
| https://github.com/iipc/awesome-web-archiving
| SilverRed wrote:
| This is the main one https://archive.is/
|
| From the FAQ, they do not respect robots.txt since they
| only archive on request by a user and they do not remove
| archives unless they contain illegal content.
| 1vuio0pswjnm7 wrote:
| HN commenters like to use archive.is but I always wonder
| if people are aware that archive.is is (a) blocked in
| some countries^1 and (b) may block access to itself in a
| country if it feels threatened.^2
|
| There is also the issue of EDNS subnet.^3 archive.is
| tries to require it; it wants to know what location a
| request is coming from. In addition to EDNS, archive.is
| inserts the IP address and geolocation of the incoming
| request into the HTML of the returned page as a tracking
| pixel.^4
|
| Thus archive.is does some things archive.org does not do
| besides just ignoring robots.txt
|
| One of the things archive.org does that archive.is does
| not do is that archive.org inserts an HTTP response
| header intended to disable Chrome FLoC.^5 I add this
| header for all sites in a local proxy; however I do not
| see many sites adding it as a courtesy. Thanks
| archive.org for doing that.
|
| 1. https://en.wikipedia.org/wiki/Archive.is
|
| 2. https://www.reddit.com/r/KotakuInAction/comments/3e29v
| m/arch...
|
| 3.
| https://webapps.stackexchange.com/questions/135222/why-
| does-...
|
| 4. https://news.ycombinator.com/item?id=27498902
|
| 5. permissions-policy: interest-cohort=()
| v0x wrote:
| There's archive.is but I get the sense that the major use
| case for that is getting around paywalls as opposed to
| permanently archiving a page - indeed, since they host
| content that the site owner probably doesn't want them to,
| it would stand to reason the service would not be likely to
| stand the yet of time. But I could be wrong.
| Jiro wrote:
| They actually posted about this in 2017:
| https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...
| . At the time it sounded like they might change their
| robots.txt policy. I guess they never followed up on it.
|
| (I checked and ezboard is still excluded.)
| tkgally wrote:
| In his reflections, Brewster Kahle mentions his goal of creating
| "a library available to anybody, anywhere in the world." He
| doesn't mention, though, the costs of making that library
| available to the world for free or the fact that the Internet
| Archive accepts donations. So I will:
|
| https://archive.org/donate/
| state_less wrote:
| I sent my bones or clams or whatever you call them.
|
| Will send moore when I have more and when I've learned to be
| more generous. It's good to know that you're near Internet
| Archive.
|
| _But, oh, what a wonderful feeling
|
| Just to know that you are near
|
| Sets my a heart a-reeling
|
| From my toes up to my ears_
|
| -Bob Dylan, The man in me
| mkaufman wrote:
| state_less sounds like a Little Lebowski Urban Achiever.
| state_less wrote:
| Little Lebowski Urban Achievers - inner city children of
| promise but without the necessary means for a - necessary
| means for a higher education. So Mr Lebowski is committed
| to sending all of them to college.
| walterbell wrote:
| Anyone know the relative budgets/donations/staff of Wikipedia
| vs. Archive.org?
| tkgally wrote:
| ProPublica has a database of tax filing information for
| nonprofits.
|
| Internet Archive: https://projects.propublica.org/nonprofits/
| organizations/943...
|
| Wikimedia Foundation: https://projects.propublica.org/nonprof
| its/organizations/200...
|
| Mozilla Foundation: https://projects.propublica.org/nonprofit
| s/organizations/200...
|
| Electronic Frontier Foundation: https://projects.propublica.o
| rg/nonprofits/organizations/430...
| jdc wrote:
| Wow, never thought I'd see Wikimedia at more than 4x the
| budget of Mozilla!
| feudalism wrote:
| It's how Katherine Maher funds her trips to exotic
| locations.
| [deleted]
| raybb wrote:
| Donations are fantastic but if you have engineering (or project
| management, design, etc) skills spending just 1 hour a week
| contributing to their open source goes a very long way!
|
| Open Library in particular has a very active repo with lots of
| volunteers, a weekly community call, and a rather accessible
| codebase. https://github.com/internetarchive/openlibrary
|
| If anyone knows webpack well would LOVE to have this dev-facing
| issue resolve to auto reload CSS
| https://github.com/internetarchive/openlibrary/issues/4955
| dailyanchovy wrote:
| Alright, I'm new to pull requests but I had a go at it!
| https://github.com/internetarchive/openlibrary/pull/5451
| shrubby wrote:
| Pictures and touchscreens have ruined the internet ;-)
| TedDoesntTalk wrote:
| How many books have you read that were published in the 1800s?
|
| I'm betting close to zero.
|
| Unfortunately, most people 200 years from now won't care about
| the 70 petabytes the Internet Archive has saved.
|
| Don't misunderstand: I am glad they do this and love their work.
| I just think we overestimate the long-term value of this info
| beyond a very small set of future historians or social
| historians.
|
| Most people have their lives to live in this moment, and if they
| have a chance to look backwards before they were born, it's not a
| big piece of their time.
| 5etho wrote:
| when I read for the first time Balzac's novel - nucingen Bank
| (1839) [1] and about business 'mindset' at the time I was
| mesmerized [1] https://pl.wikipedia.org/wiki/Bank_Nucingena (no
| wiki for english lang at least no hyperink)
| wolverine876 wrote:
| Jane Austen, Dickens, Melville, Oscar Wilde, Mark Twain,
| Tolstoy, Emily Dickenson, Dostoevsky, Emerson, Charlotte
| Bronte, Lewis Carroll, Victor Hugo, Grimm bros., Goethe,
| Darwin, Ibsen, Nietzsche, Pushkin ...
|
| It's a pretty good century for literature and other books.
|
| Also, lots of people are interested in history. Those can be
| best sellers.
| Andrew_nenakhov wrote:
| You must be biased against the French, mentioning only Hugo
| in your list! What about Dumas, Stendhal, Balzac, Zolya,
| Flaubert, Maupassant, George Sand, Verlaine, Rimbaud,
| Valery???!
|
| (That's only from a Russian, pretty sure actual French will
| add dozens more to this list)
| mdp2021 wrote:
| >I'm betting
|
| You are betting very wrongly. People use large amounts of older
| literature. Maybe not in your territory - well, be aware then
| that many cultures do.
|
| >Unfortunately, most people
|
| That the median individual should be considered a parameter is
| very controversial. (Contextually: services are very easily for
| interested minorities.)
|
| >overestimate the long-term value
|
| As if Project Gutenberg had not arguably been one of the most
| important endeavours in history.
|
| >if they have a chance to look backwards
|
| It is a fundamental part of education...
| someguy101010 wrote:
| I was reading every newspaper that mentioned spanish flu from
| the 1910's when covid was starting. I never would have thought
| that I would have been wishing for easier access to full text
| search newspaper archives. Here we are :p
| loughnane wrote:
| Just add a few more that are popular
|
| - Ralph Waldo Emerson
|
| - Henry David Thoreau
|
| - Rudyard Kipling (jungle book)
|
| - Anna Sewell (black beauty) - Walt Whitman (leaves of grass)
|
| - Edgar Allan Poe
|
| - Alexander Dumas (Count of Monte Cristo, musketeers)
|
| - Tocqueville (democracy in America)
| dannyobrien wrote:
| How many books have you read that were written by people who
| read books published in the 1800s?
| jfoutz wrote:
| Eh, Shelly, Dickens, Twain and Wells are all I can think of.
| Not zero, but a vanishingly small percentage.
| TheCowboy wrote:
| > How many books have you read that were published in the
| 1800s? I'm betting close to zero.
|
| Quite a few actually, and I'm not an outlier. Plus there are
| many adaptations and derivative works that exist.
| dleslie wrote:
| 19th century literature is full of treasures.
| kart23 wrote:
| I hope you're wrong. Video and pictures coupled with text are a
| goldmine for me. I would've loved to see a 'vlog' from the
| 1800s honestly. What was it like for a perfectly normal person,
| not a career writer? I don't think we have a lot of that, or if
| we do, it's from a singular viewpoint, and we're subject to
| their view and biases.
|
| I found a archive of videos in my city from 1970, a street-view
| like recording of select roads. I pored over it for a couple
| hours, noting the buildings that were still there, the
| completely empty hills now filled with houses, etc. That kind
| of stuff is really cool.
| SilverRed wrote:
| How many times have you clicked a link and found the page 404
| or redirect you somewhere else? That is the real value of the
| web archive to me. Wikipedia uses it a lot as well, they
| automatically snapshot any link used as a reference and if the
| page goes away they automatically swap the link for a web
| archive version.
|
| Reading a reference 50 or even 200 years old is not absurd. A
| post detailing some research findings which is referenced in
| wikipedia is still greatly valuable. Youtube historians often
| reference ancient patents to uncover the history of old items.
| techbio wrote:
| Interesting point about managing against link rot.
|
| But what got me is Youtube historians and their "ancient
| patents..." I want to see the ones for reinventing the wheel.
| instagraham wrote:
| Quite a spectacularly wrong take on multiple levels.
|
| So much great literature comes from centuries before our own.
| And considering that the internet is likely to be around
| forever or as long as humans persist, a snapshot of its initial
| decades will one day be one of the greatest "archaeological"
| treasures.
|
| Perhaps your main point is that you do not care for their work
| nor for the work of literature written before your time. You
| need not apply your yardstick to anything else in a bid to
| gauge its value.
| TedDoesntTalk wrote:
| But 70 petabytes worth? Most people in this thread mention a
| few dozens authors. Maybe this amounts to a few hundred
| books. Not 70 petabytes (and counting).
| allturtles wrote:
| Sure, no human has ever read 70 petabytes worth of books.
|
| An archive is not the same as a local public library. The
| latter holds a small collection of mostly frequently
| accessed items (e.g. the published works of Dickens,
| Austen, etc.). The former holds a much larger collection of
| rarely accessed items (e.g. every letter written by/to
| Dickens that survived, every political pamphlet published
| in Philadelphia in the nineteenth century , etc.).
|
| If your point is that most items in the archive will be
| rarely accessed, I don't think anyone will disagree with
| you, but suggesting that the literature of the nineteenth
| century is no longer of any interest was perhaps not the
| best way of making that point.
| Mortiffer wrote:
| Via librivox for me and many others i consume 1800s content
| almost daily
| kilroy123 wrote:
| I've read several from that time period. There are loads of
| good books from the 1800s.
| TedDoesntTalk wrote:
| Yes. But 70 petabytes worth?
| StrictDabbler wrote:
| 70 petabytes of anything didn't exist in stored form in the
| 1800s so that's a ridiculous standard, but...
|
| The value here is genuinely historical. In a hundred years,
| how will we track the etymology of common terms that
| originated in this age?
|
| Memes make preserving the internet extremely important.
| Terms and ideas evolve so quickly that the history of
| language and thought will become obscure almost instantly.
| Even now it can be almost impossible to understand some
| internet terms if you weren't part of the subculture that
| spawned them at the exact time they were spawned.
|
| Do you know what hunter2 means? Do you know it because of
| bash.org?
|
| A person doesn't have to read all this material. The
| material has to be stored because our future society will
| have descended from this material, and if they don't have
| it they won't know how they got there.
|
| You know, except for the bit where civilisation collapses
| over the next hundred years as the planet warms and hot
| countries invade cooler, developed countries looking for
| living space.
___________________________________________________________________
(page generated 2021-07-22 23:02 UTC)