[HN Gopher] Archive.is owner on "continuity of his project"
       ___________________________________________________________________
        
       Archive.is owner on "continuity of his project"
        
       Author : spzx
       Score  : 170 points
       Date   : 2021-09-08 14:35 UTC (8 hours ago)
        
 (HTM) web link (blog.archive.today)
 (TXT) w3m dump (blog.archive.today)
        
       | jeffalo wrote:
       | Does any one know what's up with all the different domains for
       | archive.is?
       | 
       | The blog is at blog.archive.today, it calls itself the
       | "archive.is blog", but when I visit archive.is or archive.today,
       | I'm brought to archive.vn. When I click the "archive.today" logo
       | in the header, I'm taken to archive.ph
        
         | benjaminikuta wrote:
         | He's mentioned this before. He has multiple domains because
         | they can go down sometimes.
        
       | Seattle3503 wrote:
       | Didn't realize people used it for anything other than getting
       | around paywalls.
        
         | john-doe wrote:
         | That's my main use and sometimes feel bad about it, since I
         | just want to see the content and not archive some mediocre
         | article for "posterity".
        
       | indigodaddy wrote:
       | Perhaps he can somehow join forces with archive.org? Maybe they
       | take over when/if he is no longer able to do it?
        
       | indigodaddy wrote:
       | On a related note, I guess Tumblr is still around to use as a
       | blog ?
        
         | goodrubyist wrote:
         | Yes it has been around, but it's quite jarring to see anyone
         | actually use it.
        
       | rosetremiere wrote:
       | About reliable email adresses: I'm using my university alumni
       | "email forwarding for life", but this loses me some emails due to
       | DMARC and friends. What are the alternatives?
        
         | autarch wrote:
         | Register your own domain and use that for email. Then you can
         | move that domain's email service to different providers.
         | 
         | I use Gmail right now, but I could move my domain's email to
         | Fastmail or another provider without _too_ much work.
         | 
         | Of course, it's also a good idea to back up your email archives
         | too, since that _can_ go away with the loss of a provider's
         | service.
        
           | thom wrote:
           | I've recently migrated four different accounts, three of them
           | Gmail with their own domains, to Fastmail and I was
           | astonished by how easy the process (that I'd put off for
           | _years_) was. Huge weight off my mind and I've been very
           | happy with the service since then.
        
           | raxi wrote:
           | Domain "ownership" is too fragile. Well, you do not lose the
           | email archive, but new domain "owner" could steal you
           | identity.
        
       | glanard_frugner wrote:
       | even if we archive everything, hundreds of years from now all of
       | "the worlds information" could very well be unusable and
       | unreadable for a variety of factors(no one remembers how to deal
       | with the file formats, EMP, bit rot). books however will continue
       | to work just fine as they have for thousands of years
        
         | derivagral wrote:
         | If we're talking that long of a timescale, how long does your
         | typical book these days actually last? I'm no expert, but it
         | makes me wonder how long consumer paper actually lasts.
         | Reasonable(?) search result below.
         | 
         | https://www.quora.com/How-long-does-it-take-for-paper-to-dec...
        
           | glanard_frugner wrote:
           | adding paper to a compost pile will give different results
           | than keeping a book stored in the proper conditions
        
       | Shank wrote:
       | Archive.is still doesn't work if you use Cloudflare DNS due to a
       | spat with Cloudflare and the operator. So to me, the continuity
       | and reliability is already a big question. Not only is it a
       | question of sustainability economically, but also ideologically:
       | what happens if another similar decision is made to lock out a
       | portion of users?
        
         | throwawaysea wrote:
         | I lost faith in Cloudflare when they switched from being a
         | neutral infrastructure service to yet another politically-
         | motivated big tech company. In their blog post around the ban
         | of 8chan (https://blog.cloudflare.com/terminating-service-
         | for-8chan/), they acknowledged that they didn't know if 8chan
         | broke any laws but they nevertheless decided to pass personal
         | judgment based on a vague notion that 8chan "inspired" a
         | shooting. That's quite an unprincipled way to operate a
         | fundamental network utility that backs 10% of the Fortune 1000
         | and 20% of the top 10000 websites.
        
         | oh_sigh wrote:
         | For reference: the spat is that Cloudflare DNS does not leak
         | geographic information of the queryer through EDNS, and the
         | archive.is fellow is requiring geographic information to
         | provide valid DNS lookups. So he intentionally sends back bad
         | results when it is 1.1.1.1 querying his nameservers.
         | 
         | I love the site, but his stance on this doesn't really make
         | sense to me, and it's a shame that millions and millions of
         | people use 1.1.1.1 daily and archive.is is the one website that
         | doesn't work for those people.
        
           | slim wrote:
           | Millions use 1.1.1.1 ? Any source for that ?
        
             | sp332 wrote:
             | The Android app has 418,000 reviews and over 50 million
             | installs, and the iPhone app has 230,000 reviews. The
             | number of people who use it without an app is probably a
             | lot higher.
        
           | spzx wrote:
           | HN discussion from 2019 on this topic:
           | https://news.ycombinator.com/item?id=19828317
        
           | [deleted]
        
           | hansel_der wrote:
           | > I love the site, but his stance on this doesn't really make
           | sense to me
           | 
           | would not be surprised if he has some personal axe to grind
           | with cf (they are no sheep either).
           | 
           | also i would be wary of overestimating market penetration of
           | any 3rd-party dns provider; iirc google has total dominance
           | of this segment and is still below 10%.
        
           | silisili wrote:
           | I'm not sure the term 'leak' applies. It's an anti-cdn play.
           | Refusing to use EDNS correctly makes the web slower for a lot
           | of people. And it adds little to nothing to privacy since the
           | answer IP is going to know your IP at the next step
           | anyways...
           | 
           | As for why archive.is cares so much...that I don't know.
           | Perhaps they rely on such data to give a fast experience, and
           | are tired of this charade...but that's just speculation.
        
             | magila wrote:
             | Cloudflare's edge network is sufficiently dense that ECS
             | data is unnecessary in almost all cases. The requesting
             | data center will be close enough to the client that doing
             | geoip on the source IP will have the same results as using
             | ECS.
             | 
             | There's nothing incorrect about what Cloudflare is doing,
             | EDNS does not require ECS data to be included in requests,
             | but for whatever reason the maintainer of Archive.is
             | decided to block 1.1.1.1 over it.
        
               | nextaccountic wrote:
               | But not sending ECS data harms non-cloudflare CDNs,
               | right?
               | 
               | edit: from here
               | https://news.ycombinator.com/item?id=19828702 I gather
               | that this indeed harms CDNs outside the ones that
               | Cloudflare has a business relationship with.
               | 
               | > EDNS IP subsets can be used to better geolocate
               | responses for services that use DNS-based load balancing.
               | However, 1.1.1.1 is delivered across Cloudflare's entire
               | network that today spans 180 cities. We publish the
               | geolocation information of the IPs that we query from.
               | That allows any network with less density than we have to
               | properly return DNS-targeted results. For a relatively
               | small operator like archive.is, there would be no loss in
               | geo load balancing fidelity relying on the location of
               | the Cloudflare PoP in lieu of EDNS IP subnets.
               | 
               | > We are working with the small number of networks with a
               | higher network/ISP density than Cloudflare (e.g.,
               | Netflix, Facebook, Google/YouTube) to come up with an
               | EDNS IP Subnet alternative that gets them the information
               | they need for geolocation targeting without risking user
               | privacy and security. Those conversations have been
               | productive and are ongoing. If archive.is has suggestions
               | along these lines, we'd be happy to consider them.
        
         | forgotmypw17 wrote:
         | Cloudflare also often blocks no-JS users.
         | 
         | All-around a huge accessibility impediment.
        
           | deadalus wrote:
           | Cloudflare is not neutral. They blocked 8chan after political
           | pressure from MSM.
        
           | MrStonedOne wrote:
           | This is kinda irrelevant in a discussion about archive.is
           | blocking cloudflare _DNS_ users and cloudflare _DNS_ servers
           | /ips.
        
           | syshum wrote:
           | CloudFlare really is an enemy of the Free (Libre) open
           | internet.
           | 
           | They are the next Google in terms of "evil companies"
        
             | Thorentis wrote:
             | What is a good alternative to Cloudflare DNS? I've been
             | using 1.1.1.1 since it was launched, but now this makes me
             | want to switch to something better.
        
               | forgotmypw17 wrote:
               | Why not your ISP?
        
             | kook_throwaway wrote:
             | Was MitMing half the web the giveaway?
        
       | spzx wrote:
       | Also some interesting numbers in this post:
       | 
       | >How much does hosting cost you per month at the moment?
       | 
       | >about ~$2600/mo of pure expenses on servers/domains, not
       | counting "work time", "buying laptop/furniture", etc.
       | ($100...300/mo covered by donations + $300...500 by ads)
       | 
       | https://blog.archive.today/post/659383959382294528/you-said-...
        
         | john_moscow wrote:
         | As someone who managed to get more or less donation campaigns
         | in the past, here are my 2 cents.
         | 
         | There's a huge difference between "please donate to our
         | project" and "it costs us $X/month to run it, we have Y users
         | and managed to collect $Z so far, please donate".
         | 
         | The first one will get you about $1 per 10K-1M users. The
         | second one will get your goal fulfilled, as long as you are
         | reasonable, and have enough users. All it takes is a noticeable
         | message and a way to update it automatically based on the money
         | received.
        
       | dmix wrote:
       | I see the answer to this question more around backups and helping
       | future people overcome technical limitations (knowledge transfer
       | + data archiving).
       | 
       | All things are ephemeral after a certain point but archiving
       | typically lasts much much longer than the human operators.
       | Likewise documenting the process and barriers to overcome will
       | help people in the future solve the problem (and a broader amount
       | of people).
       | 
       | This doesn't _have_ to be public, just needs a way to become
       | public.
        
       | growt wrote:
       | He better also has built some savings for legal defense. Because
       | he could easily be sued into oblivion especially in Europe.
        
       | ignoramous wrote:
       | archive.is is my browser to text-heavy websites like blogs, news,
       | twitter, and documentation (outline.com is another one, reader
       | mode, yet another). It completely debloats a webpage as it
       | archives (unlike web.archive.org, say). Suits my purposes just
       | fine. I must note though, archive.is (from what I recall),
       | forwards IP address of whoever initiated an archive process to
       | the origin.
       | 
       | archive.is was also a great mirror to instagram and linkedin for
       | public profiles, but it doesn't archive instagram anymore.
        
         | spzx wrote:
         | Here's his quote on instagram archiving problems:
         | 
         | "There is no Instagram content which don't need to login.
         | 
         | If you can access the page without login, it is sort of "promo
         | preview", after few pages accessed this way, they add your IP
         | into "promo is over" list and will redirect to /login on every
         | future request.
         | 
         | I just have not enough fresh IPs to abuse this mechanism."
         | 
         | https://blog.archive.today/post/659927354404192256/instagram...
        
       | kixiQu wrote:
       | This is a very fair answer, and all "permalinks" are lies. At the
       | same time, I wonder if it might not be possible to have snapshots
       | up on, I dunno, IPFS or torrent sites or something such that when
       | the unavoidable happens, not all is lost.
        
         | dannyw wrote:
         | nothing is permanent in life. heaps of torrents i want have 0
         | seeders.
         | 
         | ipfs's economically incentivised hosting layer (filecoin)
         | offers storage in monthly increments, not perpetual.
        
           | hansel_der wrote:
           | > nothing is permanent in life
           | 
           | practically yes, but
           | 
           | https://github.com/philipl/pifs
        
         | kordlessagain wrote:
         | An image of a Web asset is useful for archival purposes and may
         | be indexed using text recognition ML.
        
           | wizzwizz4 wrote:
           | Images take up _way_ more space than HTML documents...
           | actually, what with how bloated most "websites" are these
           | days, that 's probably not true any more.
        
         | legrande wrote:
         | > This is a very fair answer, and all "permalinks" are lies
         | 
         | Cool URIs don't change:
         | https://www.w3.org/Provider/Style/URI.html
         | 
         | You can believe that cool URIs don't change or you could go the
         | IPFS route. Similar to the way torrents have a 'health' score
         | of plenty of seeders, and IPFS resources could live as long as
         | people want that resource to exist (Not sure if that situation
         | is baked into IPFS though).
         | 
         | Also: have you looked into Filecoin?[0]
         | 
         | [0] https://filecoin.io/
        
         | golemotron wrote:
         | Yes, the site owner's response seems excessively didactic when
         | a system of mirrors would solve the problem.
        
           | ragona wrote:
           | Mirrors don't solve the problem, they move it.
        
             | dewey wrote:
             | How is a mirror not solving the problem of data loss if a
             | single instance goes away?
        
               | gambler wrote:
               | The problem isn't a single instance going away. The
               | problem is what happens when for whatever reason the
               | owner stops maintaining the project. This is a common
               | problem and despite all the bluster and buzzwords, the IT
               | community hasn't really found a solution. Torrents are
               | the only kind-of-solution, but they are not ideal for
               | something that needs to be constantly updated.
        
           | wolrah wrote:
           | > a system of mirrors would solve the problem.
           | 
           | A system of mirrors prevents a single node going down from
           | taking the whole system (in theory at least, we've all seen
           | plenty of times where failover goes poorly), but it doesn't
           | do anything to ensure the long term survival of the system as
           | people lose interest, lose the ability to participate, and
           | sometimes die.
           | 
           | If you were around certain internet forums in the late '00s
           | you might have run in to an image hosting platform called
           | WaffleImages which was created in response to yet another
           | popular free image hosting service locking down their
           | embedding and ruining thousands of old posts. The goal was to
           | distribute image hosting among community-operated mirrors,
           | and it worked great for a few years. Over time though people
           | lost interest while the rate of new mirrors getting added
           | dropped to basically zero and eventually it fell apart.
        
         | toomuchtodo wrote:
         | Depending on the format of the archives (hopefully WARC), the
         | owner could hand the entire archive over to the Internet
         | Archive or Archive Team for ingestion by Wayback Machine.
         | 
         | If the concern is perpetual access to archived content but
         | under your terms, that is where the cost comes in. Somebody
         | somewhere is paying for power, cooling, connectivity, and
         | disks. The Internet Archive estimates it costs them $2/GB to
         | host data uploaded in perpetuity. Please consider donating if
         | you're uploading content for permanent archival and/or deriving
         | value from hosted content.
        
           | benjaminikuta wrote:
           | The Internet Archive isn't perfect either. They can be
           | pressured to remove pages.
        
           | SllX wrote:
           | Pretty much. If you really want to maintain access to old
           | stuff in perpetuity, you have to pay for it yourself by
           | either storing it on your own equipment or paying someone to
           | store it on theirs.
        
             | kixiQu wrote:
             | Sure -- but at least with something p2p hooked up to it I
             | can let someone else benefit from my desire to assure my
             | own access.
        
       | fouc wrote:
       | Archive.is is an incredible service, but the fact he's paying out
       | $3k-4k/mo out of pocket in expenses & time doesn't strike me as
       | sustainable for the long term.
       | 
       | I'm reminded of some non-profit organization that was forced to
       | shut down their websites because they ran out of money. In
       | retrospect they could've setup a trust fund early on, stuck all
       | their money in there, and then had a perpetual annual income from
       | that for operating costs, instead of spending down. All
       | fundraising could've gone into the trust fund in order to boost
       | the annual budget, etc.
        
       | glofish wrote:
       | There is an interesting conundrum here, when we post to the
       | internet do we also consent to having that information saved for
       | all eternity?
       | 
       | Archiving everything is still a novel and not fully understood
       | concept it is not that clear that it is useful or beneficial over
       | long term.
       | 
       | Forgetting can be a bliss.
        
         | 019a wrote:
         | This is kind of an antithesis to his message in the post; that
         | nothing is actually permanent, and while many people are
         | concerned about continuity of service, ultimately perfect
         | continuity is impossible, whether that's due to the
         | organization running out of money, bad backup practices and a
         | fire, global warming wiping Ashburn Virginia off the map, or
         | the end of the human civilization by some other means.
         | 
         | I've helped with a couple risk registers at tech companies. Two
         | things I've never seen appear in a risk register: The company
         | runs out of money. Human society is wiped out. I've been
         | laughed at once for bringing up variations on these. They're
         | out of scope; risks stop being a threat when there's no one
         | left to care about them.
         | 
         | I think the goal of keeping an internet-scale level of data
         | accessible and searchable, for longer than one lifetime, is an
         | impossible task. Maybe Archive.org/Archive.is can pull it off;
         | I doubt it. Its an insane amount of data. Most of it is totally
         | pointless, but its really difficult to pick-apart what's useful
         | and what's useless, so you have to keep as much as possible
         | without bias. All of that is on hard disks which violently spin
         | around at 8 meters per second, accessed by software which we
         | all know breaks every day but are too afraid to admit it, over
         | a network of other computers with all the same flaws,
         | distributed globally, yet can be significantly disrupted by one
         | roadside construction worker and a jackhammer.
         | 
         | The internet didn't increase the lifetime of data; it decreased
         | it. Sure, we have far more of it at our fingertips than any
         | other point in history, but that's not lifetime; that's just
         | volume. And that volume has desensitized us; its fundamentally
         | impacting our innate biological memory capacity, and the social
         | structures we form around memory. We know the Library of
         | Alexandria existed because people wrote about it; the pages
         | laid for thousands of years; its memory passed verbally from
         | person to person.
         | 
         | If all computers stopped functioning tomorrow, not even
         | disappear, they're still there, they just don't work: Would the
         | memory of Stranger Things still be known in two thousand years?
         | I doubt it, but: if the only thing which offers us a satisfying
         | "Yes" is "we keep the computers running, accessible, indexable,
         | searchable"; that seems, at the very least, given the extreme
         | challenges we as a species will be facing over the next
         | century, beyond the scope of human possibility
        
           | decasteve wrote:
           | The Sun's ability to function like an Earth-wide Eprom eraser
           | might cause some catastrophic disruption given our reliance
           | on the Internet and computing devices. A large enough
           | geomagnetic/solar storm is not unprecedented.
           | 
           | See: https://en.wikipedia.org/wiki/Carrington_Event
        
         | [deleted]
        
         | rm_-rf_slash wrote:
         | What blissful times we lived in before the daily drumbeat of
         | "Accomplished young professional discovered to have said
         | offensive things on the internet when they were a dumb
         | teenager, reputation tarred and feathered for the rest of their
         | life."
        
           | SamoyedFurFluff wrote:
           | we don't know if it'll be a lifelong thing though. I suspect
           | with all this pushback and emotional exhaustion (and I really
           | do believe it's emotionally exhausting to constantly be
           | hounding over peoples morality on the internet) people will
           | just stop giving a shit about what someone said 5 years ago
           | pretty soon.
        
             | rm_-rf_slash wrote:
             | It's a numbers game. Billions of people won't care, but a
             | few dozen can make enough of a stink that a habitually risk
             | averse institution would rather let someone be canceled
             | than risk the controversy snowballing into something
             | bigger.
        
               | thingification wrote:
               | Only if we let them.
               | 
               | There are all kinds of feedback cycles going on. Neither
               | of these two are rational:
               | 
               | 1. There are no ways to correct the problem you describe
               | 
               | 2. The problem will correct itself without individuals
               | and institutions making it happen
        
         | kart23 wrote:
         | Yes. I think people need to understand that anything on the
         | internet is by default, there forever. Post something privately
         | or behind a password if you don't want everyone seeing it.
        
           | glofish wrote:
           | A lot of people think that the current status-quo is the only
           | way forward. Why? There is no reason for it.
           | 
           | There is no practical reason why a forum could not have an
           | optin-optout-delete API/protocol/etc that all search
           | engines/archivers should follow.
           | 
           | When I delete something all should follow the orders and
           | those that do not need to be responsible for it.
           | 
           | Of course there are plenty of business reasons why engines
           | don't want to do it.
        
             | ergl wrote:
             | >There is no reason for it.
             | 
             | The reason is that bits are not physical. Anyone can copy
             | them and re-upload them, without any cost.
        
             | raxi wrote:
             | This is the difference between "speech" and "text".
             | 
             | When you write a brief or a book, it's a performative act;
             | you cannot pretend nothing was said.
             | 
             | The Internet is textual media, just like books, and unlike
             | television or talking in the park.
             | 
             | If you don't like it, you can only burn books or drive
             | modern technology in that direction
        
             | hansel_der wrote:
             | there are plenty of human reasons as well, as indicated by
             | your usage of the word "should"
        
             | Y_Y wrote:
             | > all should follow the orders and those that do not need
             | to be responsible for it
             | 
             | Furthermore the tide will be ordered back, the contents
             | returned to Pandora's box, and universal entropy decreased.
             | Failure to comply will result in a fine.
        
         | legrande wrote:
         | > There is an interesting conundrum here, when we post to the
         | internet do we also consent to having that information saved
         | for all eternity?
         | 
         | I'm not sure about consent, but presume it will be 'stuck' and
         | un-removeable from the net once it's out there. (So be careful
         | what you disseminate). Some people even go out of their way to
         | make sure certain content will never be forgotten from the web.
        
         | SkyMarshal wrote:
         | I think it's one of those things modern parents are going to
         | have to understand about their brave new world, and teach their
         | modern children. Like look both ways before crossing the
         | street, remember that everything you write on the Internet is
         | permanent, so think before you write, or if you don't want to
         | do that then at least write it under a pseudonym.
        
           | glofish wrote:
           | why does it have to be that way though?
           | 
           | you are stating the current status quo as unavoidable - why
           | is that?
           | 
           | Archiving everything has a massive cost and if it were
           | illegal to archive unless people consent it would be much
           | harder to get away with it.
           | 
           | When I post to a public forum I gave that forum rights to
           | publish what I said, I did not give everyone else the right
           | to store what I said
        
             | SkyMarshal wrote:
             | I am not a lawyer but I think it would be legally difficult
             | to make it illegal to record what people say and do in
             | public spaces. That was a right that the entire Western
             | media depended on long before the Internet was invented.
             | 
             | The crux may be whether websites that require you to login
             | to use them, like Facebook, are considered public or not.
             | But anything you say or do in Facebook that you don't
             | restrict to being viewable only by 1st degree friends is
             | probably considered public.
        
             | Lammy wrote:
             | > why does it have to be that way though?
             | 
             | That'll be -5 points for questioning the social credit
             | system, Citizen.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-09-08 23:02 UTC)