[HN Gopher] Internet Archive breached again through stolen acces...
       ___________________________________________________________________
        
       Internet Archive breached again through stolen access tokens
        
       Author : vladyslavfox
       Score  : 293 points
       Date   : 2024-10-20 15:00 UTC (8 hours ago)
        
 (HTM) web link (www.bleepingcomputer.com)
 (TXT) w3m dump (www.bleepingcomputer.com)
        
       | wkat4242 wrote:
       | Ouch. Once can happen, twice in a row...
        
         | fallingknife wrote:
         | Once makes the second time more likely. Shows you are a soft
         | target.
        
       | TheFreim wrote:
       | > "It's dispiriting to see that even after being made aware of
       | the breach weeks ago, IA has still not done the due diligence of
       | rotating many of the API keys that were exposed in their gitlab
       | secrets," reads an email from the threat actor.
       | 
       | This is quite embarrassing. One of the first things you do when
       | breached at this level is to rotate your keys. I seriously hope
       | that they make some systemic changes, it seems that there were a
       | variety of different bad security practices.
        
         | galleywest200 wrote:
         | >"It's dispiriting to see that even after being made aware of
         | the breach weeks ago..."
         | 
         | These people are not dispirited whatsoever, if anything they
         | are half-cocked that these script kiddies found an easy target.
        
           | chrisrhoden wrote:
           | The words came from a message written by the people you are
           | calling script kiddies, rather than being editorializing by
           | bleepingcomputer, as you seem to believe.
        
             | compootr wrote:
             | script kiddie or blackhat hacker is irrelevant. IA has shit
             | security practices, and that's a fact regardless of who
             | figures that out
        
           | EasyMark wrote:
           | I highly doubt they are script kiddies. More than likely they
           | are state actors or mercenaries of state actors attempting to
           | bring down the free transmittal of information between
           | regular folks. IA evidently has not so good security and
           | wikipedia must be doing pretty well I guess? I can't recall
           | the last time one of these attacks worked on Wiki.
        
             | luckylion wrote:
             | Why would they publicly call them out and lay open the way
             | they breached them if they were "attempting to bring down
             | the free transmittal of information between regular folks"?
             | 
             | They could have done much worse but they chose not to and
             | instead made it public. Which state actor does that?
        
         | ghostly_s wrote:
         | IA is in bad need of a leadership change. The content of the
         | archive is immensely valuable (largely thanks to volunteers)
         | but the decisions and priorities of the org have been far off
         | base for years.
        
           | echelon wrote:
           | I support archival of films, books, and music, but those
           | items need to be write-only until copyright expires. The
           | purpose of the Internet Archive is to achieve a wide-
           | reaching, comprehensive archival, not provide easy and free
           | read access to commercial works.
           | 
           | Website caches can be handled differently, but bulk
           | collection of commercial works can't have this same public
           | access treatment. It's crazy to think this wouldn't be a huge
           | liability.
           | 
           | Battling for copyright changes is valiant, but orthogonal.
           | And the IA by trying to do both puts its main charter--
           | archival--at risk.
           | 
           | The IA should let some other entity fight for copyright
           | changes.
           | 
           | I say this as an IA proponent and donor.
        
             | withinboredom wrote:
             | I'd agree with you if you live in a country where you can
             | walk into your local library and read these for "free." For
             | people who live where there may not even be a library, your
             | argument makes no sense except to make the publishers
             | richer. They typically price some of these books at
             | "library prices" so normal people won't be able to afford
             | them, but libraries will.
        
             | giantrobot wrote:
             | > I support archival of films, books, and music, but those
             | items need to be write-only until copyright expires.
             | 
             | Which means no one alive today would ever be able to see
             | them out of copyright. It also requires an unfounded belief
             | that major copyright owning companies won't extend
             | copyright lengths beyond current lengths which are
             | effectively "forever".
        
           | fngjdflmdflg wrote:
           | Do you have any examples?
        
             | soygem wrote:
             | Deleting archives of kiwifarms.
        
               | fngjdflmdflg wrote:
               | I don't believe IA itself takes down pages that kiwifarms
               | archives/links to. Rather they get a request to take it
               | down and comply with it (correct me if I'm wrong here). I
               | think IA is actually in a tough spot on this issue
               | because they might be able to be sued eg. for defamation
               | if they don't take down pages with personal info after a
               | request to do so is made. Lastly, I doubt any new
               | leadership would be less harsh on kiwifarms.
        
               | dazhengca wrote:
               | There was no illegal content on kiwi farms. Even then,
               | I'd say taking down a single page by request is
               | understandable. However, they surrendered to the mob and
               | chose to stop archiving the entire site. This was to
               | censor any criticism of the people involved, but as a
               | result, we lost all of the other information on the rest
               | of the site as well. It's clear this organization cannot
               | handle pressure, and is relying on people treating it
               | kindly.
        
               | shkkmo wrote:
               | They chose to stop serving archives of a site that had
               | started explicitly using tham as a distribution mechanism
               | to get around much a much broader attempt to censor them.
               | 
               | I'm curious what other information on that site you think
               | was valuable to have available to the general public?
               | Nothing has been lost in terms of historical data, it's
               | only the immediate disemmination that has been slowed.
               | 
               | I'm really trying to understand why I should disagree
               | with the IA's choice here. The IA is an archival service,
               | not a distribution platform and it is not their job to
               | help you distribute content that other people find
               | objectionable. Their job is to make and keep an archive
               | of internet content so that we don't lose the historical
               | record. Blocking unrestricted public access to some of
               | that content doesn't harm that mission and can even
               | support it.
        
               | wkat4242 wrote:
               | That's something I completely support. There's a limit
               | and that site crosses it.
        
               | tylerchilds wrote:
               | the funny thing about the internet archive is that anyone
               | else on this planet could do exactly what they are doing,
               | but they consistently choose not to.
               | 
               | kiwifarms could spin up their own infrastructure, serve
               | their own content for the world, but it turns out
               | technology is a social problem more than a technical
               | problem.
               | 
               | anyone that wants to stand up and be the digital backbone
               | of "kiwi farms" can, but only the internet archive gets
               | flack for not volunteering to be the literal kiwi farm.
               | 
               | for example, the pirate bay goes offline all the time,
               | but it turns out the people that use it, care enough to
               | keep it online themselves.
        
             | wkat4242 wrote:
             | Putting the organisation at risk by playing chicken with
             | large publishing corporations. Trying to stretch fair use a
             | little too far so they had to go to court.
        
           | superkuh wrote:
           | It's the least worst option. Remember when that happened with
           | Mozilla? Now they're an ad company. Take the bad (some bad
           | mis-steps re:multiple lending during the pandemic, not
           | rotating keys immediately after a hack) with the good
           | (staying true to the human centric mission and not the money
           | flows).
        
           | ranger_danger wrote:
           | The content of the archive is 90% mass piracy and Jason Scott
           | is demonstrably complicit in encouraging users to upload
           | copyrighted content without permission.
           | 
           | Edit: Downvoting doesn't change the truth.
        
         | tgsovlerkhgsel wrote:
         | There are many "first things" you need to do if breached, and
         | good luck identifying and doing them all in a timely fashion if
         | you're a small organization, likely heavily relying on
         | volunteers and without a formal security response team...
        
       | trompetenaccoun wrote:
       | We need archives built on decentralized storage. Don't get me
       | wrong, I really like and support the work Internet Archive is
       | doing, but preserving history is too important to entrust it
       | solely to singular entities, which means singular points of
       | failure.
        
         | oytis wrote:
         | We'll need to find even more people willing to expose
         | themselves to legal threats and cyberattacks then.
        
           | trompetenaccoun wrote:
           | The legal side is a big issue, true. The simplest and best
           | workaround that I'm aware of is how the Arweave network
           | handles it. They leave it up to the individual what parts of
           | the data they want to host, but they're financially
           | incentivized to take on rare data that others aren't hosting,
           | because the rarer it is the more they get rewarded. Since
           | it's decentralized and globally distributed, if something is
           | risky to host in one jurisdiction, people in another can take
           | that job and vice versa. The data also can not be altered
           | after it's uploaded, and that's verifiable through hashes and
           | sampling. Main downside in its current form is that
           | decentralized storage isn't as fast as having central
           | servers. And the experience can vary of course, depending on
           | the host you connect to.
           | 
           | As for technical attacks, I'm not an expert but I'd assume
           | it's more difficult for bad actors to bring down
           | decentralized networks. Has the BitTorrent network ever gone
           | offline because it was hacked for example? That seems like it
           | would be extremely hard to do, not even the movie industry
           | managed to take them down.
        
             | Aachen wrote:
             | > decentralized storage isn't as fast as having central
             | servers.
             | 
             | With the 30-second "time to first byte" speed we all know
             | and love from IA, I'm pretty sure it'd only get faster when
             | you're the only person accessing an obscure document on a
             | random person's shoebox in Korea as compared to trying to
             | fetch it from a centralised server that has a few thousand
             | other clients to attend to simultaneously
        
           | Aachen wrote:
           | I collect, archive, and host data. Haven't gotten any threats
           | or attacks. Not one. The average r/selfhosted user hiding
           | their personal OwnCloud behind the DDoS maffia seems more
           | afraid than one needs to be even for hosting all sorts of
           | things publicly
        
         | MattPalmer1086 wrote:
         | Lots of Copies Keeps Stuff Safe
         | 
         | https://www.lockss.org/
         | 
         | This is a brilliant system relying on a randomised consensus
         | protocol. I wanted to do my info sec dissertation on it, but
         | its security model is extremely well thought out. There wasn't
         | anything I felt I could add to it.
        
           | TZubiri wrote:
           | High Costs Makes Lots of Copies Unfeasible
        
             | MattPalmer1086 wrote:
             | That was actually one of the key constraints in the LOCKSS
             | system, since it was designed to be run by libraries that
             | don't have big budgets.
             | 
             | The design is really very good.
        
           | ChadNauseam wrote:
           | I wish IPFS wasn't so wasteful with respect to storage. I
           | tried pinning a 200mb PDF on IPFS and doing so ended up
           | taking almost a gigabyte of disk space altogether. It's also
           | relatively slow. However its implementation of global
           | deduplication is super cool - it means that I can host 5
           | pages and you can host 50, and any overlap between them means
           | we can both help one another keep them available even if we
           | don't know about one another beforehand.
           | 
           | For a large-scale archival project, it might not be ideal.
           | Maybe something based on erasure coding would be better. Do
           | you know how LOCKSS compares?
        
             | diggan wrote:
             | > I tried pinning a 200mb PDF on IPFS and doing so ended up
             | taking almost a gigabyte of disk space altogether
             | 
             | Was that any file in particular? I just tried it myself
             | with a 257mb PDF (as reported by `ls -lrth`) and doesn't
             | seem to add that much overhead:                   $ du -sh
             | ~/.ipfs         84K     /home/user/.ipfs              $
             | ipfs add ~/Downloads/large\ PDF\ File.pdf         added
             | QmSvbEgCuRNZpkKyQm6nA5vz5RTHW1nxb6MJdR4cZUrnDj large PDF
             | File.pdf          256.58 MiB / 256.58 MiB [============]
             | 100.00%              $ du -sh ~/.ipfs         264M
             | /home/user/.ipfs
        
           | Kinrany wrote:
           | Is there a high level explanation of the model?
        
         | jdiff wrote:
         | This seems to get brought at least once in the comments for
         | every one of these articles that pops up.
         | 
         | The IA has tried distributing their stores, but nowhere near
         | enough people actually put their storage where their mouths
         | are.
        
           | immibis wrote:
           | Keep in mind the IA archives a lot of garbage. If it could be
           | more focused it would be more likely to work.
        
             | db48x wrote:
             | The attempts have actually been focused on specific types
             | of content, such as historical videos.
        
             | Blackthorn wrote:
             | The IA only works because it archives everything. You don't
             | know what you need until you need it.
        
             | Spooky23 wrote:
             | Archives generally purposefully don't have a strong
             | editorial streak. My trash is your treasure.
        
             | unleaded wrote:
             | personally I love all the random crap on IA!
        
           | WarOnPrivacy wrote:
           | > nowhere near enough people actually put their storage where
           | their mouths are.
           | 
           | Typically because most people who have the upload, don't know
           | that they can. And if they come to the notion on their own,
           | they won't know how.
           | 
           | If they put the notion to a search engine, the keywords they
           | come up with probably don't return the needed ELI5 page.
           | 
           | As in: _How do I [?] for the Internet Archive?_ , most folks
           | won't know what [?] needs to be.
        
             | TZubiri wrote:
             | This is literally torrents. Just give up
        
               | briandear wrote:
               | The problem with torrents is they have a bad reputation
               | since people use it to steal and redistribute other
               | people's content without their consent.
        
               | card_zero wrote:
               | Is there any form of torrent where you can do a full text
               | search? That, to me, is the more important problem with
               | torrents.
        
               | TZubiri wrote:
               | But internet archive doesn't do this? It's a key based
               | search (url keys)
        
               | AlienRobot wrote:
               | Give it a good reputation then.
               | 
               | What are some legal torrent trackers?
        
               | unleaded wrote:
               | archive.org to name one
        
               | boomboomsubban wrote:
               | That's debatable. Most of their torrents are for things
               | under copyright, though any other decentralized archive
               | would have the same problem.
        
               | tourmalinetaco wrote:
               | That's a copyright problem. 99% of things made in the
               | last 100 years fall under copyright.
        
               | ranger_danger wrote:
               | What is your definition of a legal torrent tracker? I was
               | not aware there were even any illegal ones.
        
               | AlienRobot wrote:
               | A tracker that only tracks legal torrents, e.g. free
               | software, OCRemix content, etc.
        
               | TZubiri wrote:
               | How would you keep the definition of legality without a
               | centralizing authority?
        
               | boomboomsubban wrote:
               | https://linuxtracker.org/
               | http://www.publicdomaintorrents.info/
               | https://ocremix.org/torrents
        
               | mikae1 wrote:
               | _> I was not aware there were even any illegal ones._
               | 
               | Depends on the jurisdiction. Remember what happened in
               | the The Pirate Bay trial?
        
               | seam_carver wrote:
               | Humble Bundle. Various Linux iso
        
               | ranger_danger wrote:
               | To me this is like saying you shouldn't use a knife
               | because they are also used by criminals.
        
               | John_Cena wrote:
               | This kind of talk is simply modern politik-speak. I can't
               | stand it and the people who fall for their deception.
               | Stretch the truth to disarm the constituents
        
               | thwarted wrote:
               | The problem with file transfer is they have a bad
               | reputation since people use it to [insert illegal or
               | immoral activity here].
               | 
               | Then rename it from "torrent" to something else.
        
               | TZubiri wrote:
               | I'm not sure what the argumentative line is here. But
               | file uploading and downloading needs to have
               | accountability for hosting, which p2p obscures.
               | 
               | The bad reputation is inherent to the tech, not a random
               | quirk.
        
               | tourmalinetaco wrote:
               | Torrents have a bad reputation due to malicious
               | executables, I have never met someone who genuinely saw
               | piracy as stealing, only as dangerous. In fact, stealing
               | as a definition cannot cover digital piracy, as stealing
               | is to take something away, and to take is to possess
               | something _physically_. The correct term is copying,
               | because you are duplicating files. And that's not even
               | getting into the cultural protection piracy affords in
               | today's DRM and license-filled world.
        
               | WarOnPrivacy wrote:
               | > This is literally torrents. Just give up
               | 
               | Most casual visitors to IA don't know that. Which is the
               | point.
               | 
               | Giving up is for others.
        
           | creer wrote:
           | And it's guaranteed not to happen if the efforts don't
           | continue.
        
             | acdha wrote:
             | You could say the same thing about perpetual motion. Being
             | realistic about why past efforts have failed is key to
             | doing better in the future: for example, people won't
             | mirror content which could get them in trouble and most
             | people want to feel some kind of benefit or thanks. People
             | should be thinking about how to change dynamics like those
             | rather than burning out volunteers trying more ideas which
             | don't change the underlying game.
        
           | zelphirkalt wrote:
           | Perhaps one idea is to let people choose what they want to
           | protect. This way people wanting to support it can have their
           | mission.
        
             | card_zero wrote:
             | I want it to protect all sorts of random obscure documents,
             | mostly kind of crappy, that I can't predict in advance, so
             | I can pursue my hobby of answering random obscure
             | questions. For instance:
             | 
             | * What is a "bird famine", and did one happen in 1880?
             | 
             | * Did any astrologer ever claim that the constellations
             | "remember" the areas of the sky, and hence zodiac signs,
             | that they belonged to in ancient times before precession
             | shifted them around?
             | 
             | * Who first said "psychology is pulling habits out of
             | rats", and in what context? (That one's on Wikiquote now,
             | but only because I put it there after research on IA.)
             | 
             | Or consider the recently rediscovered Bram Stoker short
             | story. That was found in an actual library, but only
             | because the library kept copies of old Irish newspapers
             | instead of lining cupboards with them.
             | 
             | The necessary documents to answer highly specific questions
             | are very boring, and nobody has any reason to like them.
        
             | dawnerd wrote:
             | You already can, they have torrents for everything.
        
               | diggan wrote:
               | > they have torrents for everything
               | 
               | Including the index itself? That would be awesome.
        
               | tourmalinetaco wrote:
               | Their torrents suck and IME don't update to changes in
               | the archive.
        
         | sksxihve wrote:
         | There's no real financial incentive for people to archive the
         | data as a singular entity so even less for a distributed
         | collection. Also it's probably easier to fund a single entity
         | sufficiently so they can have security/code audits than a bunch
         | of entities all trying to work together.
        
           | riiii wrote:
           | Some people are motivated by more than just financial
           | incentive.
        
             | sksxihve wrote:
             | That's true, but something like archiving the internet is
             | very costly, IA has an annual budget in the tens of
             | millions.
        
               | trompetenaccoun wrote:
               | Yes, it's a good point. Though they could take that money
               | and reward people for hosting the data as well, couldn't
               | they? They don't have to be in charge of hosting.
        
               | sksxihve wrote:
               | Yes, they could, that's not much different than a single
               | company distributing the archive to multiple storage
               | centers though. My original comment was about it being
               | more cost effective for a single company to do that than
               | coordinating with a bunch of disjoint entities.
        
               | trompetenaccoun wrote:
               | Our digital memory shouldn't be in the hands of a small
               | number of organizations in my view. You're right about
               | cost effectiveness. There are pros and cons to both but
               | it's not just external threats that have to be
               | considered.
               | 
               | History has always gotten rewritten throughout time. If
               | you have a giant library it's easier for bad actors to
               | gain influence and alter certain books, or remove them.
               | This isn't just theoretical, under external pressure IA
               | has already removed sites from its archive for copyright
               | and political reasons.
               | 
               | There are also threats that are generally not even
               | considered because they happen with rare frequency, but
               | when they happen they're devastating. The library of
               | Alexandria was burned by Julius Caesar during a war.
               | Likewise, if all your servers are in one country that
               | geographic risk, they can get destroyed in the event of a
               | war or such. No one expects this to happen today in the
               | US, but archives should be robust long term, for decades,
               | ideally even centuries.
        
               | delfinom wrote:
               | >Our digital memory shouldn't be in the hands of a small
               | number of organizations in my view.
               | 
               | I would wager at least 95% of "digital memory" archived
               | is just absolute garbage from SEO spam to just some small
               | websites holding no actual value.
               | 
               | The true digital memory of the world is almost entirely
               | behind the walls of reddit, twitter, facebook, and very
               | few other sites. The internet landscape has changed
               | massively from the 90s and 2000s.
        
               | BlueTemplar wrote:
               | So, about $0.01 per person per year ?
               | 
               | We _are_ talking about an (almost) worldwide archive
               | after all.
        
         | __MatrixMan__ wrote:
         | To make the web distributed-archive-friendly I think we need to
         | start referencing things by hash and not by a path which some
         | server has implied it will serve consistently but which
         | actually shows you different data at different times for a
         | million different reasons.
         | 
         | If different data always gets a different reference, it's easy
         | to know if you have enough backups of it. If the same name gets
         | you a pile of snapshots taken under different conditions, it's
         | hard to be sure which of those are the thing that we'd want to
         | back up for that particular name.
        
           | Cheer2171 wrote:
           | Done. It is called IPFS. The IA already supports it.
           | 
           | https://github.com/internetarchive/dweb-
           | archive/blob/master/...
        
             | __MatrixMan__ wrote:
             | Right, what I'm saying is that now we need to get the rest
             | of the web (or at least the parts we want to keep) on
             | board.
        
             | majorchord wrote:
             | IPFS has shown that the protocol is fundamentally broken at
             | the level of growth they want to achieve and it is already
             | extremely slow as it is. It often takes several minutes to
             | locate a single file.
        
               | diggan wrote:
               | The beauty is that IA could offer their own distribution
               | of IPFS that uses their own DHT for example, and they
               | could allow only public read access to it. This would
               | solve the slow part of finding a file, for IA
               | specifically. Then the actual transfers tend to be pretty
               | quick with IPFS.
               | 
               | What's the point of using IPFS then? Others can still
               | spread the file elsewhere and verify it's the correct
               | one, by using the exact same ID of the file, although on
               | two different networks. The beauty of content-addressing
               | I guess.
        
               | acdha wrote:
               | That isn't solving the problem, it's just giving them
               | more of it to work on. IA has enough material that I'd be
               | surprised if they didn't hit IPFS's design limits on
               | their own, and they'd likely need to change the design in
               | ways which would be hard to get upstream.
        
               | BlueTemplar wrote:
               | Several minutes sounds more than fine for this purpose ?
               | 
               | Especially if it's about having an Internet Archive
               | backup.
        
               | Aachen wrote:
               | I think the point is that it's already slow at the
               | current amount of data, let alone when you stuff dozens
               | more PB into it
        
             | Groxx wrote:
             | Which has a rather lengthy section explaining why it's
             | currently a failed experiment:
             | https://github.com/internetarchive/dweb-
             | archive/blob/master/...
             | 
             | (this doc is 5-6 years old though, and I'm not sure what
             | may have changed since then)
             | 
             | In my own (toy-scale) IPFS experiments a couple years ago
             | it has been rather usable, but also the software has been
             | utterly insane for operators and users, and if I were IA I
             | would only consider it if I budgeted for a from-scratch
             | rewrite (of the stuff in use). Nearly uncontrollable and
             | unintrospectable and high resource use for no apparent
             | reason.
        
         | TechSquidTV wrote:
         | This has really shown that the be true. I am stuck in a
         | situation right now where I have some lost media I want to
         | upload but they have been down for over a week. I plan to
         | create a torrent in the meantime but that means relying on my
         | personal network connection for the vast majority of downloads
         | up front. I looked into CloudFlare R2, not terrible but not
         | free either.
         | 
         | I was looking into using R2 as a web seed for the torrent but I
         | don't _really_ want to spend much to upload content that is
         | going to get "stolen" and reuploaded by content farms anyway
         | you know?
        
           | tourmalinetaco wrote:
           | Why not subscribe to a seedbox? They're about $5/2TB/mo. It
           | protects your IP, you can buy for only the month, and since
           | seedboxes are hosted in DMCA-resistant data centers you can
           | download riskier torrents lightning fast, meaning you're not
           | _just_ spending money for others, you can get something out
           | of it too.
        
         | Cheer2171 wrote:
         | You say this as if the IA is not already deeply invested in the
         | DWeb movement. If you go to a DWeb event in the Bay Area, there
         | is a good chance it will be held at the IA.
        
         | sschueller wrote:
         | Yes, I was quite shocked when I found out that all their DCs
         | are within driving distance.
        
         | delfinom wrote:
         | Yea so, who pays for the decentralized storage long term? What
         | happens when someone storing decentralized data decides to
         | exit? Will data be copied to multiple places, who is going to
         | pay for doubling, tripling or more the storage costs for
         | backups?
         | 
         | Centralized entities emerge to absorb costs because nobody else
         | can do it as efficiently alone.
        
         | NelsonMinar wrote:
         | Is anyone using ArchiveBox regularly? It's a self-hosted
         | archiving solution. Not the ambitious decentralized system I
         | think this comment is thinking of but a practical way for
         | someone to run an archive for themselves.
         | https://archivebox.io/
        
       | _fat_santa wrote:
       | I don't know what their funding model looks like but if they have
       | some cash I'd say hiring a security team would be on top of the
       | list of things to invest in.
        
         | brendoelfrendo wrote:
         | I believe that, at this point in time at least, IA's funding
         | model consists of sweating profusely while awaiting a colossal
         | legal judgement.
        
       | udev4096 wrote:
       | Is it the same email spoofing attack vector of zendesk which was
       | disclosed last week?
        
         | steffanA wrote:
         | Article says API token was stolen in original breach.
        
       | myself248 wrote:
       | I'd like to imagine a world where every lawyer, when their case
       | is helped by a Wayback Machine snapshot of something, flips a few
       | bucks to IA. They could afford a world-class admin team in no
       | time flat.
        
         | thaumasiotes wrote:
         | That's a terrible solution. The Wayback Machine takes down
         | their snapshots at the request of whoever controls the domain.
         | That's not archival.
         | 
         | If the state of a webpage in the past matters to you, you need
         | a record that won't cease to exist when your opposition asks it
         | to. This is the concept behind perma.cc.
        
           | myself248 wrote:
           | Ooo, excellent. Yes, hiding items is imperfect, but I
           | understood that it was legally required or something. (IANAL
           | and IDFK, TBH) I wonder how perma.cc gets around that.
        
             | immibis wrote:
             | Most likely by breaking the law.
        
             | berdario wrote:
             | I'm afraid that it just hasn't been tested in court yet.
             | 
             | I haven't read this paper yet, but...
             | 
             | https://www.tesble.com/10.1080/0270319x.2021.1886785
             | 
             | from the abstract:
             | 
             | > The article concludes that Perma.cc's archival use is
             | neither firmly grounded in existing fair use nor library
             | exemptions; that Perma.cc, its "registrar" library,
             | institutional affiliates, and its contributors have some
             | (at least theoretical) exposure to risk
             | 
             | It seems that the article is about copyright, but of course
             | there are several other reasons that might justify takedown
             | of content stored on perma.cc:
             | 
             | - Right to be forgotten... perma.cc might be able to ignore
             | it, but could this lead to perma.cc being blocked by
             | european ISPs
             | 
             | - ITAR stuff
             | 
             | - content published by entities recognized by $GOVERNMENT
             | as terrorist organizations
             | 
             | - revenge porn
             | 
             | - CSAM
        
           | db48x wrote:
           | No, they don't delete the archived content. When the domain's
           | robots.txt file bans spidering, then the Wayback Machine
           | _hides_ the content archived at that domain. It is still
           | stored and maintained, but it isn't distributed via the
           | website. The content will be unhidden if the robots.txt file
           | stops banning spiders, or if an appropriate request is made.
        
             | speerer wrote:
             | In some cases they do appear to delete, on request.
             | 
             | edit: "Other types of removal requests may also be sent to
             | info@archive.org. Please provide as clear an explanation as
             | possible as to what you are requesting be removed for us to
             | better understand your reason for making the request.",
             | https://help.archive.org/help/how-do-i-request-to-remove-
             | som...
        
               | db48x wrote:
               | Nope. Nothing is deleted, just hidden.
        
               | rascul wrote:
               | How do you know?
        
               | db48x wrote:
               | I worked there for a short while.
        
               | bombcar wrote:
               | So if the Internet Archive accidentally archived child
               | porn, they wouldn't delete it?
               | 
               | I suspect they DO delete some things.
        
             | Raed667 wrote:
             | They do delete entire domains from the archive upon request
             | & proof of ownership.
        
               | db48x wrote:
               | Again, no they don't. They just hide them.
        
           | speerer wrote:
           | That's correct, but only for present evidence - what about
           | the past evidence, that you didn't know you needed until it
           | was too late? IA is broad enough to cover the past five times
           | out of ten.
        
       | badlibrarian wrote:
       | Restating my love for Internet Archive and my plea to put a
       | grownup in charge of the thing.
       | 
       | Washington Post: The organization has "industry standard"
       | security systems, Kahle said, but he added that, until this year,
       | the group had largely stayed out of the crosshairs of
       | cybercriminals. Kahle said he'd opted not to prioritize
       | additional investments in cybersecurity out of the Internet
       | Archive's limited budget of around $20 million to $30 million a
       | year.
       | 
       | https://archive.ph/XzmN2
        
         | semicolon_storm wrote:
         | In security, industry standard seems to be about the same as
         | military grade: the cheapest possible option that still checks
         | all the boxes for SOC.
        
           | incahoots wrote:
           | Basically, whatever the liability insurance wants for you to
           | be in compliance, than that's the standard.
        
           | Spivak wrote:
           | Hot take, this is the way it should be. If you want better
           | security then you update the requirements to get your
           | certification.
           | 
           | Security by its very nature has a problem of knowing when to
           | stop. There's always better security for an ever increasing
           | amount of money and companies don't sign off on budgets of
           | infinity dollars and projects of indefinite length. If you
           | want security _at all_ you have bound the cost and have well-
           | defined stopping points.
           | 
           | And since 5 security experts in a room will have 10 different
           | opinions on what those stopping points should be-- what
           | constitutes "good-enough" they only become meaningful when
           | there's industry wide agreement on them.
        
             | db48x wrote:
             | Yep. And worse, now matter how much you pay for security it
             | is still possible for someone to make a mistake and publish
             | a credential somewhere public.
        
             | gjsman-1000 wrote:
             | This ^
             | 
             | We can't all have the latest EPYC processors with the
             | latest bug fixes using Secure Enclaves and homomorphic
             | encryption for processing user data while using remote
             | attestation of code running within multiple layers of
             | virtualization. With, of course, that code also being
             | written in Rust, running on a certified microkernel, and
             | only updatable when at least 4 of 6 programmers, 1 from
             | each continent, unite their signing keys stored on HSMs to
             | sign the next release. All of that code is open source, by
             | the way, and has a ratio of 10 auditors per programmer with
             | 100% code coverage and 0 external dependencies.
             | 
             | Then watch as a kid fakes a subpoena using a hacked police
             | account and your lawyers, who receive dozens every day,
             | fall for it.
        
               | gjsman-1000 wrote:
               | Hilariously, I've been downvoted to -2 by butthurt
               | security experts without a counter-argument.
        
               | evilduck wrote:
               | No, it's your demeanor that is unbecoming and not worth
               | engaging with. Villianizing your poor behavior not
               | successfully baiting people into replying as you want is
               | childish too. Take a breather.
        
             | abadpoli wrote:
             | There never will be an adequate industry-wide
             | certification. There is no universal "good enough" or "when
             | to stop" for security. What constitutes "good enough" is
             | entirely dependent on what you are protecting and who you
             | are protecting it from, which changes from system to system
             | and changes from day to day.
             | 
             | The budget that it takes to protect against a script kiddy
             | is a tiny fraction of the budget it takes to protect from a
             | professional hacker group, which is a fraction of what it
             | takes to protect from nation state-funded trolls. You can
             | correctly decide that your security is "good enough" one
             | day, but all it takes is a single random news story or
             | internet comment to put a target on your back from someone
             | more powerful, and suddenly that "good enough" isn't good
             | enough anymore.
             | 
             | The Internet Archive might have been making the correct
             | decision all this time to invest in things that further its
             | mission rather than burning extra money on security, and it
             | seems their security for a long time was "good enough"...
             | until it wasn't.
        
             | goodpoint wrote:
             | > since 5 security experts in a room will have 10 different
             | opinions
             | 
             | If that happens you need to seriously rethink your hiring
             | process.
        
           | EasyMark wrote:
           | Military grade has different meanings. I've worked in the
           | electronics industry a long time and will say with confidence
           | that the pcbs and chips we sent to the military were our
           | best. Higher temperature ranges, much more thorough
           | environmental testing, many more thermal and humidity cycles,
           | lots more vibration testing. However we also sell them for
           | 5-10x our regular prices but in much lower quantities. It's a
           | failed meme in many instances as the internet uses it though.
        
       | pessimizer wrote:
       | The Internet Archive has a management problem. They seem to be
       | more comfortable _disrupting libraries_ than managing an online,
       | publicly accessible database of disputed, disorganized material.
       | 
       | Despite all of the positive self-talk, I don't know if they
       | realize how important they are, or how easy it would be for them
       | to find good help and advice if their management were transparent
       | and everything was debated in public. That may have protected it
       | to some extent; as a counterexample, Wikipedia has been extremely
       | fragile due to its transparency and accessibility to everyone.
       | With IA being driven by its creator's ideology, maybe that
       | ideology should be formalized and set in stone as bylaws, and the
       | torch passed to people openly debating how IA should be run, its
       | operations, and what it should be taking on.
       | 
       | I don't mean they should be run by the random set of Confucian-
       | style libertarian aphorisms that is running the credibility of
       | Wikipedia into the ground, but Debian is a good model to follow.
       | Or maybe do better than both?
        
         | avazhi wrote:
         | https://www.wired.com/story/internet-archive-memory-wayback-...
         | 
         | I appreciate their ethos and I've used the site many times (and
         | donated!), but clearly it's at the point where Kahle et al just
         | aren't equipped either personally (as a matter of technical
         | expertise) or collectively (they are just a handful of people)
         | to be dealing with what are probably in many cases nation-state
         | attacks. Kahle's attitude towards (and misunderstanding of)
         | copyright law is IMO proof that he shouldn't be running things,
         | because his legal gambles (gambles that a first year law
         | student could have predicted would fail spectacularly) have put
         | IA at long term risk (see: Napster). And this information
         | coming out over the past few weeks about their technical
         | incompetence is arguably worse, because the tech side of things
         | are what he and his team are actually supposed to be good at.
         | 
         | It's true that Google and Microsoft and others should be
         | propping up the IA financially but that isn't going to solve
         | the IA's lack of technical expertise or its delusional hippie
         | ethos.
        
         | badlibrarian wrote:
         | Don't forget the time Brewster tried to run a bank -- Internet
         | Archive Federal Credit Union. Or that the physical archives are
         | stored on an active fault line and unlikely to receive prompt
         | support during an emergency. Or that, when someone told him
         | that archives are often stored in salt mines he replied, "cool,
         | where can I buy one?"
        
         | mrweasel wrote:
         | > Debian is a good model to follow.
         | 
         | While I have no idea how Debian is actually funded I'd agree.
         | One issue might be that The Internet Archive actually need to
         | have people on staff, not sure if Debian has that requirement.
         | You're not going to get people to man scanner or VHS players 8
         | hours a day without pay, at least not at this scale.
         | 
         | The Internet Archive needs a better funding strategy that
         | asking for money on their own site. People aren't visiting them
         | frequently enough for that to work. They need a fundraising
         | team, and a good one.
         | 
         | Finding managers are probably even worse. They can't get a
         | normal CEO type person, because they aren't a company and the
         | type of people who apply to or are attracted to running non-
         | profit, server the community, don't be evil organisation are
         | frequently bat-shit crazy.
        
         | kmeisthax wrote:
         | > Confucian-style libertarian aphorisms that is running the
         | credibility of Wikipedia
         | 
         | Can you elaborate? I'm aware of Wikipedia having very
         | particular rules and lots of very territorial editors, but I'm
         | not sure how this runs their credibility into the ground aside
         | from pissing off the far right when they come in with an agenda
         | to push.
        
       | notmysql_ wrote:
       | I sent them a resume almost a year ago, and got nothing back in
       | response until yesterday. Looks like they are going through their
       | backlog right now to find more hands.
        
         | TZubiri wrote:
         | Interesting, for a security position?
        
           | notmysql_ wrote:
           | It was a while ago, I think it was for their general position
           | option, though I did talk about sec experience in it
        
       | sirolimus wrote:
       | It's incredibly sad to see threat actors attack something as
       | altruistic as an internet library. Truly demoralizing to see such
       | degeneracy.
        
         | codezero wrote:
         | There are many state actors that attack targets of opportunity
         | just to cause chaos and asymmetric financial costs.
        
         | croes wrote:
         | Seems like the actor did it only for the street credit and the
         | second breach is only a reminder that IA didn't properly fixed
         | it after the first breach.
         | 
         | Could be worse.
        
         | userbinator wrote:
         | When there are plenty of people who are steeped in the dogma of
         | Imaginary Property, and whose lives depend on it, it's not too
         | surprising.
        
         | xyst wrote:
         | Blame bad leadership.
        
           | callc wrote:
           | Is there a reason to blame the victim, rather than the
           | attackers?
           | 
           | I'm asking seriously - did IA do shitty things that make them
           | a worthy cause for politically/ideologically motivated
           | hacking?
        
             | lolinder wrote:
             | I imagine they're referring to the fact that the leadership
             | showed extremely bad judgement in deciding to pick a battle
             | with the major publishing companies that _everyone_ knew
             | they would lose before it even began [0].
             | 
             | I don't think that justifies blaming the victim here, and
             | from what I can see the attacker doesn't seem to be
             | motivated by anything other than funsies, but I absolutely
             | lost a lot of faith in their leadership when they pulled
             | the NEL nonsense. The IA is too valuable for them to act
             | like a young activist org--there's too much for us to lose
             | at this point. They need to hold the ground they've won and
             | leave the activism to other organizations.
             | 
             | [0] https://www.wired.com/story/internet-archive-loses-
             | hachette-...
        
               | jampekka wrote:
               | > there's too much for us to lose at this point
               | 
               | Feeling entitled?
        
         | luckylion wrote:
         | A different framing is: be grateful that it's these types of
         | people breaching IA and being vocal about it & asking IA to fix
         | their systems. Others might just nuke them, or subtly alter
         | content, or do whatever else bad thing you can think of.
         | 
         | They're providing a public service by pointing out that a
         | massive organization controlling a lot of PII doesn't care
         | about security at all.
        
         | A4ET8a8uTh0 wrote:
         | Not defending attacker, because I see IA as common good. That
         | said one of the messages from this particular instance reads
         | almost as if they were trying to help by pointing out issues
         | that IA clearly missed:
         | 
         | "Whether you were trying to ask a general question, or
         | requesting the removal of your site from the Wayback Machine
         | your data is now in the hands of some random guy. If not me,
         | it'd be someone else."
         | 
         | I am starting to wonder if the chorus of 'maybe one org should
         | not be responsible for all this; it is genuinely too important'
         | has a point.
        
         | sim7c00 wrote:
         | anything with tons of traffic going to it is a target. it has
         | nothing to do with what the entity does, more with what
         | potential reach it has. criminal behaviour is what it is.
         | people pulling loads of visitors need to properly secure their
         | shit, to prevent their their customers becoming their victims.
        
       | gweinberg wrote:
       | Does anyone know who is targeting the Internet Archive, and why?
       | I get the impression the attacks are too sophisticated for it to
       | just be vandal punks.
        
         | xyst wrote:
         | Is it sophisticated if IA leaves the door wide open? I blame
         | shit leadership.
        
         | lolinder wrote:
         | > I get the impression the attacks are too sophisticated for it
         | to just be vandal punks.
         | 
         | What gives that impression? Everything I've seen about the
         | attacker's messaging says "vandal punk(s)" to me, and nothing
         | in what I've seen of the IA's systems screams Fort Knox. It
         | wouldn't surprise me if they actually had a pretty lax approach
         | to security on the assumption that there's very little reason
         | to target them.
        
         | jrm4 wrote:
         | It strikes me as reasonable to _assume_ (or at least strongly
         | bet on) -- I 'm not sure of the right phrase for it -- but like
         | a mercenary type operation on behalf of some larger old media
         | company?
         | 
         | There's just too much "means, motive and opportunity" there.
        
         | dokyun wrote:
         | The group that claimed to be responsible for the first hack was
         | said to be Russian-based, anti-U.S., pro-Palestine, and their
         | reasoning for the attack was because of IA's violation of
         | copyright....
         | 
         | I think you should draw your own more informed conclusions, but
         | it smells a lot like feds to me.
        
         | polytely wrote:
         | With the amount of comments calling for a leadership change my
         | tinfoilhat theory is that this is a concerted effort to get a
         | leadership change.
        
       | alexey-salmin wrote:
       | A genuine question to commenters asking to "put a grownup in
       | charge of the thing" and saying that "Kahle shouldn't be running
       | things": he built the thing, why exactly he can't run it the way
       | he sees fit?
        
         | et-al wrote:
         | He is. But at the cost of the greater good.
         | 
         | Most of us care mainly about the Wayback Machine and archiving
         | webpages; not borrowing books still under copyright and
         | fighting publishers.
        
           | TZubiri wrote:
           | Speak for yourself, the internet archive successfully
           | increased its scope and made creative contributions to case
           | law (although it lost at the appeals court)
        
         | pvg wrote:
         | A good place to direct that question might be in a reply to the
         | person who made that comment.
        
       | anthk wrote:
       | The Internet Archive had legal gems such as the Jamendo Album
       | Collection, a huge CC haven. Yes, most of it under NC licenses,
       | but for non-commercial streaming radio with podcasts, these have
       | been invaluable.
       | 
       | Do you know Nanowar? They began there.
       | 
       | Also, as commercal music has been deliberately dumbed down for
       | the masses (in paper, not by cheap talking), discovering Jamendo
       | and Magnatune in late 00's has been like crossing a parallel
       | universe.
        
       | 999900000999 wrote:
       | Do any organizations have a mirror of this?
       | 
       | Even if it's not publicly available...
        
       | butz wrote:
       | Is there any way IA could be mirrored in read-only mode, while
       | security concerns are addressed?
        
       | kleiba wrote:
       | People with solid info sec knowledge: this is a good opportunity
       | to offer your expertise pro-bono for a good cause!
        
         | kyleyeats wrote:
         | They're buried in these offers right now.
        
           | op00to wrote:
           | I wonder how many offers are legitimate.
        
             | TZubiri wrote:
             | An org amidst an attack might not be the most open to
             | giving credentials and access to strangers.
        
       | RcouF1uZ4gsC wrote:
       | The Library of Congress should be archiving the Internet and it
       | should have the budget required to do so.
       | 
       | This is in line with its mission as the "Library of Congress".
       | Being able to have an accurate record of what was on the Internet
       | at a specific point in time would be helpful when discussing
       | legislation or potential regulation involving the internet.
        
         | awkwardpotato wrote:
         | The Library of Congress does currently archive limited
         | collections of the internet[0]. They have a blog post[1]
         | breaking down the effort, currently it's 8 full time staff with
         | a team of part time members. According to Wikipedia[2], it's
         | built on Heritrix and Wayback which are both developed by the
         | Internet Archive (blog post also mentions "Wayback software").
         | Current archives are available at: http://webarchive.loc.gov/
         | 
         | [0] https://www.loc.gov/programs/web-archiving/about-this-
         | progra...
         | 
         | [1] https://blogs.loc.gov/thesignal/2023/08/the-web-archiving-
         | te...
         | 
         | [2]
         | https://en.m.wikipedia.org/wiki/List_of_Web_archiving_initia...
        
       ___________________________________________________________________
       (page generated 2024-10-20 23:00 UTC)