hngopher.com

       [HN Gopher] Anna's Archive Faces Millions in Damages and a Perma...
       ___________________________________________________________________
        
       Anna's Archive Faces Millions in Damages and a Permanent Injunction
        
       Author : mistercheph
       Score  : 114 points
       Date   : 2024-07-08 19:52 UTC (3 hours ago)
        
 (HTM) web link (torrentfreak.com)
 (TXT) w3m dump (torrentfreak.com)
        
       | helloworld42024 wrote:
       | At this point, we need a service that "offers" an 8-bay (with
       | 12TB? 14TB? drives) full with the whole ~80TB Anna's Archive.
       | It's essentially all of human knowledge and to be frank it
       | belongs to no one - rather...everyone.
       | 
       | People can store this at their house, keep it offline. Just to
       | have these seeds of knowledge everywhere.
       | 
       | ...I suppose LLM's trained on this data, essentially their model
       | weights and tokenization are a much more efficient way of storing
       | and condensing this 80TB archive?
        
         | falsaberN1 wrote:
         | The problem with the later is reliability, or rather it's
         | efficient but unreliable. I'd rather overdo my offline storage
         | and figure out some way to script/code my way into searching it
         | in a convenient way.
        
         | jononor wrote:
         | If the content is to be trustworthy then using LLMs to compress
         | it makes no sense.
        
           | miloignis wrote:
           | It's possible to do lossless compression with LLMs, basically
           | using the LLM as a predictor and then storing differences
           | when the LLM would have predicted incorrectly. The incredible
           | Fabrice Bellard actually implemented this idea:
           | https://bellard.org/ts_zip/
        
             | rustcleaner wrote:
             | Can we do this in physics?
             | 
             | Use a universal function approximator to approximate the
             | universe, seek Erf(x)>threshold, interrogate universe for
             | fresh data, retrain new universal approximator, ... loop
             | previous ... , universe in a bottle.
        
         | squigz wrote:
         | You dropped a 0. Anna's Archive is currently 862.4 TB
        
           | RachelF wrote:
           | Yes, too much for one person, but collectively it is possible
           | to keep it alive.
           | 
           | If anyone wishes to help, you can generate a chunk in 1TB
           | units and seed via BitTorrent here:
           | 
           | https://annas-archive.gs/torrents
        
           | shrubble wrote:
           | If you only care about non-fiction and science journals it is
           | more like 250TB I think? Still several thousands in 22TB
           | drives with ZFS though.
        
             | ndriscoll wrote:
             | 22 TB drives are around $230 on ebay, so if you used 15 of
             | them in raidz2, that'd be around $3500 (so maybe a little
             | over $4k with the rest of the server), which is around the
             | cost of a new mirrorless camera and a decent lens, so
             | certainly within the realm of a hobbyist. You probably
             | couldn't get away with downloading 250 TB in any reasonable
             | timeframe with most US ISPs (or at least Comcast) though.
             | That'd be over 2.5 months of 300 Mb/s non-stop. Even
             | copying it from a friend using 2.5 Gbit/s Ethernet would
             | take over a week.
        
           | bityard wrote:
           | That is true. However, it also has a staggering amount of
           | duplicate data. I have _heard_ that if you search for most
           | any particular book, you often get a dozen results of varying
           | sizes and quality. Even for the same filetype. It's a hard
           | problem to solve, but if we had something that could somehow
           | pick the "best" copy of a particular title, for every title
           | in the library, Anna could likely drop the zero herself.
        
       | advael wrote:
       | I feel obligated to say my usual "IP is at this point doing more
       | harm than good" spiel here but don't have the time budget to
       | argue it with people today
        
         | data_maan wrote:
         | It's good that you still said it!
        
         | NuSkooler wrote:
         | I've been trying to convince people of this for years. The
         | problem is from my view, that people have the idea of IP being
         | some sort of the American Dream ingrained in their heads that
         | they can't even reason about anything else.
        
           | noman-land wrote:
           | What's a suitable replacement?
        
             | ElevenLathe wrote:
             | Copyright should probably grant some benefit to an author,
             | but just the bare minimum necessary to incentivize people
             | to actually submit their work and file for copyright (also,
             | we should resume requiring that you actually file for
             | copyright, as was the case before 1978). This probably
             | means some period of exclusive monetization rights. Anyone
             | should be able to search for and read any filed works for
             | free from the moment they are filed, or possibly after this
             | exclusive monopoly period.
             | 
             | Any additional benefits to copyright holders beyond what is
             | needed to make sure we don't lose the works are essentially
             | graft, no different in principle to the medieval church
             | selling lucrative offices.
             | 
             | If these things are valuable only because of scarcity, then
             | we are incentivizing scarcity by granting monopoly, so we
             | should do as little of that as we can manage. If they are
             | inherently valuable, they should be as widely disseminated
             | as possible (a cost that government can easily afford given
             | modern technology). If they are worthless, there is no harm
             | in the government keeping a copy anyway.
        
             | wongarsu wrote:
             | A copyright term of 30 years after first publication, or 30
             | years after creation if no publication happens in this
             | period. Ideally add some provision that ensures works are
             | actually available after that period (similar to how many
             | countries require all printed books to be submitted to the
             | national library, but extend it to all media that achieves
             | some benchmark of significance). Patent duration adjusted
             | on a per-industry basis. Trademarks are fine as is
        
         | noman-land wrote:
         | What would we use instead?
        
           | rvense wrote:
           | Copyright should expire after like 15 years.
           | 
           | Academic publishers should not exist, research - especially
           | publically funded research - should just go straight to the
           | public domain.
        
             | dylan604 wrote:
             | Some research is funded through private donations to a
             | specific school, or because the team shopped their project
             | around to find private funding directly. This goes beyond
             | public funding which grays up that public access to the
             | research especially when compared to something like data
             | produced by NASA
        
             | autoexec wrote:
             | 14 years was considered good enough back when it was
             | prohibitively expensive to publish anything and worldwide
             | distribution was basically impossible. Today, when
             | publishing is essentially free and worldwide distribution
             | happens at close to light speed you think we should expand
             | copyright for another year?
             | 
             | I lean more towards 7-10 years, with required registration
             | involving a DRM free copy of the work submitted to the US
             | copyright office (where possible) who will automatically
             | host that file for free once the copyright term has
             | expired. There should be an RSS feed from copyright.gov
             | with download links to the latest works entering the public
             | domain. That'd also make it dead simple to find who you
             | need to contact if you want to negotiate rights to use a
             | work still under copyright's protection.
             | 
             | I agree that anything getting public funding should be
             | public domain on day one (normal exceptions for national
             | security etc)
        
         | globular-toast wrote:
         | Definitely recommend everyone interested to listen to rms's
         | talk "Copyright vs Community". It changed the way I thought
         | about it some 15 years ago. It's only got worse since, but it
         | seems more people are coming to the same conclusion. Maybe we
         | can do something about it.
         | 
         | rms suggests dialling back copyright rather than completely
         | abolishing it: 10 years from date of publication. Of course he
         | doesn't believe in copyright for _software_ at all, but that 's
         | another matter.
         | 
         | The funny thing is the way these greedy assholes in the
         | copyright industry are behaving is just making it worse for
         | them. It's driving people to places like z-library because
         | essentially everything is in copyright. A child has just been
         | born who won't ever see a work that was published 50 years ago
         | go out of copyright. It's insane. With sensible copyright
         | lengths we wouldn't need z-library.
        
         | chongli wrote:
         | IP-intensive industries contribute 41% of the US GDP and employ
         | 44% of the US workforce [1]. If you abolished IP all of that
         | would go away. How would you replace that? I'm not a fan of IP
         | either but I think it's pretty hard to escape that reality. Big
         | companies like NVIDIA (3T market cap) are almost 100% IP.
         | 
         | IP is more-or-less central to the US's economic and security
         | strategy. Without it, the country loses a huge amount of power
         | and influence in the world.
         | 
         | [1] https://www.uspto.gov/ip-policy/economic-
         | research/intellectu...
        
           | Super_Jambo wrote:
           | Pay them from general taxation to dig holes and fill them in
           | again with spoons?
           | 
           | If you can only succeed because you're government is holding
           | back others at the barrel of a gun do you really deserve to?
        
       | data_maan wrote:
       | "Whether we're supporting advancements on the leading edge of
       | science or helping children build a strong learning foundation,
       | shared knowledge is the common thread". (source:
       | https://www.oclc.org/en/about.html)
       | 
       | Well, now it's shared on a torrent, but I guess for them that was
       | "over shared" lol.
       | 
       | Also, they are a library but spent 5 million on cyber defense...
       | seriously??
        
         | aftbit wrote:
         | Yeah why would a library of all places be upset about Anna's
         | Archive? Their whole mission is sharing and preserving
         | knowledge.
        
       | data_maan wrote:
       | What is the deal with this Njalla hosting service? Is it really
       | so hard to take sites from there down?
        
         | dtx1 wrote:
         | It's made by a former piratebay founder. When you buy a domain
         | from them, they act as a middleman and keep legal ownership of
         | the domain thus shielding your PI somewhat from legal
         | enforcement. They claim their hosting is "In secret locations
         | in Sweden". Their LLC is located in Nevis (some tax evasion
         | island).
         | 
         | There's also https://1984.hosting/ which runs a similar
         | operation out of iceland.
        
           | rustcleaner wrote:
           | Sweet! Thanks for the links guys! :^)
        
           | autoexec wrote:
           | Considering I haven't managed to run into malware or phishing
           | sites on either on those providers they seem to be doing
           | something right. Why are these so-called "evil" hosting
           | companies less of a problem for me than namecheap, godaddy,
           | and google?
        
       | catlikesshrimp wrote:
       | Mixed feelings about the case.
       | 
       | Sharing is the best thing that can happen to knowledge. It is
       | great that gatekeepers lose money over this.
       | 
       | However, the blame of the loss might burden oclc, which might
       | have been doing a positive job.
       | 
       | https://www.oclc.org/en/about.html
        
         | data_maan wrote:
         | What is OCLCs added value? They didn't create the data
        
           | catlikesshrimp wrote:
           | Useless in my country because there are no libraries. But I
           | can lookup libraries "Near my country" and beyond.
           | 
           | I suppose some libraries will allow ebook loans through
           | worldcat. They seem to be more about sharing within us law
           | without directly charging people.
           | 
           | Thats why I said positive. Torrent sharing is better, but idk
           | if that will be sustainable
        
             | knowaveragejoe wrote:
             | What country has no libraries?
        
               | autoexec wrote:
               | Good question. The internet is full of lies, but it
               | suggests that Papua New Guinea ranks at the bottom since
               | it only has one and that was a gift from Australia.
        
           | bawolff wrote:
           | Organizing data is valuable.
           | 
           | Not as valuable as the actual data, but its not nothing
           | either.
        
           | freehorse wrote:
           | Suing Anna's Archive and similar product, eg being the
           | lapdogs of big publishers, it seems. Why else would they care
           | if they had no shared interests with them?
           | 
           | MAybe I am missing a way to use their database that makes
           | sense, but for me worldcat is pinterest-level and other SEO-
           | pollution on my search results when I need to find some real
           | information. I do not care if this book is in some library
           | 1000km away. If I need a book I will search in my local ones.
           | Never understood what is the point in what worldcat does but
           | maybe others use it in some way useful to them.
        
         | wongarsu wrote:
         | OCLC's slogan is literally "Because what is known must be
         | shared". Them being a litigative gatekeeper is mighty
         | hypocritical.
         | 
         | I don't really get their dilemma. They claim that a publicly
         | available copy of the data on Anna's Archive is a direct threat
         | to their business. But the same data is freely available by
         | going to their own worldcat.org. Any library that was satisfied
         | with pure read access to the data was already not going to pay
         | them money.
         | 
         | They allege that scraping the 2.2TB of data cost them $5
         | million over 2 years. That's $2 per megabyte. If the cost of
         | providing this was the only issue surely within those two years
         | someone would have gotten the idea to just put up an XML dump
         | for download, or to shoot Anna's Archive an email with an offer
         | to just send them the data as soon as it became clear that it
         | was them.
        
       | lynguist wrote:
       | What a ridiculous claim. They call computer hardware and salaries
       | damage.
       | 
       | I want a competent judge to make sure that these are not damages
       | and I wish that Anna's Archive continues to operate in sensible
       | jurisdiction for the foreseeable future.
       | 
       | If I had the means I would donate to them.
        
         | teractiveodular wrote:
         | https://annas-archive.gs/donate
        
           | dylan604 wrote:
           | Having the means normally means the extra finances to give
           | away vs knowing the link to their donation form
        
             | autoexec wrote:
             | Still kind of nice to have that link thrown out here for
             | those with the means to support their work.
        
             | shkkmo wrote:
             | It an easily misunderstood idiom since the difference
             | between the two meanings is just the definite vs indefinite
             | article: 'the means' vs 'a means'.
        
         | tehwebguy wrote:
         | Article says nobody is responding as defendant except one
         | individual who has filed a motion to dismiss based on being
         | misidentified. I don't know the status of that motion though.
        
       | ASalazarMX wrote:
       | They're just piling on anything they can regardless of reason. A
       | real damage count would be lost sales due to web scraping, but
       | they don't sell anything.
       | 
       | > For example, the organization spent $1,548,693 on upgrades for
       | its hardware infrastructure, and an additional $608,069 for a
       | two-year Cloudflare contract [..] Other costs include the
       | salaries of 34 full-time employees, who were tasked with
       | mitigating the harm caused by the attacks, as well as various
       | other investigation, security, and hardware-related costs.
       | 
       | > "OCLC has incurred damages of $5,333,064 as a direct result of
       | Anna's Archive's cyberattacks, but that amount does not fully
       | compensate OCLC for the harm from Anna's Archive's wrongful
       | actions. OCLC continues to suffer from harms that cannot be
       | remedied by monetary damages."
       | 
       | Is web scraping now considered a cyberattack? Was it eating their
       | bandwidth even if it was served through Cloudflare? LOL.
        
         | gmuslera wrote:
         | Ask Aaron Swartz
        
           | rustcleaner wrote:
           | I'm convinced that was not self-inflicted.
        
             | neilv wrote:
             | Some friends and family are likely to be on HN, and I'd
             | assume probably don't want to see speculation like that.
        
               | 2OEH8eoCRo0 wrote:
               | The conspiracy/speculation on every damn topic around
               | here is getting out of hand.
        
         | bawolff wrote:
         | > Is web scraping now considered a cyberattack? Was it eating
         | their bandwidth even if it was served through Cloudflare? LOL.
         | 
         | We've been tobagonning down the slippery slope of "cyber"
         | damages for a long time now.
        
         | odo1242 wrote:
         | The original "damages" in a lawsuit are almost always made up.
         | They're supposed to get adjusted downward during and after the
         | lawsuit.
         | 
         | This is also why having a default judgement delivered against
         | you for failing to show up generally isn't great.
        
       | squigz wrote:
       | https://annas-archive.gs/torrents
       | 
       | https://annas-archive.gs/donate
        
       | pornel wrote:
       | The mistake was calling it Anna's Archive, and not Anna's AI
       | startup.
        
       | greenie_beans wrote:
       | what happens if somebody used the metadata from anna's archive?
        
       ___________________________________________________________________
       (page generated 2024-07-08 23:00 UTC)