[HN Gopher] Anna's Archive Faces Millions in Damages and a Perma...
___________________________________________________________________
Anna's Archive Faces Millions in Damages and a Permanent Injunction
Author : mistercheph
Score : 114 points
Date : 2024-07-08 19:52 UTC (3 hours ago)
(HTM) web link (torrentfreak.com)
(TXT) w3m dump (torrentfreak.com)
| helloworld42024 wrote:
| At this point, we need a service that "offers" an 8-bay (with
| 12TB? 14TB? drives) full with the whole ~80TB Anna's Archive.
| It's essentially all of human knowledge and to be frank it
| belongs to no one - rather...everyone.
|
| People can store this at their house, keep it offline. Just to
| have these seeds of knowledge everywhere.
|
| ...I suppose LLM's trained on this data, essentially their model
| weights and tokenization are a much more efficient way of storing
| and condensing this 80TB archive?
| falsaberN1 wrote:
| The problem with the later is reliability, or rather it's
| efficient but unreliable. I'd rather overdo my offline storage
| and figure out some way to script/code my way into searching it
| in a convenient way.
| jononor wrote:
| If the content is to be trustworthy then using LLMs to compress
| it makes no sense.
| miloignis wrote:
| It's possible to do lossless compression with LLMs, basically
| using the LLM as a predictor and then storing differences
| when the LLM would have predicted incorrectly. The incredible
| Fabrice Bellard actually implemented this idea:
| https://bellard.org/ts_zip/
| rustcleaner wrote:
| Can we do this in physics?
|
| Use a universal function approximator to approximate the
| universe, seek Erf(x)>threshold, interrogate universe for
| fresh data, retrain new universal approximator, ... loop
| previous ... , universe in a bottle.
| squigz wrote:
| You dropped a 0. Anna's Archive is currently 862.4 TB
| RachelF wrote:
| Yes, too much for one person, but collectively it is possible
| to keep it alive.
|
| If anyone wishes to help, you can generate a chunk in 1TB
| units and seed via BitTorrent here:
|
| https://annas-archive.gs/torrents
| shrubble wrote:
| If you only care about non-fiction and science journals it is
| more like 250TB I think? Still several thousands in 22TB
| drives with ZFS though.
| ndriscoll wrote:
| 22 TB drives are around $230 on ebay, so if you used 15 of
| them in raidz2, that'd be around $3500 (so maybe a little
| over $4k with the rest of the server), which is around the
| cost of a new mirrorless camera and a decent lens, so
| certainly within the realm of a hobbyist. You probably
| couldn't get away with downloading 250 TB in any reasonable
| timeframe with most US ISPs (or at least Comcast) though.
| That'd be over 2.5 months of 300 Mb/s non-stop. Even
| copying it from a friend using 2.5 Gbit/s Ethernet would
| take over a week.
| bityard wrote:
| That is true. However, it also has a staggering amount of
| duplicate data. I have _heard_ that if you search for most
| any particular book, you often get a dozen results of varying
| sizes and quality. Even for the same filetype. It's a hard
| problem to solve, but if we had something that could somehow
| pick the "best" copy of a particular title, for every title
| in the library, Anna could likely drop the zero herself.
| advael wrote:
| I feel obligated to say my usual "IP is at this point doing more
| harm than good" spiel here but don't have the time budget to
| argue it with people today
| data_maan wrote:
| It's good that you still said it!
| NuSkooler wrote:
| I've been trying to convince people of this for years. The
| problem is from my view, that people have the idea of IP being
| some sort of the American Dream ingrained in their heads that
| they can't even reason about anything else.
| noman-land wrote:
| What's a suitable replacement?
| ElevenLathe wrote:
| Copyright should probably grant some benefit to an author,
| but just the bare minimum necessary to incentivize people
| to actually submit their work and file for copyright (also,
| we should resume requiring that you actually file for
| copyright, as was the case before 1978). This probably
| means some period of exclusive monetization rights. Anyone
| should be able to search for and read any filed works for
| free from the moment they are filed, or possibly after this
| exclusive monopoly period.
|
| Any additional benefits to copyright holders beyond what is
| needed to make sure we don't lose the works are essentially
| graft, no different in principle to the medieval church
| selling lucrative offices.
|
| If these things are valuable only because of scarcity, then
| we are incentivizing scarcity by granting monopoly, so we
| should do as little of that as we can manage. If they are
| inherently valuable, they should be as widely disseminated
| as possible (a cost that government can easily afford given
| modern technology). If they are worthless, there is no harm
| in the government keeping a copy anyway.
| wongarsu wrote:
| A copyright term of 30 years after first publication, or 30
| years after creation if no publication happens in this
| period. Ideally add some provision that ensures works are
| actually available after that period (similar to how many
| countries require all printed books to be submitted to the
| national library, but extend it to all media that achieves
| some benchmark of significance). Patent duration adjusted
| on a per-industry basis. Trademarks are fine as is
| noman-land wrote:
| What would we use instead?
| rvense wrote:
| Copyright should expire after like 15 years.
|
| Academic publishers should not exist, research - especially
| publically funded research - should just go straight to the
| public domain.
| dylan604 wrote:
| Some research is funded through private donations to a
| specific school, or because the team shopped their project
| around to find private funding directly. This goes beyond
| public funding which grays up that public access to the
| research especially when compared to something like data
| produced by NASA
| autoexec wrote:
| 14 years was considered good enough back when it was
| prohibitively expensive to publish anything and worldwide
| distribution was basically impossible. Today, when
| publishing is essentially free and worldwide distribution
| happens at close to light speed you think we should expand
| copyright for another year?
|
| I lean more towards 7-10 years, with required registration
| involving a DRM free copy of the work submitted to the US
| copyright office (where possible) who will automatically
| host that file for free once the copyright term has
| expired. There should be an RSS feed from copyright.gov
| with download links to the latest works entering the public
| domain. That'd also make it dead simple to find who you
| need to contact if you want to negotiate rights to use a
| work still under copyright's protection.
|
| I agree that anything getting public funding should be
| public domain on day one (normal exceptions for national
| security etc)
| globular-toast wrote:
| Definitely recommend everyone interested to listen to rms's
| talk "Copyright vs Community". It changed the way I thought
| about it some 15 years ago. It's only got worse since, but it
| seems more people are coming to the same conclusion. Maybe we
| can do something about it.
|
| rms suggests dialling back copyright rather than completely
| abolishing it: 10 years from date of publication. Of course he
| doesn't believe in copyright for _software_ at all, but that 's
| another matter.
|
| The funny thing is the way these greedy assholes in the
| copyright industry are behaving is just making it worse for
| them. It's driving people to places like z-library because
| essentially everything is in copyright. A child has just been
| born who won't ever see a work that was published 50 years ago
| go out of copyright. It's insane. With sensible copyright
| lengths we wouldn't need z-library.
| chongli wrote:
| IP-intensive industries contribute 41% of the US GDP and employ
| 44% of the US workforce [1]. If you abolished IP all of that
| would go away. How would you replace that? I'm not a fan of IP
| either but I think it's pretty hard to escape that reality. Big
| companies like NVIDIA (3T market cap) are almost 100% IP.
|
| IP is more-or-less central to the US's economic and security
| strategy. Without it, the country loses a huge amount of power
| and influence in the world.
|
| [1] https://www.uspto.gov/ip-policy/economic-
| research/intellectu...
| Super_Jambo wrote:
| Pay them from general taxation to dig holes and fill them in
| again with spoons?
|
| If you can only succeed because you're government is holding
| back others at the barrel of a gun do you really deserve to?
| data_maan wrote:
| "Whether we're supporting advancements on the leading edge of
| science or helping children build a strong learning foundation,
| shared knowledge is the common thread". (source:
| https://www.oclc.org/en/about.html)
|
| Well, now it's shared on a torrent, but I guess for them that was
| "over shared" lol.
|
| Also, they are a library but spent 5 million on cyber defense...
| seriously??
| aftbit wrote:
| Yeah why would a library of all places be upset about Anna's
| Archive? Their whole mission is sharing and preserving
| knowledge.
| data_maan wrote:
| What is the deal with this Njalla hosting service? Is it really
| so hard to take sites from there down?
| dtx1 wrote:
| It's made by a former piratebay founder. When you buy a domain
| from them, they act as a middleman and keep legal ownership of
| the domain thus shielding your PI somewhat from legal
| enforcement. They claim their hosting is "In secret locations
| in Sweden". Their LLC is located in Nevis (some tax evasion
| island).
|
| There's also https://1984.hosting/ which runs a similar
| operation out of iceland.
| rustcleaner wrote:
| Sweet! Thanks for the links guys! :^)
| autoexec wrote:
| Considering I haven't managed to run into malware or phishing
| sites on either on those providers they seem to be doing
| something right. Why are these so-called "evil" hosting
| companies less of a problem for me than namecheap, godaddy,
| and google?
| catlikesshrimp wrote:
| Mixed feelings about the case.
|
| Sharing is the best thing that can happen to knowledge. It is
| great that gatekeepers lose money over this.
|
| However, the blame of the loss might burden oclc, which might
| have been doing a positive job.
|
| https://www.oclc.org/en/about.html
| data_maan wrote:
| What is OCLCs added value? They didn't create the data
| catlikesshrimp wrote:
| Useless in my country because there are no libraries. But I
| can lookup libraries "Near my country" and beyond.
|
| I suppose some libraries will allow ebook loans through
| worldcat. They seem to be more about sharing within us law
| without directly charging people.
|
| Thats why I said positive. Torrent sharing is better, but idk
| if that will be sustainable
| knowaveragejoe wrote:
| What country has no libraries?
| autoexec wrote:
| Good question. The internet is full of lies, but it
| suggests that Papua New Guinea ranks at the bottom since
| it only has one and that was a gift from Australia.
| bawolff wrote:
| Organizing data is valuable.
|
| Not as valuable as the actual data, but its not nothing
| either.
| freehorse wrote:
| Suing Anna's Archive and similar product, eg being the
| lapdogs of big publishers, it seems. Why else would they care
| if they had no shared interests with them?
|
| MAybe I am missing a way to use their database that makes
| sense, but for me worldcat is pinterest-level and other SEO-
| pollution on my search results when I need to find some real
| information. I do not care if this book is in some library
| 1000km away. If I need a book I will search in my local ones.
| Never understood what is the point in what worldcat does but
| maybe others use it in some way useful to them.
| wongarsu wrote:
| OCLC's slogan is literally "Because what is known must be
| shared". Them being a litigative gatekeeper is mighty
| hypocritical.
|
| I don't really get their dilemma. They claim that a publicly
| available copy of the data on Anna's Archive is a direct threat
| to their business. But the same data is freely available by
| going to their own worldcat.org. Any library that was satisfied
| with pure read access to the data was already not going to pay
| them money.
|
| They allege that scraping the 2.2TB of data cost them $5
| million over 2 years. That's $2 per megabyte. If the cost of
| providing this was the only issue surely within those two years
| someone would have gotten the idea to just put up an XML dump
| for download, or to shoot Anna's Archive an email with an offer
| to just send them the data as soon as it became clear that it
| was them.
| lynguist wrote:
| What a ridiculous claim. They call computer hardware and salaries
| damage.
|
| I want a competent judge to make sure that these are not damages
| and I wish that Anna's Archive continues to operate in sensible
| jurisdiction for the foreseeable future.
|
| If I had the means I would donate to them.
| teractiveodular wrote:
| https://annas-archive.gs/donate
| dylan604 wrote:
| Having the means normally means the extra finances to give
| away vs knowing the link to their donation form
| autoexec wrote:
| Still kind of nice to have that link thrown out here for
| those with the means to support their work.
| shkkmo wrote:
| It an easily misunderstood idiom since the difference
| between the two meanings is just the definite vs indefinite
| article: 'the means' vs 'a means'.
| tehwebguy wrote:
| Article says nobody is responding as defendant except one
| individual who has filed a motion to dismiss based on being
| misidentified. I don't know the status of that motion though.
| ASalazarMX wrote:
| They're just piling on anything they can regardless of reason. A
| real damage count would be lost sales due to web scraping, but
| they don't sell anything.
|
| > For example, the organization spent $1,548,693 on upgrades for
| its hardware infrastructure, and an additional $608,069 for a
| two-year Cloudflare contract [..] Other costs include the
| salaries of 34 full-time employees, who were tasked with
| mitigating the harm caused by the attacks, as well as various
| other investigation, security, and hardware-related costs.
|
| > "OCLC has incurred damages of $5,333,064 as a direct result of
| Anna's Archive's cyberattacks, but that amount does not fully
| compensate OCLC for the harm from Anna's Archive's wrongful
| actions. OCLC continues to suffer from harms that cannot be
| remedied by monetary damages."
|
| Is web scraping now considered a cyberattack? Was it eating their
| bandwidth even if it was served through Cloudflare? LOL.
| gmuslera wrote:
| Ask Aaron Swartz
| rustcleaner wrote:
| I'm convinced that was not self-inflicted.
| neilv wrote:
| Some friends and family are likely to be on HN, and I'd
| assume probably don't want to see speculation like that.
| 2OEH8eoCRo0 wrote:
| The conspiracy/speculation on every damn topic around
| here is getting out of hand.
| bawolff wrote:
| > Is web scraping now considered a cyberattack? Was it eating
| their bandwidth even if it was served through Cloudflare? LOL.
|
| We've been tobagonning down the slippery slope of "cyber"
| damages for a long time now.
| odo1242 wrote:
| The original "damages" in a lawsuit are almost always made up.
| They're supposed to get adjusted downward during and after the
| lawsuit.
|
| This is also why having a default judgement delivered against
| you for failing to show up generally isn't great.
| squigz wrote:
| https://annas-archive.gs/torrents
|
| https://annas-archive.gs/donate
| pornel wrote:
| The mistake was calling it Anna's Archive, and not Anna's AI
| startup.
| greenie_beans wrote:
| what happens if somebody used the metadata from anna's archive?
___________________________________________________________________
(page generated 2024-07-08 23:00 UTC)