[HN Gopher] Lawsuit accuses Anna's Archive of hacking WorldCat
___________________________________________________________________
Lawsuit accuses Anna's Archive of hacking WorldCat
Author : nickthegreek
Score : 51 points
Date : 2024-02-07 19:18 UTC (3 hours ago)
(HTM) web link (torrentfreak.com)
(TXT) w3m dump (torrentfreak.com)
| sillysaurusx wrote:
| Honest question: how is a lawsuit filed against someone whose
| identity isn't known? Is there a term for it?
|
| I'm curious about the specific mechanics of the lawsuit.
| Presumably the judge would make a ruling, and then that ruling
| would stick if and when the identity became known. But is that
| how it actually works?
|
| Also, is there a way for Anna's Archive to defend themselves
| without revealing their identity?
| artninja1988 wrote:
| >The complaint accuses Washington citizen Maria Dolores
| Anasztasia Matienzo and several "John Does" of operating the
| search engine and scraping WorldCat data. So I guess they're
| accusing Maria of running the site?
| sillysaurusx wrote:
| Oh.
|
| Well. Thank you for pointing that out.
| popcalc wrote:
| https://www.linkedin.com/in/anastasiamatienzo
|
| Library Aide Volunteer Mansfield Public Library
|
| Applications Developer at AT&T
|
| DOJ indictment is coming any day now... then 10-20 years in
| the slammer.
| wongarsu wrote:
| If you aren't identified you can't defend yourself, which I
| assume would be an issue. I'd expect that the ruling can only
| stick for the one named defendant, unless the names of the
| other accused become known as the lawsuit progresses.
| Thoreandan wrote:
| The term I've heard is "John Doe lawsuit"
|
| https://en.wikipedia.org/wiki/Fictitious_defendants
| https://en.wikipedia.org/wiki/Doe_subpoena
|
| There's several attorneys whose websites outline what's
| involved.
| devilbunny wrote:
| IANAL, but AIUI filing a John Doe lawsuit is acceptable because
| there are statutes of limitations on how long you have to file
| a suit. By filing a case with an unknown defendant, you protect
| your right to sue while you go about gathering evidence.
|
| You're still going to have to sue someone identifiable at the
| end of it, but they can't argue that your suit is too old
| (since the reason it's too old is that the defendant was hiding
| their identity, not that you weren't timely in noticing and
| responding to the infringement).
| artninja1988 wrote:
| So they found the operator because they did some opsec mistakes
| while scraping? Man that sucks. Now is the time to back up the
| Archive I guess. Hope some good samaritan comes along as
| dedicated to the cause as them... Godspeed Anna, you've been
| incredibly helpful in setting information free
| gpm wrote:
| They claim to have found the operator...
|
| The evidence they present for that is that she wrote a python
| library to interact with their websites (or in their words
| "developed a repository for a python module for interacting
| with OCLC's WorldCat(r) Affiliate web services"). Also that she
| worked for a competitor, describes herself as an archivist, and
| has publicly stated that libraries and archives should be open
| and publicly available.
|
| It's not exactly convincing evidence that she's involved with
| Anna's Archive IMHO.
| chefandy wrote:
| As a former library developer, I've written all sorts of
| little bits like that for all sorts of archives and databases
| and publicly supported open access. If that's really all
| they've got, it's pretty light. I imagine they've at least
| got some sort of access logs, surreptitiously viewed
| conversations about it, or something similar.
| hruzgar wrote:
| They probably found her identity a different way but only
| show this to the public (she most likely is 'Anna')
| toyg wrote:
| That's just your speculation. This is not the FBI or NSA,
| it's a private company doing some random homework. They
| could well have got it wrong.
| wongarsu wrote:
| A software developer who once volunteered at a library in
| 2012 wrote in that very year a Python library to interact
| with a library catalog [1]. What more evidence do you need
| that she was involved in a scraping operation in [checks
| notes] 2023?
|
| Her name (Maria Dolores Anasztasia Matienzo) even contains
| the letters Anna (almost), so she must be the mastermind
| behind Anna's Archive!
|
| Her library is the top search result if you search for
| WorldCat in github. Though I personally would have gone with
| bookops-worldcat, since they claim they made major changes to
| the library to account for 2020 API changes in WorldCat.
|
| Edit: just looked at the complaint again, I'm shocked that
| they seem to have completely missed the dual meaning behind
| the username anarchivist - they got "an archivist" but missed
| "anarchy-vist". I'm sure if they had caught that they would
| have added that to the pile of damning evidence.
|
| 1: https://github.com/anarchivist/worldcat
| gpm wrote:
| Welp, I glanced at her github and when I decent see any
| recent related projects assumed that it had been taken
| private, it didn't even occur to me to look at projects
| that hadn't been touched in a decade!
| hruzgar wrote:
| Yes. Hopefully!
| okasaki wrote:
| It says scraping. I thought scraping was legal?
| uberman wrote:
| "In addition to harvesting data from WorldCat.org, the
| defendants are also accused of obtaining and using credentials
| of a member library to access WorldCat Discovery Services. This
| opened the door to yet more detailed records that are not
| available on WorldCat.org."
| throwup238 wrote:
| AKA they used their library card number to login via their
| public library. At least that's how it works with my local
| library.
| wongarsu wrote:
| So in essence another "is scraping a cyberattack" suit, along
| with asking for damages because their hard work is now available
| for free.
|
| The interesting thing here is imho that WorldCat is that WorldCat
| is a bibliographic database, mostly listing the collections of
| OCLC members, along with other information about these works.
| Obviously this information takes work to collect and organize,
| and that's what the membership fees are for. But for financial
| harm to come to OCLC (beyond the costs of being scraped) it seems
| to me that libraries would have to decide that they don't want to
| pay membership fees and instead use the scraped, less up-to-date
| version of the catalog? How likely is that to actually happen at
| any scale?
| fsckboy wrote:
| between "scraping is cyberattack" and "damages" is "Lists,
| Directories, and Databases" generally do not fall "Under
| Copyright Law"
|
| from https://www.justia.com/intellectual-
| property/copyright/lists...
|
| _A work must have at least a minimal amount of creativity to
| get copyright protection. This can pose a barrier to
| copyrighting compilations of facts, such as lists, directories,
| and databases. The information in them is not original, but
| they still may receive protection if the people who compiled
| them used some creativity in the process of selecting or
| arranging the information. This does not mean that the
| selection process must be unprecedented or bizarre, but it
| should not be so mechanical that it required no thought. For
| example, a telephone directory likely will not receive
| copyright protection because listing the names in alphabetical
| order does not show enough creativity. (The Supreme Court
| reviewed this situation in the main case discussing copyrights
| for lists and directories.)_
|
| ...
|
| _Many lists are similar to the telephone directory example
| considered by the Supreme Court. These include mailing lists,
| membership and subscriber lists, street addresses, and
| directories that list the contact information for certain
| groups of people, such as college alumni. In most cases,
| copyright will not be appropriate for these lists because they
| are arranged in an obvious manner. They are typically arranged
| alphabetically or numerically, and the people compiling the
| list do not choose which items to include. Moreover, a mailing
| list of people who made political contributions that was
| organized by zip code was found to lie outside the boundaries
| of copyright protection._
|
| ...
|
| _Factory and store inventories do not receive copyright
| protection. They are meant to be comprehensive, so no choice is
| involved in compiling their content. Also, they are generally
| arranged in alphabetical or numerical order. Businesses may be
| able to protect their inventories as trade secrets instead._
|
| This lawsuit is for tortious interference with prospective
| business relationships
|
| _" Plaintiff OCLC Online Computer Library Center, Inc ("OCLC")
| brought suit against Defendants Clarivate, Plc; Clarivate
| Analytics (US) LLC; ProQuest, LLC; and Ex Libris (USA), Inc.
| Alleging that Defendants are tortiously interfering and
| conspiring to tortiously interfere with its contractual
| relationships and tortiously interfering and conspiring to
| tortiously interfere with its prospective business
| relationships. This matter is before the Court on Plaintiff's
| motion for temporary restraining order and preliminary
| injunction, Doc. 4."_ from
| https://librarytechnology.org/docs/27467.pdf
| victorbjorklund wrote:
| Worth noting is that in EU databases can have special IP
| protections in themselves
| wongarsu wrote:
| A part of their damage claim is based on the costs incurred
| by this so-called cyberattack. "These hacking attacks
| [scraping mostly JSON] materially affected OCLC's production
| systems and servers, requiring around-the-clock efforts from
| November 2022 to March 2023 to attempt to limit service
| outages and maintain the production systems' performance for
| customers. To respond to these ongoing attacks, OCLC spent
| over 1.4 million dollars on its systems' infrastructure and
| devoted nearly 10,000 employee hours to the same." Also
| "During this time, customers threatened and likely did cancel
| their products and services with OCLC due to these
| disruptions."
|
| Those are probably the most valid angles to push here. Mostly
| the invested time and money since they are actually
| quantifiable, as opposed to the people who maybe probably
| definitely canceled their service because of the disruption.
| lupusreal wrote:
| Based Anna, absolute queen.
| mindslight wrote:
| I've fallen behind and haven't really been keeping up with the
| newest digital institutions, but this article has given me some
| great pointers for digital libraries to investigate when I've got
| the time or need. Thank you, OCLC!
___________________________________________________________________
(page generated 2024-02-07 23:01 UTC)