hngopher.com

       [HN Gopher] Lawsuit accuses Anna's Archive of hacking WorldCat
       ___________________________________________________________________
        
       Lawsuit accuses Anna's Archive of hacking WorldCat
        
       Author : nickthegreek
       Score  : 51 points
       Date   : 2024-02-07 19:18 UTC (3 hours ago)
        
 (HTM) web link (torrentfreak.com)
 (TXT) w3m dump (torrentfreak.com)
        
       | sillysaurusx wrote:
       | Honest question: how is a lawsuit filed against someone whose
       | identity isn't known? Is there a term for it?
       | 
       | I'm curious about the specific mechanics of the lawsuit.
       | Presumably the judge would make a ruling, and then that ruling
       | would stick if and when the identity became known. But is that
       | how it actually works?
       | 
       | Also, is there a way for Anna's Archive to defend themselves
       | without revealing their identity?
        
         | artninja1988 wrote:
         | >The complaint accuses Washington citizen Maria Dolores
         | Anasztasia Matienzo and several "John Does" of operating the
         | search engine and scraping WorldCat data. So I guess they're
         | accusing Maria of running the site?
        
           | sillysaurusx wrote:
           | Oh.
           | 
           | Well. Thank you for pointing that out.
        
           | popcalc wrote:
           | https://www.linkedin.com/in/anastasiamatienzo
           | 
           | Library Aide Volunteer Mansfield Public Library
           | 
           | Applications Developer at AT&T
           | 
           | DOJ indictment is coming any day now... then 10-20 years in
           | the slammer.
        
         | wongarsu wrote:
         | If you aren't identified you can't defend yourself, which I
         | assume would be an issue. I'd expect that the ruling can only
         | stick for the one named defendant, unless the names of the
         | other accused become known as the lawsuit progresses.
        
         | Thoreandan wrote:
         | The term I've heard is "John Doe lawsuit"
         | 
         | https://en.wikipedia.org/wiki/Fictitious_defendants
         | https://en.wikipedia.org/wiki/Doe_subpoena
         | 
         | There's several attorneys whose websites outline what's
         | involved.
        
         | devilbunny wrote:
         | IANAL, but AIUI filing a John Doe lawsuit is acceptable because
         | there are statutes of limitations on how long you have to file
         | a suit. By filing a case with an unknown defendant, you protect
         | your right to sue while you go about gathering evidence.
         | 
         | You're still going to have to sue someone identifiable at the
         | end of it, but they can't argue that your suit is too old
         | (since the reason it's too old is that the defendant was hiding
         | their identity, not that you weren't timely in noticing and
         | responding to the infringement).
        
       | artninja1988 wrote:
       | So they found the operator because they did some opsec mistakes
       | while scraping? Man that sucks. Now is the time to back up the
       | Archive I guess. Hope some good samaritan comes along as
       | dedicated to the cause as them... Godspeed Anna, you've been
       | incredibly helpful in setting information free
        
         | gpm wrote:
         | They claim to have found the operator...
         | 
         | The evidence they present for that is that she wrote a python
         | library to interact with their websites (or in their words
         | "developed a repository for a python module for interacting
         | with OCLC's WorldCat(r) Affiliate web services"). Also that she
         | worked for a competitor, describes herself as an archivist, and
         | has publicly stated that libraries and archives should be open
         | and publicly available.
         | 
         | It's not exactly convincing evidence that she's involved with
         | Anna's Archive IMHO.
        
           | chefandy wrote:
           | As a former library developer, I've written all sorts of
           | little bits like that for all sorts of archives and databases
           | and publicly supported open access. If that's really all
           | they've got, it's pretty light. I imagine they've at least
           | got some sort of access logs, surreptitiously viewed
           | conversations about it, or something similar.
        
           | hruzgar wrote:
           | They probably found her identity a different way but only
           | show this to the public (she most likely is 'Anna')
        
             | toyg wrote:
             | That's just your speculation. This is not the FBI or NSA,
             | it's a private company doing some random homework. They
             | could well have got it wrong.
        
           | wongarsu wrote:
           | A software developer who once volunteered at a library in
           | 2012 wrote in that very year a Python library to interact
           | with a library catalog [1]. What more evidence do you need
           | that she was involved in a scraping operation in [checks
           | notes] 2023?
           | 
           | Her name (Maria Dolores Anasztasia Matienzo) even contains
           | the letters Anna (almost), so she must be the mastermind
           | behind Anna's Archive!
           | 
           | Her library is the top search result if you search for
           | WorldCat in github. Though I personally would have gone with
           | bookops-worldcat, since they claim they made major changes to
           | the library to account for 2020 API changes in WorldCat.
           | 
           | Edit: just looked at the complaint again, I'm shocked that
           | they seem to have completely missed the dual meaning behind
           | the username anarchivist - they got "an archivist" but missed
           | "anarchy-vist". I'm sure if they had caught that they would
           | have added that to the pile of damning evidence.
           | 
           | 1: https://github.com/anarchivist/worldcat
        
             | gpm wrote:
             | Welp, I glanced at her github and when I decent see any
             | recent related projects assumed that it had been taken
             | private, it didn't even occur to me to look at projects
             | that hadn't been touched in a decade!
        
         | hruzgar wrote:
         | Yes. Hopefully!
        
       | okasaki wrote:
       | It says scraping. I thought scraping was legal?
        
         | uberman wrote:
         | "In addition to harvesting data from WorldCat.org, the
         | defendants are also accused of obtaining and using credentials
         | of a member library to access WorldCat Discovery Services. This
         | opened the door to yet more detailed records that are not
         | available on WorldCat.org."
        
           | throwup238 wrote:
           | AKA they used their library card number to login via their
           | public library. At least that's how it works with my local
           | library.
        
       | wongarsu wrote:
       | So in essence another "is scraping a cyberattack" suit, along
       | with asking for damages because their hard work is now available
       | for free.
       | 
       | The interesting thing here is imho that WorldCat is that WorldCat
       | is a bibliographic database, mostly listing the collections of
       | OCLC members, along with other information about these works.
       | Obviously this information takes work to collect and organize,
       | and that's what the membership fees are for. But for financial
       | harm to come to OCLC (beyond the costs of being scraped) it seems
       | to me that libraries would have to decide that they don't want to
       | pay membership fees and instead use the scraped, less up-to-date
       | version of the catalog? How likely is that to actually happen at
       | any scale?
        
         | fsckboy wrote:
         | between "scraping is cyberattack" and "damages" is "Lists,
         | Directories, and Databases" generally do not fall "Under
         | Copyright Law"
         | 
         | from https://www.justia.com/intellectual-
         | property/copyright/lists...
         | 
         |  _A work must have at least a minimal amount of creativity to
         | get copyright protection. This can pose a barrier to
         | copyrighting compilations of facts, such as lists, directories,
         | and databases. The information in them is not original, but
         | they still may receive protection if the people who compiled
         | them used some creativity in the process of selecting or
         | arranging the information. This does not mean that the
         | selection process must be unprecedented or bizarre, but it
         | should not be so mechanical that it required no thought. For
         | example, a telephone directory likely will not receive
         | copyright protection because listing the names in alphabetical
         | order does not show enough creativity. (The Supreme Court
         | reviewed this situation in the main case discussing copyrights
         | for lists and directories.)_
         | 
         | ...
         | 
         |  _Many lists are similar to the telephone directory example
         | considered by the Supreme Court. These include mailing lists,
         | membership and subscriber lists, street addresses, and
         | directories that list the contact information for certain
         | groups of people, such as college alumni. In most cases,
         | copyright will not be appropriate for these lists because they
         | are arranged in an obvious manner. They are typically arranged
         | alphabetically or numerically, and the people compiling the
         | list do not choose which items to include. Moreover, a mailing
         | list of people who made political contributions that was
         | organized by zip code was found to lie outside the boundaries
         | of copyright protection._
         | 
         | ...
         | 
         |  _Factory and store inventories do not receive copyright
         | protection. They are meant to be comprehensive, so no choice is
         | involved in compiling their content. Also, they are generally
         | arranged in alphabetical or numerical order. Businesses may be
         | able to protect their inventories as trade secrets instead._
         | 
         | This lawsuit is for tortious interference with prospective
         | business relationships
         | 
         |  _" Plaintiff OCLC Online Computer Library Center, Inc ("OCLC")
         | brought suit against Defendants Clarivate, Plc; Clarivate
         | Analytics (US) LLC; ProQuest, LLC; and Ex Libris (USA), Inc.
         | Alleging that Defendants are tortiously interfering and
         | conspiring to tortiously interfere with its contractual
         | relationships and tortiously interfering and conspiring to
         | tortiously interfere with its prospective business
         | relationships. This matter is before the Court on Plaintiff's
         | motion for temporary restraining order and preliminary
         | injunction, Doc. 4."_ from
         | https://librarytechnology.org/docs/27467.pdf
        
           | victorbjorklund wrote:
           | Worth noting is that in EU databases can have special IP
           | protections in themselves
        
           | wongarsu wrote:
           | A part of their damage claim is based on the costs incurred
           | by this so-called cyberattack. "These hacking attacks
           | [scraping mostly JSON] materially affected OCLC's production
           | systems and servers, requiring around-the-clock efforts from
           | November 2022 to March 2023 to attempt to limit service
           | outages and maintain the production systems' performance for
           | customers. To respond to these ongoing attacks, OCLC spent
           | over 1.4 million dollars on its systems' infrastructure and
           | devoted nearly 10,000 employee hours to the same." Also
           | "During this time, customers threatened and likely did cancel
           | their products and services with OCLC due to these
           | disruptions."
           | 
           | Those are probably the most valid angles to push here. Mostly
           | the invested time and money since they are actually
           | quantifiable, as opposed to the people who maybe probably
           | definitely canceled their service because of the disruption.
        
       | lupusreal wrote:
       | Based Anna, absolute queen.
        
       | mindslight wrote:
       | I've fallen behind and haven't really been keeping up with the
       | newest digital institutions, but this article has given me some
       | great pointers for digital libraries to investigate when I've got
       | the time or need. Thank you, OCLC!
        
       ___________________________________________________________________
       (page generated 2024-02-07 23:01 UTC)