hngopher.com

       [HN Gopher] Taking action against scraping for hire
       ___________________________________________________________________
        
       Taking action against scraping for hire
        
       Author : pawelkobojek
       Score  : 202 points
       Date   : 2022-07-07 12:33 UTC (10 hours ago)
        
 (HTM) web link (about.fb.com)
 (TXT) w3m dump (about.fb.com)
        
       | samsoftstuff wrote:
       | It's like they don't know that courts made it legal:
       | https://techcrunch.com/2022/04/18/web-scraping-legal-court/
        
       | xvector wrote:
       | HN is hypocritical - most commenters here are against this
       | because "Meta bad," but at the same time, most commenters
       | wouldn't want their posts shared privately amongst friends to be
       | scraped and made available publicly.
        
         | mpeg wrote:
         | For that to happen, one of your friends would have had to
         | willingly allow this tool to scrape their social network, which
         | would include your private posts.
         | 
         | Is the scraper to blame here, or the friend?
        
         | Komodai wrote:
         | lol maybe if you don't want that happening you shouldn't be
         | using Facebook
        
         | pawelkobojek wrote:
         | There are two cases they brought up, one being web scraping and
         | the other is making a clone website publicly displaying content
         | from Instagram.
         | 
         | I think Meta might be mixing up these two cases here on purpose
         | to make it look like web scraping is as bad as stealing photos
         | to publish it on a clone website.
        
         | postalrat wrote:
         | Who is scraping their private messages? Themselves or their
         | friends?
        
         | oefrha wrote:
         | > most commenters wouldn't want their posts shared privately
         | amongst friends to be scraped and made available publicly.
         | 
         | Where's the "posts shared privately amongst friends made
         | public" part? There are two cases here:
         | 
         | 1. A service that logs in as the customer (who voluntarily
         | provide their credentials) and scrapes information visible to
         | said customer on their behalf. Nothing about "made available
         | publicly" is alleged.
         | 
         | 2. An individual using a pool of bot accounts to scrape posts
         | visible to any logged in user. Nothing about "shared privately"
         | is alleged. To be clear I don't like the method, but I'll also
         | have to admit I've used one of the Instagram "clone sites" in
         | the past thanks to their login wall.
         | 
         | Unless I missed something, it sounds like you just made it up.
        
         | ogurechny wrote:
         | As many other people, you are calling something "private" when
         | it is not.
         | 
         | "Privately shared with friends" used to mean that only you and
         | your friends know something. You don't "share" anything with
         | "friends" on a social network. You give the information to a
         | giant corporation. If it finds it suitable, it then delivers it
         | to other users, but only after it records your location,
         | analyzes the content to check if you were, say, affected by
         | some melodramatic event (and therefore should be tricked into
         | spending more time... I mean, get "personal recommendations"
         | for a certain kind of content), and does a billion other
         | things.
         | 
         | If you consider that this is fine, please relay all your
         | conversations with family and friends through me from now on. I
         | offer secure, reliable, fast, yada yada communication service.
         | And it's hip! Ask anyone on the street what they use.
        
       | almog wrote:
       | Ironically, around a year ago I disclosed (using their White Hat
       | bug bounty program) that I'm able to access recruitment data
       | (candidates details mostly) using very cheap form of scraping
       | against a 3rd party service provider, they dismissed it and
       | instructed me to report it to the 3rd party that operates that
       | service (which I did beforehand but the issue has had not been
       | fixed).
       | 
       | Sorry for being vague here, I haven't publicly disclosed it yet,
       | but will probably have to if it don't get fixed.
        
       | Hedepig wrote:
       | Is this much different from LinkedIn vs hiQ?
        
         | nojito wrote:
         | Logged in vs not logged in data.
        
           | logifail wrote:
           | > Logged in
           | 
           | Is this actually private data, or is it public stuff that's
           | become annoyingly hard to view anonymously because Meta chose
           | to stick it behind a login box?
        
             | cupofpython wrote:
             | >public stuff that's become annoyingly hard to view
             | anonymously because Meta chose to stick it behind a login
             | box
             | 
             | this one
        
             | nojito wrote:
             | Anything behind a login gate is private data for that
             | registered user only.
        
               | logifail wrote:
               | > Anything behind a login gate is private data for that
               | registered user only
               | 
               | That's quite the claim, if only the login gate were
               | either always there or indeed always not.
               | 
               | Presuambly such "private" data ought not to be being
               | indexed by search engines and returned to users who
               | search?
               | 
               | "site:instagram.com" is of the order of 228 million pages
               | on google.com, and "site:facebook.com" is another 422
               | million.
        
               | nojito wrote:
               | pretty sure you get hit with a login gate if you navigate
               | to the results via site:instagram.com no?
        
               | logifail wrote:
               | > you get hit with a login gate if you navigate to the
               | results via site:instagram.com
               | 
               | Nope, I just tried it (private browser session, no IG
               | activity from my IP recently)
               | 
               | google.com -> "site:instagram.com nojito" -> results ->
               | www.instagram.com/explore/tags/nojito/ with a page of
               | photos.
               | 
               | Quickly scrolling down the page for several dozen photos
               | does eventually trigger the login box, though.
        
               | upupandup wrote:
               | but you make it public for everybody with the publicly
               | accessible login so it wouldn't be considered private
               | data for the same reason news outlets can use your
               | instagram images and share it widely without your
               | permission.
               | 
               | you can't throw up a login screen but then allow people
               | to post themselves that ends up in public domain because
               | the login does not distinguish from public or
               | permissioned user authorized to view your selfie pics.
        
               | Nextgrid wrote:
               | Depends if another user can also access it, or whether
               | the original author/owner of the data in question intends
               | for it to be public. In Facebook's case, there are
               | permission levels you can set on posts, including a
               | "public" option (which isn't actually public though and
               | will require a login anyway, but it can be _any_ login)
               | which would settle that debate quickly - hell I wouldn 't
               | be surprised if that option were to be hidden as to not
               | acknowledge that a particular bit of data was explicitly
               | posted for everyone to see.
        
               | logifail wrote:
               | > In Facebook's case, there are permission levels you can
               | set on posts, including a "public" option (which isn't
               | actually public though and will require a login anyway,
               | but it can be any login)
               | 
               | Q: Have you tried this?
               | 
               | In a private browser session I started at google.com,
               | searched for "site:facebook.com nextgrid", picked some
               | random post, click through, and was reading the post
               | without anything other than seeing FB's cookie banner. No
               | sign of any login (which is good 'cause I don't have one)
        
               | Nextgrid wrote:
               | I suspect it depends on your region, page/post in
               | question and browser fingerprint. A post marked as public
               | isn't 100% guaranteed to be publicly viewable. Sometimes
               | you can view it but merely scrolling down on the page
               | would trigger a login form for example (I've had this
               | happen for pages that are definitely meant to be public
               | such as businesses who'd have an interest in getting as
               | many eyeballs as possible on their content).
               | 
               | I might be wrong and maybe the behavior is actually fully
               | deterministic and isn't nefarious, but knowing the
               | company behind it I'll assume malice until proven
               | otherwise.
        
       | Nextgrid wrote:
       | So much bad faith in this press release but not surprising from
       | such a disgusting company, with of course some China-related
       | fear-mongering despite no evidence of wrongdoing.
       | 
       | > After paying for access to the scraping software, customers
       | self-compromised their Facebook and Instagram accounts by
       | providing their authentication information to Octopus.
       | 
       | They didn't "self-compromise" their account. They trust Octopus
       | to act on their behalf, and unlike Facebook, Octopus' interests
       | are most likely more aligned with their users' since their
       | service is paid. This is no different from handing your Facebook
       | credentials to your social media manager or secretary. There's no
       | evidence that Octopus misused this access in any way.
       | 
       | > Octopus designed the software to scrape data accessible to the
       | user when logged into their accounts, including data about their
       | Facebook Friends such as email address, phone number, gender and
       | date of birth, as well as Instagram followers and engagement
       | information such as name, user profile URL, location and number
       | of likes and comments per post.
       | 
       | This is either information people intend to be public or
       | information they trust their friends to keep private. Now if
       | Octopus was leaking the private information to third-parties it
       | would be one thing, but so far I see no evidence Octopus was
       | disclosing the scraped information to anyone but their customer
       | (who is already authorized to access it).
       | 
       | > Meta is an industry leader in taking legal action to protect
       | people from scraping and exposing these types of services
       | 
       | Translation: Meta is an industry leader in protecting its
       | disgusting business model that hinges on making public data
       | behind a walled garden with an unacceptable "privacy" policy.
       | There wouldn't be a market for Octopus (or other scrapers) if
       | Facebook already allowed customers to efficiently access
       | information they're already entitled to, but that would be
       | against their interests as their entire business hinges on
       | information being held hostage.
       | 
       | They've created a problem, are selling the cure (well in this
       | case monetizing it via ads) and are now pissed off that someone
       | else is selling the cure for cheaper.
        
       | nicholasjarnold wrote:
       | Funny story from the early days of TheFaceBook, probably around
       | 2005ish:
       | 
       | I was a webmaster of a set of servers on a major university's
       | network. I also had access (enough to run arbitrary programs that
       | had pretty much full ingress/egress to the public internet) to a
       | number of machines across the campus's network. Through some of
       | my coursework and ACM chapter activities I met some other
       | similarly minded technical people with similar levels of access.
       | 
       | We decide that it would be fun to use our superpowers (access +
       | programming abilities + curiosity) to sign up for various
       | accounts on FB and essentially scrape and friend as much as
       | possible. At the time they had some rate limiting, some IP
       | banning (which wasn't terrible because the Uni gave public IPv4
       | addrs to all machines on campus by default) and then added some
       | early CAPTCHA which we ended up breaking pretty trivially with
       | some python and image recognition code.
       | 
       | Never got sued... :) Never really did much with the scripts or
       | data except test that they worked. Fun times.
        
       | Komodai wrote:
       | Is it Octopus Data Inc. aka Octoparse they are suing?
        
       | typon wrote:
       | Google has turned Google Search into a walled garden by scraping
       | people's content and serving it up on their own platter. Is
       | anyone going to stand up to them?
        
       | allenleein wrote:
       | Ironically, Octopus reminds me of "Octopus VR" in the Silicon
       | Valley show.
       | 
       | https://www.youtube.com/watch?v=ltFB4WBdDg4
        
         | mothsonasloth wrote:
         | "It's a water animal"
        
       | NelsonMinar wrote:
       | Octopus sounds really useful; is there an open source equivalent?
       | I'd love to be able to scrape my own data on Facebook. Their data
       | export feature is fairly good but far from complete.
        
       | jascii wrote:
       | So, Facebook doesn't want to share the data it wants us to share
       | with them? Figures...
        
       | [deleted]
        
       | jacooper wrote:
       | They are will using fb.com domain? I though meta is not
       | FaceBook?....
        
         | Silica6149 wrote:
         | I think it's like Google vs Alphabet. Alphabet is the parent
         | company like Meta.
         | 
         | As for why their domain is facebook for their news site, not
         | sure why. It would make for sense for it to be under meta
         | instead.
        
       | carride wrote:
       | In the early days of FB, they convinced people that pages (or
       | some content, sorry I do not know the FB terms) could be public
       | for anyone to view without needing to login to FB. This was very
       | helpful for small businesses and communities. In many countries
       | this is still the quickest place to make a public page. Though
       | now, every small business or community page I want to visit is
       | locked out unless I login FB. Even if I do login it is impossible
       | to copy paste the important details of a page or post, plus the
       | UI is as ugly as it has always been.
        
         | carride wrote:
         | I am currently in the USA and when I visit a public FB page
         | e.g. [1], there is a small login header, and a very big
         | annoying footer login. I estimate 15% of the content is
         | blocked. I had spent the past year outside USA until one month
         | ago. When I visited the same sites while traveling outside the
         | USA, the annoying login footer moves to the middle of the page
         | blocking almost all content. I do not have proof at the moment,
         | but that was my experience trying to read 95% of government,
         | business, and community pages who are almost all on FB.
         | [1] https://www.facebook.com/ParquesNacionalesdeArgentina
        
       | Litost wrote:
       | Anyone else heard of Tim Berners-Lee's idea of hosting your data
       | in pods outside the relevant corps wanting access to it and you
       | controlling what's shared and how? This is such a completely
       | different way of doing it, I'm not sure of all the implications,
       | be that from admin (how much effort) to security (would this be a
       | massive hacking opportunity) etc.
       | https://www.theregister.com/2022/01/20/tim_bernerslee/
        
       | dmje wrote:
       | Or Facebook could just open up their data. Oh wait, not _their
       | data_ , silly me. Everyone else's data. Keep on scraping, I say.
        
       | rmbyrro wrote:
       | The fact they're wasting time on that is a sign that Facebook
       | decay phase has already started.
        
       | rustdeveloper wrote:
       | "This industry makes scraping available to individuals and
       | companies that otherwise would not have the capabilities." -
       | seems like web scraping companies are doing a good job :)
        
         | theincredulousk wrote:
         | Maybe some irony here as IIRC Facebook started as essentially a
         | scraping company, pulling student profiles from college
         | websites and re-publishing it for their own profit.
         | 
         | The scrapers have become the scrapees. The horror.
        
         | jhoelzel wrote:
         | The phone charger makes engery available to individuals and
         | companies that otherwise would not have the capabilities. ;)
        
       | dangerlibrary wrote:
       | Fingers crossed they eventually get around to suing Clearview AI
       | out of existence.
       | 
       | https://www.nytimes.com/2020/01/18/technology/clearview-priv...
        
       | i_have_an_idea wrote:
       | > After paying for access to the scraping software, customers
       | self-compromised their Facebook and Instagram accounts by
       | providing their authentication information to Octopus
       | 
       | "self-compromised" lol
       | 
       | clearly these people just wanted an automated way to access their
       | own data
        
         | antonf wrote:
         | > clearly these people just wanted an automated way to access
         | their own data
         | 
         | GDPR and CCPA (and probably many other national/state privacy
         | laws) forces facebook/instagram/etc to let you download and/or
         | delete your data without using third party websites. Usually
         | people self-compromise their accounts in exchange for money:
         | https://www.buzzfeednews.com/article/craigsilverman/facebook...
        
       | throwaway5959 wrote:
       | Wasn't Meta stealing news articles and not paying news
       | organizations for them?
        
       | htrp wrote:
       | This is different from LinkedIn v HiQ because HiQ was only
       | scraping publicly available data that was generally accessible to
       | the broader internet. In these two cases, the data is being
       | scraped from FB/Insta using credentials that the client handed
       | over or the mass creation of accounts solely for scraping
       | purposes.
        
         | postalrat wrote:
         | What would be your position the data being scraped is data the
         | site is selectively providing google for indexing but don't
         | provide publicly.
        
         | squaresmile wrote:
         | Yeah, I think this is more like the Cambridge Analytica
         | situation.
        
           | Nextgrid wrote:
           | I wish the Cambridge Analytica FUD would stop. CA's "attack"
           | was to setup a malicious website that convinced idiots to
           | give it access to their Facebook account using the standard
           | oAuth2 flow.
           | 
           | Did they misuse the collected data? Sure. But people granted
           | access to that data knowingly. This wasn't really an attack
           | in my view.
           | 
           | Facebook wasn't really complicit and definitely didn't
           | sell/give away any data.
        
           | benwad wrote:
           | Did FB ever take any legal action against Cambridge
           | Analytica? I can't remember anything about it and this sounds
           | very similar to that (although back in those days FB's tools
           | made this incredibly easy).
        
             | lesuorac wrote:
             | No. FBs ToS at the time [1] allowed CA to do what they did.
             | 
             | Namely, CA didn't resell the data or give it to an ad
             | agency.
             | 
             | [1]: https://web.archive.org/web/20180329131546/https://dev
             | eloper...
        
         | Nextgrid wrote:
         | > the mass creation of accounts solely for scraping purposes.
         | 
         | Those accounts wouldn't be allowed to view private data though
         | unless they friend/follow the person first, so they'll only
         | still be limited to data the account holders intend to be
         | public and available to anyone.
         | 
         | There's also no evidence that the scraped data was aggregated
         | at scale or commingled in any way, so even if customers
         | provided their actual credentials which grant them access to
         | private data of their friends, the scraper didn't share it with
         | anyone else but them.
        
       | upupandup wrote:
       | whoa wasn't there somebody on HN that ran a web scraping shop
       | that were boasting they can scrape instagram a while back? are
       | these the same guys???
       | 
       | I don't know how far Facebook can get with this, thought
       | Linkedin's court ruling made scraping legal de-facto
        
       | neya wrote:
       | _Evil Big Co._ that literally STEALS people 's personal
       | information everywhere they go even after they've indicated they
       | want to be left alone is now offended when someone does the same
       | to them?
       | 
       | Well, color me surprised /s
       | 
       | Fuck Facebook. Meta. Or whatever you want to call it.
        
       | PhilipA wrote:
       | >Octopus, a US subsidiary of a Chinese national high-tech
       | enterprise, built a cloud-based platform designed to provide
       | paying customers access to on-demand scraping software and
       | services.
       | 
       | It is interesting as how they try to position this as a Chinese
       | attack on them.
        
         | upupandup wrote:
         | It must coincide with Christopher Wray's sudden claim that
         | there is an active dragnet of sorts that is trying to subvert
         | America from within much like the recent election interference
         | of a former Tianmen square activist who tried to run for
         | congress I think.
         | 
         | It makes me think that there are many people on CCP's dole,
         | rich powerful famous people are somehow beholden to the CCP in
         | some unknown way but we can all guess correctly that they are
         | _all old white men_ who have previously been seen with young
         | females.
        
         | MangoCoffee wrote:
         | it look like Zack is giving up on the Chinese market.
        
           | romanovcode wrote:
           | I guess after Winnie the Pooh rejected to name his children
           | for him he got sour grapes for China.
        
       | ok123456 wrote:
       | Remember back when facebook grew their little network by scraping
       | your gmail contacts.
       | 
       | Google blocked them.
       | 
       | There was animus between the two companies that resulted in
       | Facebook not making an official android app until 2010.
        
       | romanovcode wrote:
       | > Meta is an industry leader in taking legal action to protect
       | people from scraping and exposing these types of services, which
       | provide scraping as a service across multiple websites.
       | 
       | Sure, as long as Meta is not the one selling the data to
       | Cambridge Analytica it's wrong.
        
       | uhtred wrote:
       | Fuck off Facebook you scumbags
        
       | iandanforth wrote:
       | Collecting the rhetorical BS:
       | 
       | "scraping attacks"
       | 
       | Scraping is not an attack. Monopolists want to pretend they own
       | your data because they get unlimited access to monetize it
       | whereas competitors should have none.
       | 
       | "self-compromised"
       | 
       | Monopolists want to sell _you_ thus it 's imperative they
       | maintain the fiction of "one person, one account". By admitting
       | you own your account, they'd have to allow sharing and they
       | wouldn't be able to provide their customers (advertisers) with
       | reliable data about individuals.
       | 
       | "protect people from scraping"
       | 
       | Monopolists will protect themselves and call it protecting you.
       | They will attempt to make you afraid of some _other_ actor using
       | your data in harmful ways so as to detract from how they monetize
       | you and use your data in harmful ways.
       | 
       | "deter the abuse"
       | 
       | Monopolists don't want to argue about what constitutes abuse.
       | Anything they write in their TOS is entirely for their benefit
       | and only constrained by local law (if that). They will abuse you
       | to the fullest extent they can get away with while arguing that
       | any action to use your rights is "abuse."
       | 
       | "safeguard people against clone sites"
       | 
       | Monopolists want to maintain their monopoly, there is no greater
       | threat than a direct challenge to that monopoly by allowing data
       | to move freely.
       | 
       | --
       | 
       | More subtle but even more ironic rhetorical points
       | 
       | "for hire" / "paying for access"
       | 
       | Emphasizing that people making _money_ (gasp) for providing this
       | service, is _bad_.
       | 
       | "industry leader in taking legal action" + "across many platforms
       | and national boundaries, also requires a collective effort from
       | platforms, policymakers and civil society"
       | 
       | Monopolists can pay high priced marketers to rebrand them as
       | patriotic hero figures fighting valiantly for the little guy.
        
         | rmbyrro wrote:
         | Missed this one:
         | 
         | > _a US subsidiary of a_ "Chinese national" "high-tech"
         | _enterprise_
         | 
         | Replacing it with "a business" would do just fine.
        
           | lupire wrote:
        
         | mylons wrote:
         | they also toss in the chinese affiliation in hopes to bring
         | even more ill will from the reader towards the company. china
         | is probably doing some bad things, but scraping facebook ain't
         | one of them.
        
           | iandanforth wrote:
           | Good point, I missed that one.
        
           | kube-system wrote:
           | Scraping social media is something that China is very
           | notorious for doing. They are 100% positively scraping all
           | major social networks around the world.
           | 
           | They do this to collect information of foreign policy
           | interest to them, to silence political dissidents abroad,
           | etc.
           | 
           | For example: https://www.washingtonpost.com/national-
           | security/china-harve...
           | 
           | And: https://www.propublica.org/article/even-on-us-campuses-
           | china...
        
         | pr0zac wrote:
         | While I agree with your assessment of the BS in the article wrt
         | scraping, and also agree with your assessment that the
         | behaviour is completely about FB protecting itself and its
         | monopoly control (the word control being important), I think
         | its important to emphasize its not about FB caring whether
         | other entities having access to the data, its about FB caring
         | about it's public perception with regard to its having that
         | data at all.
         | 
         | Over the last few years or so it feels like, to reference a
         | @dril tweet[1], Facebook has just been 'turning a big dial taht
         | says "data access" on it and constantly looking back at the
         | audience for approval like a contestant on the price is right'
         | with how much it allows 3rd parties to get at its data.
         | 
         | Keep in mind ~5 years ago the big thing at FB was "Open Graph"
         | and "Graph Search" which gave everyone really in-depth access
         | to their data with the idea that Facebook would be the "data
         | platform" on top of which all of these 3rd parties would build
         | apps and interfaces. This of course eventually resulted in the
         | whole Cambridge Analytica thing and now this gigantic swing in
         | the other direction of being overly protective of the data as a
         | kneejerk PR reaction to all the bad press.
         | 
         | FB loved sharing data and provided a direct API for accessing
         | it when the public narrative was about data freedom and 3rd
         | party developer friendliness and it hates giving any access at
         | all and goes around sues web scrapers now that the public
         | narrative is all about privacy.
         | 
         | Facebook will happily align itself in whatever way results in
         | the least public outcry arguing they shouldn't be allowed to
         | have the data in the first place regardless of if that means
         | giving access or restricting it.
         | 
         | 1: https://twitter.com/dril/status/841892608788041732
        
         | noslenwerdna wrote:
         | The users agreed to share their data with Facebook, not some
         | other company. If they didn't prevent this, they'd be asking
         | for another Cambridge Analytica
        
           | greatgib wrote:
           | The user agreed in facebook to have is data "public", so it
           | can't complain that a robot scrap it.
           | 
           | Nothing prevents him to restrict access to his pages an data
           | to "trusted" friends.
        
             | kube-system wrote:
             | The description in the article sounds like it scrapes
             | private profile data.
             | 
             | > Octopus designed the software to scrape data accessible
             | to the user when logged into their accounts
        
               | greatgib wrote:
               | I don't think so, it is more like you scrape what is
               | accessible to this user. So in the end you will scrape
               | your friends data. This is why I said that you are free
               | to only share with friends that 'you trust'.
        
               | Kwpolska wrote:
               | Were they showing the private data to everyone, or just
               | to the person whose account was used for the scraping? If
               | it's the latter, then this is also not a crime, it is
               | just someone accessing data they have been authorized to
               | access, but in an automated way.
        
           | stickfigure wrote:
           | The users agreed to share their data with everyone that uses
           | Instagram. Because that's how the site works.
        
             | kube-system wrote:
             | There's an important difference between technically
             | consenting and informed consent.
             | 
             | Given what I know about the bot problem on Instagram, I
             | would imagine many people have been tricked into sharing
             | their private profiles with scraping bots. Many bots are
             | copying real people's profiles and then spamming their
             | friends with follow requests. It's highly effective and
             | gives these bots access to private profiles.
             | 
             | Fooling people is fraudulent, period.
        
           | jasfi wrote:
           | That is a very good point, but surely it was taken into
           | consideration when scraping was declared legal?
        
             | danuker wrote:
             | https://techcrunch.com/2022/04/18/web-scraping-legal-court/
        
             | stefan_ wrote:
             | All that case says is "scraping is not a violation of the
             | CFAA". But of course the scraped data still exists in legal
             | limbo; maybe you can compute derived information from it,
             | but the moment a scraper _reproduces_ it there is all of
             | copyright law waiting for them.
        
         | TechBro8615 wrote:
         | Indeed. It's the height of hypocrisy for a company to define
         | the borders of its own system and then prosecute those who they
         | consider in violation of them. There is no consideration given
         | to whether the data should have been collected and retained by
         | Facebook in the first place, regardless of whatever arbitrary
         | access policies they defined to fit their own business and data
         | model.
         | 
         | It's not clear what Facebook's position on scraping truly is.
         | Sometimes they downplay it as "normalized and widespread," and
         | other times they castigate it as inexplicably legal and clearly
         | immoral, or even outright "in violation of state and federal
         | law." For example:
         | 
         | - April 2021. Researchers find an exposed database containing
         | the scraped data of 533 million facebook users. Some news
         | reports refer to it as a "breach." Facebook attempts to
         | downplay the issue as the result of third party scraping.
         | Headline in ZDNet: "Internal Facebook email reveals intent to
         | frame data scraping as 'normalized, broad industry issue'" [0]
         | 
         | - October 2020. Facebook announces lawsuits against companies
         | it claimed created a "malicious extension on Google's Chrome
         | Web Store designed to scrape Facebook, in violation of
         | Facebook's Terms and Policies and state and federal law." [1]
         | 
         | So... which is it? Does Facebook believe that scraping is a
         | "broad, normalized industry issue?" Or is it a violation of
         | "state and federal law?" It seems like they measure severity of
         | its impact primarily based on the reactions of political
         | commentators.
         | 
         | And what's the difference between automating a browser and
         | automating an API client? Why did Facebook design an API for
         | accessing the data they collected, if it's illegal to collect?
         | They've even claimed to be the victim of Cambridge Analytica,
         | who purchased a "quiz" application created by a developer who
         | pieced it together using code straight from the "examples"
         | section of Facebook's API documentation.
         | 
         | There is one obvious resolution to this apparent contradiction.
         | If we remove Facebook from the question, then the contradiction
         | resolves itself. All we need to do is stop presuming that
         | Facebook has the right to collect and retain this data in the
         | first place. And as a user, if you publish your data to a
         | website designed for sharing it with other people, then by
         | definition it is no longer private data. Therein lies the
         | central question: what is "semi-private" data, and who controls
         | its boundaries?
         | 
         | [0] https://www.zdnet.com/article/facebook-internal-email-
         | reveal...
         | 
         | [1] https://about.fb.com/news/2020/10/taking-legal-action-
         | agains...
         | 
         | p.s. another thing they never mention is _why_ companies want
         | to scrape lists of facebook users. perhaps it might have
         | something to do with the  "lookalike audience" feature, and its
         | more precisely targetable predecessors, which allow advertisers
         | to upload a list of usernames and email addresses for targeted
         | advertising?
        
         | utahcon wrote:
         | The only argument I have here (sadly in favor of FB) is with
         | "safeguard people against clone sites". While I did give my
         | data to FB, I didn't approve that transfer to another
         | site/system. That is the only place I could possibly see some
         | legal foot hold.
        
           | kbenson wrote:
           | It's impossible to control information once been created. The
           | longer it's existed and the more locations you can see it
           | make that spread exponentially more likely.
           | 
           | Wehether we make that spread of informationlegal or not does
           | little to affect whether it happens.
           | 
           | There are two things that might help. First, don't share as
           | much information. Once it's no longer limited to you or your
           | close group of friends which hopefully won't share it along
           | with your name, it's mostly out of your control. Second, put
           | limits (laws) on what information companies are able to
           | synthesize about you, and how long they can retain it. If
           | there's less information created about you (or it's
           | ephemeral, created and destroyed as needed), and if they need
           | to clean out older data, there's less to be shared or stolen.
        
             | kube-system wrote:
             | "It's hard to enforce the rule of law" is not a good reason
             | to abandon it entirely. Data privacy laws make data privacy
             | better even without being 100% infallible.
             | 
             | We should be both practicing good data hygiene _and_ using
             | legal tools to combat those who abuse data privacy.
        
               | kbenson wrote:
               | > "It's hard to enforce the rule of law" is not a good
               | reason to abandon it entirely.
               | 
               | I didn't?
               | 
               | > We should be both practicing good data hygiene and
               | using legal tools to combat those who abuse data privacy.
               | 
               | That's what I said. The first thing is data hygiene, the
               | second is legal requirements. The difference I think is
               | that the legal requirements should be on the actual
               | creation and retention of the data, not just who owns it,
               | who it can be shared with, etc.
               | 
               | As soon as PII information over a certain age is
               | radioactive and linked to a fine _per person_ , all of a
               | sudden there'll be a lot less giant repositories of PII
               | to worry about.
        
           | asdff wrote:
           | What happens when FB builds a shadow instagram profile of you
           | based on your FB account? That already happens. FB clones
           | their own data for other projects no different than what you
           | might fear happening if this data were cloned to a third
           | party. The cat is out of the bag already but FB wants to
           | pretend they are the only ones with the right to abuse.
        
         | blantonl wrote:
        
           | coffeeblack wrote:
           | And that's the trick. You use the bad apples to delegitimise
           | the good ones. Works every time.
        
           | dhzhzjsbevs wrote:
           | > the vast majority of Web scraping efforts are to build
           | businesses on top of other organizations hard work and
           | innovation. Period. End of story.
           | 
           | Yeah and the vast majority of the internet and all these mega
           | corps run on open source while paying pittance back to the
           | ecosystem. Cry me a fuckin river.
           | 
           | Can't wait til someone sue's them for "scraping" their site
           | for web previews and thumbnails everytime someone shares a
           | link on Facebook.
           | 
           | The double standard of these muppets.
        
           | trasz wrote:
           | >the vast majority of Web scraping efforts are to build
           | businesses on top of other organizations hard work and
           | innovation
           | 
           | The vast majority of Facebook/Google's efforts are to build
           | businesses on top of other organizations hard work and
           | innovation.
        
           | jjoonathan wrote:
        
             | lcnPylGDnU4H9OF wrote:
             | If simp is supposed to be short for simpleton, you might
             | want to consider how simple your thoughts are.
        
               | [deleted]
        
               | DaveFr wrote:
               | It's not, see
               | https://www.urbandictionary.com/define.php?term=Simp
        
               | lcnPylGDnU4H9OF wrote:
               | I can also link to a source that's going to be biased in
               | my favor: https://www.etymonline.com/word/simp
        
               | jjoonathan wrote:
               | > 1903
               | 
               | > 1640s
               | 
               | Lol, no. I'm using the definition from _this_ century:
               | 
               | > Someone who does way too much for a person they like
        
             | EGreg wrote:
             | What a pretty picture capitalism is.
             | 
             | "Give us all your data for free." "They 'trust me', dumb
             | fucks."
             | 
             | https://www.esquire.com/uk/latest-news/a19490586/mark-
             | zucker...
             | 
             |  _Proceeds to build entire business on this data..._
             | 
             | "You can't scrape us!"
             | 
             | LinkedIn tried this:
             | 
             | https://www.zdnet.com/google-amp/article/court-rules-that-
             | da...
             | 
             | And it's not like capitalist enterprises even try to be
             | consistent in their legal complaints:
             | 
             | https://9to5mac.com/2022/04/14/apple-calls-out-meta-for-
             | hypo...
        
           | smt88 wrote:
           | I don't sympathize with a monopoly that people are trying to
           | weaken.
           | 
           | I loathe Meta and want to boycott it. Unfortunately this
           | means I'm now locked out of the only repository of most local
           | events and gatherings in my city.
           | 
           | In some countries, life is literally not possible without
           | WhatsApp.
           | 
           | If Meta wants to cry about the mean bullies trying to
           | exfiltrate data, they need to stop wiping out competitors.
        
           | latexr wrote:
           | > the vast majority (...) Period. End of story.
           | 
           | If you're going to assert something as definitely true to the
           | point of closing off discussion, I'd expect a modicum of
           | evidence. At a minimum that you'd explain the reasoning
           | behind your conclusion. What's the source of the "vast
           | majority" claim? There's little point to advertising when
           | you're scraping a website for personal consumption, so it
           | seems dubious anyone would have reliable numbers on which
           | kind is more prevalent.
        
           | sneak wrote:
           | > _the vast majority of Web scraping efforts are to build
           | businesses on top of other organizations hard work and
           | innovation._
           | 
           | Not really. Scraping just gets data, not code, so it's hard
           | to support this argument. The anti-scraping view is that the
           | right to use the data rests with the company that collected
           | it, but I don't think that view is held by most people.
        
             | blantonl wrote:
             | If you are arguing that an organization's data is worthless
             | but only their code has worth, then I'm not quite sure
             | where to go from this point in this discussion, other than
             | to say _that is crazy_.
        
               | PeterisP wrote:
               | The data is obviously valuable, but they don't
               | necessarily deserve a monopoly on that data, since that
               | data primarily belongs to the users who created the data;
               | so while it's understandable that organizations want to
               | restrict that data, we have no obligation (moral or
               | otherwise) to respect that desire.
        
               | jjoonathan wrote:
               | Exactly. Your list of friends does not belong to
               | Facebook, it belongs to you.
               | 
               | I am sure Facebook believes they deserve a monopoly for
               | having obtained it first. They do not. The market forces
               | _you_ to compete for every dollar you earn, so you have
               | every right to expect Facebook to compete for every
               | dollar _they_ earn, and  "I touched it first therefore
               | it's mine!" is not competition.
        
               | blantonl wrote:
               | But, but, but..... you _agreed_ that Facebook _does own
               | your friends list_ when you signed up for an account and
               | started giving them all your data.
               | 
               | If I run a restaurant, and I stipulate that when you walk
               | through the doors and place an order I reserve the right
               | to take your picture and post it on the bulletin board,
               | why would you place the order and then get pissed off
               | when I post a picture of you on the bulletin board? And
               | why would you be mad at me if I stipulated that no one
               | else can use a camera in my restaurant? Terms of service,
               | my friend. Unless prohibited by legislation, I can
               | stipulate how things run in my restaurant.
        
               | jjoonathan wrote:
               | If your bulletin board somehow let you monopolize the
               | restaurant industry (? lol) then we should absolutely
               | vote for some politicians to boot your entitled ass back
               | into competition.
               | 
               | Obviously, the idea of a bulletin board granting a
               | restaurant an effective monopoly is ridiculous so your
               | analogy is trash, but even if your analogy wasn't trash,
               | your conclusion would still be wrong.
        
               | sneak wrote:
               | I'm not saying that the data isn't valuable, but that
               | possession of the data, valuable though it may be, is not
               | related to the organization's hard work or innovation.
               | For the most part, any control rights to the data likely
               | rest or should rest with the people who provided it to
               | the company.
               | 
               | Meta claiming that all of the photos on Instagram are
               | Meta's property does not comport with current IP law or
               | the views/opinions of most of the users on Instagram who
               | do own the copyrights to those photos.
               | 
               | You really shouldn't be able to sue anyone for use or
               | copying of data to which you do not hold copyright. The
               | stuff on FB is licensed to FB by the people who own it
               | (their users).
        
           | kordlessagain wrote:
           | I've reread the previous comment and I really don't see where
           | there is any justification stated for acting in an unethical
           | manner. While Facebook may be making an argument against
           | unethical behavior by a few, using the language they do is
           | detrimental to legitimate uses of crawling content available
           | on the Web.
           | 
           | Corporations, by nature, work in a way that individuals at
           | those companies don't. They are literally "non-corporal"
           | entities and work toward increasing profit and stakeholder
           | value, not improving the lives or situations of their users,
           | unless that happens to correspond to making them more money.
           | 
           | We should all be wary of corporate control and claim to
           | rights built from their user base, especially if those
           | services are offered for "free".
        
             | [deleted]
        
             | blantonl wrote:
             | _We should all be wary of corporate control and claim to
             | rights built from their user base, especially if those
             | services are offered for "free"._
             | 
             | That's fine then. And I agree with you. But leave you with
             | this.
             | 
             | Do. Not. Give. The. Company. Your. Data.
             | 
             |  _They are literally "non-corporal" entities and work
             | toward increasing profit and stakeholder value_
             | 
             | Again, I agree. But if you think this is a bad thing, then
             | you don't believe in capitalism, and I'm not quite sure
             | what the intention is to argue this point on a platform
             | (HN) that encourages the most basic forms of capitalism -
             | starting up companies with innovative technology and
             | solutions.
        
               | EGreg wrote:
               | What a pretty picture capitalism is. Break out the
               | popcorn for the latest regular installment of "ok for me
               | but not for thee":
               | 
               | People You May Know employs tons of shady stuff Facebook
               | doesn't reveal and has saved their bacon early on from
               | stagnating at around 100M users.
               | 
               | https://mashable.com/article/people-you-may-know-
               | facebook-cr...
               | 
               | Facebook Beacon and others had a big outcry. They got
               | hauled into Congress multiple times. And of course
               | whenever they get caught, they always throw a "mea culpa"
               | and do it all over again in a year under a different
               | name. Here they are recording faces of their users
               | secretly using camera permisions!!
               | 
               | https://www.independent.co.uk/tech/facebook-app-
               | recording-ca...
               | 
               | Their entire business model is "Give us all your data for
               | free." Mark Z early on was flabbergasted himself when he
               | realized he no longer needed to scrape sites on Harvard's
               | house websites and could just ask people to submit the
               | data for each other: "They 'trust me', dumb fucks."
               | 
               | https://www.esquire.com/uk/latest-news/a19490586/mark-
               | zucker...
               | 
               |  _Proceeds to build entire business on this data..._
               | 
               | BUT THEN. Someone else does it to them and they get mad.
               | "You can't scrape us!" LinkedIn tried this:
               | 
               | https://www.zdnet.com/google-amp/article/court-rules-
               | that-da...
               | 
               | And it's not like capitalist enterprises even try to be
               | consistent in their legal complaints:
               | 
               | https://9to5mac.com/2022/04/14/apple-calls-out-meta-for-
               | hypo...
        
               | mechanical_bear wrote:
               | The problem isn't "capitalism", it's crony-capitalism
               | enabled by certain elements of state complicity.
        
               | EGreg wrote:
               | Okay, is there a single problem with capitalism, or is it
               | perfect? The problem is never w capitalism?
        
               | jjoonathan wrote:
               | Yeah, nothing says "commie" like trust busting and
               | keeping markets competitive.
        
           | ramses0 wrote:
           | Cough, cough, Google, cough, cough...
           | 
           | I'm not ashamed to admit that I've done some jquery
           | shenanigans on my Facebook friends page to "export" my friend
           | list so I can retake control of my friend relationships
           | (disintermediation for the in-crowd).
           | 
           | So easy to push data in to Facebook, so hard to get even
           | basic data out of it.
        
           | EGreg wrote:
           | Pretty ironic that Mark Z himself started out exactly like
           | this: scraping Harvard servers and photos to power facemash.
           | 
           | He subsequently realized that he doesn't need to scrape if he
           | can just make a viral site that lets people share this info
           | with each other while he can eavesdrop on ALL OF IT:
           | 
           | https://www.esquire.com/uk/latest-news/a19490586/mark-
           | zucker...
        
           | ConstantVigil wrote:
           | > I love to hate on Meta, but their actions here are spot on
           | and make my morning very enjoyable as I sip my cup of coffee.
           | 
           | You might want to reassess your intelligence there friend. It
           | seems to be suffering from a common form of cogntive
           | dissonance combined with some form of confirmation bias.
           | 
           | How so?
           | 
           | Well you clearly don't like scraping, otherwise you wouldn't
           | be agreeing with a criminal... So there's the confirmation
           | bias...
           | 
           | Which is also the cognitive dissonance part. You clearly
           | don't like Meta/Zuckerberg by your own admission; but you are
           | agreeing with a empty rhetoric attack against people who are
           | smart enough to make use of Zuckerbergs terrible security
           | practices...
           | 
           | Do you not see the problem in this?
        
             | pc86 wrote:
             | Who is the criminal here? Scraping is not illegal. This is
             | a civil suit, so even if Meta wins, it's still not remotely
             | criminal for anyone involved.
             | 
             | Also please explain to me how someone giving a company
             | their Facebook credentials is an example of "people who are
             | smart enough to make use of Zuckerbergs terrible security
             | practices."
        
             | blantonl wrote:
             | This is a total non-sequitur argument here. You've gone
             | from accusing me of lack of intelligence to suffering from
             | cognitive dissonance and confirmation bias, to Facebook's
             | terrible security practices: simply because I'm pleased
             | that an organization has taken action against Web scrapers
             | for violation of Terms of Service.
             | 
             | Yes, I've gone on record indicating that I believe Web
             | scraping to be generally unethical, and that I'm pleased
             | that some action was taken against those that make it their
             | business to do so. And that is all that I have stated in my
             | OP. You've decided to take me on some circular mental
             | gymnastics journey I'm still trying to wrap my head around.
        
               | windexh8er wrote:
               | Let me restate this how I view what you've stated: your
               | position is that because Facebook has a Terms of Service
               | that may define something that is not illegal - means
               | that one must abide by it? Also...
               | Facebook/Meta/Zuckerberg have lied over and over and over
               | very publicly to get their way or to give themselves an
               | advantage: by giving themselves unfettered and
               | unwarranted access to data that they profit from by their
               | own fast and loose rules.
               | 
               | If Facebook/Meta/Zuckerberg are OK with lying, stealing
               | and cheating - then why should anyone leveraging their
               | online properties need to abide? Until they're held
               | accountable under broader rules I see no reason the
               | consumption side can't bend them as well. And you may
               | argue "this isn't how it works" but we all know this
               | isn't how Facebook/Meta/Zuckerberg operate. They operate
               | under the premise of: do whatever makes us money because
               | breaking the rules is the cost of doing business. So, no
               | - they don't get to spew propaganda to the advantage of
               | their business under the guise of protecting users. That
               | is complete and utter bullshit.
        
           | hdjjhhvvhga wrote:
           | I disagree precisely for the simple reason that these
           | businesses are using Meta's weapon against them. It will be
           | an interesting battle to watch - and if my memory doesn't
           | fail me, LinkedIn lost one already. The more the press writes
           | about it, the better: (ordinary) people will sooner or later
           | see through their doublespeak and realize what is at stake.
        
           | cmiles74 wrote:
           | In my opinion, breaking a click-through license agreement or
           | violating the small print on some dense and difficult to read
           | web page is hardly an issue of morality or ethics.
           | 
           | Let's also remember that a big reason Meta is hating on
           | scraping is because of their own problematic behavior. It
           | wasn't so long ago that they were suing NYU over research on
           | political ads and how Facebook targets their readers.[0] In
           | fact, it wouldn't surprise me if Meta's larger goal is to
           | prevent this sort of research.
           | 
           | [0]: https://news.bloomberglaw.com/privacy-and-data-
           | security/face...
        
           | 14 wrote:
           | Sorry but there are many legitimate reasons to scrape a
           | website. Price manipulation is one example. Because of
           | scraping we know Amazon does things like price gouging and
           | raising prices right before they go on "sale". Scraping can
           | be very useful for researchers to monitor trends and find
           | correlations. It's not just about bad guys stealing personal
           | information. There are far to many legitimate uses that
           | banning scraping would be a bad thing.
        
           | basetwojesus wrote:
           | Regardless, it's very rich that a company like meta is mad
           | that they're being beat at their own game (making money off
           | of data that they obtained through shady means).
        
           | matthewmacleod wrote:
           | Nah, you are straight-up wrong. In fact, it's the opposite -
           | the only companies who are scared of scraping are the ones
           | whose business models rely on artificial lock-in, and we
           | should all be working as hard as we can to demolish them.
        
             | dylan604 wrote:
             | >the only companies who are scared of scraping are the ones
             | whose business models...<snip whatever other nonsense
             | followed>
             | 
             | This is just patently false. There is an expense incured by
             | scraping. There is no benefit to a host providing the data
             | from those scrapers. My logs are full of various bots that
             | pull data from my webhost that costs me money to serve. I
             | run various sites that do not serve ads. I do not include
             | any 3rd party tracking. They're just simple sites that I
             | pay for out of my own pocket because that what I've chosen
             | to do. Nothing shady about any of it.
             | 
             | It's just sad that your own personal feelings towards
             | scraping prevents you from being able to accept that there
             | are people with views other than your own.
        
               | matthewmacleod wrote:
               | Hey, I totally accept people have views other than my
               | own. I just disagree with them.
               | 
               | It seems extremely weird that you'd want to publish
               | content, but then get mad that people are using the thing
               | that you published. But you do you.
        
               | dylan604 wrote:
               | How is that weird? I publish on my site to have people
               | visit my site. I don't publsh for people to take my data
               | and do what they will without attribution for where they
               | got the data. How that makes no sense to others has me
               | saying please don't do you because you are being not
               | considerate to others
        
             | jjoonathan wrote:
             | It's wild that people are arguing that their friend list
             | should belong exclusively to facebook and not, you know, to
             | them and their friends.
        
           | dylan604 wrote:
           | I feel the same way. My biggest pet peeve is that
           | scrapers/bots traversing my site generates more data than the
           | target audience of users. The scrapers get all of this data
           | for "free" at my expense of the hosting costs to provide them
           | that "free" data.
        
           | macinjosh wrote:
           | Google search's business model is scraping the web, indexing
           | it, and then pasting ads all over their search results made
           | up of other people's content. If Google can build a business
           | on third-party data then these meta scrapers can do the same
           | thing.
           | 
           | It is like saying a photographer can't photograph a building
           | from the street because she doesn't own it. The building is
           | there, taking a picture takes nothing from the building. That
           | is all that is going on here, repeating publicly available
           | information.
        
             | injidup wrote:
             | No it's more like you subletting an apartment to a dodgy
             | photographer who wants to take pictures of the children's
             | playground your back window looks out on even though your
             | contract explicitly forbids it subletting. The suit is
             | against companies that use login credentials that are not
             | theirs. It is not public information that is being scraped.
             | It is information behind a login with a terms of service
             | for what you are allowed to do with that login.
        
         | nathanaldensr wrote:
         | Great post that summarizes exactly what I feel about
         | globocorps. The euphemisms and propaganda are disgusting.
        
         | lupire wrote:
        
       | fxtentacle wrote:
       | Of course, Facebook wants to make it sound like scraping is
       | illegal, when it generally isn't.
       | 
       | But account hijacking and mass-creation of accounts just to
       | access private pages are clear violations of the Facebook and
       | Instagram ToS, so they surely can sue for that.
        
         | crawsome wrote:
        
         | [deleted]
        
         | dementiapatien wrote:
         | Since when do you get sued for breaching TOS?
        
           | thallium205 wrote:
           | Since when do you get sued for breaching a contract? When the
           | offense is worth it.
        
           | curiousllama wrote:
           | Since you start a business on the violation.
           | 
           | "Since when do I get sued for taking too many free samples
           | from Costco?" -> "Since you started taking millions of them
           | to resell"
        
             | jhoelzel wrote:
             | im not sure on american law, but if you give me those
             | samples willingly i can do whatever i want with them.
             | 
             | Actually this is the reason why many products come with the
             | lable "not for resale" but i have yet to find somebody who
             | cares about it :D
        
               | treis wrote:
               | >give me those samples willingly
               | 
               | Doesn't seem like Facebook is giving them willingly.
        
           | golemotron wrote:
           | You can get sued for anything that causes harm.
           | 
           | Relevant life lesson: don't do things to people with money
           | that they might perceive as harm.
           | 
           | Corollary: Being sued is as much punishment as losing a suit
           | for most people.
        
           | contravariant wrote:
           | I don't know but it's at least been that way since Aaron
           | Swartz did it I suppose.
        
         | Raed667 wrote:
         | Violation of ToS does not mean a violation of the law.
        
           | CoastalCoder wrote:
           | I don't think I know the answer, but I'm curious:
           | 
           | Does violating a website's TOS meant your accessing it beyond
           | your authority, making it a violation of the US's Computer
           | Fraud and Abuse Act?
        
             | zja wrote:
             | Violating TOS no; Gaining access beyond your authority
             | maybe https://www.eff.org/deeplinks/2010/07/court-
             | violating-terms-...
        
               | CoastalCoder wrote:
               | I was assuming that in this case, a person's authority
               | was specifically granted _by_ the ToS.
               | 
               | I wondered if the interplay of those two concepts muddied
               | the waters.
        
             | danaris wrote:
             | I don't have a source for this, but my recollection is that
             | this has been successfully argued by a couple of companies
             | --but then an appeals court found very firmly that it was
             | _not_ the case.
             | 
             | Essentially, having that be true would mean that any given
             | website could create whole new classes of criminal
             | behavior.
        
               | zinekeller wrote:
               | > having that be true would mean that any given website
               | could create whole new classes of criminal behavior.
               | 
               | While this is true, reading the lawsuit it is clear that
               | Meta is suing in civil court, so maybe they're trying to
               | enforce their contract, especially their automated
               | collection ToS (https://www.facebook.com/apps/site_scrapi
               | ng_tos_terms.php)?
        
             | tumult wrote:
             | Not a violation. Decided by Supreme Court in 2021. Van
             | Buren vs. United States. It was a big deal.
        
           | closewith wrote:
           | Most law suits aren't due to breaches of the law, but
           | breaches of contract. Whether terms of service constitute an
           | enforceable contact is another matter.
        
             | jhoelzel wrote:
             | if a bot creates the account, who breaches the contract?
        
               | sneak wrote:
               | The person who ran the bot. Programs do not have agency,
               | they are just tools.
               | 
               | That's like saying "If the gun fires the bullet, who is
               | liable for murder?" It's a silly question.
        
               | CSMastermind wrote:
               | > That's like saying "If the gun fires the bullet, who is
               | liable for murder?" It's a silly question.
               | 
               | I don't know I've seen several people unironically argue
               | that it should be the gun's manufacturer.
        
               | bee_rider wrote:
               | Software that exclusively has illegitimate uses has been
               | shut down. Whether we agree that it is a good argument or
               | not, it is definitely _an_ argument people have made
               | (that some types of guns are mainly designed to hurt
               | people).
               | 
               | With software of course it is a little complicated
               | because:
               | 
               | * it can be produced really easily in a distributed
               | fashion over the internet by anonymous people in many
               | jurisdictions, so there isn't always an obvious company
               | or entity to sue
               | 
               | * most automation tools can be repurposed for malicious
               | use (nobody would sue John Deere because their tractors
               | can be armored and turned into pseudo-tank things)
        
               | lesuorac wrote:
               | Probably should also add "successfully", there's a reason
               | NYPD had/has guns that require 12 pounds of force to pull
               | the trigger (instead of a normal ~5 lbs).
        
             | adamsmith143 wrote:
             | ToS have been around for decades, surely this question is
             | settled by now?
        
               | marlowe221 wrote:
               | Former attorney turned software developer here!
               | 
               | Nope, it's not a settled question in the way that I think
               | you mean. Each ToS is different so each would be subject
               | to individual legal analysis in court on its own terms.
               | 
               | Questions would include whether the ToS is
               | unconscionable, whether the terms violate laws of the
               | locality/nation, and so forth.
               | 
               | It's the same with traditional contracts - the fact that
               | contracts have been around for hundreds (maybe thousands)
               | of years doesn't mean much if you and I create a brand
               | new one between us. Our contract's specific terms (and
               | events/actions between us as a result) would be the issue
               | in court.
        
               | kaivi wrote:
               | Why can't FB simply include a clause like "No kind of
               | automated scraping is allowed, except for search engines
               | in robots.txt"? This would save them so much time in
               | court, arguing over the use of fake accounts which should
               | really be irrelevant.
        
               | closewith wrote:
               | It's not clear that clause would be enforceable. Scraping
               | has been found to be lawful in many jurisdictions,
               | including the US, even without the consent of the host.
        
               | adamsmith143 wrote:
               | So even the general question of "Whether terms of service
               | constitute an enforceable contract" depends on each
               | individual ToS?
        
               | marlowe221 wrote:
               | Congress or a state legislature could pass a law that
               | says "No terms of service are ever enforceable" but to my
               | knowledge no one has done that.
               | 
               | So, under the current state of the law whether or not a
               | contract is enforceable depends entirely on what the
               | terms in that specific contract are.
               | 
               | Unfortunately, this is yet another instance where the law
               | has failed to keep up with technology. Contract laws (at
               | least in the USA) date back long before anyone ever
               | dreamed up the idea of a EULA or ToS. Our laws
               | contemplate two or more parties with roughly equal
               | bargaining power sitting down and hashing things out, and
               | go from there.
               | 
               | Laws based on that assumption are a pretty poor fit for a
               | world filled with EULAs and ToS but it's what we are
               | stuck with at the moment.
        
           | stonemetal12 wrote:
           | That is why they are suing rather than pressing charges. When
           | someone steals your car you don't sue them you press charges.
           | When someone doesn't uphold their end of a contract you don't
           | press charges you sue for breach of contract.
        
             | sneak wrote:
             | "pressing charges" isn't a thing.
        
               | onionisafruit wrote:
               | It is a thing. In America pressing charges is when you
               | accuse somebody of a crime and ask a prosecutor to bring
               | criminal charges against them.
        
               | sneak wrote:
               | Prosecutors exclusively decide who is charged. No charges
               | can be "pressed" by a victim.
        
               | onionisafruit wrote:
               | Yes, in most cases it is the prosecutor's discretion
               | whether to bring a case to a grand jury, but that isn't
               | what pressing charges is. See Merriam Webster's
               | definition[0].
               | 
               | [0] https://www.merriam-
               | webster.com/dictionary/press%20charges
        
               | stonemetal12 wrote:
               | As far as I am aware it isn't a specific thing, but a
               | general catchall term for going through the process of
               | filing a criminal complaint, and seeing it through to
               | completion. Maybe there is better words for it but
               | "pressing charges" is what they use on TV so it is top of
               | mind.
               | 
               | In general I meant there is a difference between criminal
               | and civil law, and suing generally refers to civil not
               | criminal law.
        
             | compsciphd wrote:
             | in reality, you as an individual can't press charges. Only
             | the state can. And many times the state chooses not to. You
             | can sue in civil court, but individuals can't bring cases
             | in criminal court.
        
               | closewith wrote:
               | Many countries do have the concept of private criminal
               | prosecutions.
        
               | onionisafruit wrote:
               | You are confusing pressing charges and indictment.
               | Pressing charges just means you accuse somebody of a
               | crime and "press" the prosecutor to indict them. So the
               | state does have the ultimate say on who is prosecuted,
               | but that doesn't mean you can't press charges.
        
               | [deleted]
        
       | cosmiccatnap wrote:
       | I would consider this appropriate if one of the largest offenders
       | of scrapping weren't the one pretending to be the offended.
        
       | HeckFeck wrote:
       | Data harvesting is moral for me, but not for thee.
        
         | mateuszbuda wrote:
         | In general I agree that harvesting _public_ data is moral. I
         | think that in these particular cases it 's: 1) extracting data
         | from profiles that opted for not being public (only available
         | to logged in users) and 2) reposting scraped data (publicly?)
         | as belonging to the guy who scraped it without users consent.
        
           | lolinder wrote:
           | I agree with the moral argument against posting the scraped
           | data publicly, but if someone gave my account access to their
           | data, I don't think they have a _moral_ right to say I can 't
           | use a script to do something private with it.
           | 
           | Scripts are tools, and like any tool they're extensions of
           | the self. If it's morally okay to do it by hand, it's morally
           | okay to do it with a script, so long as my script is
           | respectful of server resources.
        
           | upupandup wrote:
           | Instagram behind a login screen is public. If you say were an
           | OnlyFans model and somebody paid for your videos, scraped
           | them, then there would've been implicit agreement.
           | 
           | Sharing photos on Instagram, there is no such understanding,
           | news outlets have been logging in to view and publish your
           | instagram photos so.
        
           | adolph wrote:
           | The state of "opted for not being public" and 'available to
           | any system authenticated person' seem contradictory.
           | 
           | I appreciate that 'system authenticated person' is a smaller
           | set than those who can access anything publicly accessible,
           | and that the former is a subset of the latter.
        
           | trasz wrote:
           | If they are being harvested it makes them public by
           | definition. Unless there was a break-in.
        
           | kordlessagain wrote:
           | Facebook has hidden much of Instagram's content behind
           | logins, so that makes most of it "not public".
           | 
           | At the same time, I don't think all of Instagram's users care
           | if their images are hidden, or not.
           | 
           | It's quite unfortunate Facebook/Meta is using hostile
           | language and the word "scraping" together in this case.
           | Scraping is a legitimate process used by various business
           | models to gather information from the Web, which itself was
           | originally intended to be an open forum for people to share
           | content.
           | 
           | Hostile business models have corrupted that intent and turned
           | it into a competitive environment that is harming users and
           | legitimate models which may not have the funding larger
           | corporations can muster.
           | 
           | I have a "scraper" I've built that will either snapshot a
           | page from a user's browser or crawl it remotely with
           | Selinium/Firefox, on the user's behalf, to save the content
           | in an index for searching later, by that user. It's not
           | automated, nor does it parse and crawl URLs in the pages
           | saved. It doesn't use page content in a wider context,
           | either.
           | 
           | I've spent a significant amount of time trying to "work
           | around" anti-scraping efforts by various companies and it's
           | frustrating to see hostility instead of cooperation in
           | certain types of use.
        
             | car_analogy wrote:
             | > Facebook has hidden much of Instagram's content behind
             | logins, so that makes most of it "not public".
             | 
             | 1) It was public when the content was posted by its
             | authors. Facebook locked it down retroactively, regardless
             | of the author's intent.
             | 
             | 2) A login requirement doesn't make it non-public, if
             | making an account is trivial, and there are already
             | hundreds of millions of accounts. Is the plot of Avengers:
             | Endgame also not public, because it's locked behind a
             | ticket purchase or subscription?
        
           | Alex3917 wrote:
           | > extracting data from profiles that opted for not being
           | public
           | 
           | The tool lets you download the contact info of your friends,
           | which you should be able to do anyway. In fact Facebook tries
           | to trick its users into thinking they can do this with their
           | data takeout option, but the downloaded files don't actually
           | include any of the contact info for your contacts. Which
           | makes zero sense, considering the entire point of Facebook is
           | that it's a digital rolodex for storing your friends' contact
           | info.
        
           | slightwinder wrote:
           | From the article, it seems to be service for scrapping data
           | you have access anyway. As long as they only handle those
           | data to the requesting customer, whose login they used, I
           | don't see a difference between general public, and this users
           | personalized "public". If access is still limited to the
           | people who have the access-rights, then I don't see a
           | difference between accessing through the official interface,
           | or via scrapped data.
        
             | saddlerustle wrote:
             | Users make information available on facebook with the
             | expectation that they are able to later control access to
             | it (other than the obvious threat model of screenshotting,
             | etc). This is violating that expectation and thus their
             | privacy.
        
               | Nextgrid wrote:
               | There's no evidence of the accused scraper sharing the
               | scraped data with anyone but the account-holder, so the
               | privacy of their friends is still protected.
        
               | falcolas wrote:
               | > they are able to later control access to it
               | 
               | This has never realistically been the case. An illusion
               | of control is provided by facebook, but they've never
               | really put much effort into it. For a really simple
               | example, look at how long content remained available to
               | the entire internet after "deletion". Sometimes it took
               | years.
               | 
               | Expecting any semblance of privacy from a company who
               | profits from using and selling your data is, if I'm being
               | blunt, lunacy.
        
               | gfodor wrote:
               | This is a false expectation and it's important people
               | learn this.
        
               | IfOnlyYouKnew wrote:
               | They'll stop posting in the way they currently enjoy and
               | will, therefore, have lost some freedom. Great outcome!
               | 
               | In other news: your partner may also leak your most
               | intimate secrets. I hope they do, to teach you a lesson?
               | 
               | Every trust can be betrayed. Why do you believe a world
               | without trust would be better? Only because you cannot
               | handle the nuance of different levels of trust?
        
               | ogurechny wrote:
               | So taking shackles off is called "losing freedom" now?
               | Also, people enjoy many things, just look at the
               | junkheads. Still, it's more natural to have trust in a
               | heroin addict than to have trust in businesses like
               | Facebook.
        
               | gfodor wrote:
               | The counterparty risk from Facebook has almost nothing to
               | do with trust of individual human beings. It has to do
               | with the nature of systems, failure, vulnerabilities,
               | attack surface area, etc. It's "privacy through
               | obscurity" to act in a way that your data is not on the
               | precipice of being leaked by a bad actor or a mistake.
        
               | vorpalhex wrote:
               | The freedom to live in a fictional world where Facebook
               | safeguards your data is just as available regardless the
               | reality of the situation.
               | 
               | The reality of the situation is that Facebook is a walled
               | garden built on the labor of it's users and it is
               | objecting to those users reclaiming the fruits of their
               | labor by scraping.
        
               | the_fury wrote:
               | "They'll stop posting in the way they currently enjoy and
               | will, therefore, have lost some freedom."
               | 
               | That is, quite honestly, one of the oddest definitions of
               | freedom I've come across.
        
         | bko wrote:
         | It's their platform. Do you really want some random companies
         | scraping your facebook and instagram posts?
        
           | logifail wrote:
           | > Do you really want some random companies scraping your
           | facebook and instagram posts?
           | 
           | Thought experiment: if you want to keep control over your
           | data, try something radical: _don 't hand it to Meta/FB/IG at
           | all_
           | 
           | (Full disclosure, I'm neither on FB nor IG)
        
           | vorpalhex wrote:
           | You published them for the world to see... so yes,
           | presumably.
        
           | iandanforth wrote:
           | Yes. I want a free and open web.
        
             | xvector wrote:
             | Good for you. Normal people do not want posts shared
             | privately amongst friends to become publicly available.
        
               | Nextgrid wrote:
               | There's no evidence the scraper companies mentioned there
               | are making the scraped data public or sharing it with
               | anyone beyond the individual customer that is already
               | entitled to access that data through the official
               | clients.
        
               | falcolas wrote:
               | Then why would you ever put it on a website that
               | generates its revenue from using and selling your data?
        
               | nathanaldensr wrote:
               | Because you're (not you, but people in general) are dumb
               | and overly trusting.
        
               | marlowe221 wrote:
               | This is the correct answer.
        
               | blantonl wrote:
               | Because you agreed to do so under the terms of conditions
               | of that website.
        
               | nlh wrote:
               | Look I understand you point from a legal standpoint, but
               | do you really truly believe even a small fraction of FB
               | and IG users actually "agreed to do so under the terms
               | and conditions of that website"? They just clicked
               | whatever was necessary to create their accounts. I doubt
               | there was much affirmative agreement going on there.
        
               | orangecat wrote:
               | Then you need to trust your friends, because copy/paste
               | and screenshots exist.
        
           | trasz wrote:
           | It's not "your Facebook", it's Facebook's Facebook. You
           | already made that data public, otherwise it would be
           | impossible to scrap it.
        
           | ceejayoz wrote:
           | I'd rather _anyone_ than  "just Facebook".
           | 
           | "Just Facebook" has made the web shittier; entire realms of
           | essentially public, often great content hidden behind a login
           | wall.
        
           | ogurechny wrote:
           | As others said, there is no "you" in the scheme. It's
           | Facebook's data. When people access that data without paying,
           | they are "bad guys". When the very same people pay for it,
           | they are "legal partners". In both cases they can do anything
           | with it, while Facebook can't be held responsible because of
           | all the official agreements. So as long as there is no
           | specifically bad publicity or money loss anything goes either
           | way.
           | 
           | "You" only exist in numerous empty statements about
           | "privacy", "respect", etc. If you are feeling artsy, you can
           | make that hyped NFT thing out of those, and see whether those
           | kilobytes of text really worth anything.
        
             | lbriner wrote:
             | What you are claiming here is not true in Europe. If FB
             | hold data about you, the data is still your legal right.
             | You can have it deleted and changed if it is somehow untrue
             | and have variou other rights too.
             | 
             | There is a relationship involved because ultimately as a FB
             | user, if I don't like what they are doing, I can ask them
             | to remove my data permanently and they must legally do
             | that. If someone has "scraped" that data (if it is
             | considered PID), without my permission or a legal basis to
             | do so, they are in breach of the GDPR and can have
             | enforcement taken against them.
             | 
             | I think some of these "aggregation" businesses will fall
             | foul of this in Europe but I don't know what will
             | realistically happen if that business does not exist in
             | Europe and breaches the GDPR.
        
               | Nextgrid wrote:
               | > breaches the GDPR.
               | 
               | Facebook breaches the GDPR all the time and manages to
               | stay in business. GDPR enforcement is barely existent,
               | and when it does happen, it's insufficient.
        
               | ogurechny wrote:
               | This is how it works in press releases. The problem is
               | that data protection laws were in fact lobbied by
               | corporations either openly or behind the scenes, and
               | focus on things like real names and passport numbers that
               | look impressive but aren't really important for the data
               | market. These are just put into some high security
               | database (e.g. for billing info), and it's fine. However,
               | the real behavioral data that costs money is shared as
               | easy as it ever was in the form of "User ID <long number>
               | was at the location of Wi-Fi AP ID <another long
               | number>". It doesn't matter that the data owner still
               | trades all the history of activity of a certain
               | individual, or that Wi-Fi station locations can be
               | matched with some external database. Everything is fine
               | as long as you don't slap someone's real name on that.
               | And, contrary to the show social networks make, they
               | couldn't care less about real names. Even if you trick
               | the system by calling yourself John Doe, you still look
               | at the specific content, and have specific contacts, you
               | are you, and the data is the same.
               | 
               | I remember that about a decade ago some IT guys have paid
               | for the common Facebook advertiser access, then targeted
               | the ad campaigns using filters in such a way that their
               | intersection only resulted in a single user, or just a
               | couple of them, and were able to match those "anonymized"
               | accounts to real ones. You didn't have to be a genius to
               | do that. Facebook certainly knew it could be used like
               | that. Everyone who made money on that simply agreed to
               | use "anonymization" as a smokescreen. Later, with all the
               | scandals, those routine operations were presented as
               | something exceptional done by a small number of bad
               | actors.
        
       | trasz wrote:
       | We need to update the law to make sure Meta loses in cases like
       | this.
        
       | jmyeet wrote:
       | I'm torn on Web scraping because the extreme of each end of the
       | spectrum on this issue both seem unreasonable.
       | 
       | On one side, you have people who say any form of scraping is be
       | disallowed, even prosecutable. This went so far that the
       | Department of Justice on behalf of AT&T prosecuted a case of URL
       | modification [1]. One of the few bright spots for this psychotic
       | Supreme Court was to curtail the government's power under the
       | CFAA by limiting what constituted "unauthorized" access [2].
       | 
       | On the other hand, there are those who think that any level of
       | scraping should be fine and I think that's untenable too.
       | Consider Yahoo indexing of Stack Overflow [3]:
       | 
       | > In the meantime, since Yahoo (via Slurp!) is about 0.3% of our
       | traffic, but insists on rudely consuming a huge chunk of our
       | prime-time bandwidth, they're getting IP banned and blocked.
       | 
       | Do these "scraping extremists" think such actions should be
       | illegal? It's actually not that far-fetched given the Ninth
       | Circuit decided LinkedIn wrongly blocked HiQ scraping [4]. Like
       | if you change your website with the intent that it'll make
       | scraping more difficult, is that a problem? What if it's an
       | unintended side effect?
       | 
       | Additionally, companies like Meta, Google and Apple are going to
       | be way more acountable to abiding by data retention laws and
       | regulations than any scraper. If it's OK to scrape FB.com
       | completely, that information is out there forever.
       | 
       | I certainly think the government shouldn't prosecute on behalf of
       | companies. At least that should expose to people how the
       | government's #1 priority is in fact to protect the true
       | constituents: corporations and the capital-owning class.
       | 
       | [1]: https://www.techdirt.com/2013/09/30/dojs-insane-argument-
       | aga...
       | 
       | [2]: https://en.wikipedia.org/wiki/Van_Buren_v._United_States
       | 
       | [3]: https://stackoverflow.blog/2009/06/16/the-perfect-web-
       | spider...
       | 
       | [4]: https://blog.ericgoldman.org/archives/2019/09/ninth-
       | circuit-...
        
         | ConstantVigil wrote:
         | > So much about this case is ridiculous, and it's complicated
         | by the fact that nearly everyone agrees that weev is a world-
         | class jerk. But, you need to separate that out from the details
         | of what he did here, to note that it was nothing particularly
         | special, and it involved the sort of thing that security
         | researchers do all the time, and which all sorts of non-
         | security researchers do quite often.
         | 
         | Yeah... uhm... I used to do exactly this sort of thing...
         | 
         | When I was a teenager, I would look at the URL of whatever site
         | I was on, and would change a number here, or a letter there;
         | and see what I got.
         | 
         | Sometimes you get nothing, sometimes you get something.
         | Sometimes that something is quite interesting.
        
       | paultopia wrote:
       | "Scraping attacks" LOL
        
         | sophacles wrote:
         | Why not? weev was put in jail over incrementing a number in a
         | url. Surely writing software to put values into urls is even
         | worse.
        
           | sneak wrote:
           | Let's be clear and accurate: technically weev was put in jail
           | for conspiring on IRC with JacksonBrown. JacksonBrown was the
           | one who wrote a PHP script that incremented a value in a URL
           | (and appended a valid Luhn check digit following
           | incrementation).
           | 
           | Conspiracy to access a protected computer system - that is,
           | typing on IRC. weev didn't write any of the code or access
           | the API.
        
       | ConstantVigil wrote:
        
       | pclmulqdq wrote:
       | They have to keep the walls up on their garden so they can get
       | maximum value from harvesting.
        
       | viburnum wrote:
       | One of Facebook's earliest acquisitions was a scraping company
       | called Octazen.
        
       | [deleted]
        
       | throwaway_meta wrote:
       | People that are criticizing this probably were also critical of
       | the Cambridge Analytica scandal, but it would be useful to
       | compare what happened there and here.
       | 
       | With Cambridge Analytica:
       | 
       | - Facebook allowed users (with informed consent) to allow
       | external developers to access their data and limited data about
       | their friends, in order to build social-enabled apps.
       | 
       | - CA exploited this to scrape basic profile data from a large
       | number of users. It broke the ToS by doing so (in particular by
       | using the data for purposes different than stated)
       | 
       | Here the same is happening:
       | 
       | - people are giving a third company access to their profile,
       | which includes access to friends' data (in fact a lot more than
       | what the app platform allowed to do)
       | 
       | - the company is scraping all the data.
       | 
       | At the time of CA, the criticism was that Facebook didn't do
       | enough to enforce its ToS (or maybe that the data sharing should
       | have not been allowed in the first place? But the terms were
       | common knowledge and the attack potential became clear only in
       | hindsight), here people are criticizing that Facebook is in fact
       | enforcing its ToS.
       | 
       | Also note that strong enforcement against scraping is one of the
       | mandates that came from the FTC settlement.
       | 
       | It seems inevitable that any news about Facebook/Meta is read in
       | the worst possible light these days, even when the criticism is
       | self-contradictory. I would expect less superficial commentary
       | from HN.
        
         | unosama wrote:
         | The real reason _most_ people were upset about Cambridge
         | Analytica was it revealed to the public how advertising and PR
         | companies manipulate us. The fact they violated facebook ToS is
         | moreso the excuse for the press covering it when they wanted to
         | write another anti-Trump piece. If you were accusing a specific
         | newspaper of hypocrisy based on two article I might agree. But
         | you 're referring to general public sentiment, and I really
         | don't think most people cared or were surprised about the data
         | collection. The shock and scandal was the realization that
         | targeted advertising campaigns and information bubbles have the
         | potential to sway elections.
        
           | throwaway_meta wrote:
           | I'm referring to the HN crowd, I'm not sure that can be
           | equated to "general public sentiment".
           | 
           | I agree with your first paragraph, and my point is that it is
           | not possible to argue at the same time that Facebook should
           | share data more broadly and allow scraping, and at the same
           | time be critical that Facebook allowed CA to happen in the
           | first place.
           | 
           | If the CA scandal was a wake-up call, it appears it was not
           | internalized enough for people to understand the implications
           | of what they're suggesting in this thread?
        
       | pid-1 wrote:
       | > scrapping attack
        
         | mohamez wrote:
         | That cracked me up when I read it lol
        
       | throw20220707 wrote:
       | From GDPR point-of-view this kind of 3rd party data collection is
       | not acceptable (assuming it covers personal information, for
       | example names of people and what they have posted). The
       | difference with Meta's own data collection is that the users have
       | relationship with Meta and users have given their permission for
       | Meta to handle the data. Users also know they can contact Meta
       | and ask them to remove the data.
       | 
       | 3rd parties don't have the consent from users. Users don't even
       | have an idea these companies might be holding their data.
        
         | Nextgrid wrote:
         | From a GDPR point of view the scraper would be acting as a data
         | processor on behalf of their customer, no different from using
         | a cloud storage service for your contacts. It's fine as long as
         | the third-party doesn't misuse the scraped data or share it
         | with third-parties and there's no evidence they did so in this
         | case.
        
           | danuker wrote:
           | > and there's no evidence they did so in this case.
           | 
           | Indeed; the users probably wanted to make the data public, if
           | scraper accounts could see it. There is a GDPR allowance for
           | data "manifestly made public by the data subject".
           | 
           | https://gdpr-info.eu/art-9-gdpr/
           | 
           | Here, it's just Facebook wanting to keep the data inside a
           | walled garden.
           | 
           | For the same reason, I quit LinkedIn and made my own site. I
           | don't want people to have to sign in to see my profile.
        
       | oxff wrote:
       | Pretty rich idea coming from FB, lol. They do human scraping.
        
       | samsoftstuff wrote:
       | It's like they don't know that courts just made it legal:
       | https://techcrunch.com/2022/04/18/web-scraping-legal-court/
        
         | blantonl wrote:
         | "Legal" doesn't make it ethical, nor does it shield you from
         | liability if you willfully violate contract law (terms of
         | service)
        
         | brushfoot wrote:
         | From the article: "[T]he Ninth Circuit reaffirmed its original
         | decision and found that scraping data that is publicly
         | accessible on the internet is not a violation of the Computer
         | Fraud and Abuse Act."
         | 
         | The key phrase is "publicly accessible." This wasn't that. The
         | scraping was done by automating Facebook accounts, which have
         | terms of service, which forbid scraping.
         | 
         | ToS/EULAs make a big difference. They're the reason Blizzard
         | could shut down bnetd's StarCraft server. They're why no one
         | can legally reverse engineer Oracle to create a drop-in
         | replacement, despite interoperability provisions.
         | 
         | More and more platforms are putting the majority of your user-
         | generated content behind auth walls with ToS because that's how
         | they prevent competitors from swiping it.
        
           | EMIRELADERO wrote:
           | > ToS/EULAs make a big difference. They're the reason
           | Blizzard could shut down bnetd's StarCraft server. They're
           | why no one can legally reverse engineer Oracle to create a
           | drop-in replacement, despite interoperability provisions.
           | 
           | Strictly referencing EULAs for user-owned copies of software
           | here, not ToS:
           | 
           | That is not true. The Blizzard court clearly erred in not
           | considering unconscionability when analyzing the EULA. As for
           | Oracle, the interoperability provisions are what _overrides_
           | that part of the EULA.
        
           | Nextgrid wrote:
           | Does it go into detail about the actual meaning of "publicly
           | accessible"? Because most content on Facebook/Instagram
           | requires _any_ valid login (as opposed to a specific account)
           | and that data people intend to be public (especially on
           | Insta).
           | 
           | In this case, the account requirement would be a technicality
           | and the data, for all intents and purposes, would still be
           | considered "publicly accessible" if _anyone_ with an account
           | can access it.
        
           | upupandup wrote:
           | Putting a login screen that any public member can bypass
           | isn't private information. Private info would be Onlyfans
           | videos. So far there is no such feature on Instagram
        
       | postalrat wrote:
       | Hey instagram/facebook/linkedin/etc: It's not your data.
        
       ___________________________________________________________________
       (page generated 2022-07-07 23:02 UTC)