[HN Gopher] Taking action against scraping for hire
___________________________________________________________________
Taking action against scraping for hire
Author : pawelkobojek
Score : 202 points
Date : 2022-07-07 12:33 UTC (10 hours ago)
(HTM) web link (about.fb.com)
(TXT) w3m dump (about.fb.com)
| samsoftstuff wrote:
| It's like they don't know that courts made it legal:
| https://techcrunch.com/2022/04/18/web-scraping-legal-court/
| xvector wrote:
| HN is hypocritical - most commenters here are against this
| because "Meta bad," but at the same time, most commenters
| wouldn't want their posts shared privately amongst friends to be
| scraped and made available publicly.
| mpeg wrote:
| For that to happen, one of your friends would have had to
| willingly allow this tool to scrape their social network, which
| would include your private posts.
|
| Is the scraper to blame here, or the friend?
| Komodai wrote:
| lol maybe if you don't want that happening you shouldn't be
| using Facebook
| pawelkobojek wrote:
| There are two cases they brought up, one being web scraping and
| the other is making a clone website publicly displaying content
| from Instagram.
|
| I think Meta might be mixing up these two cases here on purpose
| to make it look like web scraping is as bad as stealing photos
| to publish it on a clone website.
| postalrat wrote:
| Who is scraping their private messages? Themselves or their
| friends?
| oefrha wrote:
| > most commenters wouldn't want their posts shared privately
| amongst friends to be scraped and made available publicly.
|
| Where's the "posts shared privately amongst friends made
| public" part? There are two cases here:
|
| 1. A service that logs in as the customer (who voluntarily
| provide their credentials) and scrapes information visible to
| said customer on their behalf. Nothing about "made available
| publicly" is alleged.
|
| 2. An individual using a pool of bot accounts to scrape posts
| visible to any logged in user. Nothing about "shared privately"
| is alleged. To be clear I don't like the method, but I'll also
| have to admit I've used one of the Instagram "clone sites" in
| the past thanks to their login wall.
|
| Unless I missed something, it sounds like you just made it up.
| ogurechny wrote:
| As many other people, you are calling something "private" when
| it is not.
|
| "Privately shared with friends" used to mean that only you and
| your friends know something. You don't "share" anything with
| "friends" on a social network. You give the information to a
| giant corporation. If it finds it suitable, it then delivers it
| to other users, but only after it records your location,
| analyzes the content to check if you were, say, affected by
| some melodramatic event (and therefore should be tricked into
| spending more time... I mean, get "personal recommendations"
| for a certain kind of content), and does a billion other
| things.
|
| If you consider that this is fine, please relay all your
| conversations with family and friends through me from now on. I
| offer secure, reliable, fast, yada yada communication service.
| And it's hip! Ask anyone on the street what they use.
| almog wrote:
| Ironically, around a year ago I disclosed (using their White Hat
| bug bounty program) that I'm able to access recruitment data
| (candidates details mostly) using very cheap form of scraping
| against a 3rd party service provider, they dismissed it and
| instructed me to report it to the 3rd party that operates that
| service (which I did beforehand but the issue has had not been
| fixed).
|
| Sorry for being vague here, I haven't publicly disclosed it yet,
| but will probably have to if it don't get fixed.
| Hedepig wrote:
| Is this much different from LinkedIn vs hiQ?
| nojito wrote:
| Logged in vs not logged in data.
| logifail wrote:
| > Logged in
|
| Is this actually private data, or is it public stuff that's
| become annoyingly hard to view anonymously because Meta chose
| to stick it behind a login box?
| cupofpython wrote:
| >public stuff that's become annoyingly hard to view
| anonymously because Meta chose to stick it behind a login
| box
|
| this one
| nojito wrote:
| Anything behind a login gate is private data for that
| registered user only.
| logifail wrote:
| > Anything behind a login gate is private data for that
| registered user only
|
| That's quite the claim, if only the login gate were
| either always there or indeed always not.
|
| Presuambly such "private" data ought not to be being
| indexed by search engines and returned to users who
| search?
|
| "site:instagram.com" is of the order of 228 million pages
| on google.com, and "site:facebook.com" is another 422
| million.
| nojito wrote:
| pretty sure you get hit with a login gate if you navigate
| to the results via site:instagram.com no?
| logifail wrote:
| > you get hit with a login gate if you navigate to the
| results via site:instagram.com
|
| Nope, I just tried it (private browser session, no IG
| activity from my IP recently)
|
| google.com -> "site:instagram.com nojito" -> results ->
| www.instagram.com/explore/tags/nojito/ with a page of
| photos.
|
| Quickly scrolling down the page for several dozen photos
| does eventually trigger the login box, though.
| upupandup wrote:
| but you make it public for everybody with the publicly
| accessible login so it wouldn't be considered private
| data for the same reason news outlets can use your
| instagram images and share it widely without your
| permission.
|
| you can't throw up a login screen but then allow people
| to post themselves that ends up in public domain because
| the login does not distinguish from public or
| permissioned user authorized to view your selfie pics.
| Nextgrid wrote:
| Depends if another user can also access it, or whether
| the original author/owner of the data in question intends
| for it to be public. In Facebook's case, there are
| permission levels you can set on posts, including a
| "public" option (which isn't actually public though and
| will require a login anyway, but it can be _any_ login)
| which would settle that debate quickly - hell I wouldn 't
| be surprised if that option were to be hidden as to not
| acknowledge that a particular bit of data was explicitly
| posted for everyone to see.
| logifail wrote:
| > In Facebook's case, there are permission levels you can
| set on posts, including a "public" option (which isn't
| actually public though and will require a login anyway,
| but it can be any login)
|
| Q: Have you tried this?
|
| In a private browser session I started at google.com,
| searched for "site:facebook.com nextgrid", picked some
| random post, click through, and was reading the post
| without anything other than seeing FB's cookie banner. No
| sign of any login (which is good 'cause I don't have one)
| Nextgrid wrote:
| I suspect it depends on your region, page/post in
| question and browser fingerprint. A post marked as public
| isn't 100% guaranteed to be publicly viewable. Sometimes
| you can view it but merely scrolling down on the page
| would trigger a login form for example (I've had this
| happen for pages that are definitely meant to be public
| such as businesses who'd have an interest in getting as
| many eyeballs as possible on their content).
|
| I might be wrong and maybe the behavior is actually fully
| deterministic and isn't nefarious, but knowing the
| company behind it I'll assume malice until proven
| otherwise.
| Nextgrid wrote:
| So much bad faith in this press release but not surprising from
| such a disgusting company, with of course some China-related
| fear-mongering despite no evidence of wrongdoing.
|
| > After paying for access to the scraping software, customers
| self-compromised their Facebook and Instagram accounts by
| providing their authentication information to Octopus.
|
| They didn't "self-compromise" their account. They trust Octopus
| to act on their behalf, and unlike Facebook, Octopus' interests
| are most likely more aligned with their users' since their
| service is paid. This is no different from handing your Facebook
| credentials to your social media manager or secretary. There's no
| evidence that Octopus misused this access in any way.
|
| > Octopus designed the software to scrape data accessible to the
| user when logged into their accounts, including data about their
| Facebook Friends such as email address, phone number, gender and
| date of birth, as well as Instagram followers and engagement
| information such as name, user profile URL, location and number
| of likes and comments per post.
|
| This is either information people intend to be public or
| information they trust their friends to keep private. Now if
| Octopus was leaking the private information to third-parties it
| would be one thing, but so far I see no evidence Octopus was
| disclosing the scraped information to anyone but their customer
| (who is already authorized to access it).
|
| > Meta is an industry leader in taking legal action to protect
| people from scraping and exposing these types of services
|
| Translation: Meta is an industry leader in protecting its
| disgusting business model that hinges on making public data
| behind a walled garden with an unacceptable "privacy" policy.
| There wouldn't be a market for Octopus (or other scrapers) if
| Facebook already allowed customers to efficiently access
| information they're already entitled to, but that would be
| against their interests as their entire business hinges on
| information being held hostage.
|
| They've created a problem, are selling the cure (well in this
| case monetizing it via ads) and are now pissed off that someone
| else is selling the cure for cheaper.
| nicholasjarnold wrote:
| Funny story from the early days of TheFaceBook, probably around
| 2005ish:
|
| I was a webmaster of a set of servers on a major university's
| network. I also had access (enough to run arbitrary programs that
| had pretty much full ingress/egress to the public internet) to a
| number of machines across the campus's network. Through some of
| my coursework and ACM chapter activities I met some other
| similarly minded technical people with similar levels of access.
|
| We decide that it would be fun to use our superpowers (access +
| programming abilities + curiosity) to sign up for various
| accounts on FB and essentially scrape and friend as much as
| possible. At the time they had some rate limiting, some IP
| banning (which wasn't terrible because the Uni gave public IPv4
| addrs to all machines on campus by default) and then added some
| early CAPTCHA which we ended up breaking pretty trivially with
| some python and image recognition code.
|
| Never got sued... :) Never really did much with the scripts or
| data except test that they worked. Fun times.
| Komodai wrote:
| Is it Octopus Data Inc. aka Octoparse they are suing?
| typon wrote:
| Google has turned Google Search into a walled garden by scraping
| people's content and serving it up on their own platter. Is
| anyone going to stand up to them?
| allenleein wrote:
| Ironically, Octopus reminds me of "Octopus VR" in the Silicon
| Valley show.
|
| https://www.youtube.com/watch?v=ltFB4WBdDg4
| mothsonasloth wrote:
| "It's a water animal"
| NelsonMinar wrote:
| Octopus sounds really useful; is there an open source equivalent?
| I'd love to be able to scrape my own data on Facebook. Their data
| export feature is fairly good but far from complete.
| jascii wrote:
| So, Facebook doesn't want to share the data it wants us to share
| with them? Figures...
| [deleted]
| jacooper wrote:
| They are will using fb.com domain? I though meta is not
| FaceBook?....
| Silica6149 wrote:
| I think it's like Google vs Alphabet. Alphabet is the parent
| company like Meta.
|
| As for why their domain is facebook for their news site, not
| sure why. It would make for sense for it to be under meta
| instead.
| carride wrote:
| In the early days of FB, they convinced people that pages (or
| some content, sorry I do not know the FB terms) could be public
| for anyone to view without needing to login to FB. This was very
| helpful for small businesses and communities. In many countries
| this is still the quickest place to make a public page. Though
| now, every small business or community page I want to visit is
| locked out unless I login FB. Even if I do login it is impossible
| to copy paste the important details of a page or post, plus the
| UI is as ugly as it has always been.
| carride wrote:
| I am currently in the USA and when I visit a public FB page
| e.g. [1], there is a small login header, and a very big
| annoying footer login. I estimate 15% of the content is
| blocked. I had spent the past year outside USA until one month
| ago. When I visited the same sites while traveling outside the
| USA, the annoying login footer moves to the middle of the page
| blocking almost all content. I do not have proof at the moment,
| but that was my experience trying to read 95% of government,
| business, and community pages who are almost all on FB.
| [1] https://www.facebook.com/ParquesNacionalesdeArgentina
| Litost wrote:
| Anyone else heard of Tim Berners-Lee's idea of hosting your data
| in pods outside the relevant corps wanting access to it and you
| controlling what's shared and how? This is such a completely
| different way of doing it, I'm not sure of all the implications,
| be that from admin (how much effort) to security (would this be a
| massive hacking opportunity) etc.
| https://www.theregister.com/2022/01/20/tim_bernerslee/
| dmje wrote:
| Or Facebook could just open up their data. Oh wait, not _their
| data_ , silly me. Everyone else's data. Keep on scraping, I say.
| rmbyrro wrote:
| The fact they're wasting time on that is a sign that Facebook
| decay phase has already started.
| rustdeveloper wrote:
| "This industry makes scraping available to individuals and
| companies that otherwise would not have the capabilities." -
| seems like web scraping companies are doing a good job :)
| theincredulousk wrote:
| Maybe some irony here as IIRC Facebook started as essentially a
| scraping company, pulling student profiles from college
| websites and re-publishing it for their own profit.
|
| The scrapers have become the scrapees. The horror.
| jhoelzel wrote:
| The phone charger makes engery available to individuals and
| companies that otherwise would not have the capabilities. ;)
| dangerlibrary wrote:
| Fingers crossed they eventually get around to suing Clearview AI
| out of existence.
|
| https://www.nytimes.com/2020/01/18/technology/clearview-priv...
| i_have_an_idea wrote:
| > After paying for access to the scraping software, customers
| self-compromised their Facebook and Instagram accounts by
| providing their authentication information to Octopus
|
| "self-compromised" lol
|
| clearly these people just wanted an automated way to access their
| own data
| antonf wrote:
| > clearly these people just wanted an automated way to access
| their own data
|
| GDPR and CCPA (and probably many other national/state privacy
| laws) forces facebook/instagram/etc to let you download and/or
| delete your data without using third party websites. Usually
| people self-compromise their accounts in exchange for money:
| https://www.buzzfeednews.com/article/craigsilverman/facebook...
| throwaway5959 wrote:
| Wasn't Meta stealing news articles and not paying news
| organizations for them?
| htrp wrote:
| This is different from LinkedIn v HiQ because HiQ was only
| scraping publicly available data that was generally accessible to
| the broader internet. In these two cases, the data is being
| scraped from FB/Insta using credentials that the client handed
| over or the mass creation of accounts solely for scraping
| purposes.
| postalrat wrote:
| What would be your position the data being scraped is data the
| site is selectively providing google for indexing but don't
| provide publicly.
| squaresmile wrote:
| Yeah, I think this is more like the Cambridge Analytica
| situation.
| Nextgrid wrote:
| I wish the Cambridge Analytica FUD would stop. CA's "attack"
| was to setup a malicious website that convinced idiots to
| give it access to their Facebook account using the standard
| oAuth2 flow.
|
| Did they misuse the collected data? Sure. But people granted
| access to that data knowingly. This wasn't really an attack
| in my view.
|
| Facebook wasn't really complicit and definitely didn't
| sell/give away any data.
| benwad wrote:
| Did FB ever take any legal action against Cambridge
| Analytica? I can't remember anything about it and this sounds
| very similar to that (although back in those days FB's tools
| made this incredibly easy).
| lesuorac wrote:
| No. FBs ToS at the time [1] allowed CA to do what they did.
|
| Namely, CA didn't resell the data or give it to an ad
| agency.
|
| [1]: https://web.archive.org/web/20180329131546/https://dev
| eloper...
| Nextgrid wrote:
| > the mass creation of accounts solely for scraping purposes.
|
| Those accounts wouldn't be allowed to view private data though
| unless they friend/follow the person first, so they'll only
| still be limited to data the account holders intend to be
| public and available to anyone.
|
| There's also no evidence that the scraped data was aggregated
| at scale or commingled in any way, so even if customers
| provided their actual credentials which grant them access to
| private data of their friends, the scraper didn't share it with
| anyone else but them.
| upupandup wrote:
| whoa wasn't there somebody on HN that ran a web scraping shop
| that were boasting they can scrape instagram a while back? are
| these the same guys???
|
| I don't know how far Facebook can get with this, thought
| Linkedin's court ruling made scraping legal de-facto
| neya wrote:
| _Evil Big Co._ that literally STEALS people 's personal
| information everywhere they go even after they've indicated they
| want to be left alone is now offended when someone does the same
| to them?
|
| Well, color me surprised /s
|
| Fuck Facebook. Meta. Or whatever you want to call it.
| PhilipA wrote:
| >Octopus, a US subsidiary of a Chinese national high-tech
| enterprise, built a cloud-based platform designed to provide
| paying customers access to on-demand scraping software and
| services.
|
| It is interesting as how they try to position this as a Chinese
| attack on them.
| upupandup wrote:
| It must coincide with Christopher Wray's sudden claim that
| there is an active dragnet of sorts that is trying to subvert
| America from within much like the recent election interference
| of a former Tianmen square activist who tried to run for
| congress I think.
|
| It makes me think that there are many people on CCP's dole,
| rich powerful famous people are somehow beholden to the CCP in
| some unknown way but we can all guess correctly that they are
| _all old white men_ who have previously been seen with young
| females.
| MangoCoffee wrote:
| it look like Zack is giving up on the Chinese market.
| romanovcode wrote:
| I guess after Winnie the Pooh rejected to name his children
| for him he got sour grapes for China.
| ok123456 wrote:
| Remember back when facebook grew their little network by scraping
| your gmail contacts.
|
| Google blocked them.
|
| There was animus between the two companies that resulted in
| Facebook not making an official android app until 2010.
| romanovcode wrote:
| > Meta is an industry leader in taking legal action to protect
| people from scraping and exposing these types of services, which
| provide scraping as a service across multiple websites.
|
| Sure, as long as Meta is not the one selling the data to
| Cambridge Analytica it's wrong.
| uhtred wrote:
| Fuck off Facebook you scumbags
| iandanforth wrote:
| Collecting the rhetorical BS:
|
| "scraping attacks"
|
| Scraping is not an attack. Monopolists want to pretend they own
| your data because they get unlimited access to monetize it
| whereas competitors should have none.
|
| "self-compromised"
|
| Monopolists want to sell _you_ thus it 's imperative they
| maintain the fiction of "one person, one account". By admitting
| you own your account, they'd have to allow sharing and they
| wouldn't be able to provide their customers (advertisers) with
| reliable data about individuals.
|
| "protect people from scraping"
|
| Monopolists will protect themselves and call it protecting you.
| They will attempt to make you afraid of some _other_ actor using
| your data in harmful ways so as to detract from how they monetize
| you and use your data in harmful ways.
|
| "deter the abuse"
|
| Monopolists don't want to argue about what constitutes abuse.
| Anything they write in their TOS is entirely for their benefit
| and only constrained by local law (if that). They will abuse you
| to the fullest extent they can get away with while arguing that
| any action to use your rights is "abuse."
|
| "safeguard people against clone sites"
|
| Monopolists want to maintain their monopoly, there is no greater
| threat than a direct challenge to that monopoly by allowing data
| to move freely.
|
| --
|
| More subtle but even more ironic rhetorical points
|
| "for hire" / "paying for access"
|
| Emphasizing that people making _money_ (gasp) for providing this
| service, is _bad_.
|
| "industry leader in taking legal action" + "across many platforms
| and national boundaries, also requires a collective effort from
| platforms, policymakers and civil society"
|
| Monopolists can pay high priced marketers to rebrand them as
| patriotic hero figures fighting valiantly for the little guy.
| rmbyrro wrote:
| Missed this one:
|
| > _a US subsidiary of a_ "Chinese national" "high-tech"
| _enterprise_
|
| Replacing it with "a business" would do just fine.
| lupire wrote:
| mylons wrote:
| they also toss in the chinese affiliation in hopes to bring
| even more ill will from the reader towards the company. china
| is probably doing some bad things, but scraping facebook ain't
| one of them.
| iandanforth wrote:
| Good point, I missed that one.
| kube-system wrote:
| Scraping social media is something that China is very
| notorious for doing. They are 100% positively scraping all
| major social networks around the world.
|
| They do this to collect information of foreign policy
| interest to them, to silence political dissidents abroad,
| etc.
|
| For example: https://www.washingtonpost.com/national-
| security/china-harve...
|
| And: https://www.propublica.org/article/even-on-us-campuses-
| china...
| pr0zac wrote:
| While I agree with your assessment of the BS in the article wrt
| scraping, and also agree with your assessment that the
| behaviour is completely about FB protecting itself and its
| monopoly control (the word control being important), I think
| its important to emphasize its not about FB caring whether
| other entities having access to the data, its about FB caring
| about it's public perception with regard to its having that
| data at all.
|
| Over the last few years or so it feels like, to reference a
| @dril tweet[1], Facebook has just been 'turning a big dial taht
| says "data access" on it and constantly looking back at the
| audience for approval like a contestant on the price is right'
| with how much it allows 3rd parties to get at its data.
|
| Keep in mind ~5 years ago the big thing at FB was "Open Graph"
| and "Graph Search" which gave everyone really in-depth access
| to their data with the idea that Facebook would be the "data
| platform" on top of which all of these 3rd parties would build
| apps and interfaces. This of course eventually resulted in the
| whole Cambridge Analytica thing and now this gigantic swing in
| the other direction of being overly protective of the data as a
| kneejerk PR reaction to all the bad press.
|
| FB loved sharing data and provided a direct API for accessing
| it when the public narrative was about data freedom and 3rd
| party developer friendliness and it hates giving any access at
| all and goes around sues web scrapers now that the public
| narrative is all about privacy.
|
| Facebook will happily align itself in whatever way results in
| the least public outcry arguing they shouldn't be allowed to
| have the data in the first place regardless of if that means
| giving access or restricting it.
|
| 1: https://twitter.com/dril/status/841892608788041732
| noslenwerdna wrote:
| The users agreed to share their data with Facebook, not some
| other company. If they didn't prevent this, they'd be asking
| for another Cambridge Analytica
| greatgib wrote:
| The user agreed in facebook to have is data "public", so it
| can't complain that a robot scrap it.
|
| Nothing prevents him to restrict access to his pages an data
| to "trusted" friends.
| kube-system wrote:
| The description in the article sounds like it scrapes
| private profile data.
|
| > Octopus designed the software to scrape data accessible
| to the user when logged into their accounts
| greatgib wrote:
| I don't think so, it is more like you scrape what is
| accessible to this user. So in the end you will scrape
| your friends data. This is why I said that you are free
| to only share with friends that 'you trust'.
| Kwpolska wrote:
| Were they showing the private data to everyone, or just
| to the person whose account was used for the scraping? If
| it's the latter, then this is also not a crime, it is
| just someone accessing data they have been authorized to
| access, but in an automated way.
| stickfigure wrote:
| The users agreed to share their data with everyone that uses
| Instagram. Because that's how the site works.
| kube-system wrote:
| There's an important difference between technically
| consenting and informed consent.
|
| Given what I know about the bot problem on Instagram, I
| would imagine many people have been tricked into sharing
| their private profiles with scraping bots. Many bots are
| copying real people's profiles and then spamming their
| friends with follow requests. It's highly effective and
| gives these bots access to private profiles.
|
| Fooling people is fraudulent, period.
| jasfi wrote:
| That is a very good point, but surely it was taken into
| consideration when scraping was declared legal?
| danuker wrote:
| https://techcrunch.com/2022/04/18/web-scraping-legal-court/
| stefan_ wrote:
| All that case says is "scraping is not a violation of the
| CFAA". But of course the scraped data still exists in legal
| limbo; maybe you can compute derived information from it,
| but the moment a scraper _reproduces_ it there is all of
| copyright law waiting for them.
| TechBro8615 wrote:
| Indeed. It's the height of hypocrisy for a company to define
| the borders of its own system and then prosecute those who they
| consider in violation of them. There is no consideration given
| to whether the data should have been collected and retained by
| Facebook in the first place, regardless of whatever arbitrary
| access policies they defined to fit their own business and data
| model.
|
| It's not clear what Facebook's position on scraping truly is.
| Sometimes they downplay it as "normalized and widespread," and
| other times they castigate it as inexplicably legal and clearly
| immoral, or even outright "in violation of state and federal
| law." For example:
|
| - April 2021. Researchers find an exposed database containing
| the scraped data of 533 million facebook users. Some news
| reports refer to it as a "breach." Facebook attempts to
| downplay the issue as the result of third party scraping.
| Headline in ZDNet: "Internal Facebook email reveals intent to
| frame data scraping as 'normalized, broad industry issue'" [0]
|
| - October 2020. Facebook announces lawsuits against companies
| it claimed created a "malicious extension on Google's Chrome
| Web Store designed to scrape Facebook, in violation of
| Facebook's Terms and Policies and state and federal law." [1]
|
| So... which is it? Does Facebook believe that scraping is a
| "broad, normalized industry issue?" Or is it a violation of
| "state and federal law?" It seems like they measure severity of
| its impact primarily based on the reactions of political
| commentators.
|
| And what's the difference between automating a browser and
| automating an API client? Why did Facebook design an API for
| accessing the data they collected, if it's illegal to collect?
| They've even claimed to be the victim of Cambridge Analytica,
| who purchased a "quiz" application created by a developer who
| pieced it together using code straight from the "examples"
| section of Facebook's API documentation.
|
| There is one obvious resolution to this apparent contradiction.
| If we remove Facebook from the question, then the contradiction
| resolves itself. All we need to do is stop presuming that
| Facebook has the right to collect and retain this data in the
| first place. And as a user, if you publish your data to a
| website designed for sharing it with other people, then by
| definition it is no longer private data. Therein lies the
| central question: what is "semi-private" data, and who controls
| its boundaries?
|
| [0] https://www.zdnet.com/article/facebook-internal-email-
| reveal...
|
| [1] https://about.fb.com/news/2020/10/taking-legal-action-
| agains...
|
| p.s. another thing they never mention is _why_ companies want
| to scrape lists of facebook users. perhaps it might have
| something to do with the "lookalike audience" feature, and its
| more precisely targetable predecessors, which allow advertisers
| to upload a list of usernames and email addresses for targeted
| advertising?
| utahcon wrote:
| The only argument I have here (sadly in favor of FB) is with
| "safeguard people against clone sites". While I did give my
| data to FB, I didn't approve that transfer to another
| site/system. That is the only place I could possibly see some
| legal foot hold.
| kbenson wrote:
| It's impossible to control information once been created. The
| longer it's existed and the more locations you can see it
| make that spread exponentially more likely.
|
| Wehether we make that spread of informationlegal or not does
| little to affect whether it happens.
|
| There are two things that might help. First, don't share as
| much information. Once it's no longer limited to you or your
| close group of friends which hopefully won't share it along
| with your name, it's mostly out of your control. Second, put
| limits (laws) on what information companies are able to
| synthesize about you, and how long they can retain it. If
| there's less information created about you (or it's
| ephemeral, created and destroyed as needed), and if they need
| to clean out older data, there's less to be shared or stolen.
| kube-system wrote:
| "It's hard to enforce the rule of law" is not a good reason
| to abandon it entirely. Data privacy laws make data privacy
| better even without being 100% infallible.
|
| We should be both practicing good data hygiene _and_ using
| legal tools to combat those who abuse data privacy.
| kbenson wrote:
| > "It's hard to enforce the rule of law" is not a good
| reason to abandon it entirely.
|
| I didn't?
|
| > We should be both practicing good data hygiene and
| using legal tools to combat those who abuse data privacy.
|
| That's what I said. The first thing is data hygiene, the
| second is legal requirements. The difference I think is
| that the legal requirements should be on the actual
| creation and retention of the data, not just who owns it,
| who it can be shared with, etc.
|
| As soon as PII information over a certain age is
| radioactive and linked to a fine _per person_ , all of a
| sudden there'll be a lot less giant repositories of PII
| to worry about.
| asdff wrote:
| What happens when FB builds a shadow instagram profile of you
| based on your FB account? That already happens. FB clones
| their own data for other projects no different than what you
| might fear happening if this data were cloned to a third
| party. The cat is out of the bag already but FB wants to
| pretend they are the only ones with the right to abuse.
| blantonl wrote:
| coffeeblack wrote:
| And that's the trick. You use the bad apples to delegitimise
| the good ones. Works every time.
| dhzhzjsbevs wrote:
| > the vast majority of Web scraping efforts are to build
| businesses on top of other organizations hard work and
| innovation. Period. End of story.
|
| Yeah and the vast majority of the internet and all these mega
| corps run on open source while paying pittance back to the
| ecosystem. Cry me a fuckin river.
|
| Can't wait til someone sue's them for "scraping" their site
| for web previews and thumbnails everytime someone shares a
| link on Facebook.
|
| The double standard of these muppets.
| trasz wrote:
| >the vast majority of Web scraping efforts are to build
| businesses on top of other organizations hard work and
| innovation
|
| The vast majority of Facebook/Google's efforts are to build
| businesses on top of other organizations hard work and
| innovation.
| jjoonathan wrote:
| lcnPylGDnU4H9OF wrote:
| If simp is supposed to be short for simpleton, you might
| want to consider how simple your thoughts are.
| [deleted]
| DaveFr wrote:
| It's not, see
| https://www.urbandictionary.com/define.php?term=Simp
| lcnPylGDnU4H9OF wrote:
| I can also link to a source that's going to be biased in
| my favor: https://www.etymonline.com/word/simp
| jjoonathan wrote:
| > 1903
|
| > 1640s
|
| Lol, no. I'm using the definition from _this_ century:
|
| > Someone who does way too much for a person they like
| EGreg wrote:
| What a pretty picture capitalism is.
|
| "Give us all your data for free." "They 'trust me', dumb
| fucks."
|
| https://www.esquire.com/uk/latest-news/a19490586/mark-
| zucker...
|
| _Proceeds to build entire business on this data..._
|
| "You can't scrape us!"
|
| LinkedIn tried this:
|
| https://www.zdnet.com/google-amp/article/court-rules-that-
| da...
|
| And it's not like capitalist enterprises even try to be
| consistent in their legal complaints:
|
| https://9to5mac.com/2022/04/14/apple-calls-out-meta-for-
| hypo...
| smt88 wrote:
| I don't sympathize with a monopoly that people are trying to
| weaken.
|
| I loathe Meta and want to boycott it. Unfortunately this
| means I'm now locked out of the only repository of most local
| events and gatherings in my city.
|
| In some countries, life is literally not possible without
| WhatsApp.
|
| If Meta wants to cry about the mean bullies trying to
| exfiltrate data, they need to stop wiping out competitors.
| latexr wrote:
| > the vast majority (...) Period. End of story.
|
| If you're going to assert something as definitely true to the
| point of closing off discussion, I'd expect a modicum of
| evidence. At a minimum that you'd explain the reasoning
| behind your conclusion. What's the source of the "vast
| majority" claim? There's little point to advertising when
| you're scraping a website for personal consumption, so it
| seems dubious anyone would have reliable numbers on which
| kind is more prevalent.
| sneak wrote:
| > _the vast majority of Web scraping efforts are to build
| businesses on top of other organizations hard work and
| innovation._
|
| Not really. Scraping just gets data, not code, so it's hard
| to support this argument. The anti-scraping view is that the
| right to use the data rests with the company that collected
| it, but I don't think that view is held by most people.
| blantonl wrote:
| If you are arguing that an organization's data is worthless
| but only their code has worth, then I'm not quite sure
| where to go from this point in this discussion, other than
| to say _that is crazy_.
| PeterisP wrote:
| The data is obviously valuable, but they don't
| necessarily deserve a monopoly on that data, since that
| data primarily belongs to the users who created the data;
| so while it's understandable that organizations want to
| restrict that data, we have no obligation (moral or
| otherwise) to respect that desire.
| jjoonathan wrote:
| Exactly. Your list of friends does not belong to
| Facebook, it belongs to you.
|
| I am sure Facebook believes they deserve a monopoly for
| having obtained it first. They do not. The market forces
| _you_ to compete for every dollar you earn, so you have
| every right to expect Facebook to compete for every
| dollar _they_ earn, and "I touched it first therefore
| it's mine!" is not competition.
| blantonl wrote:
| But, but, but..... you _agreed_ that Facebook _does own
| your friends list_ when you signed up for an account and
| started giving them all your data.
|
| If I run a restaurant, and I stipulate that when you walk
| through the doors and place an order I reserve the right
| to take your picture and post it on the bulletin board,
| why would you place the order and then get pissed off
| when I post a picture of you on the bulletin board? And
| why would you be mad at me if I stipulated that no one
| else can use a camera in my restaurant? Terms of service,
| my friend. Unless prohibited by legislation, I can
| stipulate how things run in my restaurant.
| jjoonathan wrote:
| If your bulletin board somehow let you monopolize the
| restaurant industry (? lol) then we should absolutely
| vote for some politicians to boot your entitled ass back
| into competition.
|
| Obviously, the idea of a bulletin board granting a
| restaurant an effective monopoly is ridiculous so your
| analogy is trash, but even if your analogy wasn't trash,
| your conclusion would still be wrong.
| sneak wrote:
| I'm not saying that the data isn't valuable, but that
| possession of the data, valuable though it may be, is not
| related to the organization's hard work or innovation.
| For the most part, any control rights to the data likely
| rest or should rest with the people who provided it to
| the company.
|
| Meta claiming that all of the photos on Instagram are
| Meta's property does not comport with current IP law or
| the views/opinions of most of the users on Instagram who
| do own the copyrights to those photos.
|
| You really shouldn't be able to sue anyone for use or
| copying of data to which you do not hold copyright. The
| stuff on FB is licensed to FB by the people who own it
| (their users).
| kordlessagain wrote:
| I've reread the previous comment and I really don't see where
| there is any justification stated for acting in an unethical
| manner. While Facebook may be making an argument against
| unethical behavior by a few, using the language they do is
| detrimental to legitimate uses of crawling content available
| on the Web.
|
| Corporations, by nature, work in a way that individuals at
| those companies don't. They are literally "non-corporal"
| entities and work toward increasing profit and stakeholder
| value, not improving the lives or situations of their users,
| unless that happens to correspond to making them more money.
|
| We should all be wary of corporate control and claim to
| rights built from their user base, especially if those
| services are offered for "free".
| [deleted]
| blantonl wrote:
| _We should all be wary of corporate control and claim to
| rights built from their user base, especially if those
| services are offered for "free"._
|
| That's fine then. And I agree with you. But leave you with
| this.
|
| Do. Not. Give. The. Company. Your. Data.
|
| _They are literally "non-corporal" entities and work
| toward increasing profit and stakeholder value_
|
| Again, I agree. But if you think this is a bad thing, then
| you don't believe in capitalism, and I'm not quite sure
| what the intention is to argue this point on a platform
| (HN) that encourages the most basic forms of capitalism -
| starting up companies with innovative technology and
| solutions.
| EGreg wrote:
| What a pretty picture capitalism is. Break out the
| popcorn for the latest regular installment of "ok for me
| but not for thee":
|
| People You May Know employs tons of shady stuff Facebook
| doesn't reveal and has saved their bacon early on from
| stagnating at around 100M users.
|
| https://mashable.com/article/people-you-may-know-
| facebook-cr...
|
| Facebook Beacon and others had a big outcry. They got
| hauled into Congress multiple times. And of course
| whenever they get caught, they always throw a "mea culpa"
| and do it all over again in a year under a different
| name. Here they are recording faces of their users
| secretly using camera permisions!!
|
| https://www.independent.co.uk/tech/facebook-app-
| recording-ca...
|
| Their entire business model is "Give us all your data for
| free." Mark Z early on was flabbergasted himself when he
| realized he no longer needed to scrape sites on Harvard's
| house websites and could just ask people to submit the
| data for each other: "They 'trust me', dumb fucks."
|
| https://www.esquire.com/uk/latest-news/a19490586/mark-
| zucker...
|
| _Proceeds to build entire business on this data..._
|
| BUT THEN. Someone else does it to them and they get mad.
| "You can't scrape us!" LinkedIn tried this:
|
| https://www.zdnet.com/google-amp/article/court-rules-
| that-da...
|
| And it's not like capitalist enterprises even try to be
| consistent in their legal complaints:
|
| https://9to5mac.com/2022/04/14/apple-calls-out-meta-for-
| hypo...
| mechanical_bear wrote:
| The problem isn't "capitalism", it's crony-capitalism
| enabled by certain elements of state complicity.
| EGreg wrote:
| Okay, is there a single problem with capitalism, or is it
| perfect? The problem is never w capitalism?
| jjoonathan wrote:
| Yeah, nothing says "commie" like trust busting and
| keeping markets competitive.
| ramses0 wrote:
| Cough, cough, Google, cough, cough...
|
| I'm not ashamed to admit that I've done some jquery
| shenanigans on my Facebook friends page to "export" my friend
| list so I can retake control of my friend relationships
| (disintermediation for the in-crowd).
|
| So easy to push data in to Facebook, so hard to get even
| basic data out of it.
| EGreg wrote:
| Pretty ironic that Mark Z himself started out exactly like
| this: scraping Harvard servers and photos to power facemash.
|
| He subsequently realized that he doesn't need to scrape if he
| can just make a viral site that lets people share this info
| with each other while he can eavesdrop on ALL OF IT:
|
| https://www.esquire.com/uk/latest-news/a19490586/mark-
| zucker...
| ConstantVigil wrote:
| > I love to hate on Meta, but their actions here are spot on
| and make my morning very enjoyable as I sip my cup of coffee.
|
| You might want to reassess your intelligence there friend. It
| seems to be suffering from a common form of cogntive
| dissonance combined with some form of confirmation bias.
|
| How so?
|
| Well you clearly don't like scraping, otherwise you wouldn't
| be agreeing with a criminal... So there's the confirmation
| bias...
|
| Which is also the cognitive dissonance part. You clearly
| don't like Meta/Zuckerberg by your own admission; but you are
| agreeing with a empty rhetoric attack against people who are
| smart enough to make use of Zuckerbergs terrible security
| practices...
|
| Do you not see the problem in this?
| pc86 wrote:
| Who is the criminal here? Scraping is not illegal. This is
| a civil suit, so even if Meta wins, it's still not remotely
| criminal for anyone involved.
|
| Also please explain to me how someone giving a company
| their Facebook credentials is an example of "people who are
| smart enough to make use of Zuckerbergs terrible security
| practices."
| blantonl wrote:
| This is a total non-sequitur argument here. You've gone
| from accusing me of lack of intelligence to suffering from
| cognitive dissonance and confirmation bias, to Facebook's
| terrible security practices: simply because I'm pleased
| that an organization has taken action against Web scrapers
| for violation of Terms of Service.
|
| Yes, I've gone on record indicating that I believe Web
| scraping to be generally unethical, and that I'm pleased
| that some action was taken against those that make it their
| business to do so. And that is all that I have stated in my
| OP. You've decided to take me on some circular mental
| gymnastics journey I'm still trying to wrap my head around.
| windexh8er wrote:
| Let me restate this how I view what you've stated: your
| position is that because Facebook has a Terms of Service
| that may define something that is not illegal - means
| that one must abide by it? Also...
| Facebook/Meta/Zuckerberg have lied over and over and over
| very publicly to get their way or to give themselves an
| advantage: by giving themselves unfettered and
| unwarranted access to data that they profit from by their
| own fast and loose rules.
|
| If Facebook/Meta/Zuckerberg are OK with lying, stealing
| and cheating - then why should anyone leveraging their
| online properties need to abide? Until they're held
| accountable under broader rules I see no reason the
| consumption side can't bend them as well. And you may
| argue "this isn't how it works" but we all know this
| isn't how Facebook/Meta/Zuckerberg operate. They operate
| under the premise of: do whatever makes us money because
| breaking the rules is the cost of doing business. So, no
| - they don't get to spew propaganda to the advantage of
| their business under the guise of protecting users. That
| is complete and utter bullshit.
| hdjjhhvvhga wrote:
| I disagree precisely for the simple reason that these
| businesses are using Meta's weapon against them. It will be
| an interesting battle to watch - and if my memory doesn't
| fail me, LinkedIn lost one already. The more the press writes
| about it, the better: (ordinary) people will sooner or later
| see through their doublespeak and realize what is at stake.
| cmiles74 wrote:
| In my opinion, breaking a click-through license agreement or
| violating the small print on some dense and difficult to read
| web page is hardly an issue of morality or ethics.
|
| Let's also remember that a big reason Meta is hating on
| scraping is because of their own problematic behavior. It
| wasn't so long ago that they were suing NYU over research on
| political ads and how Facebook targets their readers.[0] In
| fact, it wouldn't surprise me if Meta's larger goal is to
| prevent this sort of research.
|
| [0]: https://news.bloomberglaw.com/privacy-and-data-
| security/face...
| 14 wrote:
| Sorry but there are many legitimate reasons to scrape a
| website. Price manipulation is one example. Because of
| scraping we know Amazon does things like price gouging and
| raising prices right before they go on "sale". Scraping can
| be very useful for researchers to monitor trends and find
| correlations. It's not just about bad guys stealing personal
| information. There are far to many legitimate uses that
| banning scraping would be a bad thing.
| basetwojesus wrote:
| Regardless, it's very rich that a company like meta is mad
| that they're being beat at their own game (making money off
| of data that they obtained through shady means).
| matthewmacleod wrote:
| Nah, you are straight-up wrong. In fact, it's the opposite -
| the only companies who are scared of scraping are the ones
| whose business models rely on artificial lock-in, and we
| should all be working as hard as we can to demolish them.
| dylan604 wrote:
| >the only companies who are scared of scraping are the ones
| whose business models...<snip whatever other nonsense
| followed>
|
| This is just patently false. There is an expense incured by
| scraping. There is no benefit to a host providing the data
| from those scrapers. My logs are full of various bots that
| pull data from my webhost that costs me money to serve. I
| run various sites that do not serve ads. I do not include
| any 3rd party tracking. They're just simple sites that I
| pay for out of my own pocket because that what I've chosen
| to do. Nothing shady about any of it.
|
| It's just sad that your own personal feelings towards
| scraping prevents you from being able to accept that there
| are people with views other than your own.
| matthewmacleod wrote:
| Hey, I totally accept people have views other than my
| own. I just disagree with them.
|
| It seems extremely weird that you'd want to publish
| content, but then get mad that people are using the thing
| that you published. But you do you.
| dylan604 wrote:
| How is that weird? I publish on my site to have people
| visit my site. I don't publsh for people to take my data
| and do what they will without attribution for where they
| got the data. How that makes no sense to others has me
| saying please don't do you because you are being not
| considerate to others
| jjoonathan wrote:
| It's wild that people are arguing that their friend list
| should belong exclusively to facebook and not, you know, to
| them and their friends.
| dylan604 wrote:
| I feel the same way. My biggest pet peeve is that
| scrapers/bots traversing my site generates more data than the
| target audience of users. The scrapers get all of this data
| for "free" at my expense of the hosting costs to provide them
| that "free" data.
| macinjosh wrote:
| Google search's business model is scraping the web, indexing
| it, and then pasting ads all over their search results made
| up of other people's content. If Google can build a business
| on third-party data then these meta scrapers can do the same
| thing.
|
| It is like saying a photographer can't photograph a building
| from the street because she doesn't own it. The building is
| there, taking a picture takes nothing from the building. That
| is all that is going on here, repeating publicly available
| information.
| injidup wrote:
| No it's more like you subletting an apartment to a dodgy
| photographer who wants to take pictures of the children's
| playground your back window looks out on even though your
| contract explicitly forbids it subletting. The suit is
| against companies that use login credentials that are not
| theirs. It is not public information that is being scraped.
| It is information behind a login with a terms of service
| for what you are allowed to do with that login.
| nathanaldensr wrote:
| Great post that summarizes exactly what I feel about
| globocorps. The euphemisms and propaganda are disgusting.
| lupire wrote:
| fxtentacle wrote:
| Of course, Facebook wants to make it sound like scraping is
| illegal, when it generally isn't.
|
| But account hijacking and mass-creation of accounts just to
| access private pages are clear violations of the Facebook and
| Instagram ToS, so they surely can sue for that.
| crawsome wrote:
| [deleted]
| dementiapatien wrote:
| Since when do you get sued for breaching TOS?
| thallium205 wrote:
| Since when do you get sued for breaching a contract? When the
| offense is worth it.
| curiousllama wrote:
| Since you start a business on the violation.
|
| "Since when do I get sued for taking too many free samples
| from Costco?" -> "Since you started taking millions of them
| to resell"
| jhoelzel wrote:
| im not sure on american law, but if you give me those
| samples willingly i can do whatever i want with them.
|
| Actually this is the reason why many products come with the
| lable "not for resale" but i have yet to find somebody who
| cares about it :D
| treis wrote:
| >give me those samples willingly
|
| Doesn't seem like Facebook is giving them willingly.
| golemotron wrote:
| You can get sued for anything that causes harm.
|
| Relevant life lesson: don't do things to people with money
| that they might perceive as harm.
|
| Corollary: Being sued is as much punishment as losing a suit
| for most people.
| contravariant wrote:
| I don't know but it's at least been that way since Aaron
| Swartz did it I suppose.
| Raed667 wrote:
| Violation of ToS does not mean a violation of the law.
| CoastalCoder wrote:
| I don't think I know the answer, but I'm curious:
|
| Does violating a website's TOS meant your accessing it beyond
| your authority, making it a violation of the US's Computer
| Fraud and Abuse Act?
| zja wrote:
| Violating TOS no; Gaining access beyond your authority
| maybe https://www.eff.org/deeplinks/2010/07/court-
| violating-terms-...
| CoastalCoder wrote:
| I was assuming that in this case, a person's authority
| was specifically granted _by_ the ToS.
|
| I wondered if the interplay of those two concepts muddied
| the waters.
| danaris wrote:
| I don't have a source for this, but my recollection is that
| this has been successfully argued by a couple of companies
| --but then an appeals court found very firmly that it was
| _not_ the case.
|
| Essentially, having that be true would mean that any given
| website could create whole new classes of criminal
| behavior.
| zinekeller wrote:
| > having that be true would mean that any given website
| could create whole new classes of criminal behavior.
|
| While this is true, reading the lawsuit it is clear that
| Meta is suing in civil court, so maybe they're trying to
| enforce their contract, especially their automated
| collection ToS (https://www.facebook.com/apps/site_scrapi
| ng_tos_terms.php)?
| tumult wrote:
| Not a violation. Decided by Supreme Court in 2021. Van
| Buren vs. United States. It was a big deal.
| closewith wrote:
| Most law suits aren't due to breaches of the law, but
| breaches of contract. Whether terms of service constitute an
| enforceable contact is another matter.
| jhoelzel wrote:
| if a bot creates the account, who breaches the contract?
| sneak wrote:
| The person who ran the bot. Programs do not have agency,
| they are just tools.
|
| That's like saying "If the gun fires the bullet, who is
| liable for murder?" It's a silly question.
| CSMastermind wrote:
| > That's like saying "If the gun fires the bullet, who is
| liable for murder?" It's a silly question.
|
| I don't know I've seen several people unironically argue
| that it should be the gun's manufacturer.
| bee_rider wrote:
| Software that exclusively has illegitimate uses has been
| shut down. Whether we agree that it is a good argument or
| not, it is definitely _an_ argument people have made
| (that some types of guns are mainly designed to hurt
| people).
|
| With software of course it is a little complicated
| because:
|
| * it can be produced really easily in a distributed
| fashion over the internet by anonymous people in many
| jurisdictions, so there isn't always an obvious company
| or entity to sue
|
| * most automation tools can be repurposed for malicious
| use (nobody would sue John Deere because their tractors
| can be armored and turned into pseudo-tank things)
| lesuorac wrote:
| Probably should also add "successfully", there's a reason
| NYPD had/has guns that require 12 pounds of force to pull
| the trigger (instead of a normal ~5 lbs).
| adamsmith143 wrote:
| ToS have been around for decades, surely this question is
| settled by now?
| marlowe221 wrote:
| Former attorney turned software developer here!
|
| Nope, it's not a settled question in the way that I think
| you mean. Each ToS is different so each would be subject
| to individual legal analysis in court on its own terms.
|
| Questions would include whether the ToS is
| unconscionable, whether the terms violate laws of the
| locality/nation, and so forth.
|
| It's the same with traditional contracts - the fact that
| contracts have been around for hundreds (maybe thousands)
| of years doesn't mean much if you and I create a brand
| new one between us. Our contract's specific terms (and
| events/actions between us as a result) would be the issue
| in court.
| kaivi wrote:
| Why can't FB simply include a clause like "No kind of
| automated scraping is allowed, except for search engines
| in robots.txt"? This would save them so much time in
| court, arguing over the use of fake accounts which should
| really be irrelevant.
| closewith wrote:
| It's not clear that clause would be enforceable. Scraping
| has been found to be lawful in many jurisdictions,
| including the US, even without the consent of the host.
| adamsmith143 wrote:
| So even the general question of "Whether terms of service
| constitute an enforceable contract" depends on each
| individual ToS?
| marlowe221 wrote:
| Congress or a state legislature could pass a law that
| says "No terms of service are ever enforceable" but to my
| knowledge no one has done that.
|
| So, under the current state of the law whether or not a
| contract is enforceable depends entirely on what the
| terms in that specific contract are.
|
| Unfortunately, this is yet another instance where the law
| has failed to keep up with technology. Contract laws (at
| least in the USA) date back long before anyone ever
| dreamed up the idea of a EULA or ToS. Our laws
| contemplate two or more parties with roughly equal
| bargaining power sitting down and hashing things out, and
| go from there.
|
| Laws based on that assumption are a pretty poor fit for a
| world filled with EULAs and ToS but it's what we are
| stuck with at the moment.
| stonemetal12 wrote:
| That is why they are suing rather than pressing charges. When
| someone steals your car you don't sue them you press charges.
| When someone doesn't uphold their end of a contract you don't
| press charges you sue for breach of contract.
| sneak wrote:
| "pressing charges" isn't a thing.
| onionisafruit wrote:
| It is a thing. In America pressing charges is when you
| accuse somebody of a crime and ask a prosecutor to bring
| criminal charges against them.
| sneak wrote:
| Prosecutors exclusively decide who is charged. No charges
| can be "pressed" by a victim.
| onionisafruit wrote:
| Yes, in most cases it is the prosecutor's discretion
| whether to bring a case to a grand jury, but that isn't
| what pressing charges is. See Merriam Webster's
| definition[0].
|
| [0] https://www.merriam-
| webster.com/dictionary/press%20charges
| stonemetal12 wrote:
| As far as I am aware it isn't a specific thing, but a
| general catchall term for going through the process of
| filing a criminal complaint, and seeing it through to
| completion. Maybe there is better words for it but
| "pressing charges" is what they use on TV so it is top of
| mind.
|
| In general I meant there is a difference between criminal
| and civil law, and suing generally refers to civil not
| criminal law.
| compsciphd wrote:
| in reality, you as an individual can't press charges. Only
| the state can. And many times the state chooses not to. You
| can sue in civil court, but individuals can't bring cases
| in criminal court.
| closewith wrote:
| Many countries do have the concept of private criminal
| prosecutions.
| onionisafruit wrote:
| You are confusing pressing charges and indictment.
| Pressing charges just means you accuse somebody of a
| crime and "press" the prosecutor to indict them. So the
| state does have the ultimate say on who is prosecuted,
| but that doesn't mean you can't press charges.
| [deleted]
| cosmiccatnap wrote:
| I would consider this appropriate if one of the largest offenders
| of scrapping weren't the one pretending to be the offended.
| HeckFeck wrote:
| Data harvesting is moral for me, but not for thee.
| mateuszbuda wrote:
| In general I agree that harvesting _public_ data is moral. I
| think that in these particular cases it 's: 1) extracting data
| from profiles that opted for not being public (only available
| to logged in users) and 2) reposting scraped data (publicly?)
| as belonging to the guy who scraped it without users consent.
| lolinder wrote:
| I agree with the moral argument against posting the scraped
| data publicly, but if someone gave my account access to their
| data, I don't think they have a _moral_ right to say I can 't
| use a script to do something private with it.
|
| Scripts are tools, and like any tool they're extensions of
| the self. If it's morally okay to do it by hand, it's morally
| okay to do it with a script, so long as my script is
| respectful of server resources.
| upupandup wrote:
| Instagram behind a login screen is public. If you say were an
| OnlyFans model and somebody paid for your videos, scraped
| them, then there would've been implicit agreement.
|
| Sharing photos on Instagram, there is no such understanding,
| news outlets have been logging in to view and publish your
| instagram photos so.
| adolph wrote:
| The state of "opted for not being public" and 'available to
| any system authenticated person' seem contradictory.
|
| I appreciate that 'system authenticated person' is a smaller
| set than those who can access anything publicly accessible,
| and that the former is a subset of the latter.
| trasz wrote:
| If they are being harvested it makes them public by
| definition. Unless there was a break-in.
| kordlessagain wrote:
| Facebook has hidden much of Instagram's content behind
| logins, so that makes most of it "not public".
|
| At the same time, I don't think all of Instagram's users care
| if their images are hidden, or not.
|
| It's quite unfortunate Facebook/Meta is using hostile
| language and the word "scraping" together in this case.
| Scraping is a legitimate process used by various business
| models to gather information from the Web, which itself was
| originally intended to be an open forum for people to share
| content.
|
| Hostile business models have corrupted that intent and turned
| it into a competitive environment that is harming users and
| legitimate models which may not have the funding larger
| corporations can muster.
|
| I have a "scraper" I've built that will either snapshot a
| page from a user's browser or crawl it remotely with
| Selinium/Firefox, on the user's behalf, to save the content
| in an index for searching later, by that user. It's not
| automated, nor does it parse and crawl URLs in the pages
| saved. It doesn't use page content in a wider context,
| either.
|
| I've spent a significant amount of time trying to "work
| around" anti-scraping efforts by various companies and it's
| frustrating to see hostility instead of cooperation in
| certain types of use.
| car_analogy wrote:
| > Facebook has hidden much of Instagram's content behind
| logins, so that makes most of it "not public".
|
| 1) It was public when the content was posted by its
| authors. Facebook locked it down retroactively, regardless
| of the author's intent.
|
| 2) A login requirement doesn't make it non-public, if
| making an account is trivial, and there are already
| hundreds of millions of accounts. Is the plot of Avengers:
| Endgame also not public, because it's locked behind a
| ticket purchase or subscription?
| Alex3917 wrote:
| > extracting data from profiles that opted for not being
| public
|
| The tool lets you download the contact info of your friends,
| which you should be able to do anyway. In fact Facebook tries
| to trick its users into thinking they can do this with their
| data takeout option, but the downloaded files don't actually
| include any of the contact info for your contacts. Which
| makes zero sense, considering the entire point of Facebook is
| that it's a digital rolodex for storing your friends' contact
| info.
| slightwinder wrote:
| From the article, it seems to be service for scrapping data
| you have access anyway. As long as they only handle those
| data to the requesting customer, whose login they used, I
| don't see a difference between general public, and this users
| personalized "public". If access is still limited to the
| people who have the access-rights, then I don't see a
| difference between accessing through the official interface,
| or via scrapped data.
| saddlerustle wrote:
| Users make information available on facebook with the
| expectation that they are able to later control access to
| it (other than the obvious threat model of screenshotting,
| etc). This is violating that expectation and thus their
| privacy.
| Nextgrid wrote:
| There's no evidence of the accused scraper sharing the
| scraped data with anyone but the account-holder, so the
| privacy of their friends is still protected.
| falcolas wrote:
| > they are able to later control access to it
|
| This has never realistically been the case. An illusion
| of control is provided by facebook, but they've never
| really put much effort into it. For a really simple
| example, look at how long content remained available to
| the entire internet after "deletion". Sometimes it took
| years.
|
| Expecting any semblance of privacy from a company who
| profits from using and selling your data is, if I'm being
| blunt, lunacy.
| gfodor wrote:
| This is a false expectation and it's important people
| learn this.
| IfOnlyYouKnew wrote:
| They'll stop posting in the way they currently enjoy and
| will, therefore, have lost some freedom. Great outcome!
|
| In other news: your partner may also leak your most
| intimate secrets. I hope they do, to teach you a lesson?
|
| Every trust can be betrayed. Why do you believe a world
| without trust would be better? Only because you cannot
| handle the nuance of different levels of trust?
| ogurechny wrote:
| So taking shackles off is called "losing freedom" now?
| Also, people enjoy many things, just look at the
| junkheads. Still, it's more natural to have trust in a
| heroin addict than to have trust in businesses like
| Facebook.
| gfodor wrote:
| The counterparty risk from Facebook has almost nothing to
| do with trust of individual human beings. It has to do
| with the nature of systems, failure, vulnerabilities,
| attack surface area, etc. It's "privacy through
| obscurity" to act in a way that your data is not on the
| precipice of being leaked by a bad actor or a mistake.
| vorpalhex wrote:
| The freedom to live in a fictional world where Facebook
| safeguards your data is just as available regardless the
| reality of the situation.
|
| The reality of the situation is that Facebook is a walled
| garden built on the labor of it's users and it is
| objecting to those users reclaiming the fruits of their
| labor by scraping.
| the_fury wrote:
| "They'll stop posting in the way they currently enjoy and
| will, therefore, have lost some freedom."
|
| That is, quite honestly, one of the oddest definitions of
| freedom I've come across.
| bko wrote:
| It's their platform. Do you really want some random companies
| scraping your facebook and instagram posts?
| logifail wrote:
| > Do you really want some random companies scraping your
| facebook and instagram posts?
|
| Thought experiment: if you want to keep control over your
| data, try something radical: _don 't hand it to Meta/FB/IG at
| all_
|
| (Full disclosure, I'm neither on FB nor IG)
| vorpalhex wrote:
| You published them for the world to see... so yes,
| presumably.
| iandanforth wrote:
| Yes. I want a free and open web.
| xvector wrote:
| Good for you. Normal people do not want posts shared
| privately amongst friends to become publicly available.
| Nextgrid wrote:
| There's no evidence the scraper companies mentioned there
| are making the scraped data public or sharing it with
| anyone beyond the individual customer that is already
| entitled to access that data through the official
| clients.
| falcolas wrote:
| Then why would you ever put it on a website that
| generates its revenue from using and selling your data?
| nathanaldensr wrote:
| Because you're (not you, but people in general) are dumb
| and overly trusting.
| marlowe221 wrote:
| This is the correct answer.
| blantonl wrote:
| Because you agreed to do so under the terms of conditions
| of that website.
| nlh wrote:
| Look I understand you point from a legal standpoint, but
| do you really truly believe even a small fraction of FB
| and IG users actually "agreed to do so under the terms
| and conditions of that website"? They just clicked
| whatever was necessary to create their accounts. I doubt
| there was much affirmative agreement going on there.
| orangecat wrote:
| Then you need to trust your friends, because copy/paste
| and screenshots exist.
| trasz wrote:
| It's not "your Facebook", it's Facebook's Facebook. You
| already made that data public, otherwise it would be
| impossible to scrap it.
| ceejayoz wrote:
| I'd rather _anyone_ than "just Facebook".
|
| "Just Facebook" has made the web shittier; entire realms of
| essentially public, often great content hidden behind a login
| wall.
| ogurechny wrote:
| As others said, there is no "you" in the scheme. It's
| Facebook's data. When people access that data without paying,
| they are "bad guys". When the very same people pay for it,
| they are "legal partners". In both cases they can do anything
| with it, while Facebook can't be held responsible because of
| all the official agreements. So as long as there is no
| specifically bad publicity or money loss anything goes either
| way.
|
| "You" only exist in numerous empty statements about
| "privacy", "respect", etc. If you are feeling artsy, you can
| make that hyped NFT thing out of those, and see whether those
| kilobytes of text really worth anything.
| lbriner wrote:
| What you are claiming here is not true in Europe. If FB
| hold data about you, the data is still your legal right.
| You can have it deleted and changed if it is somehow untrue
| and have variou other rights too.
|
| There is a relationship involved because ultimately as a FB
| user, if I don't like what they are doing, I can ask them
| to remove my data permanently and they must legally do
| that. If someone has "scraped" that data (if it is
| considered PID), without my permission or a legal basis to
| do so, they are in breach of the GDPR and can have
| enforcement taken against them.
|
| I think some of these "aggregation" businesses will fall
| foul of this in Europe but I don't know what will
| realistically happen if that business does not exist in
| Europe and breaches the GDPR.
| Nextgrid wrote:
| > breaches the GDPR.
|
| Facebook breaches the GDPR all the time and manages to
| stay in business. GDPR enforcement is barely existent,
| and when it does happen, it's insufficient.
| ogurechny wrote:
| This is how it works in press releases. The problem is
| that data protection laws were in fact lobbied by
| corporations either openly or behind the scenes, and
| focus on things like real names and passport numbers that
| look impressive but aren't really important for the data
| market. These are just put into some high security
| database (e.g. for billing info), and it's fine. However,
| the real behavioral data that costs money is shared as
| easy as it ever was in the form of "User ID <long number>
| was at the location of Wi-Fi AP ID <another long
| number>". It doesn't matter that the data owner still
| trades all the history of activity of a certain
| individual, or that Wi-Fi station locations can be
| matched with some external database. Everything is fine
| as long as you don't slap someone's real name on that.
| And, contrary to the show social networks make, they
| couldn't care less about real names. Even if you trick
| the system by calling yourself John Doe, you still look
| at the specific content, and have specific contacts, you
| are you, and the data is the same.
|
| I remember that about a decade ago some IT guys have paid
| for the common Facebook advertiser access, then targeted
| the ad campaigns using filters in such a way that their
| intersection only resulted in a single user, or just a
| couple of them, and were able to match those "anonymized"
| accounts to real ones. You didn't have to be a genius to
| do that. Facebook certainly knew it could be used like
| that. Everyone who made money on that simply agreed to
| use "anonymization" as a smokescreen. Later, with all the
| scandals, those routine operations were presented as
| something exceptional done by a small number of bad
| actors.
| trasz wrote:
| We need to update the law to make sure Meta loses in cases like
| this.
| jmyeet wrote:
| I'm torn on Web scraping because the extreme of each end of the
| spectrum on this issue both seem unreasonable.
|
| On one side, you have people who say any form of scraping is be
| disallowed, even prosecutable. This went so far that the
| Department of Justice on behalf of AT&T prosecuted a case of URL
| modification [1]. One of the few bright spots for this psychotic
| Supreme Court was to curtail the government's power under the
| CFAA by limiting what constituted "unauthorized" access [2].
|
| On the other hand, there are those who think that any level of
| scraping should be fine and I think that's untenable too.
| Consider Yahoo indexing of Stack Overflow [3]:
|
| > In the meantime, since Yahoo (via Slurp!) is about 0.3% of our
| traffic, but insists on rudely consuming a huge chunk of our
| prime-time bandwidth, they're getting IP banned and blocked.
|
| Do these "scraping extremists" think such actions should be
| illegal? It's actually not that far-fetched given the Ninth
| Circuit decided LinkedIn wrongly blocked HiQ scraping [4]. Like
| if you change your website with the intent that it'll make
| scraping more difficult, is that a problem? What if it's an
| unintended side effect?
|
| Additionally, companies like Meta, Google and Apple are going to
| be way more acountable to abiding by data retention laws and
| regulations than any scraper. If it's OK to scrape FB.com
| completely, that information is out there forever.
|
| I certainly think the government shouldn't prosecute on behalf of
| companies. At least that should expose to people how the
| government's #1 priority is in fact to protect the true
| constituents: corporations and the capital-owning class.
|
| [1]: https://www.techdirt.com/2013/09/30/dojs-insane-argument-
| aga...
|
| [2]: https://en.wikipedia.org/wiki/Van_Buren_v._United_States
|
| [3]: https://stackoverflow.blog/2009/06/16/the-perfect-web-
| spider...
|
| [4]: https://blog.ericgoldman.org/archives/2019/09/ninth-
| circuit-...
| ConstantVigil wrote:
| > So much about this case is ridiculous, and it's complicated
| by the fact that nearly everyone agrees that weev is a world-
| class jerk. But, you need to separate that out from the details
| of what he did here, to note that it was nothing particularly
| special, and it involved the sort of thing that security
| researchers do all the time, and which all sorts of non-
| security researchers do quite often.
|
| Yeah... uhm... I used to do exactly this sort of thing...
|
| When I was a teenager, I would look at the URL of whatever site
| I was on, and would change a number here, or a letter there;
| and see what I got.
|
| Sometimes you get nothing, sometimes you get something.
| Sometimes that something is quite interesting.
| paultopia wrote:
| "Scraping attacks" LOL
| sophacles wrote:
| Why not? weev was put in jail over incrementing a number in a
| url. Surely writing software to put values into urls is even
| worse.
| sneak wrote:
| Let's be clear and accurate: technically weev was put in jail
| for conspiring on IRC with JacksonBrown. JacksonBrown was the
| one who wrote a PHP script that incremented a value in a URL
| (and appended a valid Luhn check digit following
| incrementation).
|
| Conspiracy to access a protected computer system - that is,
| typing on IRC. weev didn't write any of the code or access
| the API.
| ConstantVigil wrote:
| pclmulqdq wrote:
| They have to keep the walls up on their garden so they can get
| maximum value from harvesting.
| viburnum wrote:
| One of Facebook's earliest acquisitions was a scraping company
| called Octazen.
| [deleted]
| throwaway_meta wrote:
| People that are criticizing this probably were also critical of
| the Cambridge Analytica scandal, but it would be useful to
| compare what happened there and here.
|
| With Cambridge Analytica:
|
| - Facebook allowed users (with informed consent) to allow
| external developers to access their data and limited data about
| their friends, in order to build social-enabled apps.
|
| - CA exploited this to scrape basic profile data from a large
| number of users. It broke the ToS by doing so (in particular by
| using the data for purposes different than stated)
|
| Here the same is happening:
|
| - people are giving a third company access to their profile,
| which includes access to friends' data (in fact a lot more than
| what the app platform allowed to do)
|
| - the company is scraping all the data.
|
| At the time of CA, the criticism was that Facebook didn't do
| enough to enforce its ToS (or maybe that the data sharing should
| have not been allowed in the first place? But the terms were
| common knowledge and the attack potential became clear only in
| hindsight), here people are criticizing that Facebook is in fact
| enforcing its ToS.
|
| Also note that strong enforcement against scraping is one of the
| mandates that came from the FTC settlement.
|
| It seems inevitable that any news about Facebook/Meta is read in
| the worst possible light these days, even when the criticism is
| self-contradictory. I would expect less superficial commentary
| from HN.
| unosama wrote:
| The real reason _most_ people were upset about Cambridge
| Analytica was it revealed to the public how advertising and PR
| companies manipulate us. The fact they violated facebook ToS is
| moreso the excuse for the press covering it when they wanted to
| write another anti-Trump piece. If you were accusing a specific
| newspaper of hypocrisy based on two article I might agree. But
| you 're referring to general public sentiment, and I really
| don't think most people cared or were surprised about the data
| collection. The shock and scandal was the realization that
| targeted advertising campaigns and information bubbles have the
| potential to sway elections.
| throwaway_meta wrote:
| I'm referring to the HN crowd, I'm not sure that can be
| equated to "general public sentiment".
|
| I agree with your first paragraph, and my point is that it is
| not possible to argue at the same time that Facebook should
| share data more broadly and allow scraping, and at the same
| time be critical that Facebook allowed CA to happen in the
| first place.
|
| If the CA scandal was a wake-up call, it appears it was not
| internalized enough for people to understand the implications
| of what they're suggesting in this thread?
| pid-1 wrote:
| > scrapping attack
| mohamez wrote:
| That cracked me up when I read it lol
| throw20220707 wrote:
| From GDPR point-of-view this kind of 3rd party data collection is
| not acceptable (assuming it covers personal information, for
| example names of people and what they have posted). The
| difference with Meta's own data collection is that the users have
| relationship with Meta and users have given their permission for
| Meta to handle the data. Users also know they can contact Meta
| and ask them to remove the data.
|
| 3rd parties don't have the consent from users. Users don't even
| have an idea these companies might be holding their data.
| Nextgrid wrote:
| From a GDPR point of view the scraper would be acting as a data
| processor on behalf of their customer, no different from using
| a cloud storage service for your contacts. It's fine as long as
| the third-party doesn't misuse the scraped data or share it
| with third-parties and there's no evidence they did so in this
| case.
| danuker wrote:
| > and there's no evidence they did so in this case.
|
| Indeed; the users probably wanted to make the data public, if
| scraper accounts could see it. There is a GDPR allowance for
| data "manifestly made public by the data subject".
|
| https://gdpr-info.eu/art-9-gdpr/
|
| Here, it's just Facebook wanting to keep the data inside a
| walled garden.
|
| For the same reason, I quit LinkedIn and made my own site. I
| don't want people to have to sign in to see my profile.
| oxff wrote:
| Pretty rich idea coming from FB, lol. They do human scraping.
| samsoftstuff wrote:
| It's like they don't know that courts just made it legal:
| https://techcrunch.com/2022/04/18/web-scraping-legal-court/
| blantonl wrote:
| "Legal" doesn't make it ethical, nor does it shield you from
| liability if you willfully violate contract law (terms of
| service)
| brushfoot wrote:
| From the article: "[T]he Ninth Circuit reaffirmed its original
| decision and found that scraping data that is publicly
| accessible on the internet is not a violation of the Computer
| Fraud and Abuse Act."
|
| The key phrase is "publicly accessible." This wasn't that. The
| scraping was done by automating Facebook accounts, which have
| terms of service, which forbid scraping.
|
| ToS/EULAs make a big difference. They're the reason Blizzard
| could shut down bnetd's StarCraft server. They're why no one
| can legally reverse engineer Oracle to create a drop-in
| replacement, despite interoperability provisions.
|
| More and more platforms are putting the majority of your user-
| generated content behind auth walls with ToS because that's how
| they prevent competitors from swiping it.
| EMIRELADERO wrote:
| > ToS/EULAs make a big difference. They're the reason
| Blizzard could shut down bnetd's StarCraft server. They're
| why no one can legally reverse engineer Oracle to create a
| drop-in replacement, despite interoperability provisions.
|
| Strictly referencing EULAs for user-owned copies of software
| here, not ToS:
|
| That is not true. The Blizzard court clearly erred in not
| considering unconscionability when analyzing the EULA. As for
| Oracle, the interoperability provisions are what _overrides_
| that part of the EULA.
| Nextgrid wrote:
| Does it go into detail about the actual meaning of "publicly
| accessible"? Because most content on Facebook/Instagram
| requires _any_ valid login (as opposed to a specific account)
| and that data people intend to be public (especially on
| Insta).
|
| In this case, the account requirement would be a technicality
| and the data, for all intents and purposes, would still be
| considered "publicly accessible" if _anyone_ with an account
| can access it.
| upupandup wrote:
| Putting a login screen that any public member can bypass
| isn't private information. Private info would be Onlyfans
| videos. So far there is no such feature on Instagram
| postalrat wrote:
| Hey instagram/facebook/linkedin/etc: It's not your data.
___________________________________________________________________
(page generated 2022-07-07 23:02 UTC)