[HN Gopher] Facebook wants to 'normalize' the mass scraping of p...
___________________________________________________________________
Facebook wants to 'normalize' the mass scraping of personal data
Author : gbrown_
Score : 136 points
Date : 2021-04-20 18:48 UTC (4 hours ago)
(HTM) web link (www.vice.com)
(TXT) w3m dump (www.vice.com)
| kuroguro wrote:
| Starting to feel like my login manager will soon need a 'generate
| random profile' button next to the 'generate password' one...
| arminiusreturns wrote:
| What a great idea actually. Tired of manually managing my
| firefox profiles.
| novok wrote:
| Lemme mass scrape facebook and linked in then ;)
| pocket_cheese wrote:
| Scraping facebook has been one of my dreams. I remember taking
| a cursory crack at it and giving up after seeing how much of
| time, money and effort it would involve. The cost of storing
| the data alone would be cost prohibitive, even if you have a
| fancy FAANG salary.
|
| Scraping facebook is an operation. One I wish to make happen
| one day :D
| spaced-out wrote:
| Courts have already said you can
|
| https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn
| agogdog wrote:
| I've tried for various reasons, they apparently have a team to
| fight scraping so it's a game of rate-limit whack-a-mole.
|
| This is a decent example of the imbalance of power too...
| Facebook could probably scrape whatever they want with little
| resistance from average sized sites... if they really wanted to
| they could put a team of skilled engineers on a team to get it.
|
| On the other end, if you want to scrape Facebook you're staring
| down a team of skilled engineers that's actively trying to
| prevent you from doing much. At this point in technology
| they're more formidable than most countries.
| MattGaiser wrote:
| If you can get away with it, you can.
| jandrese wrote:
| The only thing stopping you is Facebook's tech team. It's
| basically an arms race between the scrapers and anti-scraping
| tech.
| asdff wrote:
| It is so frustrating how many brilliant minds on either side
| are just wasted fighting this bullshit war and countless
| others like it for someone else's bonus check. There are so
| many hard engineering problems that need to be solved, and
| it's depressing how we as a society reward who can sell the
| most widgets instead of who can help the most people.
| freshpots wrote:
| Instagram is a prime example of it. They started the platform by
| making it default that you can't delete old comments. You can't
| even easily view them. That makes it easy to change a users
| behavior as 'that is how it always has been'. It is scary how
| much control these platforms have and how they are increasingly
| preventing users from removing/viewing past content.
| greyscale_2 wrote:
| I'm confused by the conflation of 'scraping' and 'leaking'. Is
| this FB email talking about the mass scraping of information
| users put on their public profiles or the illicit acquisition of
| non-public user data?
| MattGaiser wrote:
| > the mass scraping of information users put on their public
| profiles
|
| I believe it is this.
| kuroguro wrote:
| > the illicit acquisition of non-public user data
|
| I thought they meant this, as the phone numbers were
| 'scraped' trough one of their public facing features (I think
| it was contacts import this time, before that a lot of phones
| were leaked trough the search bar / forgot password).
|
| I think they're misusing 'scrape' here intentionally as if to
| say they did nothing wrong.
| jwalton wrote:
| Yes, it's a broad industry issue that companies like Facebook
| will ask us for our phone numbers and promise they will only be
| used for authentication [1], and then leak them by publishing
| them on the Internet and letting third parties scrape them... I'm
| not sure it's a problem we should normalize though.
|
| [1] https://nakedsecurity.sophos.com/2019/03/05/facebook-
| critici...
| aviraldg wrote:
| A better headline would be "Facebook wants to inform users about
| the fact that scraping publicly available information is easy
| (and there's nothing wrong about that.)" Of course, that wouldn't
| get Vice as many clicks.
|
| Don't publish information you don't want to be part of _some_
| database publicly on the internet. I wish schools had some sort
| of tech literacy class where they explained this stuff to
| people...
| jwalton wrote:
| The information that was scraped here, though, are phone
| numbers that Facebook requires you to add to your account and
| that they promised would not be used for anything other than
| authentication purposes. Most of these numbers should not have
| been publicly available in the first place.
| intricatedetail wrote:
| I will vote for any party that will make online tracking illegal.
| Ad targeting should be limited.
| Ticklee wrote:
| They already have, the internet is a mess.
|
| Every website you visit wants more and more of your data.
| Facebook played a huge role in making this level of data sharing
| widespread.
| joe_the_user wrote:
| I would claim the opposite. Facebook normalized the belief that
| the information people put on their FB page was NOT being
| scraped by many people even though it was. The rise of Facebook
| accompanied a whole belief system about "things I share with my
| friends on the Internet".
|
| It seems like Facebook is now large enough that they're
| effectively owning up to the unavoidable truth - there's no way
| that information made available to all subscribers of some
| largish social network isn't going to be public to the world.
| uoaei wrote:
| That's not an accurate framing of the situation. Sure, they
| relied on people's technological illiteracy to do things
| people didn't really think were possible for a while. But in
| the face of the news about recent leaks, and the Cambridge
| Analytica scandal in particular, they have had to switch to a
| more active PR strategy to quell the concerns people have
| about their product(s).
| joe_the_user wrote:
| The Cambridge Analytica scandal was three years ago and
| this article is about PR moves Facebook is doing now.
|
| And sure, I don't give every gruesome detail in the rise
| but I'd still claim that the overall situation is that
| Facebook is large enough and it's model porous enough that
| a variety of actors have scraped it, are scraping it and
| will scrape it. And given this, Facebook has to start
| owning up to an inevitable situation. Keep in mind, The
| Cambridge Analytica scandal was predicated on Facebook's
| claimed data model (which I'd claim isn't just false but
| also "can't be true"). Sure, the easiest way to scrape it
| is having API access, which it's hard not to give to your
| advertisers. But if Facebook gave no one API access,
| various actors would be directly scraping.
|
| And overall, I'd say The Cambridge Analytica scandal was
| the thing that wasn't a good framing of the broad problems
| of Facebook and privacy.
|
| Edit: _" But in the face of the news about recent leaks,
| and the Cambridge Analytica scandal in particular, they
| have had to switch to a more active PR strategy to quell
| the concerns people have about their product(s)."_
|
| And I'd say, this is again actually the wrong frame.
| Facebook is at the center of the storm, no doubt. But there
| is no large social network possible that wouldn't be
| subject to the general privacy problems of Facebook.
| Facebook created the fantasy definition of privacy,
| Facebook violated that definition but no one could satisfy
| it.
| Retric wrote:
| Scandals linger far longer than 3 years.
|
| A great example is M&M's dye choice became controversial
| due to customer confusion over which red dyes where
| harmful. So, the company couldn't simply change the dye
| because what they where using wasn't problematic. In the
| end they had to flat out stop selling red M&M's for over
| a decade, and their reintroduction was surprisingly
| controversial.
| joe_the_user wrote:
| If you read my gp, I'm not arguing the Cambridge
| Analytica scandal didn't influence Facebook. I'm arguing
| the real, larger frame is that Facebook can't help but be
| porous and it's acknowledge that truth for their self-
| interest. That helps them avoid scandal, yes but contrary
| to the earlier poster "it's cause of scandal" or "it's
| cause Facebook bad" is a bad, distorting frame. And that
| isn't saying Facebook is good, it's saying the entire
| framework of social networks and things propagating on
| the Internet creates a certain kind of "playing field".
|
| I would speculate, in fact, that Facebook acting now make
| the obvious point that of course people are going to be
| scraping the data of their site because after X many
| scandals, it's becoming obvious that people will do that,
| that they will do that to any site like Facebook and that
| they'll have much clearer cover if they "normalize" thing
| that are ... fricken normal.
|
| I'd further speculate that they couldn't act when
| Cambridge Analytica was fresher because then they'd be
| seen as being self-justifying and then they had to be
| seen as humble and apologetic.
| api wrote:
| It's not just Facebook. Every sales, marketing, or product
| person in the world basically has an unlimited appetite for
| data and will push to suck up as much data as possible.
|
| There is a logical reason for this: one of the toughest things
| is knowing what your users actually want and what their actual
| pain points are. In advertising there's an analogous problem
| often summarized as: "I know I am wasting 80% of my ad spend,
| but I don't know which 80%."
|
| Every single incentive on the business side incentivizes data
| grabbing. This will never change unless users vote hard with
| their wallets or unless there is protective legislation.
| Ticklee wrote:
| I wholeheartedly think http/s is irreparably damaged, for
| example it is impossible to find good information on search
| engines, even the free ones like Searx. If you are able to
| find a website you can bet it includes 10MiB of trackers and
| ads.
|
| Hopefully someone writes a better protocol with no third
| party cookies and heavily restricted javascript.
| elzbardico wrote:
| Hate to nitpick but those things are not features of the
| HTTP protocol from the IETF but of HTML from the W3C
| Ticklee wrote:
| Nitpick away, when I am ignorant I'd rather be told than
| stay ignorant.
| phailhaus wrote:
| As the other poster pointed out, those are properties of
| HTML and not HTTP/S. But what I'd like to point out is that
| this:
|
| > heavily restricted javascript
|
| Is basically impossible. Any useful subset of javascript
| would be turing-complete, and therefore enough to do
| whatever's necessary to track the user. Literally all you
| need to be able to do is make an HTTP request and bam, you
| can track.
| a1369209993 wrote:
| Turing-complete is (kind of) irrelevant, the question is
| what (equivalent of) system calls is has access to. Eg,
| javascript should not be able to set cookies or cause
| network traffic after page load by default.
|
| > all you need to be able to do is make an HTTP request
|
| Precicely. Inability to do this is (part of) what > >
| heavily restricted javascript _means_.
| kwdc wrote:
| How different the world would be if companies that hold data
| about you suddenly have to pay rent unless they have specific
| explicit permission, eg direct association. Put a stinger that
| means all permission granted requires a complete chain of custody
| for the data. So no data brokers lurking in the shadows. And a
| cost for non-compliance. This might get people thinking twice
| about building databases "just because".
|
| If the database has value then perhaps it should have a regular
| cost?
|
| Who knows what data is out there? My experience with just my
| credit reports was that the files about me were full of errors.
| At least I was able to correct them.
|
| I also discovered a bunch of linkedin-scraped data about me that
| was posted on various contact sites. Multiple errors.
| joe_the_user wrote:
| Your plan would put Facebook, which gets data from people, in a
| better position, since they collect data on people with those
| people's permission, in exchange for services. They just have
| to assign a dollar value to their services and they would have
| fulfilled your requirements. Where yeah, it would nice if
| credit companies had to get permission too.
|
| And nearly every website already warns me they're going to
| collect data. With you're step, the next thing is signing away
| that rent.
|
| Or, if your plan involved rent that can't signed away, well, no
| one would host anyone for anything since they wouldn't want to
| pay that.
| throwawayfeaxcz wrote:
| I am kind of with Facebook on this one, if you didn't want your
| phone scraped you shouldn't have plastered it on the internet
| next to your name.
| 1vuio0pswjnm7 wrote:
| This short article is somewhat misleading on Facebook's position.
| They are against scraping. Not on behalf of users but on behalf
| of mass user data collecters, what it calls "the industry", like
| itself. That is why they engage is "anti-scraping". That is also
| why LinkedIn has tried to sue others for scraping LinkedIn public
| data.
|
| Facebook does not want the public, outside of "the industry", to
| have the same public data that Facebook has collected. If
| everyone can potentially have the same data Facebook has, data
| collection potentially becomes democratised and the world does
| not need Facebook nor "the industry" anymore. These advertising
| services companies no longer have any special value.
|
| The problem with Facebook, and "the industry", is data
| collection, not lack of "anti-scraping" competence. Once the
| sensitive data is collected by private industry on a massive
| scale, then liability is created. The data is not any safer than
| if a government had collected it. In some jurisdictions it is
| less safe, because there are restrictions on this type of
| activity by government that do not apply to companies. This
| liability is why some people take the position that the data
| collection Google or Facebook does to further its "business" is
| neither harmless nor "acceptable".
|
| Facebook is framing this liability problem as one of "scraping",
| not collection. It is not trying to further the interests of
| users but instead to further its own interests. Facebook wants
| the courts and regulatory authorities to see mass quantities of
| public data about internet users as Facebook's semi-exclusive
| asset, to be protected as if it was "private" data. Facebook is
| arguing mass public data "leaks" are not acceptable and that's
| why "the industry" must step up its "anti-scraping" measures.
|
| However "scraping" the internet for public data is not the
| problem, it is only a symptom. Massive data collection initiated
| by these companies about internet users, for the purpose of
| selling advertising services, is the problem.
| readflaggedcomm wrote:
| The strategy as worded isn't wrong, assuming I understand their
| terminology. If an account with no special privileges can access
| the information at least once, it's essentially public
| information.
|
| People are very worried about what it and isn't public, but these
| in-between areas where a platform puts up hurdles still aren't
| private.
| uoaei wrote:
| Of course, Facebook designs their platform to incentivize
| sharing publicly whenever possible, and designs dark patterns
| to dissuade people from understanding the full extent of what
| they can control with respect to their privacy.
| joe_the_user wrote:
| Reality also incentivizes sharing things publicly. But
| screenshot sharing is huge on Facebook and on the Internet
| generally.
|
| The only thing kind-of-like-privacy that exists on the
| Internet is "encrypted messages sent to well-vetted actual
| friends" and anonymously posted things well-scrubbed of
| identifying information. Everything else is just something to
| make people feel better. And most people's stuff doesn't come
| out and create a scandal because most people's stuff is
| boring and unimportant, that's the main protection the
| average person has.
| IceWreck wrote:
| Web scraping is legal and even if it wasn't there is no way you
| can prevent it.
|
| Don't upload stuff on a public website if you don't want it
| scraped/harvested.
| Nextgrid wrote:
| Agreed. The problem here as I understand is that Facebook
| misled users about how "private" their info actually was.
| jandrese wrote:
| This is why I tell people to treat anything the put on the
| internet as public information. This includes cloud storage.
| If you have to put it up there then you encrypt it yourself
| before uploading. It only takes one compromised
| person/machine in the company to undermine all of that
| company's promises to you.
| joe_the_user wrote:
| _Facebook wants to "normalize" the idea that large scale scraping
| of user data from social networks like its own is a common
| occurrence_
|
| Get people used to the truth? Shock, horror!
|
| I mean, certainly Facebook rose to it's position through a sort
| of opposite claim, that a user could be "public" (visible to a
| wide circle of friends-of-friends-of-etc) but not public (visible
| to Russian hackers, Brazilian botmasters or whoever). This claim
| is kind of a fairy tale, something that no only isn't true but
| couldn't be true. "This information is public to anyone who
| creates an account but not public en masse to the world". Still,
| the claim made an average FB user feel safer (and lot of people
| "got on the Internet" in a big way through FB circa ~2010). And
| it's got a lot of traction now. But since the situation is
| fundamentally porous, now that FB is large, it seems it's in
| their legal interest to drop the bullshit and just say "if it's
| public, it's public, what the hell else do you expect".
|
| And yeah, the exploitation of public data arguably lead to all
| sorts of bad effects and it would have been and would be nice to
| head this off in some fashion. But imagining you can this off by
| maintain a "quote-public versus totally-public" distinction isn't
| one of those ways.
___________________________________________________________________
(page generated 2021-04-20 23:01 UTC)