[HN Gopher] Web scraping for me, but not for thee
       ___________________________________________________________________
        
       Web scraping for me, but not for thee
        
       Author : mhb
       Score  : 219 points
       Date   : 2023-08-25 17:42 UTC (5 hours ago)
        
 (HTM) web link (blog.ericgoldman.org)
 (TXT) w3m dump (blog.ericgoldman.org)
        
       | waydegg wrote:
       | It's interesting seeing the reactions from other websites/orgs
       | after OpenAI publicly announced GPTBot. Tons of people blocking
       | GPTBot outright (made a small page that track this:
       | https://wayde.gg/websites-blocking-openai)
        
         | dontupvoteme wrote:
         | Is there any legal issue with a spider trap designed to poison
         | LLMs?
        
         | version_five wrote:
         | I wonder if blocking gptbot is a good signal that a website has
         | non LLM generated content on it, and is therefore good training
         | data...
        
           | [deleted]
        
       | fasterik wrote:
       | _> Let's look at what Microsoft is doing right now, as an
       | example. In the last couple of weeks, Microsoft updated its
       | general terms of use to prohibit scraping, harvesting, or similar
       | extraction methods of its AI services. Also in the couple of
       | weeks, Microsoft affiliate OpenAI released a product called
       | GPTbot, which is designed to scrape the entire internet. And
       | while they don't admit this publicly, OpenAI has almost certainly
       | already scraped the entire non-authwalled-Internet and used it is
       | training data for GPT-3, ChatGPT, and GPT-4. Nonetheless, without
       | any obvious hints of irony, OpenAI's own terms of use prohibits
       | scraping._
       | 
       | I don't understand why this demonstrates hypocrisy. There is a
       | big difference between crawling the publicly accessible web
       | (which legitimate search engines do all the time) and scraping an
       | authenticated web application or API.
        
         | Atotalnoob wrote:
         | Or that while Microsoft is an investor in openai, they do not
         | control openai
        
       | einpoklum wrote:
       | Perhaps it's legal to scrape in some world states outside the
       | USA?
        
         | dontupvoteme wrote:
         | A good question. Japan came out and declared that copyright
         | does not apply to training AI.
         | 
         | There must be a good chunk of the world that doesn't have any
         | laws forbidding it. This isn't under the jurisdiction of WIPO
         | or anything like that, it's just a completely insane evolution
         | of anglo common law
        
       | SenAnder wrote:
       | > Mark Lemley observed this happening nearly 20 years ago, in his
       | prescient, seminal article, "Terms of Use.": _The problem is that
       | the shift from property law to contract law takes the job of
       | defining the Web site owner's rights out of the hands of the law
       | and into the hands of the site owner._
       | 
       | With "contracts" of adhesion proliferating, and how impossible it
       | has become to exist in the modern world without acceding to them
       | (something as simple as buying a new SSD involves agreeing to
       | one), this problem is getting worse by the day.
       | 
       | The law is becoming increasingly irrelevant, and more and more we
       | are ruled by one-sided "contracts" from giant companies that are
       | in a position to push them on us.
        
         | cvalka wrote:
         | Contractual law in the modern era regularly and persistently
         | undermines private property rights. Mandatory arbitration
         | clauses make it worse.
        
         | Buttons840 wrote:
         | Well said. This reminds me of my own thoughts:
         | 
         | There are two ways of thinking about what a webpage is:
         | 
         | 1) A web page is a billboard
         | 
         | 2) A web page is a pamphlet
         | 
         | If a webpage is a billboard, then it is morally wrong for me to
         | paint over those sections of the billboard that I do not like
         | (i.e., using an ad-blocker).
         | 
         | If a webpage is a pamphlet, then I'm free to cut it up and re-
         | arrange it however I want. Naturally, those with knowledge to
         | cut and re-arrange are more likely to take this view.
         | 
         | It's fair to say that Amazon.com contains Amazon's webpage, and
         | that Amazon owns that web page. And yet, I've never once viewed
         | Amazon.com without using an electronic device owned by myself
         | or another non-Amazon entities. Amazon.com doesn't exist on a
         | billboard, it requires the use of electronic devices owned by
         | other people. What rights do the owners of those electronic
         | devices have? Any?
        
         | nre wrote:
         | > With "contracts" of adhesion proliferating, and how
         | impossible it has become to exist in the modern world without
         | acceding to them (something as simple as buying a new SSD
         | involves agreeing to one), this problem is getting worse by the
         | day.
         | 
         | The craziest example of this is how all these contracts are
         | appearing in the _physical_ world as well. There are stores
         | that actually have a sign indicating that entering the store
         | constitutes acceptance of contract terms (with a QR code that
         | you presumably can scan with your phone to read the contract).
         | I 've also seen public parks with the same thing basically
         | indicating that entry binds you to a legal agreement to not sue
         | the park/follow posted rules/etc.
        
           | golemiprague wrote:
           | [dead]
        
           | actuallyalys wrote:
           | Public parks saying that is kind of strange because the city
           | or town could already set the rules by enacting an ordinance.
           | Presumably they could also delegate that authority to the
           | parks department. I suppose Parks might be doing it because
           | the city council or mayor isn't enacting the ordinances they
           | want.
        
           | profile53 wrote:
           | There's a dead reply to your post saying that this occurs
           | because of the insanely litigious nature of the USA. I think
           | it's worth highlighting -- business/property owners are
           | essentially trying to use contract law to route around the
           | fact that the US legal system is broken with regards to civil
           | litigation and throwing out bogus cases. For example, having
           | a private pool in your own back yard can make you liable for
           | someone else's child breaking in and injuring themselves in
           | your pool because you not having enough barriers to stop them
           | means you allowed the access.
        
             | Eisenstein wrote:
             | > the US legal system is broken with regards to civil
             | litigation
             | 
             | And the problem with that has a lot do with corporations.
             | For instance, if you are a pedestrian and get hit by a car
             | and end up in the hospital, in a lot of places in the USA
             | your health insurance will not cover you at all -- you have
             | to sue the driver and get compensated from their auto
             | insurance. The logical method would be for your insurance
             | to cover you and then the health insurance would recover
             | costs through appropriate parties.
             | 
             | It is the same with ridiculous lawsuits like the aunt who
             | sued her sister because the nephew jumped on her and threw
             | out her back. In order to recoup medical costs she _had_ to
             | sue her sister since the sister had homeowner 's insurance.
             | 
             | You can't entirely blame the legal system when the
             | corporations are using it to perpetuate the problem for
             | their own gains at the expense of everyone.
        
               | mindslight wrote:
               | [delayed]
        
             | giraffe_lady wrote:
             | And the litigiousness is downstream of having freakish
             | medical expenses and no universal safety net. An accident
             | can incur costs you could work you whole life to pay off so
             | of course there's a complex adversarial social system built
             | around those consequences.
        
         | pulvinar wrote:
         | What's needed to counter these is for customers to have their
         | own contract of adhesion that simply says if the company is to
         | accept them as a customer, then the company's own contract is
         | null-and-void. This would be backed by a legal team in
         | something like a customer's union or insurance that people
         | would subscribe to for a monthly fee. This contract would be as
         | enforceable (or not) as the company's, leveling the playing
         | field. It would no longer matter what they put in their fine
         | print since you wouldn't need to read it.
         | 
         | If a company doesn't accept the customer's contract or won't
         | let you bypass their own, you walk away -- no sale. Other
         | companies will get your business.
        
       | deepsun wrote:
       | > But the content that they're trying to protect isn't theirs --
       | it belongs to their users.
       | 
       | Kinda. Yes, Facebook says that content belongs to users
       | (otherwise they'd have harder time explaining they are not liable
       | when it's illegal), but users also agree to give Facebook "non-
       | exclusive, transferable, sub-licensable, royalty-free, worldwide
       | license to use any IP content that you post on or in connection
       | with Facebook."
       | 
       | For example, if a user deleted their* content, Facebook can still
       | use it and show to their friends. That's why it's "kinda".
        
         | sib wrote:
         | That doesn't change who the content belongs to. It just gives
         | some rights to FB. Any, in fact, without something like
         | "perpetual" and/or "irrevocable" in there, it doesn't imply
         | that they could keep using it after you deleted it (or that you
         | couldn't revoke a grant of rights.)
        
         | jeremyjh wrote:
         | A license is not ownership. Anyway that part of the article is
         | just context - none of what you describe constitutes the legal
         | basis for the suits or rulings discussed in it - it's just
         | explaining why property law isn't being used.
        
         | waynesonfire wrote:
         | Did you read the posted sign? "no walking on the road outside
         | my property"
        
         | antonf wrote:
         | > For example, if a user deleted their* content, Facebook can
         | still use it and show to their friends. That's why it's
         | "kinda".
         | 
         | I don't think it is correct. If you asked Facebook to remove
         | your data from platform, it will be a GDPR (and probably CCPA,
         | etc...) violation for Facebook to not delete your data within 1
         | month.
        
       | dclowd9901 wrote:
       | The primary grounds on which these cases rest is some nebulous
       | understanding of contractual agreement.
       | 
       | I have two thoughts:
       | 
       | - EULAs aren't written for companies to sign.
       | 
       | - I think EULAs are garbage anyway. They're completely one sided
       | and in most cases probably illegal or wouldn't hold up in court
       | if anyone actually had the resources to fight one.
       | 
       | Imo, the burden of ensuring someone has read and understands a
       | EULA should be on the company who creates it and they should not
       | be enforceable unless they can prove the person understood the
       | EULA entirely before accessing the site. EULAs are not a business
       | agreement. They're some kind of corporate pseudo-law companies
       | try to attach to the usage of a product. But what other product
       | in the world has a big list of rules that come with it that way
       | how you can use it (or be sued)?
       | 
       | So how does this all come back to this "company vs company
       | scraping"? If you put it on the web, and you don't have REAL
       | copyright on the content (that is, you didn't make it yourself),
       | you have no right to protect it from "theft."
       | 
       | PS yes, I know John Deere doesn't let its customers work on its
       | tractors but that's some bullshit too.
        
       | msie wrote:
       | The first company that came to mind from reading the title was
       | Google.
        
       | version_five wrote:
       | Good example from the Allen Institute discussed last week
       | https://news.ycombinator.com/item?id=37181415
       | 
       | They "released" a dataset scraped from public domain stuff under
       | a license that restricts how people can use it
        
       | [deleted]
        
       | sneak wrote:
       | If you think about it, if free lending libraries and web search
       | indices did not exist, and you tried to create them today, you
       | would get sued into oblivion.
        
       | karaterobot wrote:
       | The perceived hypocrisy sort of goes away when you stop thinking
       | about it as a collaboration or a community of equals, and instead
       | think of it as a competition, which is what it is. You would not
       | say of a football team "oh, it's okay for you to try to score a
       | goal on me, but if I try to score a goal on you, suddenly you're
       | blocking the ball?!"
       | 
       | Naturally, they're going to say "web scraping uses resources,
       | stop it!" but then keep web scraping in the background.
       | 
       | To be clear, it's bad behavior, it's just not hypocritical
       | behavior, as it's completely in keeping with what amoral
       | corporations locked in constant battle would be expected to do:
       | maximize benefits to themselves while minimizing benefits to
       | others.
        
         | philipov wrote:
         | Hypocrisy doesn't require one to believe what they say and
         | utter it in good faith but fall short of those ideals in
         | practice. Equivocating about football teams doesn't change that
         | one is trying to impose standards on another without holding
         | oneself equally to them. It is still hypocrisy, even if they do
         | it amorally in bad faith. _Especially_ then. What matters is
         | what policy you espouse; you don 't get a pass for not really
         | believing what you say. The _implication_ is that hypocrites
         | are acting in bad faith.
        
         | runesofdoom wrote:
         | The problem with that sort "that's what amoral corporations do"
         | reasoning is that corporations are _permitted to exist_ because
         | of the idea that they do contribute to the net public good.
         | Once that idea is out the window, then there 's no reason for
         | society not to treat corporations as the hungry Lovecraftian
         | nightmares they are and obliterate them with fire and
         | steamship.
        
         | KieranMac wrote:
         | I think the difference is that defeating the other team is the
         | point of sports, whereas at least ostensibly the law is
         | supposed to provide a set of coherent rules for businesses to
         | compete against each other. Trademarks are defined according to
         | certain legal rules, and if you have one, this is how they
         | provide you with a limited monopoly in a certain context.
         | Allowing businesses to define property law through contracts
         | lets people define the rules however they want. And that leads
         | to irrational results.
        
         | gnomewascool wrote:
         | That's a very interesting comparison (thanks!), but I'm not
         | sure if it's the correct framing. Making scraping technically
         | difficult would be equivalent to trying to score a goal (so
         | still not great, for the rest of the world, but probably not
         | hypocritical).
         | 
         | Trying to prevent certain classes of behaviours via legal means
         | is more like trying to prevent certain types of play, by
         | appealing to the referee, while still doing them yourself.
         | Clearly, this often does happen in sports, but _is_ generally
         | seen as hypocritical.
        
         | jjoonathan wrote:
         | > football team
         | 
         | In football, the rules have been extensively tuned to promote a
         | fair fight.
         | 
         | Perhaps we should do a bit more of that sort of thing in
         | corporate law.
        
         | Kareem71 wrote:
         | The problem is as this article points out is democratically
         | elected courts should not be choosing winners in a capitalistic
         | competition
        
           | karaterobot wrote:
           | Only commenting on how we should expect corporations to act,
           | or more accurately why we should not be surprised at their
           | behavior.
        
         | hattmall wrote:
         | As the article states the issue is with courts not companies.
         | We need a state actor to pass a law similar to the weapons laws
         | of other states that guarantee a right to scrape. Then all the
         | scraping companies setup shop in that state. If a service
         | doesn't want their data scraped they need to make sure that it
         | doesn't get sent into that state. Ideally a large enough state
         | that companies wouldn't want to block.
        
           | nosecreek wrote:
           | Agreed that legal clarity is important - especially for
           | smaller players. I've built a significant hobby site that
           | relies fairly heavily on scraping (grocery price comparison
           | site). I believe what I'm doing is morally okay, and also
           | that big players wouldn't run into any issues, but when it's
           | just me (or even if it was a small company) the legal 'grey
           | area' makes it a much bigger risk.
        
           | tomcam wrote:
           | I like what you're saying, but how do you provide for the
           | existence of evil or simply incompetent scrapers who drag the
           | system down due to incompetence?
        
         | backtoyoujim wrote:
         | There are clearly more than two teams in these issues. It is
         | not a game, and it is not football.
         | 
         | It is a public policy issue also outpaces "competition" which
         | is merely a subject change.
        
           | karaterobot wrote:
           | Football is metaphorical in this case.
        
         | autoexec wrote:
         | > Naturally, they're going to say "web scraping uses resources,
         | stop it!"
         | 
         | that's the expected cost of publishing something to the public
         | internet. People are going to access it. No one has a right to
         | complain when people access something that was put there for
         | the public to see. Scrappers can be dicks about it too, they
         | can get lazy and endlessly hammer away at some server or
         | repeatedly pull down the same content because they messed up,
         | but we don't need need litigation for that. If something raises
         | to the level of DoS that's already covered under existing laws.
         | 
         | > it's completely in keeping with what amoral corporations
         | locked in constant battle would be expected to do: maximize
         | benefits to themselves while minimizing benefits to others.
         | 
         | Maybe we need to rethink giving some of these corporations the
         | privilege of corporate personhood if they are just going to
         | make things worse for everyone else while only enriching
         | themselves. We don't need to allow parasites and pillagers to
         | take whatever they want at our expense.
        
           | mrkeen wrote:
           | > Scrappers can be dicks about it too
           | 
           | It's not always about individual bad actors. You can have
           | lots of small players causing problems. I wonder how many
           | python developers there are right now trying to make their
           | own offline copy of stackoverflow.com.
           | 
           | Wikipedia has a great defence against this. They ask you not
           | to scrape, and at the same time, provide torrents of the data
           | (https://meta.wikimedia.org/wiki/Data_dump_torrents)
        
             | rzzzt wrote:
             | Stack Exchange also provides one:
             | https://archive.org/details/stackexchange
             | 
             | There was a hiccup around June but that seems resolved now:
             | https://meta.stackexchange.com/questions/389922/june-2023-d
             | a...
        
       | Klonoar wrote:
       | Hmmm, I'm a bit confused on something. The HiQ vs LinkedIn case,
       | to my knowledge, went through the following stages:
       | 
       | - LinkedIn sues HiQ, Ninth Circuit sides with HiQ
       | 
       | - LinkedIn pushes to Supreme Court, Supreme Court vacates citing
       | Van Buren
       | 
       | - Ninth Circuit re-reviews and _affirms their decision_
       | 
       | - LinkedIn moves to get the injunction preventing them from
       | blocking HiQ dissolved, which is granted
       | 
       | - A mixed judgement is finally issued in Nov 2022 ultimately
       | resulting in a private settlement
       | 
       | Where exactly does this leave things at? I feel like everyone
       | loves to cite this case but never goes into the finer details.
       | 
       | Reading a summary of the mixed judgement from Nov 2022, it looks
       | like maybe the issue came from HiQ using people to log in and
       | thus the ToS came into play...? If I'm reading correctly, it
       | looks like the court did eventually side with LinkedIn in stating
       | that HiQ violated the ToS.
       | 
       | https://www.natlawreview.com/article/court-finds-hiq-breache...
       | 
       |  _Edit: Formatting._
        
         | dontupvoteme wrote:
         | What is the legal precedent of a mixed ruling? I was unaware
         | such a thing was even possible.
        
         | KieranMac wrote:
         | Not a mixed judgment in Nov. 22. It was a massive defeat for
         | hiQ Labs. Read the permanent injunction issued by the court.
        
           | Klonoar wrote:
           | Interesting. You appear to be a lawyer or in that realm, so
           | I'm curious your take on it - though I also understand if you
           | don't want to publicly make statements or anything.
           | 
           | i.e is the common take that people have of "scraping is legal
           | after HiQ vs LinkedIn" just completely wrong?
        
           | dontupvoteme wrote:
           | >Read the permanent injunction issued by the court. Happen to
           | have a link?
           | 
           | The question that matters is if this establishes any
           | precedence.
        
             | KieranMac wrote:
             | I don't. I just have a .pdf. Email me at
             | Kieran(at)McCarthyLG(dot)com if you want a copy.
        
       | zarazas wrote:
       | How is the situatuon legally and ethically for you to use scraped
       | data as text embeddings for a commercial product?
        
       | kazinator wrote:
       | > _Some of the biggest companies on earth--including Meta and
       | Microsoft--take aggressive, litigious approaches to prohibiting
       | web scraping on their own properties, while taking liberal
       | approaches to scraping data on other companies' properties._
       | 
       | Umm, no; author needs to study the word "hypocrisy" more deeply
       | than a cursory glance in the dictionary.
       | 
       | Doing something to others, while defending against the same
       | thing, is not hypocrisy.
       | 
       | For instance, soccer player isn't a hypocrite for defending
       | against the ball going into his net, while trying to put it into
       | the other team's net.
       | 
       | A soldier on the war front isn't a hypocrite for shooting, while
       | also taking cover and dodging bullets.
       | 
       | These subjects are not hypocrites because they are not acting in
       | one way, while preaching that they, or others, ought to be acting
       | in a different way.
       | 
       | Microsoft would be hypocrites if they published an official
       | statement such that nobody who engages in web scraping has the
       | right to defend their own site against web scraping, because that
       | would not resemble their actual behavior and position which could
       | be inferred from their behavior. (Is there such a statement
       | somewhere?)
       | 
       | For hypocrisy to take place, you have to actually preach that you
       | and others should behave in a certain way, and then not actually
       | behave in exactly that way. If you only act, and don't preach,
       | you cannot be a hypocrite.
       | 
       | Moreover, your team's net is not the same object as another
       | team's net. If a soccer player loudly professes "it is morally
       | wrong for anyone to kick the ball into our net", but then kicks
       | the ball into the other team's net, that is not hypocrisy. His
       | statement references only his own net; he didn't proclaim that
       | it's wrong to kick any ball into any net whatsoever.
        
         | KieranMac wrote:
         | Scraping other sites while prohibiting it on your own is "do
         | what I say, not what I do" behavior, which I think is a fair,
         | consensus understanding of what it means to be hypocritical.
        
       | MBCook wrote:
       | I see two issues. Web scraping is clearly a business model
       | problem, and that's partly due to scale.
       | 
       | If you give away your content for free and expect ads to sustain
       | you, that will start failing once others get the value out of
       | your content without seeing the ads. Examples are ad blockers,
       | answers embedded in Google results, Stack Overflow clones, and
       | things like ChatGPT.
       | 
       | If ads weren't your business model you wouldn't be using revenue
       | from it.
       | 
       | The other issue is scale, and I don't know how to address it.
       | 
       | It's easy for someone (say the government) to have a friendly
       | policy and say "you can use dig in a park" thinking it's useful
       | to campers and such.
       | 
       | But when someone shows up with a professional strip mining crew,
       | things are different.
       | 
       | If you run a site providing quality information for free, making
       | money off book sales or professional services or such can be a
       | good living. Even if answers end up in the Google answer box,
       | more complicated stuff or analysis still requires a visit to read
       | and people can start following you from there.
       | 
       | But if ChatGPT or whatever can "read" your stuff and give out 80%
       | of the value without anyone even knowing it came from you, you're
       | screwed. Your business model no longer works. Any kind of "give
       | away good information" business model fails. Same issue artists
       | are now seeing.
       | 
       | And I don't know how you fix that without some kind of ban. But
       | unless every country everywhere enforces one... you have to work
       | with the lowest common denominator and lock all your content up.
       | No web search. No Google answers. No chat GPT. "Please don't
       | scrape me" in robots.txt won't work.
        
         | [deleted]
        
         | tshaddox wrote:
         | It's interesting, because it's essentially the same exact
         | discussion as traditional copyrights (e.g. for books). The only
         | difference is that book authors are generally not giving away
         | their books for free on their personal website. Copyrights are
         | the attempt to protect the business model of authors who want
         | to sell copies of something that are otherwise extremely easy
         | and cheap to copy. Attempts to legally limit web scraping are
         | an attempt to protect the business of model of creators who
         | want to _give away for free_ copies of things that are easy and
         | cheap to copy, but _only_ we come directly to the creator to
         | get our free copy.
        
         | drunkencoder wrote:
         | You're right . That's why scraping must be unlimited and legal
         | for all. Any information accessible from internet should be
         | legal to refine. Thus also us using GPT services to train our
         | own models, scraping anything that's publicly accessible. Our
         | only defense is competing services that refines the data even
         | more than any general llm. The solution is almost never
         | regulation but competition. Fair competition
        
           | giraffe_lady wrote:
           | You're making an idealogical argument but not confronting any
           | of the business problems raised in the other comment.
        
           | antonf wrote:
           | > You're right . That's why scraping must be unlimited and
           | legal for all.
           | 
           | Unlimited scraping makes some of privacy regulations moot.
           | Such as right to erasure (ability to delete personal data
           | from a platform).
        
             | fluoridation wrote:
             | Not exactly. You can request a site to erase all the data
             | it has on you, but not that they erase the memories of
             | everyone who has seen this data. How is this any different?
        
               | text0404 wrote:
               | scale
        
               | fluoridation wrote:
               | So at which scale does the copying of data lower privacy,
               | such that humans looking at it and potentially
               | screenshooting it doesn't, but automated processes
               | copying it does?
        
               | nawgz wrote:
               | Your tone implies you're serious, but I struggle to
               | believe anyone could possibly equate persisting digital
               | media with recalling a memory.
               | 
               | In case you really need an example to elucidate, consider
               | reproducing an image. A scraper can quite literally
               | accomplish that, trivially; a great artist would still be
               | limited in multiple facets of the recreation, such that
               | even one with the best memory and hand would find
               | themselves far short of pixel-perfect.
        
               | brendamn wrote:
               | How many people who have seen that data are acting as a
               | service to share it, at scale?
        
             | lelandbatey wrote:
             | I don't think that's true. "Right to erasure" still works
             | just as well as it always has, but you might need to ask
             | the folks who have scraped and are re-sharing your
             | information to also delete your personal data. That's not
             | an unreasonable thing to have happen, nor is it an
             | unreasonable thing to expect.
             | 
             | Let's suppose an embarrassing image of Person X is shared
             | on Facebook and Person X uses their right to erasure with
             | Facebook to delete their profile. Facebook has no control
             | over the folks who may have downloaded or screenshot-ed
             | that photo and turned it into subsequent memes. Likewise,
             | if someone straight up scrapes and re-shares, that's not
             | Facebook responsibility.
             | 
             | What I _don 't_ want to see happen is for:
             | 
             | 1. Facebook to make it somehow impossible for anyone to
             | ever copy or screenshot that or any photo, preventing
             | anyone from ever doing anything with photos on Facebook
             | without Facebook's explicit permission. This would seem to
             | be quite the loss of user agency for very little society
             | wide benefit (also, how would they do this?)
             | 
             | 2. Facebook to somehow "control" that photo so closely that
             | Facebook is able to remotely revoke folk's copies and
             | screenshots of said photo in the spirit of "abiding by a
             | persons right to erasure"; that'd be a huge overreach, but
             | seems like the only other way to approach this (though
             | "how" is also an open question).
             | 
             | Even asserting that "unlimited scraping makes some privacy
             | regulations moot" seems like an implication that we can
             | only have privacy laws by going towards situation #1, and
             | that doesn't seem accurate given that folks can use
             | existing privacy laws to remove content from any
             | distributor (as long as they're compliant).
        
         | MetaWhirledPeas wrote:
         | > If you give away your content for free and expect ads to
         | sustain you, that will start failing once others get the value
         | out of your content without seeing the ads
         | 
         | I don't think a paywall would fix this. One paid account is all
         | a scraper needs. It couldn't really even be rate-limited if
         | it's just "reading" articles as they become available. After
         | the data is acquired it can be dispensed. If directly posting
         | it violates copyright, then obscuring it behind AI will do the
         | trick just fine.
        
           | MBCook wrote:
           | But it stops being trivial. Now to scape websites en mass you
           | have to automate signing up for them, probably paying for it.
           | 
           | And unlike now to sign up you have to agree to a very
           | enforceable EULA.
           | 
           | So instead of going to court with "FunAI read my public
           | website and is making money off it which I don't think that
           | should be fair use", you have "FunAI violated a contract they
           | signed and committed fraud by lying on signup".
           | 
           | Seems to me that's much easier.
           | 
           | There will always be people who get the content for free
           | somehow. You don't have to stop 100%. Even stopping 95% would
           | be a lot better than the current 0%.
        
       | rvnx wrote:
       | Sweet memories of Facebook, which was spamming all the contacts
       | to invite them to join Facebook.
        
         | ilrwbwrkhv wrote:
         | Same as LinkedIn. In fact that was LinkedIn's actual growth
         | hack. Now they sell books talking about other things.
        
       ___________________________________________________________________
       (page generated 2023-08-25 23:00 UTC)