[HN Gopher] Google is the only search engine that works on Reddi...
       ___________________________________________________________________
        
       Google is the only search engine that works on Reddit now, thanks
       to AI deal
        
       Author : turkeytotal
       Score  : 352 points
       Date   : 2024-07-24 13:41 UTC (9 hours ago)
        
 (HTM) web link (www.404media.co)
 (TXT) w3m dump (www.404media.co)
        
       | VoidWhisperer wrote:
       | Wow, reddit found a way to make themselves even less useful
       | somehow. After the API fiasco, that seemed like it'd be pretty
       | hard to do.
        
         | wvenable wrote:
         | But, apparently, they did finally find a way to make money.
        
           | LunaSea wrote:
           | Barely enough to pay the CEO
        
           | jasode wrote:
           | _> But, apparently, they did finally find a way to make
           | money._
           | 
           | The most recent 10-K financial results 2024-03-31 (filed
           | 2024-05-08) shows they actually _lost_ money:
           | https://www.sec.gov/edgar/browse/?CIK=1713445
           | 
           | (For 2024-Q1, Reddit _lost -$575 million_ on revenue of $242
           | M.)
           | 
           | If the quoted _" $60 million deal"_[1] from Feb 2024 is
           | accurate, that small amount from Google may not be enough for
           | Reddit to turn a profit. It remains to be seen what the Q2 or
           | Q3 financials will show.
           | 
           | [1] https://www.google.com/search?q=google+ai+deal+reddit
        
             | wuiheerfoj wrote:
             | Wow, perhaps I'm naive but what the hell are they spending
             | over $800M a year on? That seems an obscene amount for a
             | glorified message board.
             | 
             | I just read they have 2000 employees which is also puzzling
             | to me
        
               | toomuchtodo wrote:
               | They were a public good currently larping as a for profit
               | concern now run by a vanity and wealth driven executive
               | driving it into the ground while it flails to monetize
               | when that is likely incompatible with the entity.
               | 
               | Compare and contrast to say, HN, run on two servers in a
               | colo with less than a handful of mods.
        
               | splwjs wrote:
               | it's not just a message board, it's an influence machine.
               | 
               | They need to make sure the stuff they want people to
               | think is posted often and has a big number next to it,
               | they need to make sure things that people like are
               | associated with the stuff they want people to
               | like/think/do and things that people don't like are never
               | associated with the stuff they want people to
               | like/think/do. They need to make sure that people who say
               | the wrong things are silenced or persuaded to leave, etc
               | etc. Man they probably have at least one contact in at
               | least one intelligence agency and they have to make sure
               | not to run afoul of that contact.
               | 
               | Like the news isn't just a list of what happened
               | recently, political debates aren't just two guys talking,
               | and reddit/twitter aren't just message boards.
        
               | alephxyz wrote:
               | They spent 400M on R&D this quarter, which means more
               | "personalisation"/ad targeting and probably cooking up
               | some DOA chatbot/assistant product that's costing them a
               | ton in compute
        
               | some_random wrote:
               | Almost 200 million is in CEO compensation
               | https://www.statista.com/statistics/1453196/reddit-top-
               | execu...
        
         | Hikikomori wrote:
         | The only things it does for me is forcing me to use Google as a
         | large amount of the answers I need is on reddit.
        
           | immibis wrote:
           | That's what Google is paying them for :)
        
           | brewdad wrote:
           | So then this gambit worked. It sucks and I hate it. I will
           | continue to use DDG/Bing first but it looks like I'll be
           | hitting up Google more often too.
        
           | WarOnPrivacy wrote:
           | > The only things it does for me is forcing me to use Google
           | 
           | Startpage, Kagi and Lukol are 3 that source from Google. I
           | imagine there are others.
        
         | stainablesteel wrote:
         | which is ironic because pre-AI every solid piece of obscure
         | information and non-programming question usually had an answer
         | on reddit, its an extremely valuable dataset looking back. but
         | moving forward i think its only going to become less valuable
         | and people will probably manually/custom-scrape all the
         | questions out of worthwhile subreddits and open up their data
         | for free
        
           | splwjs wrote:
           | When I was young, my brother knew a guy who was really into
           | movies. If you wanted to know about a movie you couldn't
           | remember, you would go talk to that guy.
           | 
           | For a while, the internet had an end-run play that made that
           | guy less useful. You can just go on the internet for obscure
           | movie information, buddy.
           | 
           | But now it seems like knowing a movie guy is going to be the
           | only way to get a real person's opinion on movies. The
           | internet is about to forget everything without a profit
           | motive and just start telling you that the latest product
           | from a monolith corp like disney is the only movie worth
           | watching. If someone scrapes all the useful movie opinions
           | off of reddit and spends their time crafting it into a usable
           | format, that guy's probably got a company. But not Bill.
           | Bill's just a guy you can know or not know. You can't
           | monetize knowing Bill. Sidenote that's probably why it irked
           | me so bad when some bozo coined the phrase "social capital".
        
         | splwjs wrote:
         | If they kept their API open then by now the entirety of the
         | site would be ai slop that was built with chatgpt and launched
         | with the api.
         | 
         | Then again most of what that site does is just blend and
         | regurgitate the information that's currently on it anyway.
        
           | miohtama wrote:
           | Those AI bots would likely to be more intelligent commentors
           | than Redditors
        
         | abdullahkhalids wrote:
         | The API changes and these robots.txt were part of the same
         | strategy - preventing third parties from scrapping their data
         | and reducing the AI generated content that makes it into their
         | data. So they can sell that data and make money.
        
           | kjkjadksj wrote:
           | Their dataset is already polluted with misinformation
           | campaigns and shilling
        
           | AlexandrB wrote:
           | > their data
           | 
           | Love how it's their data when it might make them money but
           | not their data if they get sued.
        
             | abdullahkhalids wrote:
             | That's fair. I agree with you that in some sense it is user
             | data. And that Reddit is operating unethically.
        
       | popcalc wrote:
       | # Welcome to Reddit's robots.txt       # Reddit believes in an
       | open internet, but not the misuse of public content.       # See
       | https://support.reddithelp.com/hc/en-
       | us/articles/26410290525844-Public-Content-Policy Reddit's Public
       | Content Policy for access and use restrictions to Reddit content.
       | # See https://www.reddit.com/r/reddit4researchers/ for details on
       | how Reddit continues to support research and non-commercial use.
       | # policy: https://support.reddithelp.com/hc/en-
       | us/articles/26410290525844-Public-Content-Policy            User-
       | agent: *       Disallow: /
       | 
       | Source: https://www.reddit.com/robots.txt
        
         | will0 wrote:
         | Looks like it changed a month ago:
         | 
         | https://old.reddit.com/r/redditdev/comments/1doc3pt/updating...
        
         | immibis wrote:
         | Nobody who wants to be successful obeys robots.txt. And I do
         | mean nobody.
        
           | chippiewill wrote:
           | They changed it to disallow so that scrapers can't just claim
           | the robots.txt gave them permission.
        
             | toomuchtodo wrote:
             | Independent scrapers can launder the data between Reddit
             | and AI consumers. The only folks this hurts is users
             | seeking info via search engines and folks willing to kowtow
             | to rules that are potentially low impact to evade. Next
             | steps would be (from an adversarial perspective) browser
             | extensions that stream back data for ingestion similar to
             | Recap for Pacer [1].
             | 
             | [1] https://free.law/recap/faq
             | 
             | (full disclosure: assisting someone pursuing regulatory
             | action against reddit in the EU for a separate issue from
             | scraping, it's a valuable resource, but the folks who own
             | and control it are meh)
        
             | tedivm wrote:
             | According to the US court systems the robots.txt file is
             | meaningless. If they respond with a 200 status code giving
             | you the access then you can legally scrape it all you want.
             | If they require that you log in then you have to follow the
             | terms you agree to when creating an account. Public means
             | public though, and if Reddit doesn't want to make the
             | content private (put it behind a login) then we can scrape
             | away.
             | 
             | Note that scraping, regardless of the level of permission,
             | doesn't mean you can do anything you want with the content.
             | Copyright still applies. But you can scrape it, and if your
             | use falls under Fair Use or another caveat to the copyright
             | laws then you can do ahead and do it without needing any
             | permission from the authors.
        
           | JohnFen wrote:
           | Sadly true. That's why I gave up on robots.txt years ago and
           | started blocking crawlers outright in .htaccess
           | 
           | Of course, that became unsustainable so now I have everything
           | behind a login wall.
        
         | sunaookami wrote:
         | They serve a different robots.txt to Google:
         | https://merj.com/blog/investigating-reddits-robots-txt-cloak...
         | 
         | You can see it here: https://search.google.com/test/rich-
         | results/result?id=_mYogl... (click on "View Tested Page")
        
         | dogleash wrote:
         | > # Reddit believes in an open internet, but not the misuse of
         | public content.
         | 
         | Calling it "public" content in the very act of exercising their
         | ownership over it. The balls on whoever wrote that.
        
           | pas wrote:
           | it's even worse. it's not theirs (it's the users'), they are
           | merely hosting it and using it (ToS gives them a fancy
           | irrevocable license I guess).
           | 
           | so they can do whatever they want with it and the actual
           | owners/authors have no chance to really influence Reddit at
           | all to make it crawlable. (the GDPR-like data takeout is
           | nice, but ... completely useless in these cases where the
           | value is in the composition and aggregation with other users'
           | content.)
        
             | deepfriedbits wrote:
             | On top of that, a sizable chunk of Reddit content is ripped
             | from elsewhere, whether videos, images, etc.
        
             | visarga wrote:
             | > the GDPR like data takeout is nice
             | 
             | Is there a way to export my history? How?
        
               | pas wrote:
               | https://www.reddit.com/settings/data-request
               | 
               | (and there's some help article for it that I didn't read,
               | but google found this first
               | https://support.reddithelp.com/hc/en-
               | us/articles/36004304835... that's how I got to the link)
        
             | throwaway290 wrote:
             | actually owners/authors like me would not want our stuff
             | crawlable because that gives up our ownership.
             | 
             | When I am answering some random dude on reddit with a
             | problem I want _that dude_ to read my solution. I don 't
             | want this to be crawled and forever stored (probably
             | deanonymized) or enshrined in a dozen commercial LLMs.
             | There is substack for that stuff.
        
         | raverbashing wrote:
         | With the amount of crap in Reddit, cleaning it must be a very
         | non-trivial problem. (I mean, it never is, but in the case of
         | Reddit it's probably extra complicated)
        
       | Elfener wrote:
       | I mean, the reddit company did go public, so things like this
       | were inevitable.
       | 
       | Also things like the API fiasco, and also small annoyances like
       | the fact that when you click on an image on reddit, it now goes
       | to a wrapper html page instead of just the actual image (this was
       | one of the reasons reddit was better than most social media...).
        
         | mrec wrote:
         | Maybe it's just me or something temporary (I use Old Reddit,
         | like all right-thinking folk) but for the past couple of days
         | the image wrapper page seems to have been sent to the glue
         | factory. I'm just getting the image now, unadorned.
        
       | lifestyleguru wrote:
       | I deeply regret every minute spent on and kilobyte of text
       | contributed to reddit.
        
         | Ylpertnodi wrote:
         | I don't. There's nothing around that is similar...with the same
         | traction. The various 'verses are variations on cat pics. I'm
         | still looking, though.
        
           | wccrawford wrote:
           | While it's still not Reddit, but I've been enjoying Lemmy. I
           | have a similar range of communities on each, and other than
           | some annoying groupthink, the content is often similar.
           | 
           | And to me, forgetting to log in to each of them feels
           | similar, too. For what that's worth. (I hate both of them
           | when not logged in.)
        
         | trallnag wrote:
         | I can confidently state that I'm a net negative for Reddit,
         | looking at the dozens of banned accounts in the trash bin of my
         | KeePass vault
        
         | card_zero wrote:
         | I mostly contributed to r/nonsense and I'm pleased by the
         | thought of that sub's content being used to train future AI,
         | with information about the architectural uses of super-tall
         | chef's hats, the prehistoric invasion of Europe by Beak People,
         | and so forth.
        
       | nerfbatplz wrote:
       | I propose we change the term _enshitification_ to
       | _engoogleification_ in regards to the internet.
        
         | crazygringo wrote:
         | This is about _Reddit_ disallowing other search engines.
         | 
         | Blame Reddit, not Google.
        
           | dvngnt_ wrote:
           | plenty of blame to go around
        
             | crazygringo wrote:
             | You'll have to demonstrate that.
             | 
             | Is Google's contract with Reddit exclusive, so that other
             | search engines aren't given the opportunity to also pay?
             | 
             | I highly doubt that, especially since the DOJ would go
             | after that immediately because of antitrust.
             | 
             | So no, pretty sure the blame here is 100% on Reddit unless
             | you have evidence otherwise.
        
               | dvngnt_ wrote:
               | I don't think the DOJ acts immediately.
               | 
               | > so that other search engines aren't given the
               | opportunity to also pay?
               | 
               | this makes it harder for new engines if google has
               | exclusive deals with some of the most popular sites
        
               | crazygringo wrote:
               | > _if google has exclusive deals_
               | 
               | My comment said, show me _that_ the Google deal with
               | Reddit is exclusive.
               | 
               | You haven't done that.
               | 
               | And there's no reason to think it would be, because of
               | antitrust. The DOJ doesn't have to act "immediately", the
               | point is that obvious antitrust violations come with
               | fines that make it unprofitable to attempt in the first
               | place. And this would be black-and-white obvious
               | antitrust violation, given Google's monopoly status in
               | search. This isn't a gray area where it might be worth it
               | for Google to roll the dice.
        
               | dvngnt_ wrote:
               | clearly some deal was reach between the two parties or we
               | wouldn't be here.
               | 
               | whether or not the deal is exclusive OR companies have to
               | pay to index reddit it's still bad for competition. money
               | has a barrier to entry preventing newcomers.
               | 
               | I can blame reddit for creating the deal and I can blame
               | google for accepting the deal if the effect is bing, ddg
               | and others cannot display reddit results without reaching
               | some deal.
        
               | crazygringo wrote:
               | I'm not saying it's not bad for competition.
               | 
               | I'm saying the blame is 100% with Reddit.
               | 
               | Blaming Google for accepting it makes no sense. That's
               | like if a shopper goes to grocery store and buys an
               | expensive $20 piece of cheese, and other shoppers can't
               | afford cheese that pricey, and you're blaming that one
               | shopper for buying it because it means other shoppers
               | can't also get the cheese without paying for it. That
               | doesn't make any sense. The _store_ set the price, and
               | they 're the one to blame if other shoppers can't afford
               | it.
               | 
               | If Bing, DDG and others can't reach a deal with Reddit,
               | that has _nothing_ to do with Google.
               | 
               | Again, blame here is _100%_ on Reddit, and 0% on Google.
               | To assign blame to a _purchaser_ in a case like this
               | doesn 't make any sense.
        
               | dvngnt_ wrote:
               | bing probably has the money to reach a deal, smaller
               | companies without monopolies is less likely, and that's
               | the problem.
               | 
               | i don't think google is blameless like you propose.
        
               | crazygringo wrote:
               | > _bing probably has the money to reach a deal, smaller
               | companies without monopolies is less likely, and that 's
               | the problem._
               | 
               | Reddit can charge smaller companies less money. So if
               | there's a problem, again, the problem is _100% with
               | Reddit_.
               | 
               | Google is absolutely blameless here. You may not like
               | Google, and you can certainly blame them for plenty of
               | other things. But in this situation, _literally all of
               | the blame is with Reddit_ for deciding to remove their
               | content from all search engines unless they pay. Reddit
               | didn 't have to do that. Google didn't make them do that.
               | 
               | Reddit did this. Not Google.
        
               | dvngnt_ wrote:
               | takes two tango. reddit can't do anything without google
               | signing papers as well
        
               | bryan_w wrote:
               | I don't think you're right about that.
        
         | frizlab wrote:
         | I back up this proposal.
        
       | debacle wrote:
       | Reddit has been ripe for disruption for years. It's just waiting
       | on an inflection point and someone to take it behind the barn.
        
         | onlyrealcuzzo wrote:
         | Or for Google to buy it.
         | 
         | They could monetize it much better while being less annoying.
         | 
         | Ultimately - Google is getting everything they want from Reddit
         | with this deal without having to buy it outright.
         | 
         | Short of Reddit transforming to an entirely different product
         | (difficult) - I'm not sure where the major growth opportunity
         | is for it.
        
           | rob74 wrote:
           | It wouldn't be the first time they have done something like
           | this either. Remember
           | https://en.wikipedia.org/wiki/Google_Groups ?
        
             | Suppafly wrote:
             | >Remember https://en.wikipedia.org/wiki/Google_Groups
             | 
             | It'd be somewhat hilarious if google bought reddit just to
             | archive it and shut it down.
        
         | jessriedel wrote:
         | Very few of the reddit users who are providing the content for
         | free are motivated by which search engines are allowed to index
         | the content, so I don't see how this would make it more ripe
         | for competition. (If you just mean society would now be even
         | better off if reddit were disrupted, ok, maybe, but that's a
         | different thing.)
        
         | crazygringo wrote:
         | The network effects are too strong.
         | 
         | Remember, the only reason Reddit "won" was because Digg
         | destroyed itself with a radical upgrade that everyone hated.
         | 
         | Reddit would have to do something similarly self-inflicted, and
         | I can't even guess where people would go. Reddit was already an
         | alternative to Digg -- what's the alternative to Reddit? I
         | mean, it's certainly not Quora.
        
           | tayo42 wrote:
           | Reddit is quietly a huge website with a significant amount of
           | users. So many people use it but dont talk about it. Google
           | search says 1billion mau? Twice as big as Twitter
        
           | nope1000 wrote:
           | There is Lemmy for example, very similar to old Reddit. The
           | big problem is the missing content outside of mainstream
           | communities.
        
           | NoMoreNicksLeft wrote:
           | It was already dead by then. Really, it was the various
           | Slashdot exoduses... sites like K5 got large initial boosts,
           | but stumbled and started to deteriorate. If the Digg exodus
           | is what sent you to Slashdot, chances are you're the kind of
           | user everyone else was trying to escape.
           | 
           | >what's the alternative to Reddit? I mean, it's certainly not
           | Quora.
           | 
           | If it was deliberate I certainly can't tell, but one of the
           | characteristics of Reddit is that it caused so many other
           | little tiny internet forums to just wither away. Most were
           | visually unappealing, running some ancient phpbb software or
           | whatever, but there were so many like stars in the night sky.
           | Now, if they're even still running, you look for the newest
           | post, and it will say "November 2023". Hell, the only reason
           | they are still running is that the credit card number on file
           | paying for hosting doesn't expire until next year somehow.
           | Reddit is a red tide algae choking out all life in the ocean,
           | nothing else gets to exist anymore.
        
             | bobajeff wrote:
             | I think you might be onto something with the observation
             | about people moving from old forum software like phpbb to
             | subreddits.
             | 
             | It's like what happened to personal websites when things
             | like Blogger, Tumblr and Facebook popped up.
             | 
             | It's hard to beat something that is easy to set up and pays
             | for hosting but still let's you control moderation. It's a
             | no brainer.
             | 
             | Managing your own domain where users post content is a
             | minefield of problems these days even if you didn't mind
             | the cost of running it.
             | 
             | Something like this might also explain the move to things
             | like Discord over IRC.
        
           | Suppafly wrote:
           | >Reddit was already an alternative to Digg -- what's the
           | alternative to Reddit?
           | 
           | This site is essentially 'orange reddit', they just need to
           | add sub-HNs or tagging or something and it'd be ready for an
           | influx of reddit refugees. Not that any of really want it,
           | but it's possible.
        
           | CSMastermind wrote:
           | I don't think this is true.
           | 
           | The main thing I see Reddit being useful for are discussions
           | about entertainment.
           | 
           | There's probably a subreddit for your favorite sports team,
           | twitch steamer, TV show, book series video game, politics
           | (which is entertainment for some people).
           | 
           | Reddit has seriously degraded the experience of a lot of
           | these communities with things like restricting custom CSS.
           | 
           | It seems to me that the way you'd disrupt Reddit as a startup
           | is to pick a vertical and laser focus on becoming the best
           | discussion board for that community. If it's sports than have
           | integrations for live stats, scores, etc.
           | 
           | In general you could attract users by offering profit sharing
           | on ads the same way Youtube does for creators.
           | 
           | Have the best moderation tools in the world, a constant
           | painpoint with Reddit. Give admins more flexibility over the
           | appearance of the board, all things Reddit took away.
           | 
           | The other path for disruption would be if an established
           | company with those communities tackled the problem. Lots of
           | communities already us Discord, but they tend to also have a
           | subreddit because chat and forums are different communication
           | methods. Discord could easily offer a forum product as an
           | extension of their chat services. If they do it well they'd
           | drive a lot of users away from the subreddits.
        
         | api wrote:
         | Networks effects are more powerful than we are. Witness the
         | number of people who despise Xhitter but are still on there.
         | Once something has a sufficient network effect they become
         | immune to normal market forces and able to abuse their position
         | with near impunity.
        
         | bdw5204 wrote:
         | The strange thing to me is how everybody keeps trying to make
         | distributed Twitter happen when distributed Reddit is the low
         | hanging fruit for federated social media.
         | 
         | You don't want to end up banned from a movies forum because you
         | also participate in a political forum. Federation solves that
         | problem because you can use separate accounts without either
         | forum knowing that you also use the other.
        
           | ravetcofx wrote:
           | This exists with Lemmy already and is fostering nice
           | communities (and due to ActivityPub is interoperable with
           | Mastodon accounts)
        
           | ks2048 wrote:
           | I like it principle, but after watching the situation with
           | Twitter clones, I'm not too optimistic on federated services
           | taking off.
           | 
           | I would like to see a wikipedia-style system for
           | Twitter/Reddit: open access data, non-profit.
        
           | teabee wrote:
           | Is this not just what the internet was before reddit? What
           | features would "distributed reddit" have that an internet
           | full of independent community forums be missing?
        
           | psunavy03 wrote:
           | They had this years ago, and they were called "forums."
        
           | Suppafly wrote:
           | >The strange thing to me is how everybody keeps trying to
           | make distributed Twitter happen when distributed Reddit is
           | the low hanging fruit for federated social media.
           | 
           | Honestly, it's strange to me how hard people are trying to
           | make distributed anything happen. Federation mostly solves a
           | problem that real people don't have or care about.
        
             | Yawrehto wrote:
             | >Honestly, it's strange to me how hard people are trying to
             | make distributed anything happen.
             | 
             | IMO, something federation is very good at is solving one
             | slow-moving problem - enshittification of social platforms.
             | It's not immune, of course, but an Elon Musk-style takeover
             | is much harder with Mastodon than Twitter, and it would be
             | hard to run it into the ground in other ways because the
             | platforms are owned by different people and groups.
        
         | escapecharacter wrote:
         | Man, I just want to be able to search the entire internet for
         | when I'm doing niche research.
         | 
         | Does this mean there will be a future where everyone is running
         | their own crawler? I suppose.
        
       | causal wrote:
       | It feels like Reddit is approaching an inflection point anyway
       | where bot-made content is concentrated enough to spoil the whole
       | experience. Closed servers like Discord and Slack may be the last
       | haven of online human interaction.
        
       | onlyrealcuzzo wrote:
       | This is an interesting development.
       | 
       | How many other sites might have leverage to charge to be indexed?
       | 
       | I don't want to live in a world where you have to use X search
       | engine to get answers from Y site - but this seems like the
       | beginning of that world.
       | 
       | From an efficiency perspective - it's obviously better for
       | websites to just lease their data to search engines then both
       | sides paying tons of bandwidth and compute to get that data onto
       | search engines.
       | 
       | Realistically, there are only 2 search engines now.
       | 
       | This seems very bad for Kagi - but possibly could lead the old,
       | cool, hobbiest & un-monetized web being reinvented?
        
         | ColinHayhurst wrote:
         | Kagi uses at least Google and Mojeek
         | 
         | edit:
         | 
         | > Realistically, there are only 2 search engines now.
         | 
         | https://seirdy.one/posts/2021/03/10/search-engines-with-own-...
        
           | WarOnPrivacy wrote:
           | > Realistically, there are only 2 search engines now.
           | 
           | From the article:                    Many alternatives to GBY
           | [Google, Bing, and Yandex] exist, but almost none of them
           | have their own results;
           | 
           | This seems to assert that ~0 other search providers do any
           | crawling at all. Ever. Are we sure that's the case?
           | (they could crawl but never ever return those results == more
           | odd).
        
             | ColinHayhurst wrote:
             | It's a very long article so understandable that you did not
             | read on and learn about other search engines crawling
             | beyond GBY. Still there are indeed very few that are
             | crawling at web scale, and internationally. We are at 8
             | billion pages and totally independent [0], hence expressing
             | our concerns to 404 media after being blanked by Reddit.
             | 
             | [0] https://www.mojeek.com/about/why-mojeek
        
               | WarOnPrivacy wrote:
               | > did not read on and learn about other search engines
               | crawling beyond GBY. Still there are indeed very few that
               | are crawling at web scale, and internationally
               | 
               | That's helpful clarification.
               | 
               | In criticism of the article, you might agree that
               | 
               |  _none of them have their own results_
               | 
               | is a fairly absolute statement. It signals: Final word on
               | the matter; no nuance to follow.
        
               | topaz0 wrote:
               | Omitting the "almost" from "almost none" makes it sound
               | disingenuously more absolute than it actually is.
        
               | shadowgovt wrote:
               | I mean, I didn't read on because it's paid.
               | 
               | I'm not taking their reporting without compensation, but
               | that also means I didn't have the whole story. Such is
               | life in this era of the internet.
        
               | smallerize wrote:
               | It's not paid.
               | 
               |  _Sign up to support our work and for free access to this
               | article_
        
             | MichaelZuo wrote:
             | Bing provides far fewer verbatim results for pretty much
             | all search queries that I've tested.
             | 
             | And Yandex isn't much better for non cyrillic search, Baidu
             | is only for the Chinese web effectively.
             | 
             | And all other search engines either don't even attempt to
             | do full web crawls anymore and/or buy from one of the four
             | above.
             | 
             | So realistically there's just one search engine for the
             | full web that actually does the work.
        
               | WarOnPrivacy wrote:
               | > And Yandex isn't much better for non cyrillic search,
               | 
               | I like Yandex when I'm rabbit-holing after obscure
               | musicians/music. I routinely have a better experience
               | than I do with DDG or Kagi or Goog.
        
               | MichaelZuo wrote:
               | It's also vastly better for finding livejournal blogs.
        
               | dev1ycan wrote:
               | Brave has their own search engine, yandex I only use for
               | reverse image search, baidu's interface is really clean
               | and feels like old school google... but I don't speak
               | chinese so I can't use it.
               | 
               | I hope that one day they get a western version
        
               | MichaelZuo wrote:
               | Brave doesn't have its own index of the full web, and
               | it's even less useful than Yandex. And very likely buys
               | some of it, according to what I've heard. So it falls
               | into the last category.
        
               | em-bee wrote:
               | if that is true then they are lying on their site where
               | they claim: " _Brave Search operates from a fully
               | independent search index_ "
               | 
               | do you have any reference for your claim?
               | 
               | i use brave search and find it very useful. very rarely
               | there is something i can't find, and when i run into that
               | other search engines are not much better.
        
             | culi wrote:
             | I believe Brave Search is also starting their own index.
             | There are some tiny independent indexes too:
             | 
             | https://www.crawlson.com/ https://search.marginalia.nu/
             | https://wiby.me/ https://searchmysite.net/
        
             | darreninthenet wrote:
             | I believe Kagi has its own crawler as well and it merges
             | all the results and does whatever Kagi does behind the
             | scenes to show the mix
        
           | Yawrehto wrote:
           | Doesn't it list three major ones, Google, Bing, and Yandex,
           | plus Mojeek and a few other small ones? That's a bit more
           | than two.
        
         | McDyver wrote:
         | That seems like the business model for streaming. You subscribe
         | to X provider to watch Y series. So, as for streaming, I
         | suppose a pirate bay search engine will come up
        
           | toomuchtodo wrote:
           | Pirate Bay is probably not the most optimal analogy, more
           | like Anna's Archive imho [1], individually offered by web
           | property scrape runs compressed into a package, maybe served
           | by torrents like this Academic Torrents site example [2].
           | 
           | Scraper engine->validation/processing/cleanup->object
           | storage->index + torrent serving is rough pipeline sketch.
           | 
           | [1] https://hn.algolia.com/?dateRange=all&page=0&prefix=false
           | &qu... ("HN Search: annas archive")
           | 
           | [2] https://academictorrents.com/details/9c263fc85366c1ef8f5b
           | b9d... ("AcademicTorrents: Reddit comments/submissions
           | 2005-06 to 2023-12 [2.52TB]")
        
         | splwjs wrote:
         | idk man i bet you five bucks and a handshake it's just going to
         | play out like the existing startup grift.
         | 
         | There's an established player with institutional protections,
         | then a scrappy upstart takes a bunch of VC money, converts it
         | into runway, gives away the product for free, gradually
         | replaces and becomes the standard, then puts out an s-1
         | document saying "we don't make money and we never have, want to
         | invest?" and then they start to enjoy all the institutional
         | protections. Or they don't. Either way you pay yourself
         | handsomely from the runway money so who cares.
         | 
         | The upstart gets indexed and has an API, the established player
         | doesn't.
         | 
         | The upstart is more easily found and modular but the
         | institutional player can refuse to be indexed to own their data
         | and they can block their API to prevent ai slop from getting in
         | and dominating their content.
        
         | gtirloni wrote:
         | _> but this seems like the beginning of that world._
         | 
         | It's not the beginning, it's mere continuation.
         | 
         | Walled gardens have existed since the AOL days. They
         | deteriorate over time but it doesn't prevent companies from
         | trying (each time, in bigger attempts).
        
       | dvngnt_ wrote:
       | site:reddit.com works for kagi for new posts this week?
        
         | rozab wrote:
         | Basically all 'independent' search engines piggyback off Google
         | or Bing
         | 
         | https://help.kagi.com/kagi/search-details/search-sources.htm...
         | 
         | >Our search results also include anonymized API calls to all
         | major search result providers worldwide
        
           | ColinHayhurst wrote:
           | >Basically all 'independent' search engines piggyback off
           | Google or Bing
           | 
           | Incorrect: https://www.mojeek.com/about/why-mojeek
        
             | Suppafly wrote:
             | weird, I've never heard of mojeek before and this is the
             | 2nd comment in this thread I've seen mentioning it.
        
               | zamadatix wrote:
               | As of the time of writing there are 8 search matches in
               | this thread: 1 from you, the rest from Colin (CEO of said
               | company).
        
               | Suppafly wrote:
               | > the rest from Colin (CEO of said company)
               | 
               | I assumed he had financial connection to them, but didn't
               | want to take the time to research it. Mojeek is the new
               | fetch.
        
               | rozab wrote:
               | Spam is fine by me if it's from the CEOs personal
               | account, lol. Clearly I wasn't familiar with the product
               | so it's a helpful comment for me
        
             | em-bee wrote:
             | also brave: https://brave.com/search/#independent
        
         | AndroidKitKat wrote:
         | Kagi gets part of their index from Google, per the article, so
         | perhaps that's the reason Kagi still works. Wonder if Vlad and
         | Kagi will do (or have done) the calculus to see if buying
         | crawlability from Reddit itself is cheaper than buying results
         | from Google for Reddit search.
        
           | hugh_kagi wrote:
           | Not yet but it's something we want to look into.
        
         | ColinHayhurst wrote:
         | Kagi pays to use APIs from Mojeek and Google
        
         | karaterobot wrote:
         | From the second paragraph of the article:
         | 
         | > Searching for Reddit still works on Kagi, an independent,
         | paid search engine that buys part of its search index from
         | Google.
        
           | dvngnt_ wrote:
           | thanks i only read the first paragraph. then i went to kagi
           | discord and they provided more context
        
       | lpod wrote:
       | Interesting move by Reddit to lock down their search
       | functionality to just Google. I guess this means Bing and others
       | are out of luck. Seems like another step towards the walled
       | garden approach - good for ad revenue, but probably not great for
       | user choice. Wonder how long it'll be before other platforms
       | follow suit?
        
       | jedberg wrote:
       | They changed robots.txt a month or so ago. For the first 19 years
       | of life, reddit had a very permissive robots.txt. We allowed all
       | by default and then only restricted certain poorly behaved agents
       | (and Bender's Shiny Metal Ass(tm))
       | 
       | But I can understand why they made the change they did. The data
       | was being abused.
       | 
       | My guess is that this was an oversight -- that they will do an
       | audit and reopen it for search engines after those engines agree
       | not to use the data for training, because let's face it, reddit
       | is a for profit business and they have to protect their income
       | streams.
        
         | JohnMakin wrote:
         | One (in this case, 2) company's incentive for profit should not
         | take priority over the usability/well being of the internet as
         | a whole, ever, and is exactly why we are where we are now. This
         | is an absolutely terrible precedent.
        
           | jedberg wrote:
           | I agree with you in theory, but in practice someone has to
           | pay for all this magic.
        
             | JohnMakin wrote:
             | This is a false dichotomy. You can have services, and not
             | have them devolve into complete unusability in the name of
             | profit. This isn't sustainable either. The myopic pursuit
             | of short term gains at the expense of the product will
             | collapse at some point in the future, no matter how much
             | you believe in this weird frog-boil internet we've
             | inherited now.
        
               | talldayo wrote:
               | > The myopic pursuit of short term gains at the expense
               | of the product will collapse at some point in the future,
               | 
               | The myopic pursuit of short-term gains is the only
               | playbook that works. Long-term business strategy is a
               | gamble, and today's businesses have all learned that
               | they'd rather make hay when the sun is shining than be
               | remembered as a good business.
               | 
               | Twitter tried a long-term playbook to reverse their
               | unprofitable sinkhole of a website. That ended up with
               | them being undervalued and sold to the highest bidder.
        
               | twelve40 wrote:
               | Complete unusability is when ai tools clone the content
               | and people stop visiting the original service and
               | participating. I'll leave it up to them to defend
               | blocking duck duck go for example, but blocking "AI" bots
               | for an online community is a matter of survival at this
               | point.
        
               | talldayo wrote:
               | Alternatively, it's because the base platform has also
               | devolved into unusability. Both Reddit and Twitter are in
               | a position where their info is easily scraped, and their
               | community is barely worth the advertising/paid-premium
               | experience they demand from you. As both platforms
               | continue to decline in quality, you might not even need
               | to replace the original service. Both businesses appear
               | intent on getting replaced.
        
             | ToucanLoucan wrote:
             | We did. As in we, the Internet, existed for a long time
             | without anyone making money and we paid for the privilege.
             | Websites were built and hosted at owner's expense, _for
             | years,_ with no expectation that they be financially
             | rewarded. Sure some would run donation drives, or work with
             | sponsors relevant to the community in question, but a whole
             | ton, mine included, just cost me a lot of money over many
             | years.
             | 
             | Those websites were definitely technically inferior, as the
             | march of progress is unavoidable, but web hosting is
             | cheaper than it's ever been. A VPS that utterly blows away
             | what mine was capable of in 2007 for nearly a hundred a
             | month can now be had for about $10 per month. Yet everyone
             | wants these monolith platforms, but even that wouldn't be
             | the worst thing ever, except that every one of these
             | platforms has a backend to support that we in the Old
             | Internet never did: a C-suite's worth of executives and
             | millions of shareholders, who for some reason have decided
             | that reddit can't exist unless reddit makes them reams and
             | reams of money.
             | 
             | I'd be very, very interested to see how much of, even
             | what's probably the most massive one of all, Facebook, is
             | non-essential busywork that could easily be shut down
             | tomorrow with no adverse effects to the platform. Firstly
             | the entire executive class, just, they don't do shit to
             | make Facebook the product. In fact I'd argue their
             | decisions almost universally have made it worse as a
             | product very consistently for it's entire lifetime. Then,
             | all the marketing people. There's just no goddamn reason to
             | advertise Facebook (or reddit for that matter) the brand is
             | so ubiquitous, if you actually found someone who'd never
             | heard of it, I'd give you a large chunk of money. Add to
             | that, if Facebook was doing a _good job_ of being what it
             | ostensibly is, then people immediately become the best
             | advertising, because people want to hang with people in
             | these digital spaces. Then get rid of the people working to
             | make Facebook addictive with dark patterns. Then get rid of
             | the entire targeted ad division, because it 's gross and
             | inhumane. Pare the company down to engineers who build the
             | product, and if anything, _expand_ the moderation team so
             | they can actually ensure the safety of the platform, and of
             | course the IT staff to back them. Now what does Facebook
             | cost to operate?
             | 
             | As far as I'm concerned, this pearl-clutching about "well
             | websites have to make money" is grossly, grossly
             | overstated. Websites don't cost that much to run. A ton of
             | money is being siphoned off by the MBA parasites playing
             | games in Excel all day. A ton more is being wasted
             | developing features that advertisers want and users hate. A
             | ton more is being funneled into making products
             | artificially addictive to vulnerable people, to exploit
             | them, so let's just not do that. And of course, leadership,
             | rewarding themselves with generous compensation packages
             | they aren't even remotely able to justify. _Now_ what does
             | your website cost to maintain? Surely not nothing, and for
             | websites of substantial size, it will still be high, but I
             | 'm willing to bet it's a hell, hell, hell of a lot less
             | than it was before.
        
               | kjkjadksj wrote:
               | Part of the issue is that it isn't just the web, but the
               | inevitable american corporate shareholder model. Even
               | businesses could be mom and pop ified and made way more
               | popular overnight: quit raising prices and cutting
               | corners and it would actually stand for itself like a
               | massive $7 burrito. However the expectation is that
               | shareholders get returns. Costs must be cut. Prices must
               | be raised. Margins must be improved. It doesn't matter if
               | this eats the business alive, as shareholders are
               | sufficiently leveraged. The whole system is incentivized
               | to select for inferior quality and taking all the
               | available money on the table.
        
               | ToucanLoucan wrote:
               | My rant above and your response reminded me of all those
               | tons of MMO games out there that are ancient, with a tiny
               | playerbase, that remain profitable nonetheless simply
               | because if you have a product that people like using,
               | putting it into maintenance mode and doing the bare
               | minimum to keep it running is a perfectly valid business
               | strategy. The companies that buy these service games and
               | run them effectively just buy completed money printers
               | and keep them operating. It's not going to make anyone
               | rich probably, but it's a perfectly valid and profitable
               | way to go about things.
               | 
               | The silicon valley "grow at all costs, always evolve and
               | innovate forever" model is so detached from the reality
               | of most businesses in my experience.
        
               | isoprophlex wrote:
               | In biology, you'd call that a cancer. But to people
               | praising the gospel of VC money, it's something
               | desirable...
        
               | Suppafly wrote:
               | >The companies that buy these service games and run them
               | effectively just buy completed money printers and keep
               | them operating.
               | 
               | I hadn't really thought about that topic in that way
               | before. Really explains why some of those older MMOs have
               | no desire to really make any improvements, the owners are
               | happy to just keep them powered up and collect a check
               | but have no incentive to invest in making them better.
        
               | ToucanLoucan wrote:
               | I think the notion that sometimes things are just "done"
               | is incredibly undervalued in our industry. Frankly I wish
               | a ton of games I play would STOP updating.
        
               | Suppafly wrote:
               | >I think the notion that sometimes things are just "done"
               | is incredibly undervalued in our industry.
               | 
               | I agree, but also the flip side is that things rapidly
               | switch from 'done and working' to 'dead' pretty quickly
               | if no one is willing to do minor maintenance.
        
               | u8080 wrote:
               | Yeah, like Rockstar with GTA V Online.
        
               | lotsofpulp wrote:
               | >Websites don't cost that much to run.
               | 
               | Popular websites that allow user content to be uploaded
               | or linked do cost that much to run, due to content
               | moderation.
               | 
               | There might be a small (relatively) forum here and there
               | that a few good moderators are willing to slave away at
               | keeping clean, but you will never see a website that
               | allows user content with as many users as
               | Reddit/Youtube/Instagram/etc be cheap.
               | 
               | Although, due to AI, the cost to spam the small forums
               | might be so small that even they might come into the
               | crosshairs.
        
               | megaman821 wrote:
               | Although it is quite surprising that mainly text websites
               | (Reddit, Twitter) are hard to run sustainably but video
               | and image websites (YouTube, Instagram, TikTok) can
               | because it is easier to sell ads against them.
        
             | meiraleal wrote:
             | how can we keep paying the ever-growing profits of multi-
             | trillion dollar companies? This is insane.
        
               | jsnell wrote:
               | Reddit is 100x from being a trillion-dollar company, and
               | is not profitable.
        
               | meiraleal wrote:
               | Reddit offers no magic is just a forum. Google used to do
               | some magic decades ago and still profit from it.
        
           | BeetleB wrote:
           | I know people will hate to hear this, but Reddit it's not
           | important to the A well being of the Internet.
        
             | TeaBrain wrote:
             | I think it's the other way around, in that people don't
             | like to hear how Reddit has become important due to the
             | death of independent forums and the degree to which
             | information has become concentrated on the site.
        
               | BeetleB wrote:
               | The death of independent forums has been greatly
               | exaggerated.
               | 
               | Of all the forums I used to be active in, many are still
               | active. The ones that died did so because the community
               | died (i.e. they did not shift to Reddit and the like).
               | 
               | Reddit is great simply because it allowed _anyone_ to
               | create a community. No need to get a LAMP stack and deal
               | with security vulnerabilities in your forum SW.
               | 
               | These days you have Lemmy and its ilk. Much higher
               | barrier than the old LAMP stack, but also much superior
               | to it. I do hope it takes off.
        
         | fredgrott wrote:
         | the article quotes reddit policy change: Reddit considers
         | search and ads commercial activities and thus subject to
         | robot.txt block and exclusion.
        
         | ColinHayhurst wrote:
         | Person extensively quoted in the article here. They are welcome
         | to reach out. But not a single person from any level did that,
         | nor replied to my polite requests to explain and engage. We
         | first contacted them in early June and by 13th June, I had
         | escalated to Steve Huffman @spez.
        
           | toomuchtodo wrote:
           | An acquaintance investigating Reddit's moderation
           | mechanization inquired how a major subreddit was moderated
           | after an Associated Press post was auto removed by automod.
           | They were banned from said sub. They inquired why they were
           | banned, and they shared they would share any responses with a
           | journalism org (to be transparent where any replies would be
           | going, because they are going to a journalism org). They were
           | muted by mods for 28 days and were "told off" in a very poor
           | manner (per the screenshots I've seen) by the anonymous mod
           | who replied to them. They were then banned from Reddit for 3
           | days after an appeal for "harassment"; when they requested
           | more info about what was considered harassment, they were
           | ignored. Ergo, inquiring as to how the mods of a major sub
           | are automodding non-biased journalism sources (the AP, in
           | this case) without any transparency appears to be considered
           | harassment by Reddit. The interaction was submitted to the
           | FTC through their complaint system to contribute towards
           | their existing antitrust investigation of Reddit.
           | 
           | Shared because it is unlikely Reddit responds except when
           | required by law, so I recommend engaging regulators (FTC, and
           | DOJ at the bare minimum) and legislators (primarily those
           | focused on Section 230 reforms) whenever possible with
           | regards to this entity. They're the only folks worth
           | escalating to, as Reddit's incentives are to gate content,
           | keep ad buyers happy, and keep the user base in check while
           | they struggle to break even, sharing as little information
           | publicly as possible along the way [1] [2].
           | 
           | [1]
           | https://www.bloomberg.com/news/articles/2024-05-09/reddit-
           | la... | https://archive.today/wQuKM
           | 
           | [2] https://www.sec.gov/edgar/browse/?CIK=1713445
        
         | ColinHayhurst wrote:
         | The blocks for MojeekBot, as Cloudflare verified and respectful
         | bot for 20 years, started before the robots.txt file changes.
         | We first noticed in early June.
         | 
         | We thought it was an oversight too at first. It usually is.
         | Large publishers have blocked us when they have not considered
         | the details, but then reinstated us when we got in touch and
         | explained.
        
         | Closi wrote:
         | > But I can understand why they made the change they did. The
         | data was being abused.
         | 
         | Depends how you see it - if you see it as 'their' data (legally
         | true) or if you see it as user content (how their users would
         | likely see it).
         | 
         | If you see it as 'user content', they are actually selling the
         | data to be abused by one company, rather than stopping it being
         | abused at all.
         | 
         | From a commercial 'lets sell user data and make a profit'
         | perspective I get it, although does seem short-sighted to
         | decide to effectively de-list yourself from alternative search
         | engines (guess they just got enough cash to make it worth their
         | while).
        
           | Ajedi32 wrote:
           | > if you see it as 'their' data (legally true)
           | 
           | Is that actually true? Reddit may indeed have a license to
           | use that data (derived from their ToS), but I very much doubt
           | they actually own the copyright to it. If I write a comment
           | on Reddit, then copy-paste it somewhere else, can Reddit sue
           | me for copyright infringement?
        
             | jedberg wrote:
             | They own a non-exclusive worldwide right to it. You own the
             | copyright, they have a license to use it however they see
             | fit.
        
           | passwordoops wrote:
           | Enough cash or enough data on hand to show the majority of
           | traffic comes from the search monopoly
        
         | ekidd wrote:
         | I personally feel that this kind of "exclusive search only by
         | Google deal" should result in an anti-trust case against
         | Google. This is the kind of abuse of monopoly power that caused
         | anti-trust laws to be passed in the 1890s.
        
           | eddd-ddde wrote:
           | if i create a vacuum cleaner and decide to only sell it at
           | Walmart you can't get mad at me for not wanting to sell it at
           | costco
           | 
           | you can always buy a competitor's or make your own vacuum
           | cleaner if you hate buying at Walmart
           | 
           | maybe what you are really mad about is Reddit monopolising
           | content
        
             | ekidd wrote:
             | Usually, to trigger any kind of anti-trust law, you need to
             | have massive market share. In this case, for example,
             | Reddit almost certainly hasn't committed any antitrust
             | violations, because they're a relatively minor player in
             | their market.
             | 
             | Similarly, if you start a vacuum cleaner company, you can
             | make whatever exclusive deals you want. But if you control
             | 80% of the market for vacuum cleaners, then you might need
             | to be more careful about leveraging your market share in
             | unfair ways.
             | 
             | If a company is part of a robust, competitive market (like
             | Reddit), it's usually wiser to let customers vote with
             | their wallets, and leave the government out of it. If a
             | company becomes massively dominant (like Google or
             | TicketMaster), and if it starts pushing exclusive
             | contracts, it's much harder for customers to switch away.
        
       | PaulRobinson wrote:
       | This is great. It means I won't see Reddit content popping up all
       | over search results in other engines. Can Medium do the same? And
       | perhaps Quora?
        
         | lfkdev wrote:
         | Yeah awesome, reddit was one of the last useful results beside
         | the spam blogs and ai generated articles.
        
         | bdjsiqoocwk wrote:
         | What a weird thing to say. Reddit has for a long time been a
         | place where real people hang out and have real conversations,
         | unlike quora and medium.
        
           | MattPalmer1086 wrote:
           | Its not strange to me. Every single time I've followed a
           | Reddit link from search results, I've got a short and fairly
           | useless conversation that doesn't help me at all. So I have
           | never understood why people like it.
           | 
           | Obviously, people do see value in it, or they wouldn't keep
           | saying so! I would happily exclude Reddit links from search
           | results though.
        
           | candiddevmike wrote:
           | I think Reddit lost that kind of authenticity a while ago.
           | Advertisers know the "search:reddit.com <product>" trick, and
           | when you look at the number of upvotes, it costs _pennies_ to
           | get your product trending in the comments.
        
             | Suppafly wrote:
             | I don't search reddit for <product> though I search it for
             | <highly technical issue with product> because reddit is the
             | only place where real people discuss such issues and the
             | solutions to them.
        
           | VancouverMan wrote:
           | > where real people hang out and have real conversations
           | 
           | I don't consider the discussions there to be "real" in any
           | meaningful way, thanks to the extensive moderation.
           | 
           | From what I've seen, there typically ends up being a small
           | handful of moderator-enforced narratives that are deemed
           | "acceptable" for a given subreddit, and any commenters
           | deviating from those narratives get banned, or their comments
           | end up as "[removed]" by "[deleted]", or the comments get
           | obscured with the "comment score below threshold" notice.
           | 
           | It's generally some of the most one-sided and blandest
           | discussion around. Given that there's often no meaningful
           | back-and-forth involving differing perspectives of any sort,
           | I'm not even sure if it should be considered "discussion".
           | It's more like regurgitation and repetition.
           | 
           | I've found the situation to be particularly bad on the
           | Canadian locale-specific subreddits, for example, but a
           | enough of the tech-oriented ones I've seen seem to end up
           | like that, too.
        
           | psunavy03 wrote:
           | Yeah, but each sub to a greater or lesser degree, has its own
           | hivemind you'll be run out of town (or possibly even banned)
           | for challenging. And the average member of Reddit is quite
           | willing to spout off confidently incorrect BS and downvote
           | people into the ground who actually know what they're talking
           | about.
           | 
           | Not exactly always a reliable source of info outside
           | uncontroversial niche topics or places like /r/AskHistorians
           | that actually moderate. And even there I've seen the
           | occasional humdinger.
        
         | kingnothing wrote:
         | What use do you get out of a search engine if not searching for
         | reddit and other forums? The rest of the internet has become a
         | cesspool of useless AI generated crap.
        
           | kevincox wrote:
           | To be fair Reddit threads are more and more often getting
           | filled with useless AI generated crap as well.
        
           | jjulius wrote:
           | To be fair, Reddit has plenty of astroturfing, too.
        
         | jonpurdy wrote:
         | FYI, Kagi lets you do this and personalize it as you desire.
         | They even share aggregated stats* about which domains users
         | choose to block/lower. (Mine generally match these stats.)
         | 
         | * - https://kagi.com/stats?stat=leaderboard&k=-2
        
           | WarOnPrivacy wrote:
           | > Kagi lets you do this and personalize it as you desire.
           | 
           | Kagi shill here. Are they finally applying filters and
           | operands to image searches?
           | 
           | Asking because it was a tough year seeing Pinterest as top
           | filter choice _and_ top result in images (when set as
           | filter=block).
           | 
           | (edit: I just tried searching->image: beautiful quilt
           | patterns. I didn't spot any Pinterest results!)
           | 
           | I have never understood why DDG, etc steadfastly refuse to
           | obey operands in image searches. Most days. Every blue moon
           | operands seem to work. I think.
           | 
           | sidebar: Yesterday I saw Yandex obey quotes in a web search.
           | It was the 1st time I've seen that.
        
             | hugh_kagi wrote:
             | > Are they finally applying filters and operands to image
             | searches?
             | 
             | That was a bug, apologies. It should be fixed now.
        
         | troyvit wrote:
         | Kagi lets you configure the search engine to deprioritize or
         | even fully eliminate search results. They ride on the back of
         | Google's indexing so -- if you ever change your mind -- you
         | could bring reddit searches back.
        
         | Suppafly wrote:
         | >This is great. It means I won't see Reddit content popping up
         | all over search results in other engines.
         | 
         | Honestly, that makes those other engines way less valuable
         | because for many topics, telling the engine to specifically
         | narrow the results down to reddit comments is the only way to
         | get a decent answer to what you're looking for. I'd definitely
         | support blocking Quora from everything though.
        
         | rkangel wrote:
         | Interesting. I have long found Reddit to be the an excellent
         | source of solutions to problems. Stack Overflow usually beats
         | it for programming specific stuff, but for everything else
         | usually the most helpful answer comes from Reddit. It's a real
         | person, helping another real person with a real problem.
        
       | nomilk wrote:
       | Suppose a crawler or rival search engine doesn't respect
       | robots.txt, reddit can't stop them. Make it a bit trickier, yes,
       | but not stop them.
        
         | eschneider wrote:
         | It is evidence that they didn't have permission if you sue
         | them.
        
           | kingnothing wrote:
           | There's no grounds on which to file suit. The 9th circuit
           | court found web scraping is legal.
           | 
           | https://techcrunch.com/2022/04/18/web-scraping-legal-court/
        
             | tagawa wrote:
             | This is not even scraping - it's just crawling and
             | indexing.
        
         | miyuru wrote:
         | reddit blocked datacenter IPs even before this change.
        
           | nomilk wrote:
           | Could a motivated scraper not buy IPs/proxies that aren't in
           | those ranges, i.e. to blend in with general users?
        
             | xeromal wrote:
             | Just like every security feature in the physical and
             | digital worlds, security just inconveniences honest people
             | and the cost to bypass reduces the amount of people who
             | try.
             | 
             | Eventually it becomes expensive to scrape reddit's data and
             | most people will stop.
        
             | Manuel_D wrote:
             | Proxy IPs are also known and typically blocked. In fact,
             | you can't even browse reddit without logging in when
             | connected to most proxies.
             | 
             | Many web scraping companies have loads of phones hooked up
             | in a rack in order to use mobile IPs. Companies can't just
             | block mobile IPs because their site would become unusable
             | for several city blocks (mobile IPs often correspond to a
             | specific cell tower). This is the face of modern web
             | scraping: https://i.imgur.com/U2RXi5G.jpeg
        
       | tempfile wrote:
       | Hopefully this paves the way for antitrust action, but I won't
       | hold my breath.
       | 
       | Reddit's justification for this is profoundly wrong. Their
       | "public content policy" is absurd doublespeak, and counter to
       | everything the open internet is and hopes to be. You cannot
       | simultaneously call yourself "open" and "public" while refusing
       | access to automated clients. Every client is automated. They even
       | go so far as to say that "crawling" (also known as "downloading")
       | is an "abuse" and violates user privacy.
       | 
       | This is absurd, and not justified. I would love to see
       | legislation that restricted server operators' ability to prohibit
       | automated access in this way, but I suppose it will never happen.
       | Some people in this thread have attempted to justify the policy
       | by saying "they have to protect their income streams". No they
       | don't. You don't have a right to an income stream, and you
       | certainly don't have a right to lie in order to get all the
       | benefits of an open internet with none of the downsides. Noting
       | of course that the "downsides" are in this case actually just
       | "competitors".
        
         | semiquaver wrote:
         | Sorry, what is the antitrust concern about Reddit blocking
         | crawlers that aren't paying them? Surely you don't think Reddit
         | has a monopoly on anything?
         | 
         | Or are you somehow suggesting that it's google's fault that
         | Reddit took this step? I don't see any indication that's the
         | case.
        
           | em-bee wrote:
           | not that reddit has a monopoly, but that google has.
           | 
           | google is using their power to prevent others from competing.
           | 
           | the problem here is of course that if reddit would be in
           | financial trouble (i don't know if they are but let's imagine
           | they need this money), they'd be between a rock and a hard
           | place.
           | 
           | google should not be allowed to make exclusive deals, and
           | reddit could not survive without the deal, then what would be
           | left? google buys reddit, or the relevant authority approves
           | of the deal?
           | 
           | i thought about the same problem with firefox. let's assume
           | firefox is forced to allow people to make a choice of the
           | default search engine (just like microsoft was forced to
           | allow a choice of default browser on windows) then google
           | might stop paying mozilla, and they could end up in financial
           | trouble.
           | 
           | ideally no company ever depends on a single other company
           | that much, but that only works if we don't allow companies to
           | grow that much in the first place.
        
             | ColinHayhurst wrote:
             | > let's assume firefox is forced to allow people to make a
             | choice of the default search engine
             | 
             | let assume apple is forced to allow people to make a choice
             | of the default search engine in safari then google might
             | stop paying apple, and ...
        
               | tempfile wrote:
               | surely firefox is the more interesting example, since
               | they have orders of magnitude less alternative revenue?
        
             | asadotzler wrote:
             | > let's assume firefox is forced to allow people to make a
             | choice of the default search engine
             | 
             | Firefox has always allowed people to make a choice of the
             | default search engine, since before it was even called
             | Firefox. I know. I was there building it.
        
               | em-bee wrote:
               | yes, but the default is google, and you have to go into
               | the settings to make a choice, so most people keep the
               | default. what i meant was the EU directive for microsoft
               | where they actually had to put up a prompt at first use
               | asking the user which browser they want, without allowing
               | any default (and, i am not sure, maybe even a randomized
               | list)
               | 
               | if the same was done for search engine choice for firefox
               | then google would no longer be the default, and they
               | would have no reason to pay firefox for that.
        
           | tempfile wrote:
           | Yes, sorry, should have been more clear: I claim google is in
           | a monopoly position, not reddit. The rest of the comment is
           | unrelated ranting about reddit's betrayal of their
           | previously-held "public data is public" position.
        
       | r_singh wrote:
       | I wonder how Aaron Swartz would react to this
        
         | geodel wrote:
         | My guess is he'd freak out once he'd hear that lawyers, law
         | enforcement may get involved on this issue.
        
       | ykonstant wrote:
       | It's ironic, because Reddit is the only search engine that works
       | on Google now thanks to shittening.
        
         | maxwell wrote:
         | They're both running on fumes at this point.
        
           | riiii wrote:
           | Also sniffing them.
        
       | voisin wrote:
       | Makes sense that Google did this deal since their search quality
       | tanked and they became an de facto front end UI for Reddit.
        
         | NoMoreNicksLeft wrote:
         | Up until 2016 (I think, +/- 1 year), if you could remember 3
         | uncommon words in a comment, you could find any reddit post
         | instantly on Google. I'd want to follow up on a thread from
         | weeks ago, and it was magic. Number one result. Then one day
         | that just stopped working, and even adding site:*.reddit.com
         | didn't fix it. At the time, I think, I didn't realize that it
         | was mostly Google's fault, I thought maybe Reddit had changed
         | their infrastructure so that it couldn't be crawled properly.
         | 
         | Google hasn't been a search engine in a long while, it's just
         | an advertisement engine now.
        
           | dev1ycan wrote:
           | it's so bad it's crazy, you can legit not find stuff on the
           | internet anymore, it's the same with youtube, I search
           | something and get like 20 or so results and then everything
           | else is hidden.
           | 
           | it started when youtube removed the ability to search for
           | videos older than 5 years, if I had to guess? cost saving,
           | have every old video in cheaper storage... but it sort of
           | fragments youtube, every couple of years you only get newer
           | content.
        
         | LegitShady wrote:
         | "we noticed that since our search results had gotten so bad
         | nobody can use them to find the things they want, people just
         | kept adding "reddit" to search terms anyways, so we figured we
         | might as well make it official and exclusive"
        
       | lowbloodsugar wrote:
       | Funny that source of TFA blocked me from reading the whole thing.
        
       | roughly wrote:
       | Boy, the LLMs have really been an apocalypse moment for the web,
       | haven't they? Between hoovering up and monetizing every bit of
       | content they can without any attribution or compensation and the
       | absolute flood of mediocre generated content, they've really done
       | in the last straggling remains of the open internet.
       | 
       | It's not like everyone wasn't already pulling the same grift, but
       | quantity really does have a quality all its own.
        
         | imglorp wrote:
         | Of course, we have to be careful not to villainize a neutral
         | tech. Instead let's call it what it is: unchecked capitalism
         | and monopolistic behaviors.
         | 
         | Capitalism seems to work ok for the common good until you
         | remove all the protections. LLMs provide a defacto monopoly for
         | the owner which must already be a near monopoly: they take vast
         | resources to train; only a giant corp can afford to buy all the
         | content and provision enough resources to train one.
         | 
         | LLM did not enshittify what's left of the internet, greed did
         | it.
        
       | mediumsmart wrote:
       | that is awesome but I can't open old.reddit.com in my browser so
       | its a non-issue.
        
       | daft_pink wrote:
       | I don't understand how this isn't anti-competitive behavior. It
       | seems like reddit has to offer this deal with similar terms to
       | google's competitors.
        
         | talldayo wrote:
         | They do offer that deal to others; a big news story was when
         | OpenAI bought Reddit's data they were selling:
         | https://openai.com/index/openai-and-reddit-partnership/
        
           | dathinab wrote:
           | yep, but for things which are "only" search engines it's not
           | a viable offer. Only if you expect "big AI business value"
           | from it does it make sense, maybe.
        
         | Suppafly wrote:
         | Most business deals are anti-competitive in some way. What
         | makes you think this specifically rises to the level where
         | they'd _legally_ have to offer similar terms to competitors?
        
         | carlosjobim wrote:
         | Why in the world would they have to do that? There are
         | thousands of exclusive business-to-business deals being signed
         | into action every second of the day.
        
         | eddd-ddde wrote:
         | I don't see how this tracks at all. Companies can decide to
         | only sell their products with some retailer if they want. You
         | can't force them to make deals with other companies.
        
           | gtirloni wrote:
           | You certainly can in monopoly situations (which apparently
           | this isn't the case).
        
       | mutatio wrote:
       | It's funny in the context of Google's past motto of "don't be
       | evil". I feel the right thing for Google here would have been to
       | decline any deal regarding exclusivity, then Reddit wouldn't have
       | pulled the trigger with its robots.txt update. The entire
       | manoeuvre required both parties.
        
         | peddling-brink wrote:
         | Google should abandon its mission to "organize the world's
         | information" because doing so requires spending money for
         | valuable data, and others might not want to spend that money?
        
       | tbeseda wrote:
       | https://archive.li/GS2I0
        
       | dathinab wrote:
       | Worse it doesn't even really "work" anymore, giving how most
       | search are flooded with garbage SEO results and payed
       | advertisements "basically" looking like search results (most
       | times more garbage not what you are looking for results, int he
       | cases where it isn't it quite often times is on the line of
       | "googles algorithm blackmailing companies to buy ads for users
       | which want to find them through google but wouldn't without
       | ads".)
       | 
       | I wonder if this might affect redis, as in slowly kill it's user
       | base especially when it comes to user providing (and often also
       | looking for) high quality content, because who of such users
       | would want to use google search?
        
         | john-radio wrote:
         | > Worse it doesn't even really "work" anymore, giving how most
         | search are flooded with garbage SEO results and payed
         | advertisements "basically" looking like search results ...
         | 
         | I don't understand what you're saying. That's exactly why
         | people append `site:reddit.com` to their searches in the first
         | place, because those search results typically aren't like that.
        
           | wwweston wrote:
           | Or at least, reddit posts and comments that _are_ content
           | messaging  / marketing (human or AI) fit in better with
           | earnest and natural posts, so that they're more effective.
        
       | wtf242 wrote:
       | This problem is only going to get worse. for my
       | thegreatestbooks.org site i used to just get indexed/scraped by
       | google and bing. now it's like 50+ AI bots scraping my entire
       | site just so they can train a LLM to answer questions my site
       | answers without having a user ever visit my site. I just checked
       | cloudflare and in the past 24 hours I've had 1.2 million
       | bot/automated requests
        
         | sct202 wrote:
         | There's a new setting in Cloudflare to block AI/scraper bots.
         | https://blog.cloudflare.com/declaring-your-aindependence-blo...
        
       | venkat223 wrote:
       | google is selfish
        
       | venkat223 wrote:
       | Google is selfish
        
       | StrauXX wrote:
       | IANAL but as far as I understand the current legal status (in the
       | US) a change in robots.txt or terms and conditions is not binding
       | for web scrapers since the data is publicly accessible. Neither
       | does displaying a banner "By using this site you accept our terms
       | and conditions" change anything about that. The only thing that
       | can make these kinds of terms binding is if the data is only
       | accessible after proactively accepting terms. For instance by
       | restricting the website until one has created an account.
       | Linkedin lost a case against a startup scraping and indexing
       | their data because of that a few years ago.
        
         | jpalomaki wrote:
         | Quite sure they are also enforcing these with some technical
         | measures to limit scraping.
        
           | renlo wrote:
           | As was LinkedIn, who was forced to rate stop limiting / IP-
           | banning scrapers for public pages.
        
         | qingcharles wrote:
         | At the federal level; but states have their own laws. For
         | instance, it can get you 5 years in prison in Illinois to
         | violate a web site ToS.
         | 
         | https://www.ilga.gov/legislation/ilcs/ilcs4.asp?DocName=0720...
        
           | redcobra762 wrote:
           | Has anyone ever successfully been prosecuted for violating
           | this statute?
        
       | numbers wrote:
       | "Information is power. But like all power, there are those who
       | want to keep it for themselves. The world's entire scientific and
       | cultural heritage, published over centuries in books and
       | journals, is increasingly being digitized and locked up by a
       | handful of private corporations." - Aaron Swartz (2008)
        
       | Khelavaster wrote:
       | robots.txt isn't legally binding. Can Reddit really force Bing
       | not to crawl it..?
        
       | melodyogonna wrote:
       | Wait that's actually terrible.
        
       | bitpush wrote:
       | When Microsoft strikes an exclusive deal with OpenAI to use their
       | models, it is a smart, brilliant, clever move.
       | 
       | When Apple strikes an exclusive deal with suppliers for parts, it
       | is sound business practice.
       | 
       | When Google strikes an exclusive deal with Reddit, it is ..
       | 
       | Some of you have no idea how businesses work, and it shows.
        
         | riku_iki wrote:
         | > When Google strikes an exclusive deal with Reddit, it is ..
         | 
         | It's because reddit is selling content created by users, base
         | on promises that reddit supports open internet, open data, etc,
         | without their consent and sharing revenue, which maybe legal
         | but likely not ethical.
        
           | bitpush wrote:
           | Let's get specific. You're confusing with copyright and
           | licensing.
           | 
           | The users hold the copyright (reddit claim that they made the
           | meme) but reddit has the non-exclusive right to redistribute
           | and license the content.
           | 
           | Two different things.
        
             | riku_iki wrote:
             | > but reddit has the non-exclusive right to redistribute
             | and license the content.
             | 
             | that's what I said: it is legal.
        
       | arnaudsm wrote:
       | I understand the AI context, but this is dangerously
       | anticompetitive for other search engines.
       | 
       | This is a dangerous precedent for the internet. Business
       | conglomerates have been controlling most of the web, but refusing
       | basic interoperability is even worse.
        
         | zooq_ai wrote:
         | There is nothing preventing search companies paying the same
         | $60 Million to license content.
         | 
         | If reddit had exclusive agreement, it would be anti-competive.
         | 
         | This is classic HN anti-Google tirade (and downvoting facts,
         | logic and concepts of free market)
        
           | pluc wrote:
           | Paying 60 million to every site you want to index is also a
           | bad precedent to set. Why can Reddit get paid and XYZ can't?
        
             | zooq_ai wrote:
             | Anyone can ask for licensing deal. I'm sure NY Times, Conde
             | Nest all have licensing deals. Mr. Beast signed a deal with
             | Amazon. Joe Rogan with Spotify. Why is it hard to
             | understand?
             | 
             | Even HN can get a licensing deal if they want to.
             | 
             | If you are producing content, you have every right to do
             | what you want to with the content.
        
               | SlackingOff123 wrote:
               | Reddit is not producing any content; its users are.
        
               | zooq_ai wrote:
               | Not the point. If users don't like it they can go
               | somewhere else to post.
               | 
               | For practical purposes, reddit can do whatever they want
               | with users post. It's right there in TOS
        
               | renewiltord wrote:
               | Users sign a deal to give Reddit the content.
        
             | spixy wrote:
             | maybe Reddit has more value than XYZ?
        
           | not_wyoming wrote:
           | > There is nothing preventing search companies paying the
           | same $60 Million to license content.
           | 
           | Yes, actually, there is - having $60m to throw around.
           | 
           | "Barriers to entry often cause or aid the existence of
           | monopolies and oligopolies" [0]. Monopolies and oligopolies
           | are definitionally the opposite of free market forces. This
           | is quite literally Econ 101.
           | 
           | [0] - https://en.wikipedia.org/wiki/Barriers_to_entry
        
             | saghm wrote:
             | Not to mention the fact that if this became commonplace,
             | other websites might start charging as well
        
             | zooq_ai wrote:
             | Having a Monopoly != Anti-Competitive.
             | 
             | Having Barriers to Entry != Anti-competitive
             | 
             | Yes, large players have advantages of Economies of Scale.
             | 
             | Just because you can't run an Airline because you don't
             | have money to buy an Airplane isn't anti-competitive.
             | 
             | Today Microsoft, Apple, OpenAI, Google, Amazon all can
             | afford those piddly $60m to license from reddit.
             | 
             | Not Anti-competitive at all.
             | 
             | But saddened by how much corporate-hate by HNers destroys
             | their credibility in debating these thing.
             | 
             | Go ahead downvote
        
               | not_wyoming wrote:
               | If you check citations, you'd find the sentence preceding
               | my excerpt on barriers to entry:
               | 
               | "Because barriers to entry protect incumbent firms and
               | restrict competition in a market, they can contribute to
               | distortionary prices and are therefore most important
               | when discussing antitrust policy."
               | 
               | Antitrust policy then links to a page on competition law:
               | "Competition law is the field of law that promotes or
               | seeks to maintain market competition by regulating anti-
               | competitive conduct by companies." [0]
               | 
               | So yes, I'd downvote you if I could, but HN doesn't allow
               | downvotes - which is honestly pretty fitting in the
               | context of this conversation.
               | 
               | [0] - https://en.wikipedia.org/wiki/Competition_law
        
               | zooq_ai wrote:
               | Once again buying an Airplane and starting an Airline
               | business has probably the highest barrier to entry. Yet
               | the Airline industry is the most competitive.
        
               | not_wyoming wrote:
               | The air travel industry has also seen some of the most
               | significant government regulation in the form of blocking
               | mergers (ie monopolistic, anticompetitive behavior) -
               | meaning that competition in the airline space is due to
               | regulation, not free market dynamics alone.
               | 
               | I'm happy to continue this debate if you'd like to start
               | supporting your posts with citations but probably won't
               | engage further unless you do. Have a great day!
        
             | try_the_bass wrote:
             | And yet "free market forces" are often the reason why
             | monopolies and oligopolies arise...?
             | 
             | Monopolies are entirely consistent with free market
             | economics. After all, if there's clearly a "best product"
             | for a particular niche, it's entirely rational (free market
             | actor) behavior for everyone to use the same product,
             | leading to its monopoly in that market segment.
             | 
             | I don't understand why people think this isn't/won't
             | be/shouldn't be a common result of "free market forces".
        
               | not_wyoming wrote:
               | > Monopolies are entirely consistent with free market
               | economics.
               | 
               | This is a fair critique. I'm approaching this from an
               | admittedly American perspective in which "free market"
               | colloquially implies competition - but I recognize that
               | competition is not inherently a free market concept.
               | 
               | Good callout!
        
             | GuB-42 wrote:
             | Microsoft can throw around $60M, and Bing is used by most
             | of the "alternative" search engines.
             | 
             | It doesn't solve the problem, but if money is the only
             | thing preventing search engines from accessing Reddit, then
             | what goes for Google also goes for Microsoft.
        
       | thih9 wrote:
       | Story / rant warning.
       | 
       | I remember seeing an unhelpful hyperlink for the first time. It
       | was a random word in the body of a random tech site that
       | redirected to a list of articles from that site tagged with that
       | term.
       | 
       | I remember being stunned, my expectation was that the link would
       | lead me to another website, one that would be an authoritative
       | source on that term and freely accessible.
       | 
       | 20 years later we get a paywalled article about fragmented web -
       | and we're not slowing down.
        
       | lmeyerov wrote:
       | FWIW, we inquired to the reddit sales team about paying for data
       | sometime last year, as we do similar elsewhere for use cases like
       | helping emergency responders, and even though they were launching
       | the program and asking for customers... no email back. Nor on our
       | second and I think third attempt.
       | 
       | I'm not sure what to make of that.
        
         | morkalork wrote:
         | How much were you willing to pay? Still, rude of them not to
         | even discuss the issue. Every time I've gone to buy data, if
         | I'm too small of a fish, vendors have always been happy refer
         | me to a reseller.
        
           | lmeyerov wrote:
           | We do 4-6 figures/yr for providers which is normal in our
           | world
           | 
           | An enterprise sales team with only 1 customer happens (eg,
           | Mozilla 's search bar), but... That's surprising here, and
           | scary as a sustainable & scalable business. Ignoring 5-6
           | figure/yr inquiries says a lot to me. In contrast, we did
           | that same-day with Twitter without talking to anyone.
        
           | heisenbit wrote:
           | Certainly rude but also possibly legally problematic. If they
           | were judged to be in a dominant position in a market and were
           | found making deals with exclusivity then it can get
           | expensive.
           | 
           | It all depends of course what the market is. If one looks as
           | reddit not as a whole but as a collection of niches then one
           | could imho find niches where reddit has a dominant knowledge
           | position.
        
       | jumploops wrote:
       | IIRC, GPT-2 was primarily trained on Reddit[0]
       | 
       | [0]https://www.reddit.com/r/ChatGPT/comments/133xgb5/gpt2_was_p..
       | .
        
       | neilv wrote:
       | I'm concerned multiple ways by this, but I also could see some
       | positive fallout from this, if it sets precedents that help
       | protect 'content' owners from AI goldrush companies just taking
       | everything.
        
         | gtirloni wrote:
         | AI companies are the least of our worries in the Reddit
         | situation. The fact that Reddit has full control of user-
         | generated data to do as they please gives them freedom to do as
         | they please. I think this is the crux of today's issue.
         | 
         | AI companies like Google, Microsoft and OpenAI have deep
         | pockets to 'unprotect' themselves from anything. The barrier to
         | entry is for small AI companies and those aren't really making
         | an impact currently.
        
       | r_singh wrote:
       | Thinking from reddits perspective they have nothing to lose
       | really. It's not like other search engines are going to pay any
       | attention to the robots txt and Google's AI would have still
       | scraped data from Reddit regardless of the deal. Now they will
       | just feel less bad about not citing sources possibly, depending
       | on the user experience they want to deliver.
        
       | dbg31415 wrote:
       | Every time I think, "How scummy..." Reddit always finds another
       | way to go lower.
        
       | earthboundkid wrote:
       | They literally think the scissor statement is a real thing that
       | will really work, fml.
        
       ___________________________________________________________________
       (page generated 2024-07-24 23:09 UTC)